Experience

Changelog from my journey

ML Engineer · AI Platform

AILY Labs

May 2024 – Present

Barcelona, Spain

  • Maintain and improve the org's shared ML/AI platform on Kubernetes (EKS), eliminating per-team infrastructure duplication and cutting model deployment lead time with shared GHA CI/CD workflows.
  • Engineered a standardised FastAPI model-serving framework adopted org-wide: factory pattern, request/response middleware, health checks, Datadog APM tracing, and multi-tenant MLflow model loading.
  • Authored and maintain an internal scikit-learn-compatible ML utilities library covering MRMR/SHAP feature selection, Optuna hyperparameter tuning, statistical drift detection, and MLflow lifecycle management.
  • Designed a Knowledge Graph platform from greenfield to production on Neo4j: NLP-driven entity extraction, matching and merging pipelines, GraphRAG, LLM exploration agents, Pydantic data models.
  • Maintain a shared GenAI library (embeddings, LLMs, vector DBs, Langfuse integration) and a unified data access layer enabling 10+ services to share a single tested data surface.
  • Delivered production LLM and agentic systems: hybrid + vector RAG on OpenSearch, real-time ReAct agents via PydanticAI + MCP servers, a unified LLM gateway (OpenAI, Bedrock) with quota management.
  • Productionalizing a LoRA model to be served on GPU Nodepools, culminating in a purpose-built autocomplete model served via a vLLM inference engine on Kubernetes.
  • Built an OpenSearch data platform end-to-end: Textract LAYOUT + Anthropic contextual retrieval, retrieval evals, cluster health and indexing pressure monitoring, cluster infra management with Terraform.
  • Designed and shipped an MCP server exposing OpenSearch to LLM agents via a PydanticAI query agent with a semantic index catalog, DSL validator, and inline Bedrock vector injection. Agentic RAG with a safe, validated query surface.
  • Contributed to a pull-based lakehouse query orchestrator (DuckDB/DuckLake over Redis + Kubernetes) for agentic services: pod family sizing, gradient-based proactive scaling, crash-safe inflight recovery, and a ~1s → ~50ms tail latency improvement.
  • Contributed to a shared semantic layer (metadata and context layer) providing agents and services a unified catalog API over tables, indexes, and skills, with tenant-aware metadata resolution and a dbt/YAML publish pipeline into Postgres.
KubernetesFastAPIMLflowNeo4jPydanticAIRAGLLMsOpenTelemetryPython

Data Scientist · Projects Officer

IMF

Nov 2023 – Apr 2024 · Apr 2025 – Jun 2025

Remote

  • Quantified causal effects of IMF interventions on member-state conflict under a structural causal framework.
  • Built a RAG pipeline over MONA policy documents to accelerate evidence retrieval for economists.
  • Designed a Human-in-the-Loop annotation pipeline (Label Studio + Few-Shot Learning) that reduced manual labelling effort while maintaining research-grade label quality.
RAGCausal InferenceNLPPythonLabel Studio

Data Analyst

Gameloft

Jul 2023 – Dec 2023

Barcelona, Spain

  • Shipped funnel dashboards tracking CTR, DAU and RPU across titles.
  • Designed A/B testing frameworks for seasonal campaigns that measurably lifted player retention and monetisation metrics.
A/B TestingSQLDashboardsAnalytics

Apr 2023 – Jul 2023

Barcelona, Spain

  • Implemented a novel inductive GraphSAGE variant (GNN/GCN/GAT) to learn embeddings from Knowledge Graphs extracted from earnings-call transcripts.
  • Used the embeddings to estimate cumulative abnormal returns in the 30-day post-event window.
GNNGraphSAGEKnowledge GraphsPyTorchNLP

Data Engineer

IQVIA

Apr 2021 – Aug 2022

Kerala, India

  • Replaced a legacy Informatica pipeline with Spark SQL-based automated report generation, cutting processing time significantly.
  • Built ELT pipelines in Snowflake integrating pharmaceutical sales data with SAP (Star Schema).
  • Implemented GDPR-compliant data anonymisation for HCPs and onboarded new market data models in Reltio MDM.
SnowflakeSparkInformaticaSQLReltio

Education

2022 – 2023

GPA 8.62 / 10

Causal InferenceBayesian StatisticsDeep LearningNLP / LLMsGraph Neural NetworksProbabilistic Inference

2017 – 2021

CGPA 9.13 / 10