Leo Torres

Leo Torres

Senior Data Engineer

LLMs & AI Python BigQuery Leipzig/Berlin
Download Resume Full technical experience
Schedule Call Let's discuss your project

About

I'm a Lead Data Scientist at FGS Global, building LLM-powered data pipelines that process 10M+ documents daily for Fortune 500 clients. Based in Leipzig, Germany (commuting to Berlin), I specialize in large-scale AI systems, distributed data processing, and production data infrastructure.

I architect data infrastructure handling billion-row datasets with sub-second query performance using BigQuery, Python, and modern cloud technologies. As technical lead for a team of 6 engineers on our flagship internal product, I implement cutting-edge RAG architectures and vector databases.

My approach combines rigorous computer science fundamentals with practical engineering solutions. I deliver measurable impact at scale through robust pipelines and APIs that serve real production workloads.

Technical Expertise

Core Competencies

Backend systems, data engineering, and ML infrastructure. Experienced in taking projects from prototype to production.

Python JavaScript/TypeScript Distributed Systems Machine Learning Data Pipelines API Design Cloud Architecture Performance Optimization

Featured Projects

FGS Global — Flagship Intelligence Product

Lead Data Scientist and technical lead for a team of 6 engineers. Designed, built, and own end-to-end the news media analysis pipeline — an LLM-powered system processing 10M+ documents/day into client-facing intelligence for Fortune 500 firms. Architected retrieval infrastructure over billion-row datasets in BigQuery with sub-second query performance.

Design: RAG and vector-search layer made domain-specific retrieval accurate enough for production client use; FastAPI service layer serves multiple internal products from a shared data backend.

Python • FastAPI • BigQuery • GCP • LLMs • RAG • Vector Databases

hns — Headless Multi-Agent Orchestration (private)

Headless dispatcher that runs Claude Code workers in parallel through a state machine: PR triage, CI gating, stuck-detection with escalation, orphan recovery, autonomous worktree dispatch.

Design: Python; abstractions and control loop designed from first principles.

Python • State Machines • Concurrency • Git Worktrees

The Aris Program

Founder of a multi-project open-source organization building web-native infrastructure for academic publishing. Designed RSM (Readable Science Markup), a markup language for scientific documents, with a tree-sitter grammar.

Built Scroll Press: a preprint server (FastAPI, PostgreSQL, HTMX) accepting Typst, Quarto, MyST, and Jupyter; live in beta. Architecting a collaborative editor (Vue) as the reference RSM implementation.

Python • FastAPI • PostgreSQL • HTMX • Vue.js • tree-sitter

XGI: Complex Group Interactions

Co-Lead Developer of Python library for analyzing higher-order networks and hypergraphs. NumFOCUS affiliated project with growing academic user base. Implemented core algorithms, designed API, and established comprehensive testing framework.

Technical leadership: OOP design, CI/CD with GitHub Actions, performance optimization with NumPy and Numba.

Python • NumPy • pandas • Numba • pytest • GitHub Actions • OOP

Manim Community

Organization Owner and Core Developer of the community-maintained version of 3Blue1Brown's mathematical animation engine. Contributing to the open-source Python library that creates precise, programmatic mathematical visualizations and educational content.

Recognition: Featured in GitHub's Popular Python Repositories. Contributions: Algorithm implementations, performance optimizations, documentation improvements, and community support for mathematical animation workflows.

Python • Mathematical Visualization • OpenGL • Cairo • Community Development

COVID-19 Mobility Data Pipeline

Engineered data pipeline processing mobility data for 300+ US cities during COVID-19 pandemic. Built ETL workflows using Apache Airflow, implemented data quality checks, and optimized geospatial queries with PostGIS.

Results: Enabled epidemiologists to analyze movement patterns in near real-time, contributing to public health policy decisions.

Python • Airflow • Pandas

Industry Experience

Lead Data Scientist → Tech Lead, Data Platform

May 2023 - Present

FGS Global

Designed, built, and own end-to-end the news media analysis pipeline at FGS — an LLM-powered system processing 10M+ documents/day into client-facing intelligence for Fortune 500 firms

Architected retrieval infrastructure over billion-row datasets in BigQuery with sub-second query performance

Designed the RAG and vector-search layer that made domain-specific retrieval accurate enough for production client use

Technical lead for a team of 6 engineers: set architecture, owned technical direction, drove hiring decisions

Built the FastAPI service layer serving multiple internal products from a shared data backend

Research Intern

May 2019 - Jul 2019

Yahoo! Research

Built graph representation learning models on Tumblr social network data

Processed terabyte-scale datasets using PySpark and distributed computing

Developed Python pipelines for large-scale network analysis

Research Programmer

2012 - 2014

Wolfram Research South America

Developed data pipelines for the Wolfram|Alpha knowledge engine

Owned specific data domains end-to-end, including ingestion and quality

Worked in a remote, globally distributed team

Open-Source Maintainership

hns — Headless Multi-Agent Orchestration (private)

2025 - Present

Headless dispatcher for parallel Claude Code workers

State machine for PR triage, CI gating, stuck-detection with escalation, orphan recovery, autonomous worktree dispatch

Python; abstractions and control loop designed from first principles

Co-Lead Developer — XGI

Aug 2021 - Present

NumFOCUS-affiliated Python library for higher-order networks

Designed public API, core algorithms, and CI/CD; performance work with NumPy and Numba

Library adopted by researchers across academia and industry

Organization Owner & Core Developer — Manim Community

May 2020 - May 2021

Community-maintained mathematical animation engine (3Blue1Brown)

Featured in GitHub's Popular Python Repositories; grew project from fork to active community

Algorithm implementations, performance work, release management, contributor onboarding

Co-Lead Developer — netrd

Jan 2019 - Jul 2019

Library for network reconstruction and comparison (JOSS-published)

Implemented 40+ algorithms; coordinated 12+ contributors; set coding standards

Reviewer — Journal of Open Source Software

Jul 2020 - Present

Peer review for scientific software submissions

Research Engineering & Academia

Postdoctoral Fellow — Mathematics

Aug 2021 - May 2023

Max Planck Institute for Mathematics in the Sciences

Spectral graph theory research applied to complex networks

Implemented high-performance graph mining tools in Python alongside published research

PhD, Network Science

2016 - 2021

Network Science Institute, Northeastern University

Dissertation: Spectral Aspects of Mining Complex Networks

Developed open-source Python libraries used by the research community

Technical Skills

Languages & Frameworks

  • Python: NumPy, Pandas, SciPy, PyTorch, FastAPI, Django, Celery
  • JavaScript: Node.js, React, TypeScript, D3.js, Express
  • Systems: C++, Rust (learning), Go (basic)
  • Other: SQL, GraphQL, Shell scripting, LaTeX

Infrastructure & Tools

  • Cloud: AWS (EC2, S3, Lambda, SageMaker), GCP, Azure
  • Databases: PostgreSQL, MongoDB, Redis, Neo4j, TimescaleDB
  • DevOps: Docker, Kubernetes, Terraform, GitHub Actions, CircleCI
  • Monitoring: Prometheus, Grafana, ELK Stack, Datadog

Methodologies & Practices

  • Architecture: Microservices, Event-driven, REST/GraphQL APIs
  • Development: TDD, CI/CD, Code review, Pair programming
  • Data: ETL pipelines, Stream processing, Data modeling
  • ML Ops: Model versioning, A/B testing, Feature stores

Soft Skills

  • Technical leadership and mentoring
  • Cross-functional collaboration
  • Technical documentation and knowledge sharing
  • Remote team coordination (5+ years)