Leo Torres

Lead Data Scientist & Full-Stack Developer

LLMs & AI • Python • BigQuery • Leipzig/Berlin

Download Resume Full technical experience

Schedule Call Let's discuss your project

About

I'm a Lead Data Scientist at FGS Global, building LLM-powered data pipelines that process 10M+ news articles daily for Fortune 500 clients. Based in Leipzig, Germany (commuting to Berlin), I specialize in large-scale AI systems, distributed data processing, and full-stack development.

I architect data infrastructure handling billion-row datasets with sub-second query performance using BigQuery, Python, and modern cloud technologies. As tech lead for our flagship internal product, I manage a team of 6 engineers while implementing cutting-edge RAG architectures and vector databases.

My approach combines rigorous computer science fundamentals with practical engineering solutions. Whether it's reducing processing time by 10x through optimized pipelines or building robust APIs that serve multiple products, I deliver measurable impact at scale.

Technical Expertise

Core Competencies

Full-stack development with a focus on backend systems, data engineering, and ML infrastructure. Experienced in taking projects from prototype to production.

Python JavaScript/TypeScript Distributed Systems Machine Learning Data Pipelines API Design Cloud Architecture Performance Optimization

Featured Projects

NetworkX Performance Optimization Suite

Led development of performance-critical extensions to NetworkX, achieving 100x speedup for spectral computations on graphs with millions of nodes. Implemented custom Cython modules and parallelized algorithms using multiprocessing and NumPy vectorization.

Impact: Reduced computation time from hours to minutes for large-scale network analysis, enabling real-time analytics for production systems.

Python • Cython • NumPy • SciPy • pytest • GitHub Actions

GitHub Documentation Benchmarks

Real-time Graph Analytics Platform

Architected and built a distributed system for real-time graph analytics, processing 1M+ events/second. Designed microservices architecture with Apache Kafka for streaming, Redis for caching, and PostgreSQL with TimescaleDB for time-series storage.

Technical highlights: Custom graph database layer, WebSocket API for live updates, React dashboard with D3.js visualizations.

Python • FastAPI • Kafka • Redis • PostgreSQL • Docker • Kubernetes • React

Architecture Diagram Case Study

ML Model Deployment Framework

Built an open-source framework for deploying PyTorch models at scale. Features include automatic batching, model versioning, A/B testing capabilities, and comprehensive monitoring. Reduced model deployment time from days to hours.

Used by: 50+ data scientists across 3 organizations, serving 10M+ predictions daily.

PyTorch • TorchServe • Docker • FastAPI • Prometheus • Grafana • AWS

GitHub PyPI Tutorial

COVID-19 Mobility Data Pipeline

Engineered data pipeline processing mobility data for 300+ US cities during COVID-19 pandemic. Built ETL workflows using Apache Airflow, implemented data quality checks, and optimized geospatial queries with PostGIS.

Results: Enabled epidemiologists to analyze movement patterns in near real-time, contributing to public health policy decisions.

Python • Airflow • PostGIS • Pandas • Dask • AWS S3 • Docker

Technical Report Dashboard

Professional Experience

Senior Software Engineer

2021 - Present

Max Planck Institute for Mathematics in the Sciences

• Lead engineer for computational research platform serving 100+ researchers

• Architected distributed computing infrastructure reducing analysis time by 90%

• Built ML pipelines for network analysis processing TB-scale datasets

• Mentored 5 junior developers and established engineering best practices

ML Engineer / Data Scientist

2017 - 2021

Network Science Institute, Northeastern University

• Developed production ML systems for graph-based predictions achieving 95% accuracy

• Built data engineering pipelines processing 50M+ graph edges daily

• Created open-source Python libraries with 1000+ GitHub stars

• Optimized critical algorithms achieving 100x performance improvements

Research Programmer

2010 - 2017

Wolfram Research South America

• Implemented computational algorithms for Mathematica's graph theory package

• Developed high-performance C++ extensions for numerical computations

• Contributed to Wolfram Language documentation and examples

• Collaborated with global team across 5 time zones using agile methodologies

Open Source Contributions

2015 - Present

Various Projects

• NetworkX: Core contributor, performance optimizations and new algorithms

• netrd: Co-creator, library for network comparison with 500+ stars

• PyTorch Geometric: Contributed spectral methods implementations

• SciPy: Bug fixes and documentation improvements

Technical Skills

Languages & Frameworks

Python: NumPy, Pandas, SciPy, PyTorch, FastAPI, Django, Celery
JavaScript: Node.js, React, TypeScript, D3.js, Express
Systems: C++, Rust (learning), Go (basic)
Other: SQL, GraphQL, Shell scripting, LaTeX

Infrastructure & Tools

Cloud: AWS (EC2, S3, Lambda, SageMaker), GCP, Azure
Databases: PostgreSQL, MongoDB, Redis, Neo4j, TimescaleDB
DevOps: Docker, Kubernetes, Terraform, GitHub Actions, CircleCI
Monitoring: Prometheus, Grafana, ELK Stack, Datadog

Methodologies & Practices

Architecture: Microservices, Event-driven, REST/GraphQL APIs
Development: TDD, CI/CD, Code review, Pair programming
Data: ETL pipelines, Stream processing, Data modeling
ML Ops: Model versioning, A/B testing, Feature stores

Soft Skills

Technical leadership and mentoring
Cross-functional collaboration
Technical documentation and knowledge sharing
Remote team coordination (5+ years)