Lead Data Scientist & Full-Stack Developer
I'm a Lead Data Scientist at FGS Global, building LLM-powered data pipelines that process 10M+ news articles daily for Fortune 500 clients. Based in Leipzig, Germany (commuting to Berlin), I specialize in large-scale AI systems, distributed data processing, and full-stack development.
I architect data infrastructure handling billion-row datasets with sub-second query performance using BigQuery, Python, and modern cloud technologies. As tech lead for our flagship internal product, I manage a team of 6 engineers while implementing cutting-edge RAG architectures and vector databases.
My approach combines rigorous computer science fundamentals with practical engineering solutions. Whether it's reducing processing time by 10x through optimized pipelines or building robust APIs that serve multiple products, I deliver measurable impact at scale.
Full-stack development with a focus on backend systems, data engineering, and ML infrastructure. Experienced in taking projects from prototype to production.
Led development of performance-critical extensions to NetworkX, achieving 100x speedup for spectral computations on graphs with millions of nodes. Implemented custom Cython modules and parallelized algorithms using multiprocessing and NumPy vectorization.
Impact: Reduced computation time from hours to minutes for large-scale network analysis, enabling real-time analytics for production systems.
Python • Cython • NumPy • SciPy • pytest • GitHub Actions
Architected and built a distributed system for real-time graph analytics, processing 1M+ events/second. Designed microservices architecture with Apache Kafka for streaming, Redis for caching, and PostgreSQL with TimescaleDB for time-series storage.
Technical highlights: Custom graph database layer, WebSocket API for live updates, React dashboard with D3.js visualizations.
Python • FastAPI • Kafka • Redis • PostgreSQL • Docker • Kubernetes • React
Built an open-source framework for deploying PyTorch models at scale. Features include automatic batching, model versioning, A/B testing capabilities, and comprehensive monitoring. Reduced model deployment time from days to hours.
Used by: 50+ data scientists across 3 organizations, serving 10M+ predictions daily.
PyTorch • TorchServe • Docker • FastAPI • Prometheus • Grafana • AWS
Engineered data pipeline processing mobility data for 300+ US cities during COVID-19 pandemic. Built ETL workflows using Apache Airflow, implemented data quality checks, and optimized geospatial queries with PostGIS.
Results: Enabled epidemiologists to analyze movement patterns in near real-time, contributing to public health policy decisions.
Python • Airflow • PostGIS • Pandas • Dask • AWS S3 • Docker
Max Planck Institute for Mathematics in the Sciences
• Lead engineer for computational research platform serving 100+ researchers
• Architected distributed computing infrastructure reducing analysis time by 90%
• Built ML pipelines for network analysis processing TB-scale datasets
• Mentored 5 junior developers and established engineering best practices
Network Science Institute, Northeastern University
• Developed production ML systems for graph-based predictions achieving 95% accuracy
• Built data engineering pipelines processing 50M+ graph edges daily
• Created open-source Python libraries with 1000+ GitHub stars
• Optimized critical algorithms achieving 100x performance improvements
Wolfram Research South America
• Implemented computational algorithms for Mathematica's graph theory package
• Developed high-performance C++ extensions for numerical computations
• Contributed to Wolfram Language documentation and examples
• Collaborated with global team across 5 time zones using agile methodologies
Various Projects
• NetworkX: Core contributor, performance optimizations and new algorithms
• netrd: Co-creator, library for network comparison with 500+ stars
• PyTorch Geometric: Contributed spectral methods implementations
• SciPy: Bug fixes and documentation improvements