Lead Data Scientist & Full-Stack Developer
I'm a Lead Data Scientist at FGS Global, building LLM-powered data pipelines that process 1M+ news articles daily for Fortune 500 clients. Based in Leipzig, Germany (commuting to Berlin), I specialize in large-scale AI systems, distributed data processing, and full-stack development.
I architect data infrastructure handling billion-row datasets with sub-second query performance using BigQuery, Python, and modern cloud technologies. As tech lead for our flagship internal product, I manage a team of 6 engineers while implementing cutting-edge RAG architectures and vector databases.
My approach combines rigorous computer science fundamentals with practical engineering solutions. Whether it's reducing processing time by 10x through optimized pipelines or building robust APIs that serve multiple products, I deliver measurable impact at scale.
Full-stack development with a focus on backend systems, data engineering, and ML infrastructure. Experienced in taking projects from prototype to production.
Lead Data Scientist and Tech Lead for flagship internal product processing 1M+ news articles daily for Fortune 500 clients. Built LLM-powered pipelines with RAG architecture, managing team of 6 engineers while architecting infrastructure handling 1B+ log entries.
Impact: Reduced processing time by 10x through optimized data pipelines, enabling real-time insights for enterprise clients across multiple industries.
Python • FastAPI • BigQuery • GCP • LLMs • RAG • Vector Databases
Founder and Lead Developer of next-generation academic publishing platform. Building open-source tools for collaborative scientific writing with real-time version control. Full-stack architecture using Python, FastAPI, Vue.js, and HTMX.
Vision: Revolutionizing how researchers collaborate and publish, launching 2025 with focus on transparency and reproducibility.
Python • FastAPI • Vue.js • HTMX • PostgreSQL • Docker • Netlify
Co-Lead Developer of Python library for analyzing higher-order networks and hypergraphs. NumFOCUS affiliated project with growing academic user base. Implemented core algorithms, designed API, and established comprehensive testing framework.
Technical leadership: OOP design, CI/CD with GitHub Actions, performance optimization with NumPy and Numba.
Python • NumPy • pandas • Numba • pytest • GitHub Actions • OOP
Organization Owner and Core Developer of the community-maintained version of 3Blue1Brown's mathematical animation engine. Contributing to the open-source Python library that creates precise, programmatic mathematical visualizations and educational content.
Recognition: Featured in GitHub's Popular Python Repositories. Contributions: Algorithm implementations, performance optimizations, documentation improvements, and community support for mathematical animation workflows.
Python • Mathematical Visualization • OpenGL • Cairo • Community Development
Engineered data pipeline processing mobility data for 300+ US cities during COVID-19 pandemic. Built ETL workflows using Apache Airflow, implemented data quality checks, and optimized geospatial queries with PostGIS.
Results: Enabled epidemiologists to analyze movement patterns in near real-time, contributing to public health policy decisions.
Python • Airflow • Pandas
FGS Global
• Built LLM-powered pipelines processing 1M+ news articles daily for Fortune 500 clients
• Architected data infrastructure handling 1B+ log entries with sub-second query performance
• Tech lead for flagship internal product, managing team of 6 engineers
• Implemented RAG architecture and vector databases for domain-specific information retrieval
Yahoo! Research
• Built graph representation learning models on Tumblr social network data
• Processed terabyte-scale datasets using PySpark and distributed computing
• Developed Python pipelines for large-scale network analysis
Wolfram Research South America
• Developed data pipelines for the Wolfram|Alpha knowledge engine
• Owned specific data domains end-to-end, including ingestion and quality
• Worked in a remote, globally distributed team
NumFOCUS-affiliated Python library for higher-order networks
• Designed public API, core algorithms, and CI/CD; performance work with NumPy and Numba
• Library adopted by researchers across academia and industry
Community-maintained mathematical animation engine (3Blue1Brown)
• Featured in GitHub's Popular Python Repositories; grew project from fork to active community
• Algorithm implementations, performance work, release management, contributor onboarding
Library for network reconstruction and comparison (JOSS-published)
• Implemented 40+ algorithms; coordinated 12+ contributors; set coding standards
Peer review for scientific software submissions
Max Planck Institute for Mathematics in the Sciences
• Spectral graph theory research applied to complex networks
• Implemented high-performance graph mining tools in Python alongside published research
Network Science Institute, Northeastern University
• Dissertation: Spectral Aspects of Mining Complex Networks
• Developed open-source Python libraries used by the research community