Senior Data Engineer
I'm a Lead Data Scientist at FGS Global, building LLM-powered data pipelines that process 10M+ documents daily for Fortune 500 clients. Based in Leipzig, Germany (commuting to Berlin), I specialize in large-scale AI systems, distributed data processing, and production data infrastructure.
I architect data infrastructure handling billion-row datasets with sub-second query performance using BigQuery, Python, and modern cloud technologies. As technical lead for a team of 6 engineers on our flagship internal product, I implement cutting-edge RAG architectures and vector databases.
My approach combines rigorous computer science fundamentals with practical engineering solutions. I deliver measurable impact at scale through robust pipelines and APIs that serve real production workloads.
Backend systems, data engineering, and ML infrastructure. Experienced in taking projects from prototype to production.
Lead Data Scientist and technical lead for a team of 6 engineers. Designed, built, and own end-to-end the news media analysis pipeline — an LLM-powered system processing 10M+ documents/day into client-facing intelligence for Fortune 500 firms. Architected retrieval infrastructure over billion-row datasets in BigQuery with sub-second query performance.
Design: RAG and vector-search layer made domain-specific retrieval accurate enough for production client use; FastAPI service layer serves multiple internal products from a shared data backend.
Python • FastAPI • BigQuery • GCP • LLMs • RAG • Vector Databases
Headless dispatcher that runs Claude Code workers in parallel through a state machine: PR triage, CI gating, stuck-detection with escalation, orphan recovery, autonomous worktree dispatch.
Design: Python; abstractions and control loop designed from first principles.
Python • State Machines • Concurrency • Git Worktrees
Founder of a multi-project open-source organization building web-native infrastructure for academic publishing. Designed RSM (Readable Science Markup), a markup language for scientific documents, with a tree-sitter grammar.
Built Scroll Press: a preprint server (FastAPI, PostgreSQL, HTMX) accepting Typst, Quarto, MyST, and Jupyter; live in beta. Architecting a collaborative editor (Vue) as the reference RSM implementation.
Python • FastAPI • PostgreSQL • HTMX • Vue.js • tree-sitter
Co-Lead Developer of Python library for analyzing higher-order networks and hypergraphs. NumFOCUS affiliated project with growing academic user base. Implemented core algorithms, designed API, and established comprehensive testing framework.
Technical leadership: OOP design, CI/CD with GitHub Actions, performance optimization with NumPy and Numba.
Python • NumPy • pandas • Numba • pytest • GitHub Actions • OOP
Organization Owner and Core Developer of the community-maintained version of 3Blue1Brown's mathematical animation engine. Contributing to the open-source Python library that creates precise, programmatic mathematical visualizations and educational content.
Recognition: Featured in GitHub's Popular Python Repositories. Contributions: Algorithm implementations, performance optimizations, documentation improvements, and community support for mathematical animation workflows.
Python • Mathematical Visualization • OpenGL • Cairo • Community Development
Engineered data pipeline processing mobility data for 300+ US cities during COVID-19 pandemic. Built ETL workflows using Apache Airflow, implemented data quality checks, and optimized geospatial queries with PostGIS.
Results: Enabled epidemiologists to analyze movement patterns in near real-time, contributing to public health policy decisions.
Python • Airflow • Pandas
FGS Global
• Designed, built, and own end-to-end the news media analysis pipeline at FGS — an LLM-powered system processing 10M+ documents/day into client-facing intelligence for Fortune 500 firms
• Architected retrieval infrastructure over billion-row datasets in BigQuery with sub-second query performance
• Designed the RAG and vector-search layer that made domain-specific retrieval accurate enough for production client use
• Technical lead for a team of 6 engineers: set architecture, owned technical direction, drove hiring decisions
• Built the FastAPI service layer serving multiple internal products from a shared data backend
Yahoo! Research
• Built graph representation learning models on Tumblr social network data
• Processed terabyte-scale datasets using PySpark and distributed computing
• Developed Python pipelines for large-scale network analysis
Wolfram Research South America
• Developed data pipelines for the Wolfram|Alpha knowledge engine
• Owned specific data domains end-to-end, including ingestion and quality
• Worked in a remote, globally distributed team
Headless dispatcher for parallel Claude Code workers
• State machine for PR triage, CI gating, stuck-detection with escalation, orphan recovery, autonomous worktree dispatch
• Python; abstractions and control loop designed from first principles
NumFOCUS-affiliated Python library for higher-order networks
• Designed public API, core algorithms, and CI/CD; performance work with NumPy and Numba
• Library adopted by researchers across academia and industry
Community-maintained mathematical animation engine (3Blue1Brown)
• Featured in GitHub's Popular Python Repositories; grew project from fork to active community
• Algorithm implementations, performance work, release management, contributor onboarding
Library for network reconstruction and comparison (JOSS-published)
• Implemented 40+ algorithms; coordinated 12+ contributors; set coding standards
Peer review for scientific software submissions
Max Planck Institute for Mathematics in the Sciences
• Spectral graph theory research applied to complex networks
• Implemented high-performance graph mining tools in Python alongside published research
Network Science Institute, Northeastern University
• Dissertation: Spectral Aspects of Mining Complex Networks
• Developed open-source Python libraries used by the research community