Senior Big Data Engineer (Python)
Join our fast‑growing team to build a unified platform for data analytics, machine learning, and generative AI. We are building a central hub that turns raw events into reliable features, insights, and user‑facing analytics at scale. As a Senior Big Data Engineer, you will integrate an advanced AI/ML toolkit with real‑time streaming into a production-grade feature store and high-performance dashboards, supporting both classic ML and cutting-edge GenAI use cases.
Essential functions
Pipeline Engineering: Design and build "exactly‑once" streaming data pipelines from event sources into low‑latency feature serving and OLAP queries.
Feature Store Ownership: Stand up and optimize a production feature store (handling schemas, SCD, point‑in‑time correctness, TTL, and backfills).
Tooling Development: Develop reusable libraries, SDKs, and CLIs for data ingestion, feature engineering, and model deployment.
Operational Analytics: Build and operationalize Apache Superset dashboards for monitoring data quality, feature drift, and pipeline health.
Performance Tuning: Benchmark and tune distributed databases (ClickHouse), partitioning, indexes, and streaming query patterns.
Reliability & Governance: Implement data contracts, lineage, observability, and cost controls within the platform.
Automation: Drive CI/CD and Infrastructure-as-Code (Terraform) to ensure reproducible and safe release environments.
Qualifications
Experience: 4+ years of experience building production data/ML or streaming systems with high TPS and large data volumes.
Languages: Strong coding skills in Python and at least one JVM language (Java or Scala), plus solid SQL.
Big Data Stack: Hands‑on experience with Kafka, Spark or Flink, and OLAP stores (ideally ClickHouse).
Architecture: Solid grasp of distributed systems fundamentals (partitioning, consistency, idempotency, retries).
Infrastructure: Proficiency with Kubernetes, Docker, and CI/CD pipelines (GitHub Actions/GitLab CI).
Methodology: Proven experience designing feature pipelines with point‑in‑time correctness.
Would be a plus
GenAI: Experience with RAG pipelines, embeddings, and model evaluation at scale.
Data Quality: Familiarity with observability tooling like Great Expectations, Deequ, or Monte Carlo.
Advanced Analytics: Experience with Apache Superset custom visualization or plugin development.
Orchestration: Hands-on experience with Airflow and dbt for complex transformations.
Security: Exposure to PII handling, RBAC/ABAC, and secrets management.
We offer
Opportunity to work on bleeding-edge projects
Work with a highly motivated and dedicated team
Competitive salary
Flexible schedule
Benefits package - medical insurance, sports
Corporate social events
Professional development opportunities
Well-equipped office
About us
Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, AI, and advanced analytics services. Fusing technical vision with business acumen, we solve the most pressing technical challenges and enable positive business outcomes for enterprise companies undergoing business transformation. A key differentiator for Grid Dynamics is our 8 years of experience and leadership in enterprise AI, supported by profound expertise and ongoing investment in data, analytics, cloud & DevOps, application modernization and customer experience. Founded in 2006, Grid Dynamics is headquartered in Silicon Valley with offices across the Americas, Europe, and India.
Senior Big Data Engineer (Python)
Senior Big Data Engineer (Python)