Senior Data Architect
Location: Remote. Must overlap with US Central and EU working hours.
Employment Type: Full-time No part-time availability. No split focus.
Start: ASAP (client timeline: ~16 weeks for Phase 2 MVP, likely follow-on phases) long term contract with Endrada
This is a high-rigor environment. You will work with very senior client engineers and principal architects who expect you to reason at depth about Spark/Databricks internals, orchestration semantics, failure modes, and production SDLC.
What you will own (Phase 2 deliverables)
You will lead architecture + hands-on implementation of a Temporal-based orchestration wrapper that triggers, monitors, and classifies Databricks job runs, including:
1) Temporal infrastructure & deployment
- Help deliver a production-grade Temporal deployment aligned to the client's Hub + Spoke architecture (in coordination with Cloud Engineering)
- Demonstrate deployments/execution in staging workspace
- AWS is the target cloud; identify Azure gaps (don't ignore cross-cloud realities)
2) Multi-environment SDLC
- Support multiple environments (dev/staging/production)
- Integrate with the client's existing internal deployment tooling and namespacing patterns
- Ensure clean promotion paths with appropriate guardrails
3) Production pilot: migrate authentication pipeline
- Migrate authentication token generation + secret-writing pipeline from its current orchestration into Temporal as a high-value, low-risk production pilot
4) Implement the "Sequence Pipeline" pattern in Temporal
- Replicate the current "Sequence Job" pattern using Temporal workflows
- Implement "pick up running child job" to prevent redundant compute costs
- Implement step-level recovery: if Task 5 of 10 fails, keep results from 1–4 and allow resume from 5 (no "restart everything")
- Add audit logging / observability for execution history + outcomes
- Deliver an operational runbook for triage and ongoing operations in Temporal
5) Security & permissions model
- Implement a robust permissions pattern so Temporal can trigger and monitor "child" jobs across Databricks workspaces
- Maintain strict logical separation: Temporal is the "control plane," Databricks remains the data/compute plane
6) Reference implementation
- Build a "dummy" reference job sequence as a blueprint for the client's engineers to extend in Phase 3
What is intentionally out of scope (so you can focus)
Phase 2 explicitly defers deeper data-domain workstreams (DLQ enhancements, domain-specific pilots, hybrid compute guardrails, cost attribution) to Phase 3. You are not expected to become the business-domain owner of the client's graph
logic—your job is to build a reliable orchestration layer that respects it.
This is not a "PowerPoint architect" role
You will:
- Write production code
- Own failure modes and recovery semantics
- Ship to dev/test/prod with a real SDLC
- Produce runbooks that on-call engineers can actually use
If you prefer advisory-only architecture or you need someone else to "operationalize" your designs, this will not be a fit.
Required qualifications (non-negotiable)
Hands-on architecture + delivery
- 8+ years in data engineering / platform engineering, including 3+ years as a technical lead/architect shipping production systems
- Proven ownership of a system from design → implementation → production rollout → operational handoff
Databricks + Spark depth
- Deep expertise with Databricks (Jobs/Workflows, cluster configs, execution semantics, failure patterns)
- Deep Spark fundamentals: shuffles, partitioning, skew, caching, job planning, and debugging via logs/event timelines
- (The client's engineers operate at this level.)
Durable orchestration / workflow systems
- Strong experience with orchestration frameworks beyond UI-based DAG builders:
- Temporal (preferred), Cadence, AWS Step Functions, Argo Workflows, Airflow at scale with custom state/recovery semantics, etc.
- You must understand: idempotency, deterministic execution, retries vs replays, compensation patterns, state persistence, and workflow versioning
Python + API integration
- Strong production Python (packaging, testing, typing discipline, structured logging)
- Experience integrating with REST APIs / SDKs (Databricks Jobs API patterns, auth, rate-limits, retries)
Cloud + security
- AWS fluency: IAM, networking boundaries, secrets management, KMS, deployment patterns
- Comfortable partnering with Cloud Engineering but able to lead technically (you can't outsource all infra thinking)
Operating model
- Able to be 100% dedicated to this workstream during critical phases (no "50% attention" model)
- Comfortable working across time zones (US Central + Europe overlap)
Preferred qualifications (strongly preferred)
- Temporal in production (or Cadence) with real incident learnings
- Experience implementing "meta-orchestrators" that coordinate other orchestrators/systems
- OpenTelemetry / structured observability patterns (logs + metrics + traces)
- Experience with large "DAG of DAGs" pipelines, long runtimes, expensive failure restarts
- Databricks certifications (or willingness to obtain/renew quickly as part of partner commitments)
How we hire:
Introductory Call (20 min): Short conversation with our Recruiter to discuss your background and expectations.
Deep technical interview (1 - 1,5 h): (Spark/Databricks + orchestration semantics) and System design exercise (go though a durable orchestration wrapper with step-level resume)
Client Interview (45 min - 1 h): Required in this case

Entrada AI
Entrada AI is a global Databricks consulting partner that delivers AI and data services focusing on industry solutions and business results. They empower customers at every stage of the data and AI journey, including dat...
Senior Data Architect
Senior Data Architect