ML/LLM Ops Engineer
Sitting between ML teams, LLM product squads, and platform engineering, this role is about building the foundations that make both traditional ML and modern LLM systems dependable at scale. With a mix of infrastructure, tooling, and experimentation workflow design, you’ll help clients move from promising models to robust, observable, and well-governed AI services.
ML & LLM Ops Engineer
Calimala partners with enterprises across the Gulf and Europe to design, build, and scale Data & AI teams. As an ML & LLM Ops Engineer, you’ll join a network of practitioners who understand how models are built—and what it takes to run them in production—whether that’s a classic forecasting model, a RAG pipeline, or an LLM-based copilot embedded in business workflows.
This role sits at the heart of how ML and LLM solutions are deployed, monitored, and evolved. You’ll design and operate the tooling, pipelines, and platforms that support the full lifecycle—from experimentation and evaluation to deployment, guardrails, monitoring, and retraining.
What you'll be doing
As an ML & LLM Ops Engineer at Calimala, you’ll lead and support engagements where AI is a core part of the solution and reliability, safety, and cost all matter. One project might involve setting up a standardized ML platform across teams; another could focus on building and operating LLM-powered services with retrieval, prompt management, and evaluation loops.
“We treat ML and LLM systems as living products: versioned, evaluated, monitored, and improved based on real-world behaviour—not just offline metrics.”
You’ll work closely with ML Engineers, data engineers, platform teams, and security to define how models move from notebooks into production services. You’ll help implement CI/CD for ML and LLM pipelines, build feedback and evaluation loops (including human-in-the-loop where needed), and make sure platform decisions balance speed, control, security, and spend.
Who we're looking for
You’re comfortable working across infrastructure, ML tooling, and LLM-specific stacks. You enjoy designing systems that other teams build on, and you have a strong instinct for automation, observability, and clear standards—especially where AI touches real users and critical processes.
You’ve likely worked in MLOps, platform, or applied AI environments where models had to be auditable and stable, not just impressive in demos. At Calimala, we value depth, accountability, and partnership—you take ownership of the ecosystems you build and help teams use them effectively and safely.
Strong experience with MLOps practices and tooling across the model lifecycle
Proficiency in Python and familiarity with core ML frameworks (e.g. scikit-learn, PyTorch, TensorFlow, XGBoost or similar)
Hands-on experience deploying and operating LLM-based systems (e.g. RAG pipelines, prompt orchestration, vector stores, guardrails)
Experience with CI/CD and infrastructure-as-code (e.g. GitHub Actions, GitLab CI, Azure DevOps, Jenkins, Terraform)
Solid understanding of containerization and orchestration (e.g. Docker, Kubernetes) for serving models and LLM services in production
Familiarity with monitoring and logging for ML/LLM systems (service health, data/feature drift, model performance, LLM quality and safety signals)
Experience with at least one major cloud platform (Azure, AWS, or GCP) and its data/AI ecosystem; exposure to managed LLM services and/or open-source LLM stacks is a plus
We’re looking for practitioners who see ML & LLM Ops as an enabler for teams: people who build platforms and practices that make it easier, safer, and faster to turn both traditional models and LLMs into dependable, production-grade systems.
ML/LLM Ops Engineer
ML/LLM Ops Engineer