MLOps Engineer (Databricks)
At Grape Up, we transform businesses by unlocking the potential of AI and data through innovative software solutions.
We partner with industry leaders in the automotive and aviation to build sophisticated Data & Analytics platforms that support production machine learning and AI use cases. Our solutions provide comprehensive capabilities spanning data storage, management, advanced analytics, machine learning, enabling enterprises to accelerate innovation and make trusted, data-driven decisions.
Responsibilities
Partner with the Data Science teams to harden experimental code and take it from a sandbox environment into production by applying engineering best practices
Design, implement and own scalable ML infrastructure and deployment pipelines capable of handling high-volume model training and inference workloads
Build and maintain automated CI/CD pipelines for ML model development, testing, validation, and deployment, integrating with customer platforms and Databricks environments
Define, monitor and continuously improve KPIs covering model performance, data quality, system reliability, deployment velocity, and operational efficiency
Establish and implement MLOps best practices including experiment tracking, model versioning, feature stores, and governance (e.g. MLflow, Unity Catalog)
Optimize ML infrastructure for cost efficiency and performance through automated scaling and resource management
Requirements:
Master’s degree in computer science, Machine Learning, Data Engineering, or a related field
3+ years of professional experience in MLOps, ML Engineering, or DevOps with a strong focus on production ML systems
Proven experience designing, implementing, and operating production ML systems preferably on Azure or other cloud platforms (AWS, GCP)
Strong hands-on experience with ML lifecycle and deployment tools such as MLflow, Kubeflow, SageMaker, or Vertex AI
Experience with containerization (Docker) and orchestration (Kubernetes) for ML workloads
Expert proficiency in Python and practical experience with ML frameworks (TensorFlow, PyTorch, scikit-learn)
Experience with Infrastructure as Code tools (Terraform, Pulumi) and CI/CD platforms (GitHub Actions, GitLab CI, Jenkins)
Experience implementing monitoring, observability, and model performance tracking solutions (e.g. Prometheus, Grafana, model monitoring tools)
Experience with workflow orchestration tools (Databricks Workflows, Apache Airflow, or Dagster)
Knowledge of feature stores and data versioning tools (Feast, DVC or similar)
Strong problem-solving skills and ability to work independently in fast-paced environments
Fluency in English, both written and spoken
Nice to have:
PhD degree in computer science, Data Engineering, AI, or a related field (completed or in progress)
Databricks Certified Machine Learning Professional
Experience with streaming tools like Kafka or Azure Event Hubs etc.
MLOps Engineer (Databricks)
MLOps Engineer (Databricks)