Link Group
Hundreds of IT opportunities are waiting for you—let’s make it happen! Since 2016, our team of tech enthusiasts has been building exceptional IT teams for Fortune 500 companies and startups worldwide. Join impactful projects in BFSI, CPG, Industrial, and Life Sciences & Healthcare industries. Work with cutting-edge technologies like Cloud, Business Intelligence, Data, and SAP. Unlock your potential, grow your skills, and collaborate with top global clients. Ready for your next big career move? Let’s link with us!
Job Title: ML Ops Engineer
We are seeking an experienced ML Ops Engineer to support the development and deployment of ML infrastructure for an advanced advertising platform delivering sponsored content to millions of media-enabled devices worldwide. The role focuses on designing, implementing, and maintaining reliable infrastructure for deploying, monitoring, and managing machine learning models in production environments.
Responsibilities:
Design and manage infrastructure for ML model deployment, monitoring, and maintenance.
Build and scale microservices to support ML pipelines and real-time operations.
Work with cross-functional teams to integrate ML workflows into production systems.
Develop and maintain CI/CD pipelines and infrastructure as code using modern tools.
Monitor system health and performance using real-time alerting and visualization tools.
Must-Have Qualifications:
Degree in Computer Science or a related field.
At least 2 years of industry experience working with microservices.
Experience with Infrastructure as Code (e.g., Terraform) and cloud platforms (AWS – including Sagemaker, Airflow MWAA, Step/Lambda, EC2, EMR).
Proficiency in CI/CD tools such as GitHub Actions, ArgoCD.
Familiarity with ETL workflows, big data processing, and ML frameworks (e.g., Spark, Flink, TensorFlow, PyTorch, Kafka).
Experience with containerization and orchestration tools (Docker, Kubernetes).
Strong scripting/programming skills in Python (Go is a plus).
Familiarity with communication protocols (e.g., gRPC, HTTP/2).
Experience in real-time monitoring/alerting using Prometheus, Grafana, or AWS Quicksight.
Knowledge of distributed caching systems (e.g., Redis, Aerospike).
Technologies Used:
Python, Go
REST, gRPC
AWS (Sagemaker, EC2, EMR, Lambda)
Terraform
Spark, Hadoop
Snowflake, Snowpark
GitHub Actions, ArgoCD
Airflow
Kubernetes
Grafana, Prometheus
TensorFlow, PyTorch
Redis, Aerospike
Preferred Qualifications:
3+ years of experience with low-latency, high-throughput distributed microservices.
Experience designing system architecture for ML platforms.
Familiarity with online testing frameworks (A/B testing, canary, blue-green deployment).
Experience with ML serving tools (e.g., Seldon, Triton, ONNX, TensorRT).
Background in AdTech, recommendation systems, or RTB environments.
Knowledge of additional OOP languages and SQL scripting.
Familiarity with serialization protocols such as Protobuf, FlatBuffers, Cap’n Proto.
Net per hour - B2B
Check similar offers