Senior LLMOps & Cloud Infrastructure Engineer (DevOps)
We are a provider of digital transformation and technology consulting services with a portfolio of solutions for both clients who do not yet have Salesforce and large organizations that work on Salesforce and use its extensive capabilities ☁.
We also provide body and team leasing services in IT, providing specialists in various fields.
Model: remote
Employment type: full-time
The Mission:
Standard CI/CD isn’t enough for this project. We are building a non-deterministic AI system serving users in areas with poor connectivity. Your mission is to design, deploy, and maintain the Azure cloud infrastructure and the specific LLMOps ecosystem required to run advanced GenAI agents securely, efficiently, and resiliently.
Who You Are & What You’ll Do:
GenAI Stack Master: Hands-on experience setting up infrastructure for modern AI tools, including Langfuse (observability), LangChain/LangGraph environments, and Vector & Knowledge Graph Databases.
Cloud Architecture & Resiliency: Expert in Azure with a proven track record of deploying robust staging, UAT, and production environments. Implement rate-limiting, circuit breakers, and comprehensive alerting to guarantee system stability.
Asynchronous Payload Handling: Experienced in managing edge-device audio payloads (Voice-to-Text) over poor connectivity (e.g., 3G networks) using asynchronous queueing technologies such as Celery or RabbitMQ to prevent backend overload.
Local & Cloud Flexibility: Comfortable deploying local models for testing and sandboxing, as well as managing enterprise-level API endpoints.
Performance & Cost Optimization: Collaborate with software engineers to optimize infrastructure using caching layers, load balancing, and other techniques to meet strict latency targets (<3s for text) while managing cloud compute costs.
Security & Compliance: Implement secure architecture practices to handle sensitive data, including encryption in transit and at rest, role-based access control, and secure API management.
Monitoring & Observability: Set up end-to-end observability and monitoring pipelines for GenAI workloads, including LLM performance tracking, error rates, and system health dashboards.
Automation & CI/CD for AI: Build and maintain automated deployment pipelines for LLM models and supporting services, integrating infrastructure-as-code tools like Terraform, Azure ARM templates, or Bicep.
Collaboration & Agile Practices: Work closely with cross-functional teams (Data Scientists, AI Engineers, Product) to ensure smooth deployment of LLMs and related services.
Requirements:
Strong experience in Azure cloud services (VMs, AKS, Storage, Functions, Networking)
Hands-on experience with LLMOps and AI deployment pipelines
Knowledge of LangChain, LangGraph, and RAG pipelines
Experience with Vector Databases (e.g., Pinecone, Weaviate) and Knowledge Graphs
Proficiency with asynchronous queueing systems like Celery, RabbitMQ, or similar
Solid understanding of performance optimization, caching, and load balancing
Familiarity with containerization and orchestration (Docker, Kubernetes, AKS)
Knowledge of infrastructure-as-code (Terraform, ARM templates, Bicep) and CI/CD pipelines
Experience with monitoring and observability tools for AI workloads (e.g., Langfuse, Prometheus, Grafana)
Strong understanding of security best practices in cloud environments
Excellent problem-solving skills, proactive attitude, and ability to work independently in a fast-paced environment
Fluent in English
Experience working in Agile/Scrum environments is a plus.
We offer:
Remote work model
B2B contract
Benefits package,
Daily support from team leaders,
Dedicated certification budget,
Integration trips/events.
Senior LLMOps & Cloud Infrastructure Engineer (DevOps)
Senior LLMOps & Cloud Infrastructure Engineer (DevOps)