Backend Engineer (Data Processing)
As a Backend Engineer at Shelf, you will focus on building robust backend services for largescale data processing. We use Python (and Node.js) to create data pipelines and handle data from diverse storage solutions. Your work will center on ensuring data flows efficiently, remains well orchestrated, and can operate seamlessly at scale. You’ll be tackling complex data ingestion, transformation, and orchestration challenges, building the core infrastructure that powers our platform.
But we're not just moving data; we're focused on solving the crucial data quality problems that underpin successful AI initiatives. Shelf is uniquely positioned to address these challenges head-on, as we provide data quality solutions and data enrichment capabilities that are key to building accurate and trustworthy AI systems. We're not simply building a platform, we're building the very foundation for the next generation of AI. This means your work will directly impact the accuracy, reliability, and ultimately, the usefulness of AI across the enterprise landscape.
Do you enjoy crafting efficient, testable code and want to be part of the engine behind advanced data processing?
Do you have a passion for building truly robust and accurate systems?
Are you looking for fast professional growth in a very demanding and challenging environment?
If you can answer these three questions confidently with “Yes!”, then this might just be the role for you: a unique opportunity to build products that have a huge impact on real-world AI applications.
Responsibilities
Design, implement, and optimize our distributed ETL pipeline, focusing on background processing logic, data transformation, and scalability.
Develop modular and composable components capable of efficiently processing large-scale data across a diverse range of storage solutions, including S3, RDS/PostgreSQL, Elasticsearch, DynamoDB, data warehouses, and data lakes.
Implement ML model integrations within the data pipeline, working closely with Data Scientists on model deployment and data flow.
Develop clean, maintainable code in Python, adhering to best practices in observability, cost-efficiency, and robust error handling.
Proactively identify and address performance bottlenecks and inefficiencies in current systems, proposing solutions to improve scalability and reliability, while ensuring continuous production stability through thorough testing and monitoring practices.
Share your knowledge, participate in code reviews, and advocate for best practices to advance our backend development standards.
Requirements
Must have:
Over 3 years of professional software engineering experience, including more than 1 year specializing in Python.
Deep understanding of distributed systems, concurrency patterns, and ETL-oriented workflows.
Comfortable working with diverse data stores (SQL and NoSQL), including schema design and performance tuning at scale.
Proven experience building scalable backend applications on either AWS or Azure, including a strong understanding of their respective services for compute, storage, and data processing.
Experience with event-driven architectures, distributed processing techniques, CQRS.
Ability to write well-structured, testable code with thoughtful abstractions and interfaces.
Strong problem-solving skills, including the ability to troubleshoot performance bottlenecks and legacy code.
Upper-Intermediate or better English skills for technical communication and documentation
Nice to have:
Experience with cloud-based data lakes and data warehouses is a plus.
Any hands-on experience with NLP, unstructured data processing, Node.js/TypeScript, or RAG pipelines is a significant plus.
Ability to effectively present work both verbally and visually, and create clear, well-structured documentation.
What Shelf Offers:
B2B contract.
Company Stock Options.
Hardware: MacBook Pro.
Modern technical stack. Develop open-source software.
GitHub Copilot subscription.
GitHub Copilot subscription.
Access to Claude Code, OpenAI Codex, TypingMind, and MCP Servers.
Why Shelf:
GenAI will be at least a $4 Trillion market by 2032 and Shelf is a core infrastructure that enables GenAI to be deployed at scale
Our Leadership Team has deep knowledge management and AI domain expertise and enterprise SaaS background to execute this plan
We've been helping our customers prevent knowledge mismanagement since our founding in 2017
We have raised over $60 million in funding and our investors include Tiger Global, Insight Partners, Connecticut Innovations, and others
We have high velocity growth powered by the most innovative product in our category, 3X growth for 3 years in a row
We now have over 100 employees in multiple U.S. states and European countries, and we have ambitious hiring goals over the next few months
About Shelf
Getting GenAI to work is mission-critical for virtually every modern enterprise. But there’s one thing that consistently holds enterprises back: bad data. Even the best AI strategy will fail without the right Data Strategy. The hardest challenge? Unstructured data, where most of the business’s knowledge is stored.
Poor data quality is the #1 reason GenAI projects never make it into production. That’s the problem we solve.
Shelf is the leading platform for preparing unstructured data for GenAI.Our mission is straightforward: empower humanity with better answers everywhere.
We help some of the best brands like Amazon, Mayo Clinic, AmFam, and Nespresso solve their data issues, enabling their AI to deliver accurate, production-ready results from Day 1.
Simply put, Shelf unlocks AI readiness. We provide the core infrastructure that enables GenAI to be deployed at scale. We help companies deliver more accurate GenAI answers by eliminating bad data in documents and files before they go into an LLM and create bad answers.
We’re not doing this alone. Shelf is proud to partner with Microsoft, Salesforce, Snowflake, Databricks, OpenAI, and other global leaders who are bringing GenAI to the enterprise.
Explore our tech stack → stackshare.io/shelf/shelf
Browse our open-source projects → github.com/shelfio
And there is more.Shelf is an AI-first company with everything we do.
We use the most advanced LLMs and tools available: OpenAI, Claude, Codex, TypingMind, MCP Servers as well as AI agents to power almost every part of our business internally.
Working at Shelf means you are exposed to the absolute cutting edge of what the AI world has to offer today