AI Data Engineer (Python & LLMs)
We are automating tax reporting and accounting processes using AI. You will build pipelines that scrape data from complex external sources and extract high-precision structured data from financial documents (PDFs, invoices, tax forms) using Python and LLMs.
You will join a tight-knit, fast-moving engineering team in a scaling company, where your code will have an immediate impact on our products.
You will bridge the gap between law and code, working directly with tax and legal experts to transform complex tax regulations into precise, compliant algorithmic solutions.
Requirements:
Core (Must-Haves):
- Python Mastery: 2+ years of professional experience writing clean, maintainable code.
- Advanced Web Scraping: Proficiency with Playwright (preferred) or Selenium., including experience bypassing anti-bot measures (e.g. CAPTCHAs, rate limits, fingerprinting). Some experience with "BeautifulSoup" and "requests" libraries is also required.
- Data Extraction: Strong hands-on experience with Regex, and other techniques to transform messy, real-world data (raw HTML, malformed JSON files, OCR text) into structured formats.
- Data Cleaning: Proficiency with Regex, Pandas, and NumPy for data cleaning and preprocessing.
- Database Fundamentals: Solid understanding of SQL or NoSQL databases.
- LLM Integration: Experience in prompt engineering for LLM APIs, including few-shot prompting, defining output schemas, and handling/parsing responses programmatically.
- PDF handling: Hands-on experience with libraries like pdfplumber, docling or pymupdf to extract data from text-based PDF files of varying qualities.
- Mentorship: Ability to perform high-quality code reviews and guide junior developers when necessary.
- Task Delegation: Ability to break down complex architectural features into clear, manageable tasks for the junior developer to execute.
- Task Planning: Ability to decompose project stages into weekly sprints, ensuring the team stays unblocked and delivers features in a timely manner.
- English Proficiency (B2+): Ability to analyze international financial documents, technical documentation and prompt LLMs effectively.
Preferred:
- Vector Search & RAG: Experience with one of the vector databases (e.g., Qdrant, Milvus) and Retrieval-Augmented Generation workflows.
- Async & Concurrency: Experience with asynchronous programming (asyncio) and concurrency, specifically for efficient scraping.
Nice to have:
- LLM Optimization: Experience with Batch LLM APIs, concurrent requests, and caching strategies to optimize costs and latency.
- PostgreSQL: Specific experience with Postgres.
- Frontend Basics: Familiarity with JavaScript or TypeScript for building internal GUIs.
- Domain Expertise: Background in Fintech, Regtech, or TaxTech, with knowledge of local regulations (e.g., Polish VAT or KSeF).
AI Data Engineer (Python & LLMs)
AI Data Engineer (Python & LLMs)