Junior AI Data Scientist (Python)
We are automating tax reporting and accounting processes using AI. You will build pipelines that scrape data from complex external sources and extract high-precision structured data from financial documents (PDFs, invoices, tax forms) using Python and LLMs. You will also build models using LLMs, classic ML techniques, OCR solutions, or AI neural networks to classify financial documents, extract key entities with high precision, and validate data against complex accounting rules, turning unstructured files into audit-ready assets.
You will work closely with more senior developers and legal experts, gaining hands-on mentorship while solving real-world challenges in the rapidly evolving TaxTech space.
You will join a tight-knit, fast-moving engineering team in a scaling company, where your code will have an immediate impact on our products.
You will bridge the gap between law and code, working directly with tax and legal experts to translate complex tax regulations into precise, compliant algorithmic logic.
Requirements:
Core (Must-Haves):
- Python Fundamentals: Solid understanding of Python syntax, data structures (lists, dictionaries), and basic Object-Oriented Programming. Ability to independently write scripts for data processing, file handling, or basic scraping.
- Data Handling Basics: Familiarity with Pandas, NumPy, Regex gained through academic projects, bootcamps, or personal repositories.
- Web Fundamentals: Understanding of web architecture (HTML tags, CSS selectors, JSON structure). Ability to inspect webpages to identify data patterns for extraction.
- Math & Statistics Foundation: Strong grasp of fundamental concepts including Probability (understanding distribution and non-determinism), Descriptive Statistics (mean, median, variance, outliers), and basic Linear Algebra (vectors/matrices).
- Evaluation Techniques: Knowledge of how to measure model performance beyond basic accuracy, including Precision, Recall, F1-Score, Accuracy, Character Error Rate and when to use each of them (especially for unbalanced datasets).
- LLM & Prompt Engineering: Hands-on experimentation with LLM APIs (e.g. Gemini, ChatGPT) or local models, with a focus on building functional systems around them.
- Analytical Problem-Solving: A natural inclination for parsing "messy" data (broken PDF layouts, malformed HTML) and implementing clean logic for data recovery.
- English Proficiency (B2+): Required for analyzing international financial documents, technical documentation and effective prompting .
Nice to have:
- Commercial Exposure: Previous internships or freelance work in Python, Data Science, or Web Development.
- Database knowledge: Familiarity with any of the following: SQL, NoSQL or vector databases (e.g. Qdrant, Milvus)
- Scraping Experience: Experience with Playwright, Selenium, BeautifulSoup or requests.
- Project Portfolio: A GitHub repository showcasing projects involving data extraction, automation bots, or real-world dataset analysis.
- PDF handling: Experience with PDF processing libraries such as pdfplumber, docling, pymupdf to extract data from text-based PDF files of varying qualities.
- Async & Concurrency: Experience with asynchronous programming (asyncio), multiprocessing or concurrency.
- Frontend Basics: Experience with JavaScript or TypeScript (to help build internal GUIs).
- Domain Expertise: Interest or background in Fintech, Regtech, or TaxTech, including knowledge of local regulations (e.g., Polish VAT or KSeF).
Junior AI Data Scientist (Python)
Junior AI Data Scientist (Python)