Data Engineer (Scala, Spark & Azure)

Scala

Leśna 1, Trójmiasto

Devopsbay

Full-time

B2B

Senior

Remote

6 304 - 8 223 USD

Net per month - B2B

Tech stack

Spark

advanced

JSON

advanced

SQL

advanced

Scala

advanced

Microsoft Azure

regular

Azure

regular

Python

regular

Job description

We are seeking a highly skilled Senior Data Engineer with deep expertise in Scala, Spark, and Microsoft Azure to join our dynamic team. This role offers an exciting opportunity to lead data engineering initiatives, optimize complex pipelines, and collaborate with cross-functional teams to deliver high-quality data solutions.

Technical Requirements

7+ years of professional experience in Data Engineering.

6-7 years of hands-on experience with Scala, Spark, and Azure in Data Engineering projects.

Proficient in programming languages such as Python and Scala.

Expertise in data pipeline tools and processes.

Strong knowledge of SQL and both relational/non-relational databases.

Familiarity with GitHub for version control and CI/CD workflows.

Solid understanding of Microsoft Azure data services.

Experience with JSON-based configurations for managing multiple data zones.

Knowledge of containerization and orchestration tools (e.g., Docker, Kubernetes) is a plus.

Strong problem-solving skills, fluent English, and excellent communication abilities.

Main Responsibilities

Data Pipeline Maintenance: Continuously monitor and maintain data pipelines for ingesting and transforming data using Scala and SQL on Spark. Diagnose and resolve errors and performance bottlenecks, addressing data discrepancies, ambiguities, and inconsistencies as needed.

Technical Support and Version Control: Provide technical support for data analysis while managing source code and configuration artifacts via GitHub. Deploy code artifacts through GitHub Workflows/Actions.

Technical Leadership: Offer hands-on technical guidance and leadership in developing Spark-based data processing applications using Scala, with a focus on Microsoft Azure Synapse Spark Runtime.

Pipeline Optimization: Design and enhance data pipelines to streamline processing across various stages of the Medallion architecture using Azure Synapse Pipelines.

Data Management: Oversee data ingestion processes, enforce data quality checks using tools like DQ, and manage validation and error-handling workflows.

Configuration Management: Develop and manage configuration settings using JSON-based configurations (e.g., ApplicationConfig, TableConfig) for multiple data zones.

Collaboration: Work closely with data scientists, analysts, and cross-functional teams to ensure smooth integration of data engineering efforts with marketing and business strategies.

Logging and Auditing: Implement and manage logging, auditing, and error-handling practices to maintain data processing integrity, leveraging tools like Azure Log Analytics and KQL queries where applicable.

Testing and Quality Assurance: Conduct unit testing with tools like ScalaTest and maintain rigorous data quality checks to ensure dependable processing outcomes.

Required Technical Skills