Data Quality Tester
Our client, a major player in the Energy industry, is integrating a second-layer SCADA system for renewable assets into a modern Databricks environment. We need a precision-focused Data Quality Tester to act as the gatekeeper of data integrity. You will design and execute automated checks on incoming data streams, ensuring that the transformation logic is sound and the business rules are strictly followed before the data hits the lake.
Details
Start Date: ASAP
Duration: 6 months (with a high probability of extension).
Location: 100% Remote.
Language: English.
Responsibilities
Automated Testing: Develop and run automated data and integration tests within Databricks using Python, PySpark, and SQL.
E2E Pipeline Verification: Conduct end-to-end testing of the data journey—from SCADA through AMQP and into Databricks.
Data Validation: Verify complex transformation logic and ensure data compliance with predefined business rules.
Reporting & Architecture: Use Azure services (Data Lake, Blob Storage, SQL) to capture, store, and prepare test results for analysis.
CI/CD Integration: Work with DevOps teams to bake automated tests into the Azure DevOps pipelines for continuous validation.
Collaboration: Act as the bridge between Data Engineers, SCADA specialists, and Cloud Architects to ensure full traceability and documentation.
Requirements
Must-have:
Technical Stack: Strong proficiency in Python, PySpark, and SQL.
Platform: Hands-on experience with Databricks.
Azure Ecosystem: Familiarity with Azure services for data storage and result preparation.
DevOps: Proven experience with Azure DevOps for test management and CI/CD integration.
Nice-to-have:
Domain Knowledge: Previous experience with SCADA systems or the Energy/Renewables sector.
Data Quality Tester
Data Quality Tester