Senior Data Engineer / Data Architect

Data

Kościuszki, Kraków

Frontiers

Undetermined

B2B

Senior

Remote

4 594 - 6 757 USD

Net per month - B2B

Tech stack

Databricks

advanced

Spark

advanced

Azure

advanced

Python

advanced

Databases

regular

Kafka

regular

Event Hub

regular

Kubernetes

regular

Job description

Description

We are on a mission to make science open so everyone can live healthy lives on a healthy planet

Who we are

Frontiers is an award-winning open science platform and leading open access scholarly publisher.

We are one of the largest and most cited publishers globally. To date, our 200,000 freely available research articles have received more than 1 billion views and downloads and 2 million citations. Our journals span science, health, humanities and social sciences, engineering, and sustainability. And we continue to expand into new academic disciplines so more researchers can publish open access.

Be part of the publishing revolution and help us transform the way research is published, evaluated, and communicated to the world.

The Role

To empower scientists and radically improve how science is published, evaluated and disseminated to researchers, innovators and the public, we have built our own state-of-the-art Artificial Intelligence Review Assistant (AIRA). Data is at the heart of AIRA in the form of AIRA Knowledge – a rich graph of academic knowledge such as scientific publications, citation relationships between those publications, as well as authors, institutions and fields of research. This serves as the basis of all the AI/ML models used by our reviewer recommendation service and our quality checks.

We are now looking for a passionate Senior Data Engineer / Data Architect to join our growing team and help us evolve AIRA Knowledge.

Key Responsibilities

As a Senior Data Engineer, you will responsible for optimizing or even re-designing AIRA Knowledge’s data architecture to support our next generation of product features and data initiatives. You will be expanding and optimizing our data pipeline architecture, as well as optimizing data flow and collection for AIRA.

The ideal candidate is an experienced data pipeline builder and data wrangler who enjoys optimizing existing data systems or building them from the ground up. You will work together with other data engineers, software developers, data analysts and data scientists on data initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects.

Design and develop scalable end-to-end processes and pipelines to consume, integrate, and analyze complex data from different data sources;
Assemble large, complex data sets that meet functional / non-functional business requirements;
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability etc.
Build the solution required for optimal extraction, transformation, and loading of data from a wide variety of data sources using Azure Big Data, AI, ML and Analytics technologies.
Employ strong engineering mind-set in design and development of automated monitoring, alerting, and self-healing features.
Work together with other data engineers, software developers, data analysts and data scientists to strive for greater functionality in platform
Proactively identify opportunities for improving the data management standards, guidelines, and policies.

About You

Demonstrated experience in designing and developing data ingestion, data processing and analytical pipelines for big data, relational databases, NoSQL and data warehouse solutions.
Proven experience with Enterprise Data Platform architecture, Event-Driven Architecture, Data Streaming, Software Design Patterns, and Best Practices
Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.
Strong analytic skills related to working with unstructured datasets.
Build processes supporting data transformation, data structures, metadata, dependency and workload management.
A successful history of manipulating, processing and extracting value from large disconnected datasets.
Working knowledge of message queuing, stream processing, and highly scalable ‘big data’ data stores is nice to have.
Robust analytical, critical thinking, and creative problem solving skills
Accustomed to fast-paced environments with simultaneous, high-priority tasks
Good written and verbal communication skills with the ability to clearly articulate ideas to both technical and non-technical audiences (working language is English)
Technologies:
- Hands-on experience implementing data migration, streaming, and processing using cloud services, preferably in Azure or AWS.
- Advanced experience with analytics platforms: Databricks, Spark.
- Experience Azure Data Lake, Azure Delta Lake, Azure Data Factory, Azure Functions, Azure Synapse, Azure SQL, Azure Stream Analytics, Azure Analysis Service, ML Studio, Azure ML Studio.
- Experience with relational SQL and NoSQL databases.
- Knowledge of stream-processing systems: Event Hub, Kafka and Confluent.
- Knowledge of object-oriented, functional, and scripting languages such as Python, R, Scala, C++.
- Knowledge of reporting tools: PowerBI, Tableau.
- Knowledge of CD/CD tools and infrastructure: Azure DevOps and Kubernetes.