Role Responsibilities:
- As part of the OneData Federated Delivery team working in the Firm’s Secure Foundations Tech Ecosystem pillar, you will build and maintain data pipelines from various sources.
- Your work will help our leadership understand and act upon its strategic outcomes, and to support product and service teams to focus on right user sets and measure impact of different GenAI productivity initiatives.
- You will source raw data from Jira, GitHub, AWS, ServiceNow and other cloud infrastructure and make it ready for analysis by your team in a Snowflake DWH, including processing, cleaning, joining and modelling.
- You will be part of a nimble data team that includes data engineering and data analytics colleagues. You will collaborate closely to ensure the data is fit for purpose per requirements and delivers value to the business.
- You will follow and reinforce standards for best data engineering practices, such as CI/CD, observability, and security.
Skills:
Primary Skills:
- New development, enhancements, defect resolution, and production support of ETL development using AWS native services.
- Integration of data sets using AWS services such as Glue and Lambda functions.
- Utilization of AWS SNS to send emails and alerts.
- Authoring ETL processes using Python and PySpark.
- ETL process monitoring using CloudWatch events.
- Connecting with different data sources like S3 and validating data using Athena.
- Proficiency in Agile methodology.
- Experience with DBT and SW engineering practices (TerraForm, Dev/QA/Prod, CI/CD using GitHub Actions, guardrails, security tests)
- Extensive working experience with advanced SQL and a complex understanding of SQL (Snowflake preferred)
Secondary Skills:
Experience working with Snowflake and understanding of Snowflake architecture, including concepts like internal and external tables, stages, and masking policies.
Competencies / Experience:
- Deep technical skills in AWS Glue (Crawler, Data Catalog)
- Hands-on experience with Python
- SQL experience
- CloudFormation and Terraform
- CI/CD GitHub Actions
- DBT modelling
- Good understanding of AWS services like S3, SNS, Secrets Manager, Athena, and Lambda
- Additionally, familiarity with any of the following is highly desirable: Jira, GitHub, Snowflake.
Must have skills:
- ETL Data pipelines
- AWS SNS/SQS
- Python/PySpark
- CloudWatch/Airflow
- TerraForm