We are looking for a skilled Big Data Administrator to join our team. The ideal candidate will have deep experience with Hadoop, Spark, and Kafka, along with a solid background in troubleshooting, deployment, and management of large-scale big data environments. As a Big Data Administrator, you will be responsible for managing, optimizing, and ensuring the reliability of the big data infrastructure, including deployment of new services, patching of hosts, and maintaining the overall health and performance of data systems.
You will work closely with engineering teams to ensure seamless integration of services, maintain uptime, and proactively identify and resolve issues across the big data ecosystem.
- Hadoop, Spark, and Kafka Administration:
- Manage and support the deployment, configuration, and scaling of Hadoop, Spark, and Kafka clusters in production environments.
- Administer distributed storage systems and ensure that Hadoop clusters are highly available and perform optimally.
- Configure, monitor, and troubleshoot issues related to Apache Spark jobs and Kafka streams.
- Perform regular maintenance of big data platforms and associated components to ensure data integrity and availability.
- Deployment & Patching:
- Lead the deployment of new services and solutions within the big data infrastructure.
- Ensure the timely installation of patches and updates across hosts and systems to maintain security and performance.
- Develop and automate deployment pipelines for big data services using tools like Ansible, Terraform, or Jenkins.
- Troubleshooting & Performance Tuning:
- Troubleshoot and resolve issues related to the big data infrastructure, including performance bottlenecks, data loss, system outages, and other operational incidents.
- Monitor the health of Hadoop, Spark, and Kafka systems using appropriate tools (e.g., Cloudera Manager, Grafana, etc.) and take corrective action as needed.
- Conduct performance tuning for data workflows, ensuring efficient and reliable data processing in the environment.
- Monitoring & Maintenance:
- Set up and configure monitoring and alerting tools to track system performance, data quality, and usage metrics.
- Proactively monitor system logs, disk usage, network throughput, and other resources to avoid system failures.
- Perform regular backup and recovery tasks to ensure the integrity and availability of data stored in Hadoop, Spark, and Kafka.
- Capacity Planning & Scaling:
- Plan and manage the capacity and growth of the big data ecosystem, including scaling clusters as data volume increases.
- Ensure that all systems are optimized for high-performance data processing and can handle large-scale, high-volume workloads.
- Security & Compliance:
- Implement and maintain appropriate security measures across big data platforms, including user access control, data encryption, and compliance with data protection regulations.
- Work with security teams to conduct audits and maintain access controls across systems.
- Documentation & Knowledge Sharing:
- Maintain thorough documentation on system architecture, deployment procedures, troubleshooting guides, and maintenance protocols.
- Share knowledge and provide support to other team members, developers, and stakeholders in optimizing big data solutions.
- Proven experience as a Big Data Administrator or System Administrator with strong expertise in Hadoop, Spark, and Kafka.
- Hands-on experience deploying, managing, and scaling Hadoop clusters, Apache Spark jobs, and Kafka brokers in production environments.
- Strong knowledge of Linux/Unix system administration, with experience in configuring and maintaining data infrastructure.
- Expertise in troubleshooting, resolving issues, and optimizing the performance of big data environments.
- Experience with deployment automation tools (e.g., Terraform, Jenkins).
- Familiarity with data ingestion and ETL processes in a big data context.
- Knowledge of cloud-based environments (e.g., AWS, GCP, Azure) for managing big data clusters.
- Familiarity with monitoring tools like Ambari, Grafana.
- Strong scripting skills (e.g., Bash, Python, or other shell scripting languages) for automation and troubleshooting.
- Solid understanding of distributed computing principles and data storage systems (e.g., HDFS, S3, etc.).
- Strong communication skills and the ability to collaborate with engineering, development, and operations teams.