Lead Data Engineer (PySpark, Airflow, Azure) – Scalable Data Pipelines
We’re looking for an experienced Senior Data Engineer to design, build, and optimize large-scale data pipelines powering analytics and machine learning workloads. This role is ideal for someone who is hands-on, performance-oriented, and comfortable leading other engineers while owning end-to-end data workflows.
You’ll work on both batch and real-time processing, take ownership of Spark performance tuning, and help enforce best practices around data quality, governance, and reliability.
⸻
Responsibilities
• Design, develop, and optimize scalable data pipelines using Python, PySpark, Apache Spark, and Airflow
• Build and maintain batch and streaming data processing systems on Spark
• Design and manage Airflow DAGs to orchestrate complex, dependency-heavy workflows
• Implement data partitioning, caching, and Spark performance tuning to handle large datasets efficiently
• Ensure data quality, governance, security, and reliability across the data lifecycle
• Monitor, troubleshoot, and optimize data jobs, SLAs, and pipeline dependencies
• Manage cloud infrastructure (Azure) for data workloads, including cost optimization
• Implement CI/CD pipelines for data workflows using Git, Docker, and Infrastructure-as-Code tools
• Support analytics and ML use cases by working with structured and unstructured data
• Lead and mentor other data engineers, providing architectural guidance and code reviews
• Promote best practices in coding standards, documentation, and version control
• Collaborate effectively with distributed, remote teams in an Agile environment
⸻
✅ Requirements
• 8+ years of hands-on experience in Data Engineering
• Strong expertise with Apache Spark / PySpark, including internals such as:
• RDDs, DataFrames, DAG execution, partitioning, shuffles, and caching
• Proven experience building and operating Airflow DAGs (scheduling, dependencies, retries, SLAs)
• Advanced Python and SQL skills with a focus on performance and maintainability
• Solid experience with Azure data and compute infrastructure
• Working knowledge of Docker, Kubernetes, Terraform, and CI/CD best practices
• Strong problem-solving skills and ability to optimize large-scale data processing systems
• Prior experience leading or mentoring engineers
• Comfortable working in Agile/Scrum environments
• Excellent communication skills and ability to collaborate with remote teams
⸻
⭐ Nice to Have
• Experience with streaming frameworks (Spark Structured Streaming, Kafka, Event Hubs)
• Familiarity with data governance, lineage, and observability tools
• Experience supporting ML or advanced analytics pipelines
• Background in cost-efficient Spark optimization at scale
Apply Now
Apply Now