bungeetech-talent~1d ago
New
New
Data Engineer II
Data EngineerData
0 views0 saves0 applied
Quick Summary
Overview
Job Summary: Building on the foundation of the SDE-I role, the DE- II position takes on a greater level of responsibility and leadership. You'll play a crucial role in driving the evolution and efficiency of our data collection and analytics platform, capable of handling terabyte-scale data and…
Key Responsibilities
Lead the design, development, and optimization of large-scale data pipelines and infrastructures using technologies like Apache Airflow, Spark, Kafka, and more.
Technical Tools
airflowawscassandradatadogdigitaloceandockerdynamodbelasticsearchgcpgrafanajavascriptjenkinskafkakubernetesnew-relicplaywrightprometheussparksqlterraformci-cddistributed-systemsetlperformance-optimizationsecurity-best-practicesstreaming-data
Responsibilities
~1 min readResponsibilities
~2 min read- →Lead the design, development, and optimization of large-scale data pipelines and infrastructures using technologies like Apache Airflow, Spark, Kafka, and more.
- →Architect and implement distributed data processing solutions to handle terabyte-scale datasets and billions of records efficiently across multi-region cloud infrastructure (AWS, GCP, DO).
- →Develop and maintain real-time data processing solutions for high-volume data collection operations using technologies like Spark Streaming and Kafka.
- →Optimize data storage strategies using technologies such as Amazon S3, HDFS, and Parquet/Avro file formats for efficient querying and cost management.
- →Build and maintain high-quality ETL pipelines, ensuring robust data collection and transformation processes with a focus on scalability and fault tolerance.
- →Collaborate with data analysts, researchers, and cross-functional teams to define and maintain data quality metrics, implement robust data validation, and enforce security best practices.
- →Mentor junior engineers (SDE-I) and foster a collaborative, growth-oriented environment.
- →Participate in technical discussions, contributing to architectural decisions, and proactively identifying improvements for scalability, performance, and cost-efficiency.
- →Ensure application performance monitoring (APM) is in place, utilizing tools like Datadog, New Relic, or similar to proactively monitor and optimize system performance, detect bottlenecks, and ensure system health.
- →Implement effective data partitioning strategies and indexing for performance optimization in distributed databases such as DynamoDB, Cassandra, or HBase.
- →Stay current with advancements in data engineering, orchestration tools, and emerging cloud technologies, continually enhancing the platform’s capabilities
Requirements
~1 min read- 4-5+ years of hands-on experience with Apache Airflow and other orchestration tools for managing large-scale workflows and data pipelines.
- Expertise in AWS technologies, Athena, AWS Glue, DynamoDB, Apache Spark, PySpark, SQL, and NoSQL databases.
- Experience in designing and managing distributed data processing systems that scale to terabyte and billion-scale datasets using cloud platforms like AWS, GCP, or Digital Ocean.
- Proficiency in web crawling frameworks, including Node.js, HTTP protocols, Puppeteer, Playwright, and Chromium for large-scale data extraction.
- Experience with monitoring and observability tools such as Grafana, Prometheus, Elasticsearch, and familiarity with monitoring and optimizing resource utilization in distributed systems.
- Strong understanding of infrastructure as code using Terraform, automated CI/CD pipelines with Jenkins, and event-driven architecture with Kafka.
- Experience with data lake architectures and optimizing storage using formats such as Parquet, Avro, or ORC.
- Strong background in optimizing query performance and data processing frameworks (Spark, Flink, or Hadoop) for efficient data processing at scale.
- Knowledge of containerization (Docker, Kubernetes) and orchestration for distributed system deployments.
- Deep experience in designing resilient data systems with a focus on fault tolerance, data replication, and disaster recovery strategies in distributed environments.
- Strong data engineering skills, including ETL pipeline development, stream processing, and distributed systems.
- Excellent problem-solving abilities, with a collaborative mindset and strong communication skills.
Location & Eligibility
Where is the job
Chennai, India
On-site at the office
Who can apply
IN
Listing Details
- First seen
- May 6, 2026
- Last seen
- May 7, 2026
Posting Health
- Days active
- 0
- Repost count
- 0
- Trust Level
- 42%
- Scored at
- May 6, 2026
Signal breakdown
freshnesssource trustcontent trustemployer trust
External application · ~5 min on bungeetech-talent's site
Please let bungeetech-talent know you found this job on Jobera.
4 other jobs at bungeetech-talent
View all →Explore open roles at bungeetech-talent.
Similar Data Engineer jobs
View all →Sr Principal Data Engineer
Senior Data Engineer I
Remote
Data Engineer, Azure - Remote, Latin America
Full-TimeRemote
Data Engineer, Azure - Remote, Latin America
Full-TimeRemote
Data Engineer, Azure - Remote, Latin America
Full-TimeRemote
Data Engineer, Azure - Remote, Latin America
Full-TimeRemote
Newsletter
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
A
B
C
D
No spam. Unsubscribe at any time.