Synthetic Data Engineer (AI Data/Training)
Quick Summary
Design domain-specific synthetic data generation (SDG) pipelines via self-instruct and constitutional prompting. Implement automated quality scoring and de-duplication systems.
Proven experience building large-scale data pipelines (Airflow, Spark, Ray). Deep knowledge of prompt engineering for data generation. Familiarity with dataset distillation and bias mitigation.
We are seeking a talented and innovative Synthetic Data Engineer. In this role, you will design and implement domain-specific synthetic data generation pipelines, ensuring high-quality data management for training loops. Your expertise will drive the success of data processing and model training within the organization.
Responsibilities
~1 min read- →Design domain-specific synthetic data generation (SDG) pipelines via self-instruct and constitutional prompting.
- →Implement automated quality scoring and de-duplication systems.
- →Manage data pipelines that feed directly into SFT and DPO training loops.
Requirements
~1 min read- Proven experience building large-scale data pipelines (Airflow, Spark, Ray).
- Deep knowledge of prompt engineering for data generation.
- Familiarity with dataset distillation and bias mitigation.
Location & Eligibility
Listing Details
- Posted
- April 24, 2026
- First seen
- April 24, 2026
- Last seen
- May 2, 2026
Posting Health
- Days active
- 7
- Repost count
- 0
- Trust Level
- 35%
- Scored at
- May 2, 2026
Signal breakdown

Web3 and AI talent recruitment agency based in Hong Kong with 700+ placements globally
Please let Hyphenconnect know you found this job on Jobera.
4 other jobs at Hyphenconnect
View all →Explore open roles at Hyphenconnect.
Similar Data Engineer jobs
View all →Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.