AI Agent Data Pipeline Intern
Quick Summary
XPENG is a leading smart technology company at the forefront of innovation, integrating advanced AI and autonomous driving technologies into its vehicles, including electric vehicles (EVs), electric vertical take-off and landing (eVTOL) aircraft, and robotics.
Build pipelines to ingest and organize experiment-related data from team communications, meeting notes, experiment plans, analysis documents, metrics, and evaluation results.
Strong skills in Python, SQL, and data processing. Experience working with structured and unstructured data, including text-heavy sources such as documents, notes, messages, or logs.
Responsibilities
~1 min read- →
Build pipelines to ingest and organize experiment-related data from team communications, meeting notes, experiment plans, analysis documents, metrics, and evaluation results.
- →
Use LLM-based methods to clean noisy unstructured data, extract experiment-relevant information, and convert fragmented discussions into structured records.
- →
Design data schemas, metadata, and quality checks that make experiment context easier to search, trace, and use in downstream agent workflows.
- →
Support retrieval and indexing workflows, including semantic search or RAG-style pipelines, so the agent can access relevant experiment context.
- →
Prepare curated datasets for agent evaluation and, where applicable, LLM fine-tuning or instruction-tuning.
- →
Work with MLEs and platform engineers to understand experiment workflows, data gaps, and the types of insights most useful for planning and analysis.
- →
Evaluate whether the agent uses curated experiment data correctly to generate summaries, comparisons, recommendations, and analysis insights.
- →
Contribute to internal tools, dashboards, or reports that help teams monitor experiment status, outcomes, and trends.
Requirements
~2 min read-
Strong skills in Python, SQL, and data processing.
-
Experience working with structured and unstructured data, including text-heavy sources such as documents, notes, messages, or logs.
-
Familiarity with data pipelines, ETL workflows, or large-scale data processing.
-
Interest in LLM development, LLM evaluation, agentic AI systems, RAG pipelines, semantic retrieval, prompt engineering, or LLM-assisted data processing.
-
Familiarity with machine learning workflows, model training, evaluation metrics, or MLOps concepts.
-
Strong analytical thinking and attention to data quality, consistency, and reliability.
-
Comfort working with ambiguous data sources and collaborating with ML and platform engineers to clarify requirements.
-
Previous experience building internal tools, automation scripts, or data quality checks.
-
A fun, supportive and engaging environment.
-
Infrastructures and computational resources to support your work.
-
Opportunity to work on cutting edge technologies with the top talents in the field.
-
Opportunity to make significant impact on the transportation revolution by the means of advancing autonomous driving.
-
Competitive compensation package.
-
Snacks, lunches, dinners, and fun activities.
Location & Eligibility
Listing Details
- Posted
- May 14, 2026
- First seen
- May 17, 2026
- Last seen
- May 19, 2026
Posting Health
- Days active
- 3
- Repost count
- 0
- Trust Level
- 37%
- Scored at
- May 20, 2026
Signal breakdown
Please let Xpengmotors know you found this job on Jobera.
3 other jobs at Xpengmotors
View all →Explore open roles at Xpengmotors.
Similar Intern jobs
View all →Browse Similar Jobs
Stay ahead of the market
Get the latest job openings, salary trends, and hiring insights delivered to your inbox every week.
No spam. Unsubscribe at any time.