Software Engineer, Data & AI Platform (m/f/x)

Germany·Berlinfull-timemid

Software EngineerSoftware Engineering

0 views0 saves0 applied

Apply Now

Quick Summary

Key Responsibilities

Building online and offline evaluation systems for LLM agents, including pipelines that use golden datasets, ground-truth data, human review workflows, and experiment results.

Requirements Summary

Have strong Python and/or backend engineering experience. Have strong SQL skills and are comfortable working with large datasets. Have deployed and operated systems in the cloud, ideally on GCP.

Technical Tools

Software EngineerSoftware Engineering

We’re Cortea, a Berlin startup transforming audits with AI. Manual, document-heavy audits waste expert time while demand keeps rising. Our AI-powered software and specialized AI agents remove the repetitive work so auditors can focus on judgment.

Backed by top-tier VCs with >10m funding, with a working product and paying customers, we’re rapidly scaling.

We value first-principles thinking, speed, trust, and kindness. We build side by side in our Berlin office.

We are looking for an Engineer with strong data engineering and AI systems experience to build the data, evaluation, and observability foundation for production-grade LLM agents used in complex audit workflows.

This role sits at the intersection of backend engineering, data engineering, AI infrastructure, and LLM operations. You will work hands-on in our backend and agent architecture, building the systems that help us evaluate, monitor, debug, optimize, and continuously improve AI agents in production.

This is not a traditional analytics, BI, or dashboarding role. You should expect to write production code, design infrastructure, work inside backend systems, and directly improve the quality, cost, reliability, and performance of LLM-based agents.

Responsibilities

~1 min read

You will help building and operating the technical infrastructure around our AI agents, with a focus on data infrastructure, evaluation, observability, and optimization. Your work will include:

→
Building online and offline evaluation systems for LLM agents, including pipelines that use golden datasets, ground-truth data, human review workflows, and experiment results.
→
Creating automated quality gates so changes to prompts, context, models, or agent logic can be tested before reaching production.
→
Analyzing large volumes of agent traces and executions to identify failure modes, quality regressions, latency issues, reliability gaps, and cost optimization opportunities.
→
Working with columnar data stores and analytical databases such as BigQuery, ClickHouse, or similar technologies.
→
Building reliable data retention and replay mechanisms for long-term analysis of production agent behaviour.
→
Creating observability tooling for trace analysis, experiment monitoring, production dashboards, logging, tracing, and debugging.
→
Working inside our core backend and agent architecture, including building new agents or improving existing agents when needed.

Requirements

~1 min read

You will fit into this role if you:

Have strong Python and/or backend engineering experience.
Have strong SQL skills and are comfortable working with large datasets.
Have deployed and operated systems in the cloud, ideally on GCP.
Have practical experience designing data pipelines, ETL/ELT workflows, event-processing systems, or feedback loops for production data.
Are comfortable working with analytical databases, data warehouses, columnar stores, and high-volume event or trace data.
Understand system design, reliability, observability, monitoring, logging, debugging, and operational trade-offs.
Can work in complex existing systems and quickly build a mental model of how they operate.
Bring senior-level engineering judgment: you can make architectural decisions, communicate trade-offs, and build systems that other engineers can extend.
Are comfortable with ambiguity, able to reason from first principles, and excited to build infrastructure for AI systems that are actively used in production.

Nice-to-haves that are a plus:

Building infrastructure around LLM-based products or agentic systems, including optimizing LLM usage, context windows, reasoning tokens, or model selection.
Working with production traces from complex distributed systems.
Building internal platforms for engineers, domain experts, or operations teams.
Using workflow orchestration systems such as Temporal or similar.
Familiarity with audit, finance, compliance, or other high-accuracy domains.
Experience in an early-stage startup or fast-moving engineering environment.