Data Engineer CV: How to Show Pipeline, Scale, and Reliability in One Page
Data engineering CVs fail in two predictable ways. They list every tool in the modern data stack with no context, or they describe vague improved data quality outcomes that could mean anything. Both kill your callback rate.
This guide shows the exact structure, bullets, and skills layout for a data engineer CV that survives the ATS and earns the recruiter's attention.
What a data engineering recruiter looks for
In the first 25 seconds, a recruiter or hiring manager scans for:
- The stack: ingestion, storage, transformation, orchestration, BI. Are the named tools current and relevant?
- The scale: rows per day, latency, cost saved, uptime
- The reliability mindset: SLAs, monitoring, on-call, data contracts
- The team context: solo, small team, large platform team
- The output language: business outcomes, not just shipped pipelines
If those signals are not visible in the top half of the first page, you do not get the second pass.
The structure that works
```
Name | Title | Email | Phone | LinkedIn | GitHub | Location
Summary (3 lines)
Skills (categorized, scannable in 5 seconds)
Experience (most recent first, 3 to 5 bullets per role with measurable outcomes)
Projects (1 or 2 only if early-career or career-switching)
Education
Certifications and open source (if relevant)
```
One page if you have under 8 years of experience, two pages above that. Never three.
The summary that lands
Three lines max. Each line earns its place.
Weak: Experienced data engineer with strong skills in big data and cloud, looking for a challenging role.
Strong:
```
Data Engineer with 6 years building batch and streaming pipelines on AWS and GCP. Owned the data platform at a 300-person fintech, processing 4B events per day with 99.95% uptime. Open to senior IC or tech lead roles in B2B SaaS.
```
The second version answers three questions in 30 words: what you do, the proof of scale, and what you want next.
The skills section format
Categorise. Recruiters scan, they do not read.
```
Skills
Languages: Python (advanced), SQL (advanced), Scala (intermediate), Go (basic)
Ingestion: Kafka, Kinesis, Fivetran, Airbyte, custom CDC
Storage: Snowflake, BigQuery, Redshift, Postgres, Iceberg on S3
Processing: dbt, Spark (Databricks), Flink, Beam
Orchestration: Airflow, Dagster, Prefect
Infra: AWS (Lambda, ECS, RDS, S3), GCP (BigQuery, Dataflow, Pub/Sub), Terraform, Docker
Observability: Datadog, Monte Carlo, Soda, custom dbt tests
BI: Looker, Mode, Metabase
```
Do not list every tool you have ever touched. Pick 25 to 35 that match the kind of roles you want. The rest dilute the signal.
Proficiency labels (advanced, intermediate, basic) on the top two lines only. Beyond that, levels become noise.
How to write data engineering bullets
The formula: what you built or owned, the scale, the technical decision, the business outcome.
Weak bullets:
Built ETL pipelines in AirflowWorked with Spark and KafkaResponsible for data quality
Strong bullets:
Migrated 80+ Airflow DAGs from cron to event-triggered Dagster, reducing average pipeline latency from 4h to 35 min and cutting on-call pages by 60% per monthDesigned the streaming layer (Kafka + Flink + Iceberg) that replaced a nightly batch process, enabling real-time fraud scoring and reducing fraud losses by $1.8M annualisedBuilt the data contract framework adopted across 14 source teams, dropping schema-break incidents from 9 per quarter to 1
Each bullet shows scope, scale, technical choice, and business impact. The recruiter does not need to interpret what you did.
Bullets by seniority
Junior (0 to 2 years)
Focus on what you owned, even if small. Cite the project.
Built and shipped the dbt models for the marketing attribution dataset (40 models, 200+ tests), used by the growth team for weekly campaign analysisMigrated three batch jobs from Pandas to Spark, cutting daily runtime from 6h to 22 min
Mid (3 to 6 years)
Lead small initiatives, mentor juniors, take on cross-team work.
Led the migration from Redshift to BigQuery for 1.2 PB of data across 9 source systems, completed in 8 months with zero downtimeMentored two junior engineers through their first on-call rotations; both now lead their own incident reviews
Senior and staff (7+ years)
Platform thinking, cross-functional impact, hiring and architecture.
Designed the company-wide data platform supporting 60 analysts, 12 ML teams, and 4B daily events; reduced average query cost from $0.14 to $0.04 over 18 monthsHired 5 data engineers across two teams in 14 months; retention at 100% as of last quarter
Projects section: when to include it
Include a projects section if:
- You are early career and want to show concrete work
- You are switching from another field (backend, analytics, ML)
- You contribute to open source data tools
Keep it to 1 or 2 projects with the same bullet structure as experience. Link the GitHub repo. If the project has no link, it carries less weight.
Skip the projects section if you have 5+ years of relevant experience. It pushes real work down the page.
Tools and tech to handle carefully
- Spark vs Pandas: list both if relevant, but make clear which scale you actually used Spark at (10 GB is not Spark territory; 5 TB is)
- `Big data`: avoid the phrase. State the volume
- `Cloud`: name the cloud and the services (
AWS Lambda, S3, RDS), not justAWS - `ETL/ELT`: pick whichever you actually do. Both is fine if you have done both seriously
- `Machine learning`: only mention if you have actually shipped pipelines feeding production ML, with the framework name
Common data engineer CV mistakes
- Stack inflation. Listing 60 tools signals you are intermediate at most of them. 25 to 35 is the sweet spot.
- Generic outcomes.
Improved data qualitymeans nothing.Cut null rate in the orders table from 4.3% to 0.2% over Q3does. - No scale anywhere. Volume, latency, cost, uptime — at least three of these should appear in the top two bullets per role.
- Forgotten on-call. Reliability work is half the job. If you have done it, say so.
- Hidden tech leadership. Mentoring, hiring, architecture reviews, RFCs — these belong on the CV at mid-level and above.
- Buzzwords without proof.
Streaming-first,cloud-native,data mesh— these need a sentence of evidence each, or they are wallpaper.
The ATS-friendly format checklist
- Single column layout
- Standard fonts: Calibri, Helvetica, Inter, Source Sans, Arial
- No tables, no text boxes, no infographics, no headshot
- File name:
FirstName-LastName-DataEngineer-CV.pdf - Export as PDF, not Word, but keep a
.docxbackup - Test by uploading to Greenhouse or Lever and verifying the parsed output looks correct
Skills section by target role
Streaming-heavy roles
Lead with Kafka, Flink, Kinesis, Spark Structured Streaming. De-emphasise classical ETL.
Analytics platform roles
Lead with dbt, Snowflake or BigQuery, Looker. Show ownership of semantic layer.
Infra-leaning roles
Lead with Terraform, Kubernetes, observability tooling, on-call ownership.
ML-adjacent roles
Feature stores (Feast, Tecton), training data pipelines, vector databases if relevant.
Never ship one CV to all four. Make the top half match the role you are aiming for that week.
Quick QA pass before sending
- [ ] First 6 lines tell a clear story: what you do, scale, what you want next
- [ ] Top 3 hard skills match the job description verbatim
- [ ] Every role has at least one bullet with a number (rows, time saved, cost, uptime)
- [ ] You can defend every tool in your skills section in a 15-minute conversation
- [ ] CV parses cleanly in a free ATS preview (Resume Worded, Jobscan, or upload to a real ATS)
This is what separates a CV that gets the screen call from one that disappears into the pile.