Dev Tech 2025 · November 18, 2025 · Lithuania
Data Engineer vs Software Engineer: Same Same But Different
Both roles write code. Both deploy to production. Both get paged at 2am. But the day-to-day reality is surprisingly different. This talk is about the DNA-level differences between data engineering and software engineering, the frustrations that come from misunderstanding those differences, and how the two roles can actually collaborate better.
Includes a rant about JSON blobs, because someone had to say it.
Download slides (PDF)The DNA Difference
Software engineers build applications. Data engineers build pipelines. Both write code, both think about architecture, both deal with failures. But the core mental model is different.
A software engineer thinks about user interactions, request-response cycles, and application state. A data engineer thinks about data flow, transformation logic, and historical state. A software engineer's deployment is: the new code is live. A data engineer's deployment is: the new code is live AND all the historical data needs to be consistent with it. Backfills are not optional.
This is not a hierarchy. Neither role is "harder" or "more technical." They are just different problems, and the sooner both sides understand that, the better they work together.
The JSON Blob Rant
The scenario: a software engineer designs an API or a database schema where half the fields are stuffed into a JSON blob. Flexible! Schema-less! Easy to extend! From the application side, this makes sense. You parse the JSON in your code, grab the fields you need, and move on.
From the data engineering side, this is a nightmare. You now have to parse nested JSON in SQL, handle missing keys, deal with type inconsistencies (is that field a string or a number? depends on the day), and somehow build reliable reporting on top of it. Every time someone adds a new field to the blob, your pipeline might break silently.
The fix is not "never use JSON." The fix is: talk to your data team before shipping a schema that makes their life miserable. Five minutes of conversation saves weeks of workarounds.
Data Quality Is Everyone's Problem
Software engineers have testing frameworks, type systems, and CI/CD pipelines that catch bugs before they reach production. Data engineers have SQL queries and hope. That is only slightly an exaggeration.
Data quality is fundamentally harder than application testing because your input is someone else's output, and you do not control it. Schemas change without warning. Volumes spike unexpectedly. NULL means "unknown," "not applicable," or "the upstream system broke," and you often cannot tell which one it is.
Data engineers need to adopt more software engineering practices: automated testing, CI/CD, code review, monitoring. Software engineers need to understand that the data they generate will be used downstream, and that small schema changes can cascade into big problems.
How to Actually Collaborate
Practical advice for teams that have both roles:
- Include data engineers in schema design reviews. Not after the fact. During.
- Treat schema changes as breaking changes. Version them. Communicate them.
- Data engineers: learn to write production-quality code. Not just "it works in a notebook" code.
- Software engineers: learn what happens to your data after it leaves your application. The pipeline is real.
- Shared on-call is a fast way to build empathy. Nothing teaches you about the other role like debugging their system at 3am.
The roles are converging. The best data engineers write good code. The best software engineers think about data flow. The wall between the two is getting thinner, and that is a good thing.
Key takeaways
- Data engineering and software engineering solve different problems with overlapping skills.
- JSON blobs in schemas are convenient for applications and painful for data pipelines. Talk to each other.
- Data quality is harder than application testing because you do not control your inputs.
- Both roles benefit from adopting each other's best practices.
- Collaboration starts with understanding, not tools.
Same same, but different. And that is fine.