← All talks

PyCon Lithuania 2023 · May 2023 · Vilnius

Is It the End for Apache Airflow?

Every year someone writes a blog post declaring Airflow dead. New orchestrators keep appearing. Dagster, Mage, Prefect, and whatever launched last week. So I did what any reasonable person would do: I compared them with actual benchmarks, real usage, and honest opinions from someone who has been running Airflow in production for years.

Spoiler: Airflow is not dead. But the competition is getting interesting.

Download slides (PDF)

The Contestants

Three orchestrators: Apache Airflow (the incumbent), Dagster (the challenger with a different philosophy), and Mage (the newer kid on the block). Each gets a fair look at what it does well and where it falls short.

Airflow is the one everyone knows. Battle-tested, massive community, more operators and integrations than you can count. Also: a scheduler that can be temperamental, a UI that shows its age, and deployment patterns that range from "works great" to "why is everything on fire?"

Dagster takes the software-defined asset approach. Define your data as code, with type checking, testability, and lineage built in. The developer experience is genuinely good. The trade-off: smaller community, fewer production war stories, and a learning curve if you are coming from Airflow.

Mage focuses on the data pipeline development experience with a notebook-style interface and built-in orchestration. The most opinionated of the three, which can be a strength or a limitation depending on your use case.

Actual Benchmarks

Task scheduling latency, resource consumption, startup time, and how each tool handles a growing number of DAGs and tasks. The numbers are specific because vague comparisons are useless.

Airflow's scheduler has improved significantly with Airflow 2.x. The serial scheduling bottleneck from 1.x is largely gone. But it still consumes more resources at rest compared to Dagster and Mage, mainly because of the metadata database overhead and the web server running alongside the scheduler.

Dagster and Mage are lighter at small scale. The question is what happens when you have 500 DAGs with thousands of tasks and real production load. That is where Airflow's maturity shows.

Community Numbers Do Not Lie

GitHub stars, Stack Overflow answers, PyPI downloads, Slack members, conference talks. Airflow wins every metric by a wide margin. That is not a value judgement, it is just what years of being the default gets you.

But keep in mind: Airflow also had years where it was the only real option. Dagster and Mage are growing fast. The gap is narrowing. It is just not narrow yet.

Managed Offerings

Airflow has MWAA (AWS), Cloud Composer (GCP), and Astronomer. All battle-tested, all with real SLAs.

Dagster has Dagster Cloud. Mage has Mage Cloud. Both are newer, both are improving. But if your company requires a managed service with a track record, Airflow currently has more options and more production references.

So, Is Airflow Dead?

No. Not even close. The community is too large, the ecosystem too deep, and the managed offerings too mature. Airflow 2.x addressed the biggest complaints from the 1.x era. It is not perfect, but it is production-proven at a scale that the alternatives have not reached yet.

That said, the competition is real and healthy. Dagster is doing genuinely innovative work on the developer experience and the asset-based model. Mage is pushing the boundary on how approachable pipeline development can be. These tools are making Airflow better by forcing it to improve.

If you are starting fresh today with a small team and no existing infrastructure, Dagster is worth a serious look. If you have an existing Airflow deployment that works, switching for the sake of switching is probably not worth it. Use the right tool for where you are, not where the hype cycle says you should be.

Key takeaways

  • Airflow is not dead. It is the most deployed, most supported, and most battle-tested orchestrator available.
  • Dagster offers a better developer experience and a fundamentally different (asset-based) approach.
  • Mage makes pipeline development more accessible but is the newest of the three.
  • Community size and ecosystem maturity are real advantages, not just vanity metrics.
  • Competition is making all three tools better. That is good for everyone.
  • Choose based on your current situation, not on blog posts declaring things dead.

The best orchestrator is the one your team can run, debug, and maintain in production. Everything else is a conference talk.