Revisiting Python support in Apache Flink

A couple of years ago I wrote two posts on the fledgling support for python in Apache Flink. To this day, a pretty significant amount of the people who end up on this site from google, end up here after a search for how to use python and flink together, so I thought it would be a good idea to check back in.

I pulled down the bleeding-edge version of flink, exactly the same as I did last time, and set things up:

git clone https://github.com/apache/flink
cd flink
mvn clean install -DskipTests

And then started the cluster locally with this command (which raised a deprecation warning but worked):

./build-target/bin/start-local.sh

The only change that I had to make to get the examples repo to run (find it here) was to switch from using pyflink3.sh to pyflink.sh. Other than that it all still works like a charm, even after a couple of years of major releases.

So that’s good news and bad news. The good news is that my examples and previous posts are still relevant, but bad news is that the story for production now is about the same it was then. Python support is there but not as rich as Apache Spark for the Dataset (batch) API, but not there for streaming, where Flink really shines.

That may be changing soon though, a couple of months ago Zahir Mizrahi gave a talk at Flink forward about bringing python to the Streaming API. You can check out his slides here, he’s using Jython under the hood which is pretty interesting but has some downsides (especially around the python 3.x story). The PR is still open though, so not quite there yet: https://github.com/apache/flink/pull/3838. Whenever it merges in, if you don’t see a post here, find me on twitter and I’ll put together some examples for the repo.

Until then, if anyone is using python and Flink in production, reach out, I’d like to hear how people are using it.