lundi 12 décembre 2016

Corporate hackathon

Some month ago I participated to a hackathon organized by my company with a colleague. The goal was to deploy a service on a Raspberry Pi 3 that could ingest 1 million messages and provide a synthesis of posted data in JSON format.

It was an occasion to get out of routine work and test new technologies. Here are the solutions we thought of :

  • Python + Redis
  • Node JS + Redis
  • Elixir

I discarded Python pretty soon, because it was not performant enough. Node looked promising and we found cool stuff to generate the synthesis using Redis. We also tried cluster server, which gave us tho possibility to run 4 nodes processes. Results where OK on our laptops.

We were waiting for official performance test suite from the organizers 5 using [gatling.io][gat], hence implemented in Scala). It's been buggy for a lot of time and could not rely on it to validate our solution. We were also waiting to be provided with a RPI from the organizers. The final approaching, I decided to buy one.

The tests on the RPI were surprising in the bad way, I did not expect this difference of speed. From 5000 queries per second, we were accepting about 3 times less.

We had another surprises when delivered with the final version of the official test suite : our synthesis was wrong!

I spent 2 nights before the final to deploy a version based on node and postgresql to fix this, as I was sure to be able to compute a synthesis in one SQL query. The request rate dropped bellow 1000 though, which meant we would be able to ingest 1000000 messages in more than 15 minutes. It's pretty long.

Other default, the JSON generated by node was not in expected format regarding numbers. Actually, I put an ugly hack at 3 AM to be able to serialize a Double with 2 digits of precision (you'll see it in our code, on node_pg branch).

The winner of the challenge based his implementation on Java using Undertow as a web server and MapDB for storage. The solution ingested the million of messages in less than 3 minutes and 10 million messages in less than 30 minutes when our solution failed miserably after 3 hours of processing! Actually, he has no problem for synthesis generation as the serialisation methods was exactly corresponding with test suite deserialization method.

So we did not as well as we expected, but we learned a lot:

  • Java is not a thing from the past, and concurrency based on multi threading was appropriate here . It's been a great reminder that the relevance of a solution strongly depend on the context.
  • We should have deployed on RPI sooner to adapt ourselves
  • Postgresql is a beast hard to tune if you don't have time to. I don't think its appropriate on a RPI. It seems hard to limit IO.

By the way, I'm happy with the results since we did not put so much effort in it and we still won a price (a Fitbit blaze that I'm currently trying to sell). Also, I did not have Internet at home at the time, so I did all my tuning offline! It was a great experience and we'll do better next time!