Exploring and analyzing medical datasets using machine learning together with other like-minded people and tech enthusiasts? This is exactly what we do at the Health Hackers. We just love to explore and enable each other by sharing knowledge and experiences. The most suitable format for doing so is starting a so-called project group for people who are eager to get a deeper understanding of a specific topic or technology. This gives us the opportunity to meet regularly and work together on specific aspects. So, it was just a matter of time for us to start such a regular meeting focused on machine learning in healthcare.
With the focus being set, we had to choose the right format for the group and we quickly concluded that participating in open machine learning competitions would be ideal. Thus, we would directly have a dataset to work and a platform to compete in a playful way not only within in the group but also with other machine learning enthusiasts. The most popular site hosting such competitions is Kaggle, which is owned by Google, and so the group name was born: Open Health Hacker Kaggle Squad.
As the first challenge the group leader, Pablo, picked an interesting and exciting challenge: Histopathologic Cancer Detection. It was about developing a machine learning model to identify metastatic cancer in small image patches taken from whole slide images. With over 220.000 images as training data set seemed quite interesting and the deadline at the end of March 2019 gave us the opportunity to try out several approaches.
During the initial meetings the first internal challenge was to get everyone started and provide beginners a suitable starting point. This was accomplished during a great workshop by Pablo (BIG THANKS!). Next to that, team formation, discussions on approaches and necessary infrastructures had to be taken care of. After the initial setup phase the following meetings were always more or less about the participants presenting each other different approaches or methods. Thus, the knowledge exchange was accomplished in a very fun and appealing way, Additionally, to bridge the gap from the technical to the medical domain, there was a meeting with a pathology expert present. The presentation by Dr. Markus Eckstein from the pathology department of the University Hospital Erlangen and the following vibrant discussion were definitely one highlight of the whole challenge. He really understood how to break down his expertise for the more technically minded machine learning enthusiasts. Every question was welcome, even if the vocabulary was sometimes missing and all different kind of cells were called “bubbles” at the beginning. It was incredibly interesting to follow his thoughts on how a human expert decided which images are cancerous or which aren’t – particular in the tricky cases.
Overall, we sent hundreds of increasingly accurate submissions to the Kaggle Challenge and quickly achieved a top 10 ranking on the leaderboard. The other competitors seemed to be hard to beat, but we managed to continuously improve our scores over time. For a while we held the #1 spot on the leaderboard, which was an exciting and motivating experience for the whole group. Unfortunately, the challenge got compromised during the last two weeks as people were able to gain access to the ground-truth labels of the challenge thus acquire perfect score without even training a model. This flaw in the challenge (the data was on GitHub) and their inadequate handling of the resulting frustration on the forums sadly tainted the final outcome. In the end, we ranked 26th of the over 1000 participants and we were really happy about the overall challenge experience and, of course, the great learning opportunity of the six meetings before the deadline on March 30th.
We want to thank everyone who participated no matter to which extend. But a special thanks goes to Pablo, Markus, Andras and Jonas for their commitment and passion!
So, what’s next? The challenge spirit is still driving us, but we decided to rename our group into Health Hacker Machine Learning Group to allow us a little bit more slack in regard to what we are doing at each event. This is necessary as there might not always be a great healthcare-related challenge online. But, to make a virtue of necessity, we are currently developing our own internal challenge. For more info, just keep up to date with our event announcements on meetup and facebook!
If somebody reads this with some medical data and the interest of creating an own challenge, please do not hesitate to reach out to firstname.lastname@example.org.
Here are some impression from the last months: