Week 6: Kaggle, crowdsourcing, decision trees, random forests, social networks, and Google’s hybrid research environment

Introduction to Data Science, Columbia University

Each week Cathy O’Neil blogs about the class. Cross-posted from mathbabe.org

Yesterday we had two guest lecturers, who took up approximately half the time each. First we welcomed William Cukierski from Kaggle, a data science competition platform.

Will went to Cornell for a B.A. in physics and to Rutgers to get his Ph.D. in biomedical engineering. He focused on cancer research, studying pathology images. While working on writing his dissertation, he got more and more involved in Kaggle competitions, finishing very near the top in multiple competitions, and now works for Kaggle. Here’s what Will had to say.

Crowd-sourcing in Kaggle

What is a data scientist? Some say it’s someone who is better at stats than an engineer and better at engineering than a statistician. But one could argue it’s actually someone who is worse at stats than a statistician and worse at engineering than an engineer. Being a…

