There is a lot of buzz around data science and how machine learning and artificial intelligence can be used to transform business. People new to data science often focus on understanding how the different algorithms work, be it a predictive / classification model or a clustering algorithm. There are lots of pre-packaged libraries and data science packages that make pushing data through these algorithms easy. The danger is that people forget the science behind data science and the old adage “garbage in garbage out” is often true.
This 1 hour lecture focuses on how best to approach data science projects by performing a deep dive into a well established methodology called CRISP-DM (CRoss Industry Standard Process for Data Mining). Students will learn the importance of business understanding, data understanding and data preparation. Get these tasks right and the rest of the process of modelling, evaluation and deployment are much more likely to be successful. Fast track or skip them, the follow-on tasks will lack focus and failure will be more likely. The lecture will then look at the different data science models available and provide some guidance over how to select the most appropriate model, how to tune and how to optimise. Finally, the lecture looks at model evaluation and deployment considerations.