Welcome to the website of the Inmas Workshop on Data Science 2021. This website contains links and files to all relevant content of the workshop.

Overview

This course is designed to provide a glimpse at modern computational approaches for the analysis of data sets. We cover the concepts of supervised learning and unsupervised learning and illustrate the usage of some popular methods in these frameworks by the means of popular toolboxes in Python.

As many data sets that are encountered in practice are inherently high-dimensional, we aim to gain intuition about the geometry of high-dimensional spaces and distributions, and shed light on computational aspects of some of the covered methods.

Workshop Schedule

As a preparation for the workshop, we encourage you to complete the pre-work before the first session on Friday, March 19.

Thurdsday, March 18

  • 7:00 PM ET: (Optional) Office Hour: Feedback & help with pre-work

Friday, March 19

Session I

  • 2:00 PM - 3:00 PM ET: Framework of Statistical Learning, Feature Design, Regression in High Dimensions
  • 3:00 PM - 5:00 PM ET: Project work in small groups

Saturday, March 20

Morning Session (Session II)

  • 9:00 AM - 10:00 AM ET: Classification Problems, Natural Language Processing
  • 10:00 AM - 12:00 PM ET: Project work in small groups

Afternoon Session (Session III)

  • 2:00 PM - 3:00 PM ET: Principal Component Analysis, Clustering
  • 3:00 PM - 5:00 PM ET: Project work in small groups

Sunday, March 21

Session IV

  • 9:00 AM - 10:00 AM ET: Neural Networks and Deep Learning
  • 10:00 AM - 12:00 PM ET: Project work in small groups

All sessions will be held via Zoom.

Instructor: Christian Kümmerle (Johns Hopkins University)
Teaching Assistants: Daniel Fuentes-Keuthan, Patrick Martin

After-Workshop Office Hours

  • Sunday, March 28, 11 AM ET
  • Monday, April 5, 7 PM ET

Computational Tools

This workshop will use practice exercises that will make use of the Python language, which is widely used for data science and machine learning due its property as a general purpose programming language and its modularity, which has attracted the development of a variety of powerful libraries.

The most relevant libraries we will use are:

  • NumPy: Basic manipulation of vectors and matrices.
  • SciPy: Scientific computing, in particular useful for linear algebra, optimization, signal and image processing.
  • matplotlib: Visualization and plotting.
  • seaborn: Package for visaulization, more high-level than matplotlib.
  • scikit-learn: Implementations of a wide range of machine learning
  • keras: Interface to deep learning libraries.