Please complete the following steps before the beginning of the workshop on Friday, March 19! The goal is to set up and familiarize yourself with a Python environment that is suitable working with data sets and machine learning methods.
Download and install Anaconda (Individual Edition), which is a Python distribution that includes the packages which we will use in this workshop. We suggest working with the integrated development environment (IDE) Spyder, which is already included in Anaconda.
Revisit what you learnt about linear regression in the Workshop on Statistical Learning in February. For the “Salaries” dataset that you got to know in Chapters 1 and 2 of the workshop, use R to create a scatter of the data corresponding to the categories “yrs.since.phd” and “salary” with a color-encoded category “discipline”.
Find and plot the two linear regression lines of the dataset for the categories “yrs.since.phd” and “salary”, conditioned on the variables of the category “discipline”, respectively.
(The code for the last two tasks has been discussed in Section 2.5 of Dr. McNamara’s Feburary workshop material.)