STOR565: Machine Learning

The goal of statistical machine learning and data mining is not to test a specific hypothesis or construct a confidence interval; instead, the goal is to find and understand an unknown systematic component within the realm of noisy, complex data.

Lectures: MWF 11:15am -12:05pm, Gardner 105. Syllabus

InstructorYufeng Liu

Office Hours: Mondays 10-11am; Wednesdays 12:30-1:30pm (Hanes 354)

TA: Seong Jin Lee (Ph.D. Student in Statistics) Email: slee7@unc.edu                        Office Hours (virtual): Tuesdays 1-2pm and Thursdays 2-3pm           (TA office hour zoom link)

Enrollment Questions: Please contact Ms. Christine Keat for questions and assistance regarding the enrollment and waiting list for this class.

Textbook: “An Introduction to Statistical Learning with Applications in R” by James, Witten, Hastie, and Tibshirani. The electronic version can be downloaded for free.

Additional Reference:

The Element of Statistical Learning: data mining, inference, and prediction, by Hastie, Tibshirani, and Friedman (2009). The electronic version can be downloaded for free.

Machine Learning: A Probabilistic Perspective, by Kevin P. Murphy. (2012). MIT Press (Free e-book is available at UNC library).

Statistical Software:

 R

We will use R for this course. R is free so you can easily use it anytime and anywhere. R can be downloaded from the R website. We will use the ISLR package (associated with the textbook), which includes  the datasets used in the book. Rstudio is a recommended interface for the R software. It is also free, and it runs on Windows, Mac, and Linux operating systems. We will also use R Markdown, which can produce high quality documents and reports.

Reference: W. N. Venables, D. M. Smith, and the R Core Team. 2022. An Introduction to R: Notes on R: A Programming Environment for Data Analysis and Graphics  (version 4.3.1).

Evaluation & Grading: There will be homework assignments through the semester on both the theoretical and computational aspects of the course.
The course grade will be given based on class participation, homework grades, project, quiz, and exam. The grades will be maintained through Canvas.
The distribution of the grade is as follows:

• Homework 30% (The lowest homework grade will be dropped);
• Quiz 10%;
• Exam 30% (replaces Quiz grade if higher);
• Project 30%
 ————————

Homework Policy: Homework assignments will be posted on the course web page. Each homework assignment will be graded: late/missed homework assignments without permission will receive a grade of zero. Assignments should be uploaded through Gradescope before the deadline.

Honor Code: Students are expected to adhere to the UNC honor code at all times. Violations of the honor code will be prosecuted.

Announcements, Assignments & Lectures:

Lectures Date Tentative Plan Remark
1 Aug 21 M Introduction & Overview                         Reading: ISL Ch1&2;                      Murphy Ch1(optional)                Homework 1 Overview Slides
2 Aug 23 W Overview; R intro Lab (ISL2.3); Review Reading: ISL Ch1&2;                      Murphy Ch2(optional) Review Slides
3 Aug 25 F Review                                           Reading: ISL Ch3                             Murphy Ch7(optional)  LinearReg Slides
4 Aug 28 M Review; Linear Regression; R Lab (ISL3.6)                                    Reading: ISL Ch3                            Murphy Ch7(optional)
5 Aug 30 W Class Cancelled by University             Linear Regression; Cross Validation            Reading: ISL Ch5, 6                              Hw2 Files (theory & computation) LinearReg2 Slides

 

6 Sep 1 F Linear Regression
Reading: ISL Ch5, Ch6
Supplement 1

Homework 1 due (moved from Wed. Aug 30)

Sep 4 M Labor Day
7 Sep 6 W Linear Regression
Reading: ISL Ch5, Ch6
8 Sep 8 F Cross Validation
Reading: ISL Ch6
9 Sep 11 M Variable Selection; Penalized Regression; R Lab (ISL6.5)
Reading: ISL Ch6
Hw3 Files
(theory & computation)
Homework 2 due
10 Sep 13 W Regression Extensions
Reading: ISL 6
Classification Slides
11 Sep 15 F Regression Extensions
Reading: ISL 6
12 Sep 18 M Regression Extensions & Classification
Reading: ISL Ch4
13 Sep 20 W Classification & Bayes Rule; Naive Bayes
Reading: ISL Ch4
14 Sep 22 F Logistic Regression                                 Reading: ISLCh4                                 Hw4 Files (theory & computation) Homework 3 due
Sep 25 M Wellness Day
15 Sep 27 W LDA & QDA; KNN
Reading: ISL Ch4
 Project Description

Survey I

16 Sep 29 F LDA & QDA; KNN; Lab 4.7
Reading: ISLCh4
17 Oct 2 M KNN; SVM R Lab 9.6
Reading: ISL Ch9
Initial Group Formation Due
Oct 3 T Hw5 Files (theory & computation) Homework 4 due
18 Oct 4 W SVM R Lab 9.6
Reading: ISL Ch9
Nonlinear Slides
19 Oct 6 F SVM; Nonlinear Models                      Reading:ISL Ch7
20 Oct 9 M Nonlinear Models                      Reading:ISL Ch7
21 Oct 11 W Nonlinear Models                      Reading:ISL Ch7 Tree-based Methods Slides
22 Oct 13 F Nonlinear Models R Lab 7.8; Tree-based Methods                                       Reading: ISL Ch7, Ch8                        Hw6 Files (theory & computation)
23 Oct 16 M Review Homework 5 due
24 Oct 18 W Quiz
Oct 20 F Fall break
25 Oct 23 M Tree-based Methods                      Reading: ISL Ch8
26 Oct 25 W Go over Quiz Project Proposal due                Survey II
27 Oct 27 F Tree-based Methods    Lab 8.3     Reading: ISL Ch8
28 Oct 30 M Tree-based Methods                      Reading: ISL Ch8 Clustering Slides
29 Nov 1 W Clustering Lab 12.5                 Reading:ISL Ch12
30 Nov 3 F Clustering                                 Reading:ISL Ch12                               Hw7 Files (theory & computation) Homework 6 due
31 Nov 6 M Clustering                                 Reading:ISL Ch12 Dimension Reduction Slides
32 Nov 8 W Dimension Reduction Lab 12.5
Reading: ISL Ch12
33 Nov 10 F Dimension Reduction               Reading:ISL Ch12  
34 Nov 13 M Dimension Reduction               Reading:ISL Ch12
35 Nov 15 W R examples & case study       Unsupervised Learning best practices                                       Reading: ISL Ch12
36 Nov 17 F Review Homework 7 due
37 Nov 20 M Exam
Nov 22 W Thanksgiving Break
Nov 24 F Thanksgiving Break
38 Nov 27 M Presentation order was generated randomly                                       Groups 2, 5, 13, 12 (12 mins each) Peer Evaluation Survey
39 Nov 29 W Groups 15, 1, 17, 9
40 Dec 1 F Groups 10, 19, 11, 16
41 Dec 4 M Groups 4, 8, 6, 7
42 Dec 6 W  Groups 14, 3, 18
Final Report due
(electronic copy; please email the report to instructor)