The goal of statistical machine learning and data mining is not to test a specific hypothesis or construct a confidence interval; instead, the goal is to find and understand an unknown systematic component within the realm of noisy, complex data.
Lectures: MWF 11:15am -12:05pm, Gardner 105. Syllabus
Instructor: Yufeng Liu
Office Hours: Mondays 10-11am; Wednesdays 12:30-1:30pm (Hanes 354)
TA: Seong Jin Lee (Ph.D. Student in Statistics) Email: slee7@unc.edu Office Hours (virtual): Tuesdays 1-2pm and Thursdays 2-3pm (TA office hour zoom link)
Enrollment Questions: Please contact Ms. Christine Keat for questions and assistance regarding the enrollment and waiting list for this class.
Textbook: “An Introduction to Statistical Learning with Applications in R” by James, Witten, Hastie, and Tibshirani. The electronic version can be downloaded for free.
Additional Reference:
The Element of Statistical Learning: data mining, inference, and prediction, by Hastie, Tibshirani, and Friedman (2009). The electronic version can be downloaded for free.
Machine Learning: A Probabilistic Perspective, by Kevin P. Murphy. (2012). MIT Press (Free e-book is available at UNC library).
Statistical Software:
We will use R for this course. R is free so you can easily use it anytime and anywhere. R can be downloaded from the R website. We will use the ISLR package (associated with the textbook), which includes the datasets used in the book. Rstudio is a recommended interface for the R software. It is also free, and it runs on Windows, Mac, and Linux operating systems. We will also use R Markdown, which can produce high quality documents and reports.
Reference: W. N. Venables, D. M. Smith, and the R Core Team. 2022. An Introduction to R: Notes on R: A Programming Environment for Data Analysis and Graphics (version 4.3.1).
Evaluation & Grading: There will be homework assignments through the semester on both the theoretical and computational aspects of the course.
The course grade will be given based on class participation, homework grades, project, quiz, and exam. The grades will be maintained through Canvas.
The distribution of the grade is as follows:
• Project 30%
Homework Policy: Homework assignments will be posted on the course web page. Each homework assignment will be graded: late/missed homework assignments without permission will receive a grade of zero. Assignments should be uploaded through Gradescope before the deadline.
Honor Code: Students are expected to adhere to the UNC honor code at all times. Violations of the honor code will be prosecuted.
Announcements, Assignments & Lectures:
Lectures | Date | Tentative Plan | Remark |
1 | Aug 21 M | Introduction & Overview Reading: ISL Ch1&2; Murphy Ch1(optional) Homework 1 | Overview Slides |
2 | Aug 23 W | Overview; R intro Lab (ISL2.3); Review Reading: ISL Ch1&2; Murphy Ch2(optional) | Review Slides |
3 | Aug 25 F | Review Reading: ISL Ch3 Murphy Ch7(optional) | LinearReg Slides |
4 | Aug 28 M | Review; Linear Regression; R Lab (ISL3.6) Reading: ISL Ch3 Murphy Ch7(optional) | |
5 | Aug 30 W | Class Cancelled by University Linear Regression; Cross Validation Reading: ISL Ch5, 6 Hw2 Files (theory & computation) | LinearReg2 Slides
|
6 | Sep 1 F | Linear Regression Reading: ISL Ch5, Ch6 |
Supplement 1
Homework 1 due (moved from Wed. Aug 30) |
Sep 4 M | Labor Day | ||
7 | Sep 6 W | Linear Regression Reading: ISL Ch5, Ch6 |
|
8 | Sep 8 F | Cross Validation Reading: ISL Ch6 |
|
9 | Sep 11 M | Variable Selection; Penalized Regression; R Lab (ISL6.5) Reading: ISL Ch6 Hw3 Files (theory & computation) |
Homework 2 due |
10 | Sep 13 W | Regression Extensions Reading: ISL 6 |
Classification Slides |
11 | Sep 15 F | Regression Extensions Reading: ISL 6 |
|
12 | Sep 18 M | Regression Extensions & Classification Reading: ISL Ch4 |
|
13 | Sep 20 W | Classification & Bayes Rule; Naive Bayes Reading: ISL Ch4 |
|
14 | Sep 22 F | Logistic Regression Reading: ISLCh4 Hw4 Files (theory & computation) | Homework 3 due |
Sep 25 M | Wellness Day | ||
15 | Sep 27 W | LDA & QDA; KNN Reading: ISL Ch4 |
Project Description |
16 | Sep 29 F | LDA & QDA; KNN; Lab 4.7 Reading: ISLCh4 |
|
17 | Oct 2 M | KNN; SVM R Lab 9.6 Reading: ISL Ch9 |
Initial Group Formation Due |
Oct 3 T | Hw5 Files (theory & computation) | Homework 4 due | |
18 | Oct 4 W | SVM R Lab 9.6 Reading: ISL Ch9 |
Nonlinear Slides |
19 | Oct 6 F | SVM; Nonlinear Models Reading:ISL Ch7 | |
20 | Oct 9 M | Nonlinear Models Reading:ISL Ch7 | |
21 | Oct 11 W | Nonlinear Models Reading:ISL Ch7 | Tree-based Methods Slides |
22 | Oct 13 F | Nonlinear Models R Lab 7.8; Tree-based Methods Reading: ISL Ch7, Ch8 Hw6 Files (theory & computation) | |
23 | Oct 16 M | Review | Homework 5 due |
24 | Oct 18 W | Quiz | |
Oct 20 F | Fall break | ||
25 | Oct 23 M | Tree-based Methods Reading: ISL Ch8 | |
26 | Oct 25 W | Go over Quiz | Project Proposal due Survey II |
27 | Oct 27 F | Tree-based Methods Lab 8.3 Reading: ISL Ch8 | |
28 | Oct 30 M | Tree-based Methods Reading: ISL Ch8 | Clustering Slides |
29 | Nov 1 W | Clustering Lab 12.5 Reading:ISL Ch12 | |
30 | Nov 3 F | Clustering Reading:ISL Ch12 Hw7 Files (theory & computation) | Homework 6 due |
31 | Nov 6 M | Clustering Reading:ISL Ch12 | Dimension Reduction Slides |
32 | Nov 8 W | Dimension Reduction Lab 12.5 Reading: ISL Ch12 |
|
33 | Nov 10 F | Dimension Reduction Reading:ISL Ch12 | |
34 | Nov 13 M | Dimension Reduction Reading:ISL Ch12 | |
35 | Nov 15 W | R examples & case study Unsupervised Learning best practices Reading: ISL Ch12 | |
36 | Nov 17 F | Review | Homework 7 due |
37 | Nov 20 M | Exam | |
Nov 22 W | Thanksgiving Break | ||
Nov 24 F | Thanksgiving Break | ||
38 | Nov 27 M | Presentation order was generated randomly Groups 2, 5, 13, 12 (12 mins each) | Peer Evaluation Survey |
39 | Nov 29 W | Groups 15, 1, 17, 9 | |
40 | Dec 1 F | Groups 10, 19, 11, 16 | |
41 | Dec 4 M | Groups 4, 8, 6, 7 | |
42 | Dec 6 W | Groups 14, 3, 18 |
Final Report due (electronic copy; please email the report to instructor) |