Machine Learning

CS5824/CS4824/ECE5424/ECE4424
Fall 2017

Announcements

  • 12/11/17 Some notes on the topics we've covered all semester here.
  • 11/01/17 Homework 5 is now available. All the materials, including the instructions for the written and programming homework, are available at this Bitbucket repository.
  • 10/18/17 Homework 4 is now available. All the materials, including the instructions for the written and programming homework, are available at this Bitbucket repository.
  • 10/11/17 Some notes on the topics we've covered so far are available here.
  • 10/08/17 The project page is now available here.
  • 09/27/17 Homework 3 is now available. The instructions are available in pdf and as LaTeX, and the code for the programming portion is available in this zip archive.
  • 09/13/17 Homework 2 is now available. The instructions are available in pdf and as LaTeX, and the code for the programming portion is available in this zip archive.
  • 08/30/17 Homework 1 is now available. The instructions are available in pdf and as LaTeX, and the code for the programming portion is available in this zip archive.
  • 08/27/17 For those who are unable to be added to the class, ECE Professor Joseph Wang is offering a WebEx version of the course. Contact him for information on how to request registration. As with this course, demand is much higher than capacity, so it is also going to be difficult to get a spot in that course. You may instead consider trying to take Data Analytics or Information Retrieval, both of which will cover many of the same topics from a slightly different perspective.
  • 08/24/17 Homework 0 is available now on the course Canvas site (here). This is a graded quiz to assess whether you have the background necessary to succeed in this course. It is due 09/05/17 at 9:30 am. Students who are not yet registered for the course can preview the quiz here, work on the problems now, and enter their answers once they are registered.
  • 08/02/17 Students interested in requesting a force add should attend the first session where we will collect information. We will then make decisions that week on which students get into the class. The course is full and we will be limited by room space and availability of resources, so we cannot promise that anyone who is not already registered will get into the course.

Description

This course will cover the science of machine learning. It focuses on the mathematical foundations and analysis of machine learning methods and how they work.

The graduate listing of the course is titled "Advanced Machine Learning," but this naming is to distinguish it from the undergraduate version. Both levels will cover the same introductory material with the same workload, but graduate and undergraduate sections will be graded on separate scales.

Class meets Tuesday and Thursday from 9:30 AM to 10:45 PM in Torgersen 1060.

  • Instructor:
    Bert Huang, Assistant Professor of Computer Science
    Office hours: Tuesdays 11:00 AM–12:00 PM and Thursdays 2:00 PM–3:00 PM, Torgersen Hall 3160L
    bhuang@vt.edu
  • Teaching Assistants:
    Elaheh Raisi
    Office hours: Mondays 2:00 PM–3:00 PM and Wednesdays 2:00 PM–3:00 PM, Kelly Hall 219
    elaheh@vt.edu

    Sirui Yao
    Office hours: Mondays 12:30 PM–1:30 PM Wednesdays 3:00 PM–4:00 PM, Kelly Hall 219
    ysirui@vt.edu
  • Office Hour Chart
    Mon. Tue. Wed. Thu. Fri.
    12:30 PM–1:30 PM
    Sirui, Kelly Hall 219.

    2–3 PM
    Elaheh, Kelly Hall 219.
    11 AM–12 PM
    Bert, Torg 3160L.
    2–3 PM
    Elaheh, Kelly Hall 219.

    3–4 PM
    Sirui, Kelly Hall 219.
    2–3 PM
    Bert, Torg 3160L.

The course homepage (this page) is at http://courses.cs.vt.edu/cs5824/Fall17/.

The course Canvas site is at https://canvas.vt.edu/courses/57388 and should be visible to all users with a Virginia Tech login.

Topics

  • Overview of machine learning: learning from data; overfitting, regularization, cross-validation.
  • Supervised learning: decision trees, naive Bayes, logistic regression, support vector machines, neural networks
  • Unsupervised and semi-supervised learning: clustering (k-means, Gaussian mixtures); principal components analysis
  • Learning theory: probably approximately correct (PAC) learning, model complexity, bias and variance
  • Structured models: Bayesian networks, Markov random fields, hidden Markov models
  • Other topics: reinforcement learning, machine learning applications (vision, natural language processing, recommendation)

Prerequisites

The listed prerequisite courses cover relevant material that includes : data structures, algorithms, complexity, linear algebra, and basic concepts of probability and statistics (random variables, expectation, conditional distributions, Bayes rule, sampling distributions, estimators, likelihood, and maximum likelihood). The homework assignments will include programming portions using Python.

Please speak with the instructor if you are concerned about your background. Note: If any student needs special accommodations because of any disabilities, please contact the instructor during the first week of classes.

Learning Objectives

A student who successfully completes this class should

  • be familiar with a breadth of foundational machine learning concepts;
  • be able to implement standard machine learning methods without the use of pre-packaged machine learning software;
  • be able to make informed decisions about which machine learning methods are appropriate for different tasks;
  • have awareness of the mathematical and computer science concepts underlying machine learning;
  • and have the background knowledge to be able to understand new machine learning methods not covered in the course.

Reading and Materials

We will use a mix of freely available materials from the web. Our main reading will come from Hal Daumé's online textbook A Course in Machine Learning (http://ciml.info). We will also use chapters from David Barber's text Bayesian Reasoning and Machine Learning, which has a free online version available at http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=Brml.HomePage.

Homework

  • Each homework assignment will include written and programing portions.
  • Both written and programming assignments will be submitted electronically.
  • I strongly recommend learning LaTeX to help write neat, readable math, but I will not require LaTeX.
  • Programming assignments will be in Python.
  • Homework assignments will be listed in the announcements at the top of the course homepage.
  • The last homework is a research project, which will give you an opportunity to try machine learning research on algorithms, theory, or applications.

Schedule

The tentative class schedule is available here and is embedded below. We will update the schedule regularly.

Slides we use in the class sessions will be available at https://www.dropbox.com/sh/bqmle0eff1gpkzd/AAAhRTjKIFza7w1NDBQ_ktM4a?dl=0.

Policies

Exams

Exams for this course will be open-book and notes. They will be designed with the intent of testing your ability to understand and apply the concepts we learn about in class, not whether you can memorize them. The only restriction to avoid communication with others over the Internet or otherwise.

The midterm exam will be held in class. The final exam will be a take-home exam.

Regrading Requests

Requests for regrading due to grading errors must be submitted in writing to the TA within one week of the release of grades.

Late Homework Policy

Homework submitted late without permission will be penalized according to the following formula: (Penalized score) = (Your raw score) * (1 - 0.1*(# of days past deadline))

This formula will apply for up to three days, after which the homework will not be accepted and you will receive a grade of zero. Avoid invoking these penalties by starting early and seeking extra help.

Academic Integrity

The tenets of the Virginia Tech's Honor Codes will be strictly enforced in this course, and all assignments shall be subject to the stipulations of the Undergraduate and Graduate Honor Codes. For more information on the Graduate Honor Code, please refer to the GHS Constitution at http://ghs.graduateschool.vt.edu. The Undergraduate Honor Code pledge that each member of the university community agrees to abide by states: "As a Hokie, I will conduct myself with honor and integrity at all times. I will not lie, cheat, or steal, nor will I accept the actions of those who do." A student who has doubts about how the Undergraduate Honor Code applies to any assignment is responsible for obtaining specific guidance from the course instructor before submitting the assignment for evaluation. Ignorance of the rules does not exclude any member of the University community from the requirements and expectations of the Honor Code. For additional information about the Undergraduate Honor Code, please visit: https://www.honorsystem.vt.edu/

This course will have a zero-tolerance philosophy regarding plagiarism or other forms of cheating. Your homework assignments must be your own work, and any external source of code, ideas, or language must be cited to give credit to the original source. I will not hesitate to report incidents of academic dishonesty to the graduate school or honor system.

Principles of Community

Because the course will include in-class discussions, we will adhere to Virginia Tech's Principles of Community. The first two principles are most relevant:

  • We affirm the inherent dignity and value of every person and strive to maintain a climate for work and learning based on mutual respect and understanding.
  • We affirm the right of each person to express thoughts and opinions freely. We encourage open expression within a climate of civility, sensitivity, and mutual respect.

The remaining principles are also important and we will take them seriously as a class.

Grading Breakdown

  • 1%: Homework 0
  • 4%: Class attendance and participation
  • 50%: Homework Sets
  • 15%: Midterm Exam
  • 15%: Final project
  • 15%: Final exam

Based on the grading breakdown above, each student's final grade for the course will be determined by the final percentage of points earned. The grade ranges are as follows:

A 93.3%–100% A- 90.0%–93.3% B+ 86.6%–90.0% B 83.3%–86.6% B- 80.0%–83.3%
C+ 76.6%–80.0% C 73.3%–76.6% C- 70.0%–73.3% D+ 66.6%–70.0% D 63.3%–66.6% D- 60.0%–63.3% F 00.0%–60.0%