**MIDDLE EAST TECHNICAL UNIVERSITY
DEPT. OF COMPUTER ENGINEERING
**

**CEng 574 STATISTICAL DATA ANALYSIS
Fall 2020
Course Webpage**

**Instructor** Volkan Atalay

*e-mail* vatalay AT metu.edu.tr

**Class** Thursday 9:40-12:30 (on-line)

**Office Hour** TBA

**Course web page address** http://blog.metu.edu.tr/vatalay/ceng-574-statistical-data-analysis/

**Course Objectives**

The objective of this course is to introduce the concepts and techniques of clustering and multivariate and exploratory data analysis. This course also offers an opportunity to perform data analysis by using data visualization, projection and embedding.

**Prerequisites **Knowledge of programming, probability, and linear algebra.

**Main Reference Book**

Alpaydın, *Introduction to Machine Learning,* 2^{nd} Edition (2010) or 3^{rd} Edition (2014), The MIT Press.

(Yapay Öğrenme, Turkish language edition, translated by the author, Boğaziçi Üniversitesi Yayınevi, 1st Edition 2011, 2nd Edition 2013, 3rd Edition 2017)

**Course Outline**

1 Data representation, distance metrics and similarity measures

2 Linear and non-linear projection methods, data embedding methods

3 Data clustering algorithms and methods

4 Evaluation of clustering algorithms and validation of clusterings

5 Applications of data clustering in various fields such as bioinformatics and data stream analysis

**Grading
**Assignment #1, #2, #3 2pts each 6

Assignment #4, #5, #6 8pts each 24

Assignment #7, #8, #9, #10 4pts each 16

Paper submission and presentation 15

Midterm 30

Attendance and class participation 10

**Notes and Remarks**

**Students coming from graduate programs other than the Computer Engineering program should attend the first class.**

Attend the first lecture at https://cengvideo.ceng.metu.edu.tr/b/m-v-k63-d43

We will use ODTUClass for the conduct and for all of the materials for this course.

Assignments should be done on individual basis.

Dataset Analysis will be performed in a team setting of 2 persons.

R programming language will be used for the applied part of this course.

Late submission policy: you have 4 days of late submission.

Academic Integrity Guide for Students: http://oidb.metu.edu.tr/system/files/Academic%20Integrity%20Guide%20for%20Students.pdf

**Interactive Demonstrations of Some of the Algorithms from the Course**

**by Mehmet Akif Akkus, Begüm Yağmur, Shakiba R, Abdullah Al-shiabi**

Step by step *k*-means algorithm on a 2D interactive environment: http://user.ceng.metu.edu.tr/~akifakkus/courses/ceng574/k-means/

Step by step Mean-shift algorithm on a 2D interactive environment: http://user.ceng.metu.edu.tr/~akifakkus/courses/ceng574/mean-shift/

**Related Links**

**Learning R**

An Introduction to R, http://cran.r-project.org/doc/manuals/r-release/R-intro.html

HSAUR3: A Handbook of Statistical Analyses Using R (3rd Edition), Torsten Hothorn and Brian S. Everitt, Chapman & Hall/CRC, 2014, https://cran.r-project.org/web/packages/HSAUR3/

R tutorials, *December 10, 2015,* By Tal Galili, https://www.r-bloggers.com/how-to-learn-r-2/

R Tutorial: Introduction to R, https://www.youtube.com/watch?v=7cGwYMhPDUY

Introduction to Data Science with R – Data Analysis Part 1, https://www.youtube.com/watch?v=32o0DnuRjfg

Also, https://www.r-project.org/ look at “Documentation”, “Manuals” and https://cran.r-project.org/ see “Contributed”

**PCA**

A Tutorial on Principal Component Analysis, Jonathon Shlens, 2014, http://arxiv.org/pdf/1404.1100v1.pdf

A tutorial on Principal Components Analysis, Lindsay I Smith, February 26, 2002, http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf

Principal Component Analysis in R, Gregory B. Anderson, 2013, http://www.ime.usp.br/~pavan/pdf/MAE0330-PCA-R-2013

Principal Components Analysis: A How-To Manual for R, Emily Mankin, http://people.tamu.edu/~alawing/materials/ESSM689/pca.pdf

5 functions to do Principal Components Analysis in R, Gaston Sanchez,

http://www.gastonsanchez.com/visually-enforced/how-to/2012/06/17/PCA-in-R/

R-Bloggers., Computing and visualizing PCA in R, [online] 2013, http://www.r-bloggers.com/computing-and-visualizing-pca-in-r/

Step by step implementation of PCA in R using Lindsay Smith’s tutorial, http://stats.stackexchange.com/questions/90331/step-by-step-implementation-of-pca-in-r-using-lindsay-smiths-tutorial

PCA in R, Ed Boone, https://www.youtube.com/watch?v=Heh7Nv4qimU

Principal Components Analysis Using R – P1, Steve Pittard, https://www.youtube.com/watch?v=5zk93CpKYhg