High-Dimensional Data Analysis/高维数据分析

This graduate-level course is designed for students majoring in applied statistics at the Department of Mathematics, Jinan University, taught by Weiwen Wang(王伟文). The course is mathematically rigorous, with a strong focus on high-dimensional data analysis. It includes a variety of examples and practical applications to illustrate key concepts.

Reference

Vershynin, R. High-Dimensional Probability:An Introduction with Applications in Data Science. Cambridge University Press, Cambridge, 2018.
Wainwright, M.J. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press, Cambridge, 2019.
Wright, J. & Ma, Y. High-Dimensional Data Analysis with Low-Dimensional Models. Cambridge University Press, Cambridge, 2022.
Mohri, M., Rostamizadeh, A. and Talwalkar, A. Foundations of Machine Learning. 2nd Edition, The MIT Press, 2018.

The lecture notes are heavily based on Ref. 1, Ref. 2. and the video course taught by Roman Vershynin (https://www.math.uci.edu/~rvershyn/teaching/hdp/hdp.html)

ALL THE MATERIALS ARE INTENDED FOR NON-PROFIT ACADEMIC USE. IF THEY ARE PRESENTED IMPROPERLY, PLEASE EMAIL ME TO REQUEST REMOVAL.

Syllabus

Lecture notes and homework are released from time to time. There might be typos in the released notes.

Lecture 1 [notes]
- Introductory
  - Counter-intuition of high-dimensional data
  - Non-asymptotic anaysis
  - Goals of this course
  - Review of expecation and variance
  - Some classical inequalities
  - Monte-Carlo method for integration in high-dimensional space
Lecture 2 [notes] [Eigenface]
- Approximated Caratheodory’s theorem and its applications
Lecture 3 [notes]
- Reivew of large sample laws, Markov inequality and Chebyshev’s inequality
- Concentration inequalties: Gaussian tail bounds
Lecture 4 [notes] [Erdos-Renyi Model] [video recording]
- Chernoff trick
- Sub-Gaussian variables
- Hoeffding bound
- Chernoff bound and its application
Lecture 5 [notes] [Manifold Learning]
- Sub-exponetial variables
- Bernstein-type bound
- Johnson-Lindenstrauss embedding
  - Reading
    - Martin Zinkevich, “Rules of Machine Learning: Best Practices for ML Engineering”
    - Machine Learning Checklist
Lecture 6 [notes]
- Bounded differences inequality
- Clique number in random graphs, Rademacher complexity and Gaussian complextiy
- Lipschitz functions of Gaussian variables with applications
  - $\chi^{2}$-concentration
  - Gaussian chaos variables
Lecture 7 [notes]
- Unifrom laws of large numbers
  - Glivenko-Cantelli theorem
  - A uniform law via Rademacher complexity
Lecture 8 [notes]
- Upper bounds of Rademacher Complexity
- Vapnik-Chernonenkis dimension
Lecture 9 [notes]
- Gaussian complexity and Rademacher complextiy
- Covering number and packing number
Lecture 10 [notes]
- sub-Gaussian process
- One-step discretization
- Dudley’s integral inequality and its applications
Lecture 11 [notes]
- Random process
- Dudley’s integral inequality and its application in Monte-Carlo method
Lecture 12 [notes]
- Covering number and VC dimension
- Dudley’s integral inequality’s application in statistical learning theory
Lecture 13 [notes]
- Revisit matrices
  - Spectral decomposition
  - Singular value decomposition
  - Matrix norm
- Principal component analysis
Lecture 14 [notes]
- Covariance estimation
- Semicircle law
- Marchenko-Pastur law
Lecture 15 [notes]
- Matrix caculus
- Matrix Hoeffding inequality
Lecture 16 [notes]
- Matrix Bernstein inequality and its application in community detection
Lecture 17 [notes]
- Introduction to deep generative models
  - GAN
  - VAE
  - Diffusion models

History

2025-02-06: Build