جزوه ESL یادگیری ریاضیات

ریاضیات یادگیری Add comments

Feb 032013

Download PDF : The Elements of Statistical Learning

تعریف یادگیری با نظارت Supervised learning

Preface to the Second Edition vii

Preface to the First Edition xi

۱ Introduction 1

۲ Overview of Supervised Learning 9

۲٫۱ Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 9

۲٫۲ Variable Types and Terminology . . . . . . . . . . . . . . 9

۲٫۳ Two Simple Approaches to Prediction:

Least Squares and Nearest Neighbors . . . . . . . . . . . 11

۲٫۳٫۱ Linear Models and Least Squares . . . . . . . . 11

۲٫۳٫۲ Nearest-Neighbor Methods . . . . . . . . . . . . 14

۲٫۳٫۳ From Least Squares to Nearest Neighbors . . . . 16

۲٫۴ Statistical Decision Theory . . . . . . . . . . . . . . . . . 18

۲٫۵ Local Methods in High Dimensions . . . . . . . . . . . . . 22

۲٫۶ Statistical Models, Supervised Learning

and Function Approximation . . . . . . . . . . . . . . . . 28

۲٫۶٫۱ A Statistical Model

for the Joint Distribution Pr(X, Y ) . . . . . . . 28

۲٫۶٫۲ Supervised Learning . . . . . . . . . . . . . . . . 29

۲٫۶٫۳ Function Approximation . . . . . . . . . . . . . 29

۲٫۷ Structured Regression Models . . . . . . . . . . . . . . . 32

۲٫۷٫۱ Difficulty of the Problem . . . . . . . . . . . . . 32

xiv Contents

۲٫۸ Classes of Restricted Estimators . . . . . . . . . . . . . . 33

۲٫۸٫۱ Roughness Penalty and Bayesian Methods . . . 34

۲٫۸٫۲ Kernel Methods and Local Regression . . . . . . 34

۲٫۸٫۳ Basis Functions and Dictionary Methods . . . . 35

۲٫۹ Model Selection and the Bias–Variance Tradeoff . . . . . 37

Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 39

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

۳ Linear Methods for Regression 43

۳٫۱ Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 43

۳٫۲ Linear Regression Models and Least Squares . . . . . . . 44

۳٫۲٫۱ Example: Prostate Cancer . . . . . . . . . . . . 49

۳٫۲٫۲ The Gauss–Markov Theorem . . . . . . . . . . . 51

۳٫۲٫۳ Multiple Regression

from Simple Univariate Regression . . . . . . . . 52

۳٫۲٫۴ Multiple Outputs . . . . . . . . . . . . . . . . . 56

۳٫۳ Subset Selection . . . . . . . . . . . . . . . . . . . . . . . 57

۳٫۳٫۱ Best-Subset Selection . . . . . . . . . . . . . . . 57

۳٫۳٫۲ Forward- and Backward-Stepwise Selection . . . 58

۳٫۳٫۳ Forward-Stagewise Regression . . . . . . . . . . 60

۳٫۳٫۴ Prostate Cancer Data Example (Continued) . . 61

۳٫۴ Shrinkage Methods . . . . . . . . . . . . . . . . . . . . . . 61

۳٫۴٫۱ Ridge Regression . . . . . . . . . . . . . . . . . 61

۳٫۴٫۲ The Lasso . . . . . . . . . . . . . . . . . . . . . 68

۳٫۴٫۳ Discussion: Subset Selection, Ridge Regression

and the Lasso . . . . . . . . . . . . . . . . . . . 69

۳٫۴٫۴ Least Angle Regression . . . . . . . . . . . . . . 73

۳٫۵ Methods Using Derived Input Directions . . . . . . . . . 79

۳٫۵٫۱ Principal Components Regression . . . . . . . . 79

۳٫۵٫۲ Partial Least Squares . . . . . . . . . . . . . . . 80

۳٫۶ Discussion: A Comparison of the Selection

and Shrinkage Methods . . . . . . . . . . . . . . . . . . . 82

۳٫۷ Multiple Outcome Shrinkage and Selection . . . . . . . . 84

۳٫۸ More on the Lasso and Related Path Algorithms . . . . . 86

۳٫۸٫۱ Incremental Forward Stagewise Regression . . . 86

۳٫۸٫۲ Piecewise-Linear Path Algorithms . . . . . . . . 89

۳٫۸٫۳ The Dantzig Selector . . . . . . . . . . . . . . . 89

۳٫۸٫۴ The Grouped Lasso . . . . . . . . . . . . . . . . 90

۳٫۸٫۵ Further Properties of the Lasso . . . . . . . . . . 91

۳٫۸٫۶ Pathwise Coordinate Optimization . . . . . . . . 92

۳٫۹ Computational Considerations . . . . . . . . . . . . . . . 93

Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 94

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Contents xv

۴ Linear Methods for Classification 101

۴٫۱ Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 101

۴٫۲ Linear Regression of an Indicator Matrix . . . . . . . . . 103

۴٫۳ Linear Discriminant Analysis . . . . . . . . . . . . . . . . 106

۴٫۳٫۱ Regularized Discriminant Analysis . . . . . . . . 112

۴٫۳٫۲ Computations for LDA . . . . . . . . . . . . . . 113

۴٫۳٫۳ Reduced-Rank Linear Discriminant Analysis . . 113

۴٫۴ Logistic Regression . . . . . . . . . . . . . . . . . . . . . . 119

۴٫۴٫۱ Fitting Logistic Regression Models . . . . . . . . 120

۴٫۴٫۲ Example: South African Heart Disease . . . . . 122

۴٫۴٫۳ Quadratic Approximations and Inference . . . . 124

۴٫۴٫۴ L1 Regularized Logistic Regression . . . . . . . . 125

۴٫۴٫۵ Logistic Regression or LDA? . . . . . . . . . . . 127

۴٫۵ Separating Hyperplanes . . . . . . . . . . . . . . . . . . . 129

۴٫۵٫۱ Rosenblatt’s Perceptron Learning Algorithm . . 130

۴٫۵٫۲ Optimal Separating Hyperplanes . . . . . . . . . 132

Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 135

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

۵ Basis Expansions and Regularization 139

۵٫۱ Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 139

۵٫۲ Piecewise Polynomials and Splines . . . . . . . . . . . . . 141

۵٫۲٫۱ Natural Cubic Splines . . . . . . . . . . . . . . . 144

۵٫۲٫۲ Example: South African Heart Disease (Continued)146

۵٫۲٫۳ Example: Phoneme Recognition . . . . . . . . . 148

۵٫۳ Filtering and Feature Extraction . . . . . . . . . . . . . . 150

۵٫۴ Smoothing Splines . . . . . . . . . . . . . . . . . . . . . . 151

۵٫۴٫۱ Degrees of Freedom and Smoother Matrices . . . 153

۵٫۵ Automatic Selection of the Smoothing Parameters . . . . 156

۵٫۵٫۱ Fixing the Degrees of Freedom . . . . . . . . . . 158

۵٫۵٫۲ The Bias–Variance Tradeoff . . . . . . . . . . . . 158

۵٫۶ Nonparametric Logistic Regression . . . . . . . . . . . . . 161

۵٫۷ Multidimensional Splines . . . . . . . . . . . . . . . . . . 162

۵٫۸ Regularization and Reproducing Kernel Hilbert Spaces . 167

۵٫۸٫۱ Spaces of Functions Generated by Kernels . . . 168

۵٫۸٫۲ Examples of RKHS . . . . . . . . . . . . . . . . 170

۵٫۹ Wavelet Smoothing . . . . . . . . . . . . . . . . . . . . . 174

۵٫۹٫۱ Wavelet Bases and the Wavelet Transform . . . 176

۵٫۹٫۲ Adaptive Wavelet Filtering . . . . . . . . . . . . 179

Bibliographic Notes . . . . . . . . . . . . . . . . . . . . . . . . . 181

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

Appendix: Computational Considerations for Splines . . . . . . 186

Appendix: B-splines . . . . . . . . . . . . . . . . . . . . . 186

Appendix: Computations for Smoothing Splines . . . . .

PrefaSplines . . . . .

باشگاه دانشجویی کارشناسی ارشد دانشگاه امیرکبیر

جزوه ESL یادگیری ریاضیات

تعریف یادگیری با نظارت Supervised learning

Leave a Reply Cancel reply