**线性回归回顾**

## What is linear regression:

Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables or multivariables.

Above is the plot of linear regression.

## Mathmatical Function of Linear Regression:

### Hypothesis Function:

Linear regression 的优化方式是最小化Hypothesis函数，而最小化Hypothesis函数自然也就引申到了最小二乘法的损失函数。
最小二乘法也就是我们日常所说的最小平方差，一个是从图像方面可以推导，即每个点到回归线的距离的平方值。 另外对于贝叶斯学派来说，我们通过条件概率推导，也一样可以得到损失函数是最小二乘法。

### Loss function of Linear Regression:

所以对于线性回归的loss function, 这是一个bowl-shaped function. 所以通过gradient descent比较容易收敛到得到$J_{min}(W)$。

## Regularization：

Regularization is a very important technique in machine learning to prevent overfitting. Mathematically speaking, it adds a regularization term in order to prevent the coefficients to fit so perfectly to overfit. The difference between the L1 and L2 is just that L2 is the sum of the square of the weights, while L1 is just the sum of the weights. As follows:

### L1 Regularization:

Linear Regression with L1 regularization also named Lasso Regression.

### L2 Regularization:

Linear Regression with L2 regularization named Ridge Regression.
And *$\frac{1}{2}$* here is to simply computation.

### Gradient descent:

Gradient descent is a method like climbers look down from hill peek , and go down step by step. Of course, the step size could be tuned by people. Repeat until convergence:

#### Learning rate a

If $\alpha $ is small, so it would be a tiny tiny baby step. If $\alpha $ is large, it would be a large step which may fail to converage.

#### Gradient descent with regularization

Cost function like this, then derivate of cost function :

Then gradient descent will repeat :

Therefore, $\lambda$ was introduced to minimize the value of $\theta$ by minus an extra value. And this is the main purpose of regularization.