Math of Intelligence : Linear Regression
Math Of Intelligence : Linear Regression
Let be the input feature and is the output that we are interested in.
For linear regression, we need a hypothesis function that predicts y, given the input feature x.
Let us assume that y is linearly dependent on x, so our hypothesis function is:
Here ’s are the parameters(or weights). To simplify the notation, we will drop the in the subscript of and mention it simply as .
Now, we need to find a way to measure the error between our predicted output $h(x)$ and the actual value y for all our training examples.
One way to measure this error is the ordinary least squared method. TODO: Explore other cost functions
So, the cost function(or loss function)* according to the ordinary least square method will be as follows:
*there’s some debate about whether they are the same or not but for now we’ll assume they are the same
On expanding , we get
Our objective is to find the values of and that minimize the loss function.
One way to do this is by using the Gradient descent method. TODO:Explore other methods to find the global minima of a function
In this method, we first initialize randomly and then update it according to the above rule to come closer the minima with each update.
Here, is the learning rate.
Hence, in order to update , we need to find out the partial derivative of w.r.t. . In our case j = 0 and 1
w.r.t.
w.r.t.
Combining equations (4) and (6) as well as (4) and (8) we get:
The above equations can be used to update the weights and hence improve the hypothesis function with every training example.
References:
- https://see.stanford.edu/materials/aimlcs229/cs229-notes1.pdf