In particular, find the best line fit to given data
(t_i,y_i), i.e. find the line y_i = C +D t_i which best fits the
data in a least squares sense. Note that there was confusion in class
about minimizing the error. The equation Ax=b comes from using
b=(y_i) in R^m and the matrix A is m by 2 with two columns given by
[ ones t], where t=(t_i). Then we try and find the best approximation
of b on the column space of A to find \hat b. Then there exists
C,D so that
A x = \hat b, where x = (C D)^T
to define the best line. But the confusion came about from seeing how
this solves the problem of minimizing the sum of squares of the errors
minimize sum_i (b_i - A(:,i)x)^2