Lagrange Regression
1. **Problem statement:**
We have a noiseless dataset $D = \{(x_1,y_1), \dots, (x_N,y_N)\}$ with distinct $x_i \in \mathbb{R}$. Define the Lagrange basis functions
$$\ell_i(x) = \prod_{j \neq i} \frac{x - x_j}{x_i - x_j}$$
We want to prove:
(a) There exist weights $w_i$ such that
$$\sum_{i=1}^N (y_i - f(x_i))^2 = 0,$$
where
$$f(x) = \sum_{i=1}^N w_i \ell_i(x).$$
2. **Part (a) - noiseless regression:**
By construction, each basis function satisfies $\ell_i(x_j) = \delta_{ij}$ (the Kronecker delta), meaning
$$\ell_i(x_j) = \begin{cases} 1 & \text{if } i=j \\ 0 & \text{if } i \neq j \end{cases}.$$
Substitute into $f(x_j)$:
$$f(x_j) = \sum_{i=1}^N w_i \ell_i(x_j) = w_j,$$
so the prediction at $x_j$ is just $w_j$.
To perfectly fit the dataset, set weights as
$$w_i = y_i$$
then
$$f(x_j) = y_j$$
and hence
$$\sum_{i=1}^N (y_i - f(x_i))^2 = \sum_{i=1}^N (y_i - y_i)^2 = 0.$$
3. **Part (b) - noisy dataset, squared error loss:**
When the data contains noise, perfect fitting isn't usually possible or desirable.
We minimize the squared error:
$$\min_w \sum_{i=1}^N (y_i - f(x_i))^2 = \sum_{i=1}^N \left(y_i - \sum_{i'} w_{i'} \ell_{i'}(x_i)\right)^2.$$
Since $\ell_{i'}(x_i) = \delta_{ii'}$, the design matrix $L$ is identity:
$$L_{ij} = \ell_j(x_i) = \delta_{ij}.$$
Therefore the least squares solution is
$$w = L^{-1} y = y,$$
as in the noiseless case. This means the Lagrange basis interpolates the noisy data exactly, which is often undesirable due to overfitting.
4. **Part (c) - noisy dataset, absolute error loss:**
If instead we minimize the absolute error:
$$\min_w \sum_{i=1}^N |y_i - f(x_i)| = \sum_{i=1}^N |y_i - w_i|$$
as before $f(x_i) = w_i$.
The optimal $w_i$ minimize $\sum_i |y_i - w_i|$.
Because each $w_i$ corresponds exactly to one $y_i$ (due to the identity matrix), the minimum absolute error per datapoint is zero by choosing
$$w_i = y_i.$$
Thus, even for absolute loss, the weights fitting the observation exactly are optimal, achieving zero absolute error.
**Final answers:**
\begin{itemize}
\item[(a)] $w_i = y_i$, perfect interpolation, squared error zero.
\item[(b)] $w_i = y_i$, least squares solution (interpolation), not regularized.
\item[(c)] $w_i = y_i$, minimum absolute error is zero, achieved by interpolating points.
\end{itemize}