Subjects machine learning

Lagrange Regression

Step-by-step solutions with LaTeX - clean, fast, and student-friendly.

Search Solutions

Lagrange Regression


1. **Problem statement:** We have a noiseless dataset $D = \{(x_1,y_1), \dots, (x_N,y_N)\}$ with distinct $x_i \in \mathbb{R}$. Define the Lagrange basis functions $$\ell_i(x) = \prod_{j \neq i} \frac{x - x_j}{x_i - x_j}$$ We want to prove: (a) There exist weights $w_i$ such that $$\sum_{i=1}^N (y_i - f(x_i))^2 = 0,$$ where $$f(x) = \sum_{i=1}^N w_i \ell_i(x).$$ 2. **Part (a) - noiseless regression:** By construction, each basis function satisfies $\ell_i(x_j) = \delta_{ij}$ (the Kronecker delta), meaning $$\ell_i(x_j) = \begin{cases} 1 & \text{if } i=j \\ 0 & \text{if } i \neq j \end{cases}.$$ Substitute into $f(x_j)$: $$f(x_j) = \sum_{i=1}^N w_i \ell_i(x_j) = w_j,$$ so the prediction at $x_j$ is just $w_j$. To perfectly fit the dataset, set weights as $$w_i = y_i$$ then $$f(x_j) = y_j$$ and hence $$\sum_{i=1}^N (y_i - f(x_i))^2 = \sum_{i=1}^N (y_i - y_i)^2 = 0.$$ 3. **Part (b) - noisy dataset, squared error loss:** When the data contains noise, perfect fitting isn't usually possible or desirable. We minimize the squared error: $$\min_w \sum_{i=1}^N (y_i - f(x_i))^2 = \sum_{i=1}^N \left(y_i - \sum_{i'} w_{i'} \ell_{i'}(x_i)\right)^2.$$ Since $\ell_{i'}(x_i) = \delta_{ii'}$, the design matrix $L$ is identity: $$L_{ij} = \ell_j(x_i) = \delta_{ij}.$$ Therefore the least squares solution is $$w = L^{-1} y = y,$$ as in the noiseless case. This means the Lagrange basis interpolates the noisy data exactly, which is often undesirable due to overfitting. 4. **Part (c) - noisy dataset, absolute error loss:** If instead we minimize the absolute error: $$\min_w \sum_{i=1}^N |y_i - f(x_i)| = \sum_{i=1}^N |y_i - w_i|$$ as before $f(x_i) = w_i$. The optimal $w_i$ minimize $\sum_i |y_i - w_i|$. Because each $w_i$ corresponds exactly to one $y_i$ (due to the identity matrix), the minimum absolute error per datapoint is zero by choosing $$w_i = y_i.$$ Thus, even for absolute loss, the weights fitting the observation exactly are optimal, achieving zero absolute error. **Final answers:** \begin{itemize} \item[(a)] $w_i = y_i$, perfect interpolation, squared error zero. \item[(b)] $w_i = y_i$, least squares solution (interpolation), not regularized. \item[(c)] $w_i = y_i$, minimum absolute error is zero, achieved by interpolating points. \end{itemize}