author: niplav, created: 2021-04-19, modified: 2022-02-02, language: english, status: on hold, importance: 2, confidence: likely

Solutions to the textbook “Pattern Recognition and Machine Learning” by Christopher M. Bishop.

Solutions to “Pattern Recognition and Machine Learning”

Chapter 1


(*) Consider the sum-of-squares error function given by (1.2) in which the function $y(x, \textbf{w})$ is given by the polynomial (1.1). Show that the coefficients $\textbf{w}=\{w_i\}$ that minimizes this error function are given by the solution to the following set of linear equations

$$ \sum_{j=0}^{M} A_{ij}w_{j}=T_i$$


$$A_{ij}=\sum_{n=1}^{N} (x_n)^{i+j}, T_i=\sum_{n=1}^{N} (x_n)^i t_n.$$

Here a suffix $i$ or $j$ denotes the index of a component, whereas $(x)^i$ denotes $x$ raised to the power of $i$.

Recap: formula 1.1 is

$$ y(x, \textbf{w}) = w_0 + w_1 x + w_2 x^2+ \dots +w_M x^M = \sum_{j=0}^{M} w_j x^j$$

and formula 1.2 (the error function) is

$$E(\textbf{w})=\frac{1}{2} \sum_{n=1}^{N} (y(x_n, \textbf{w})-t_n)^2$$

Substituting 1.1 into 1.2 gives

$$ E(\textbf{w})=\frac{1}{2} \sum_{n=1}^{N} (\sum_{j=0}^{M} w_j x^j-t_n)^2 $$

Differentiating after $\textbf{w}$ then returns

$$ E(\textbf{w})'=\sum_{n=1}^{N} (\sum_{j=0}^{M} (w_j x^j)'-t_n) $$

I really should learn multivariable calculus.


(*) Using the definition (1.38) show that $\text{var}[f(x)]$ satisfies (1.39).

$$ \mathbb{E}[(f(x)-\mathbb{E}[f(x)])^2]=\\ \mathbb{E}[f(x)^2-2\mathbb{E}[f(x)]f(x)+\mathbb{E}[f(x)]^2]=\\ \mathbb{E}[f(x)^2]-\mathbb{E}[2\mathbb{E}[f(x)]f(x)]+\mathbb{E}[\mathbb{E}[f(x)]^2]=\\ \mathbb{E}[f(x)^2]-2\mathbb{E}[f(x)]^2+\mathbb{E}[f(x)]^2=\\ \mathbb{E}[f(x)^2]-\mathbb{E}[f(x)]^2$$