Linear Systems and LU
Linear Systems
\[ \begin{aligned} Ax &= b \\ A &\in \mathbb{R}^{n \times n} \\ x &\in \mathbb{R}^n \\ b &\in \mathbb{R}^n \end{aligned} \]
Case 1: Solvable
\[ \begin{aligned} \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} &= \begin{bmatrix} -1 \\ 1 \end{bmatrix} \end{aligned} \] Completely determined
Case 2: No solution
\[ \begin{aligned} \begin{bmatrix} 1 & 0 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} &= \begin{bmatrix} -1 \\ 1 \end{bmatrix} \end{aligned} \] Overdetermined
Case 3: Infinitely many solutions
\[ \begin{aligned} \begin{bmatrix} 1 & 0 \\ 1 & 0 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} &= \begin{bmatrix} -1 \\ -1 \end{bmatrix} \end{aligned} \] Underdetermined
No Other Cases
Proposition
If \(Ax = b\) has two distinct solutions \(x_0\) and \(x_1\), then it has infinitely many solutions.
Proof
\[ \begin{aligned} t\in \mathbb{R} \\ A(tx_0 + (1-t)x_1) &= tAx_0 + (1-t)Ax_1 \\ &= tb + (1-t)b \\ &= b \end{aligned} \]
Dependence on Shape
Proposition
Tall matrices admit unsolvable right hand sides.
Proof
\[ A \in \mathbb{R}^{m \times n}, m > n,\\ \text{every col} \in \mathbb{R}^m, \text{n total}. \\ \begin{aligned} n < m &\xRightarrow{} \exists v \notin col(A)\\ &\xRightarrow{} Ax = v \text{ unsolvable} \end{aligned} \]
Proposition
Wide matrices admit right hand sides with infinitely many solutions.
Proof
\[ \begin{aligned} m < n &\xRightarrow{} \text{cols are linearly dependent} \\ &\xRightarrow{}\exists y, s.t. Ay = 0, y \neq 0 \\ &\xRightarrow{} A(x + y) = Ax = b \end{aligned} \]
For this Lecture
All matrices will be:
- Square
- Invertible(nonsingular)
Foreshadowing
Even rounding entries of a matrix can change it from non-invertible to invertible.(but ill-conditioned) \[ \begin{bmatrix} 1 & 0 \\ 0 & \epsilon \end{bmatrix} \]
Row Operations: Permutation
\[ \sigma: \{1, 2, \dots, m\} \rightarrow \{1, 2, \dots, m\} \\ P_\sigma := \begin{bmatrix} - & e_{\sigma(1)}^T & - \\ - & e_{\sigma(2)}^T & - \\ - & \vdots & - \\ - & e_{\sigma(m)}^T & - \\ \end{bmatrix} \]
\(e_{i}^TA\) is the \(i\)th row of \(A\).
Row Operations: Row Scaling
\[ S_a := \begin{bmatrix} a_1 & 0 & \dots & 0 \\ 0 & a_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & a_m \end{bmatrix} \]
"Scale row k by c and add to row l":
\[ (I+ce_le_k^T)A \]
Inverting Matrices
Common rule of thumb: Do not compute \(A^{-1}\) if you do not need it.
- Not the same as solving \(Ax = b\).
- Can be slow and poorly conditioned.
Steps of Gaussian Elimination
- Forward substitution: For each row \(i\):
- Scale row to get pivot 1
- For each j > i, subtract multiple of row i from row j to zero out pivot column
- Backward substitution: For each row \(i =
m, m-1, \dots, 1\):
- For each j < i, zero out rest of column
Total runtime: \(O(n^3)\)
总共n行,每一行需要处理其下面的\(O(n)\)行,每次处理一行有\(O(n)\)次减法,总共\(O(n^3)\)
Pivoting
Permute rows and/or columns to avoid dividing by small numbers or zero.
- Partial pivoting
- Full pivoting
LU
Observation: Triangular systems can be solved in \(O(n^2)\) time
\[ E_m \dots E_2E_1Ax = E_m \dots E_2E_1b \\ \] \(E_m \dots E_2E_1 = L^{-1}\) is lower triangular.
- row scaling - diagonal
- forward substitution - \(I + ce_le_j^T, l > j\) - lower triangular
\(L^{-1}A\) is upper triangular. \[ L^{-1}A = U \]
Why is \(L^{-1}\) triangular?
\[ S:= diag(a_1, a_2, \dots) \\ S^{-1} = diag(\frac{1}{a_1}, \frac{1}{a_2}, \dots) \\ E := I + ce_le_k^T, l > k \\ E^{-1} = I - ce_le_k^T, l > k \\ \]
Proposition: Product of lower triangular matrices is LT
proof: \[ \text{Take A,B LT} \xRightarrow{} a_{ij} = b_{ij} = 0 \forall j > i \\ \text{Define } C = AB. \text{Suppose } j > i \\ C_{ij} = \sum_{k} A_{ik}B_{kj} = A_{i1}B_{1j} + \dots + A_{ij}B_{jj} + \dots + A_{in}B_{nj} = 0 \\ \]
Pivoting by Swapping Columns
\[ (E_k \dots E_2E_1)A(P_1 \dots P_{l-1}) = U \\ \xRightarrow{} A = LUP \] elimination, permutation \[ Ax = b \\ LUPx = b \\ x = P^TL^{-1}U^{-1}b \]
Linear regression
Normal equations: \(A^TAx = A^Tb\)
\(A^TA\) is Gram matrix
Regulization
Tikhonov regularization("ridge regression;" Gauss prior): \[ \min_x ||Ax - b||_2^2 + \alpha ||x||_2^2 \]
Lasso(Laplace prior):
\[ \min_x ||Ax - b||_2^2 + \alpha ||x||_1 \]
- Elastic net:
\[ \min_x ||Ax - b||_2^2 + \alpha ||x||_1 + \beta ||x||_2^2 \]
Example: Image Alignedment
\[ y_k = Ax_k + b \\ A \in \mathbb{R}^{2 \times 2}, b \in \mathbb{R}^2 \\ \min_{A,b} \sum_{k=1}^n ||Ax_k + b - y_k ||_2^2 \]
Example: Mesh Parameterization
\[ G = (V, E) \\ \textrm{Variable: } x_i \in \mathbb{R}^{2}, \textrm{position of vertex i} \\ \min_{x_1, \dots, x_n} \sum_{(i,j) \in E} ||x_i - x_j||_2^2\\ \textrm{s.t } x_i \textrm{ fixed } \forall i \in \textrm{boundary} \]
- "Tutte parameterization"
- "Harmonic parameterization": A secret differential equation model hiding in the objective function \[ \int ||\nabla f||^2 \]