# Forward Backward Algorithm

The Forward Backward Algorithm, aka Forward Backward Splitting or Iterative Shrinkage or Iterative Soft-Thresholding is one of the most basic and widely used algorithms to solve the LAG problem. The algorithm is much more general and can solve a wide class of convex problems, including some Matrix Completion problems. Below, we describe the basic iteration and refer to variants. Background information on the general algorithm can be found in the survey article [1].

## Iteration

Consider the LAG problem

$\min_x;\|x\|_1 + \frac{\lambda}{2}\|Ax-b\|^2.$

Write this in a more general format

$\min_x\;f(x) + g(x).$

Both f and g are convex, and furthermore g is smooth (meaning that it has a continuous derivative). A basic result of convex optimization is that under mild assumptions, the answer can basically be found by setting the derivative to zero. Because f is not smooth, we generalize the "derivative" to the subdifferential, denoted by $\partial f$. Thus we seek to find some x that solves

$0 \in \partial f (x) + \nabla g(x)$.

For any τ > 0, we can equivalently write

$0 \in \partial f (x) + \tau^{-1} x - \tau^{-1} x+ \nabla g(x)$

and then multiply by τ and re-arrange to write

$x = (I + \tau \partial f)^{-1}( I - \tau\nabla g)(x)$

(we use the fact that even though $\partial f(x)$ is a set-valued operator, the operator $(I + \tau \partial f)^{-1}$ is single-valued).

Define $T=(I + \tau \partial f)^{-1}( I - \tau\nabla g)$ so that the optimality condition can be written

x = Tx

which is a fixed-point equation. When τ is sufficiently small, then T is firmly non-expansive, and this suggests that we can find a fixed point by iterating:

xk + 1 = T(xk).

Under suitable conditions, this converges, and sometimes it is possible to prove the rate of convergence of the objective function.

The operator

$(I + \tau \partial f)^{-1}$

is known as the proximity operator of f (careful: the term proximity operator has many meanings depending on the specific field of math), and it is easy to calculate for many common functions (see[1]). The proximity operator can also be seen as the solution to the minimization problem

$\text{prox}_{\tau f}(y) = \text{argmin}_x\; \tau f(x) + \frac{1}{2}\|x-y\|^2$.

In the case of $f(x) = \|x\|_1$, the proximity operator is known as soft-thresholding or shrinkage. Shrinkage acts component-wise on the input y, so the formula is defined by considering just a scalar variable y:

shrinkτ(y) = sign(y)max( | y | − τ,0).

## Variants and implementations

### Iterative Shrinkage

Applied to the LAG problem, the forward-backward algorithm is known as iterative shrinkage. Although the general form of the forward-backward algorithm has existing convergence results, the paper [2] and [3] derived specific results for the LAG problem.

Ignace Loris has a Mathematica package "L1Packv2" that implements the algorithm in [3]. See also some algorithms at Tibshirani's LASSO page.

### Fixed Point Continuation (FPC)

The series of papers [4][5] analyze the forward-backward algorithm on the LAG problem, and release implementations in Matlab. They find efficient methods for step-size choices, based on the Barzilai-Borwein paper. One of their contributions is to describe a continuation technique which first solves the LAG problem for the wrong value of λ (but which gives faster convergence), and then uses this answer to warm-start the algorithm for the correct value of λ.

#### FPC-AS

FPC-Active Set[6] is an extension to FPC that runs the standard FPC algorithm to define an active set (the set of non-zeros), then uses more powerful smooth optimization techniques to find high-accuracy solutions on the support.

### FISTA

The Fast Iterative Shrinkage-Thresholding Algorithm[7] is a algorithmic break-through that is a variant for the general forward-backward algorithm that can guarantee better convergence rates (the paper [8] also provides a similar method). The variant algorithm is just as simple as the original forward-backward algorithm. For practical use, it is recommended to use the continuation scheme used in FPC.

• SpaRSA is quite similar, and inspired the analysis in Nesterov's classic 2007 paper[8]

## References

1. 1.0 1.1 P. L. Combettes, V. R. Wajs, Signal recovery by proximal forward-backward splitting, Multiscale Modeling and Simulation, vol. 4, no. 4, pp. 1168-1200, November 2005. paper.
2. I. Daubechies, M. Defrise, C. De Mol, An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Comm. Pure and Applied Math., Vol 57, Issue 11, pp. 1413–1457, Nov. 2004, arXiv:
3. 3.0 3.1 I. Daubechies, M. Fornasier, I. Loris, Accelerated Projected Gradient Method for Linear Inverse Problems with Sparsity Constraints, J. Fourier Analysis and Applications, Vol. 14, N. 5-6, pp 764-792, arXiv:
4. E. T. Hale, W. Yin, and Y. Zhang, A fixed-point continuation method for l1-regularized minimization with applications to compressed sensing, Tech. Report CAAM/TR07-07, Rice University, Houston, TX, 2007. link
5. E. T. Hale, W. Yin, and Y. Zhang, Fixed-point continuation for l1-minimization: Methodology and convergence, SIAM J. Optim. 19 (2008), no. 3, 1107--1130. link
6. Z. Wen, W. Yin, D. Goldfarb, and Y. Zhang, A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation, SIAM J. Sci. Comput. 32 (2010), no. 4, 1832--1857. link
7. A. Beck, M. Teboulle, Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems, SIAM J. Imaging Sciences, Vol. 2 (2009), 183--202 paper
8. 8.0 8.1 Y. Nesterov, Gradient methods for minimizing composite objective function, CORE Discussion Paper 2007/76 Sept. 2007 link