This is a little vignette about using TFOCS to solve linear inverse problems, with a focus on problems from imaging and compressed sensing, and it covers some basic ideas in the TFOCS package.
From a generic point-of-view, linear inverse problems assume that we have data collected via (noisy) linear measurements:
To recover the unknown signal
, we look at the set of signals that agree with the measurements:
for a reasonable value of
.
To select the “best” signal from this class, there are many options. A common choice is choosing the signal with least norm, for some choice of norm. In particular, the
norm is often used because it induces sparsity. For example, we have the Basis Pursuit De-Noising (BPDN) problem:
There are quite a few algorithms that solve similar problems, such as
which are equivalent for some choice of
and
, but this relation is not usually known. There are many algorithms to solve LASSO, but our algorithm, TFOCS, is one of the few first-order algorithms that can efficiently solve the constrained version of this problem (BPDN).
TFOCS can also deal with variations. Consider the following problem:
where
is a wavelet transform (most natural images have wavelet coefficients that decay quickly) and
is the total-variation norm, which penalizes changes in an image. Below are some results on the classic “cameraman” test image, where we have applied various formulations that we solved with TFOCS. This is a denoising problem, so the linear operator
is the identity, and we add noise
.
![]() The original image without noise |
![]() The noisy version of the image (this is what we want to clean-up) |
![]() The cleaned-up image if we use wavelet thresholding (using an oracle to pick the optimal threshold). There are some artifacts. |
![]() The cleaned-up image using the wavelet and TV composite function (see equation above). The TV norm helps reduce the artifacts. |
To solve the problems efficiently, it is preferable to minimize a smooth function rather than a non-smooth function, because gradient descent methods have better convergence rates than sub-gradient descent methods.
The approach of TFOCS is to add a strongly convex term to the primal problem. This doesn't make the primal smooth, but rather it makes the dual smooth, and so we solve the dual problem via gradient descent (using an optimal version of gradient descent, such as Nesterov's algorithm). The picture below shows the effect of the strongly convex term added to the
norm (the red line, labeled “dual smoothed”). As you can see, it retains the sharp kink at 0, which is important since this is what promotes sparsity. Also plotted is the Huber function (the green line, labeled “primal smoothed”), which is a smooth approximation of the
norm that we used in NESTA.

The smoothing is controlled by a parameter
, where the extra term added to the objective is
for some
. That is, we replace an optimization problem like
with
We want
to be small so that we closely approximate the original problem, but we also want it large because then the smoothed problem is easier to solve (i.e. faster convergence). The standard solution is to use continuation: we solve for a large value of
and then use this solution to “seed” the algorithm using a small value of
. Continuation can also be done by keeping
fixed and updating
with a better guess.
The TFOCS paper describes an “accelerated continuation” scheme that improves on this. We analyze the fixed-
version of continuation in terms of the proximal point method, and show that continuation is just the gradient step of a proximal point method. Thus we can apply Nesterov's accelerated gradient algorithm. The plot below shows this: we are plotting the error between the smoothed solution and non-smoothed solution. Each iteration is an update of
in either the simple fashion (“regular continuation”) or the accelerated fashion.
TFOCS also discusses some cases where the strongly convex parameter doesn't affect the solution, in so-called “exact regularization”. Simple cases have been known since the 1970s; more complete results appear in Exact regularization of convex programs by Friedlander and Tseng (2007).

Many compressed sensing problems enjoy a local strong convexity, and it is possible to exploit this using a restarted Nesterov algorithm. Note that, unlike gradient descent, accelerated algorithms do not automatically have linear convergence for strongly convex objectives.
In TFOCS, we explored the idea of automatic restarts, which restart whenever the function value increases. After this, the idea was explored in more depth by Brendan O'Donoghue and Emmanuel Candès in Adaptive resart for accelerated gradient schemes.

The 1990s saw the explosion of interior-point methods (IPM), and by the end of the decade, IPM solvers were mature and robust in both theory and practice. However, they do have scaling issues: for a primal variable of dimension
, each step requires the solving of a linear system on the order of
(or even larger, depending on the number of constraints), so this is automatically an order
algorithm. Even worse, the number of steps scales slightly in
. The KKT system can sometimes be solved with an iterative solver like conjugate gradient, but this introduces other issues.
Below are some plots of a very respected IPM solver (SeDuMi, called via cvx), as well as l1-Magic (which is custom for the basis pursuit problem and uses conjugate gradient to solve the KKT system). You can see that the TFOCS first-order method is way faster at large problem sizes, and (not shown) requires a small memory footprint. SeDuMi required more than 4 GB of RAM on the largest problem and consequently memory calls thrashed.
The plot shows two curves for each algorithm: one curve used a loose tolerance, and one used a higher tolerance. LIPSOL is a Mehrotra-based primal-dual IPM built into MATLAB's linprog, and Simplex is also using MATLAB's implementation in linprog.

But are first-order methods like TFOCS as accurate as IPM methods? For this problem (a noiseless basis pursuit problem, with a sparse signal), the answer is yes. In fact, TFOCS returns an answer exact to machine precision, and exactly identifies the support, whereas IPM never exactly find zero entries (since, by definition, their solution is always on the interior).
Also, the Simplex method failed on most of the problem instances, and SeDuMi failed on some of the larger problems (it returned NaN, so those points are not plotted).

Much of my work involves algorithms that arise from compressed sensing. There are already many excellent overviews of compressed sensing on the web, so I just list a few here:
Rice University maintains the de facto standard of compressed sensing and compressive sampling resources.
Terence Tao's blog posts about CS
For an general overview of compressed sensing, the March 2008 issue of IEEE Signal Processing magazine is devoted to compressed sensing and provides an excellent introduction. (free PDF)
Optimization Online is a repository of e-prints about optimization algorithms.