Marguerite.jl v0.1: A Differentiable Frank-Wolfe Solver

I'm releasing Marguerite.jl, a differentiable Frank-Wolfe (conditional gradient) solver for convex and bilevel optimization in Julia. The package is named in honor of Marguerite Frank (1927–2024), co-inventor of the Frank-Wolfe algorithm (1956).

Marguerite.jl finds parameterized solutions to optimization problems of the form

x_\star(\theta) \in \argmin_{x \in \mathcal{C}(\theta)}\, f(x;\theta),

where $\mathcal{C}(\theta)$ is a compact convex set parameterized by $\theta$ , which is accessed via a linear minimization oracle (LMO). In particular, the LMO returns solutions to approximate program

v_\star(x_\star ; \theta) \in \argmin_{v \in \mathcal{C}(\theta)} \ \langle \nabla f, v \rangle.

This "projection-free" paradigm is useful in instances where it is undesirable to project onto the constraint set. The ideal case is where there is a directly computable solution for the linear subproblem at each iteration. It's even better if it's cheap to evaluate.

Docs: Link Code: Link

Why `Marguerite.jl`?

Marguerite.jl is built for simple and fast bilevel optimization, meaning optimization programs that appear as

\min_{\theta \in \Theta} \ u(x_\star(\theta)) \qquad \mathsf{st} \quad x_\star(\theta) \in \argmin_{x \in \mathcal{C}(\theta)}\, f(x;\theta).

It achieves this by implementing highly optimized implicit differentiation for a restrictive–-but expressive–-selection of constraints. It then abstracts that away into a simple, minimalist interface.

The key features:

Differentiable solve: Implicit differentiation via ChainRulesCore.rrule gives exact gradients $\partial x^* / \partial \theta$ at convergence, without unrolling the solver iterations. The Hessian system is solved by conjugate gradient with Hessian-vector products, so memory is $O(1)$ .
Bilevel optimization: bilevel_solve backpropagates through the solver to learn parameters of constrained inner problems. This handles objectives and constraints that depend on the outer parameters $\theta$ .
Zero-allocation inner loop: Pre-allocated buffers and @inbounds hot paths. The solver itself allocates nothing in the main loop.
Minimal API: Everything goes through solve(f, lmo, x0; ...). Provide a manual gradient with grad=, or let ForwardDiff handle it automatically.

Quick example

using Marguerite, LinearAlgebra

# Minimize a quadratic on the probability simplex
Q = [4.0 1.0; 1.0 2.0]; c = [-3.0, -1.0]
f(x) = 0.5 * dot(x, Q * x) + dot(c, x)

x, result = solve(f, ProbSimplex(), [0.5, 0.5])

Bilevel optimization

Many operations and planning problems have nested structure: An outer loss wraps a constrained inner problem. Marguerite.jl differentiates through the inner solve via implicit differentiation:

x_target = [0.7, 0.3]; θ = zeros(2); η = 0.1

inner(x, θ) = 0.5 * dot(x, x) - dot(θ, x)
outer(x) = sum((x .- x_target).^2)

x_curr = [0.5, 0.5]
for _ in 1:50
    x, dθ, _ = bilevel_solve(outer, inner, ProbSimplex(),
                               x_curr, θ)
    x_curr .= x
    θ .-= η .* dθ
end
# x_curr ≈ [0.7, 0.3]

Connection to randomized switching

Marguerite.jl grew out of my work on network reconfiguration with Dmitrii Ostrovskii and Daniel Molzahn. In that work, we optimize over the space of switching probabilities (try it live!). The structure of this problem makes Frank-Wolfe particularly effective. The graph problems example in the docs demonstrates this connection: minimizing the Laplacian pseudoinverse quadratic form over a masked knapsack polytope.

Install

Requires Julia 1.12+:

using Pkg
Pkg.add(url="https://github.com/samtalki/Marguerite.jl")

Your feedback is welcome – please open issues on GitHub.