Generalised Sylvester

Two MatrixEquations.jl solvers for the same equation:

gsylv(A, B, C, D, E) — Schur-based, $O(n^3)$ per solve.
gsylvkr(A, B, C, D, E) — Kronecker LU on the $n^2 \times n^2$ operator.

Both solve

\[A\,X\,B \;+\; C\,X\,D \;=\; E. \tag{S}\]

Define $G[X] = A\,X\,B + C\,X\,D$, so (S) reads $G[X] = E$. MatrixEquationsAD provides ForwardDiff Dual dispatches and Enzyme forward / reverse rules for both front-ends; the rules share their tangent / adjoint derivations and differ only in which factorisation is cached.

Implementation pointers:

ext/forwarddiff_sylvester.jl, ext/enzyme_sylvester.jl — AD front-ends for gsylv and gsylvkr.

Primal

gsylv uses the generalised-Schur (QZ) decompositions of $(A, C)$ and $(B, D)$ from MatrixEquations.jl; gsylvkr builds the Kronecker matrix directly:

\[\bigl(B^\top \otimes A + D^\top \otimes C\bigr)\,\operatorname{vec}(X) \;=\; \operatorname{vec}(E).\]

The factorisation assumption is that $G$ is nonsingular — equivalently, the operator pairs $(A - \lambda C)$ and $(D + \lambda B)$ are regular and share no common eigenvalue (MatrixEquations.jl documents this requirement).

Worked example

julia> using MatrixEquations: gsylv, gsylvkr

julia> using MatrixEquationsAD

julia> A = [4.0 0.1 0.0; -0.2 3.6 0.3; 0.1 0.0 3.8]
3×3 Matrix{Float64}:
  4.0  0.1  0.0
 -0.2  3.6  0.3
  0.1  0.0  3.8

julia> B = [3.0 0.2; -0.1 2.7]
2×2 Matrix{Float64}:
  3.0  0.2
 -0.1  2.7

julia> C = [0.2 0.0 0.0; 0.0 0.2 0.0; 0.0 0.0 0.2]
3×3 Matrix{Float64}:
 0.2  0.0  0.0
 0.0  0.2  0.0
 0.0  0.0  0.2

julia> D = [0.3 0.0; 0.0 0.3]
2×2 Matrix{Float64}:
 0.3  0.0
 0.0  0.3

julia> E = [1.0 -0.4; 0.3 0.8; -0.2 0.5]
3×2 Matrix{Float64}:
  1.0  -0.4
  0.3   0.8
 -0.2   0.5

julia> X = gsylv(A, B, C, D, E)
3×2 Matrix{Float64}:
  0.0805978  -0.0446488
  0.0362012   0.072903
 -0.017917    0.050781

julia> isapprox(A * X * B + C * X * D - E, zeros(3, 2); atol = 1.0e-12)
true

gsylvkr returns the same solution up to round-off:

julia> gsylvkr(A, B, C, D, E) ≈ X
true

ForwardDiff / Enzyme JVP

Step 1: differentiate the implicit equation. Differentiating (S),

\[G[d X] \;=\; d E \;-\; d A\,X\,B \;-\; A\,X\,d B \;-\; d C\,X\,D \;-\; C\,X\,d D.\]

i.e. $d X$ solves another generalised Sylvester equation against the same operator $G$.

Step 2: cached factorisation.

gsylv: generalised-Schur (QZ) factors of the pairs $(A, C)$ and $(B, D)$, built on the value layer.
gsylvkr: LU of $B^\top \otimes A + D^\top \otimes C$.

Step 3: solve per direction. One triangular sweep (gsylv) or LU back-substitution (gsylvkr) per tangent. Chunked-Dual partials and Enzyme BatchDuplicated tangents reuse the cached factorisation.

Step 4: code path. The AD frontends in ext/forwarddiff_sylvester.jl and ext/enzyme_sylvester.jl build the cache on the value layer and dispatch one solve per lane.

Enzyme VJP

Step 1: differentiate the implicit equation (adjoint). Solve the adjoint generalised Sylvester equation

\[G^*[Y] \;=\; A^\top\,Y\,B^\top \;+\; C^\top\,Y\,D^\top \;=\; \bar X.\]

Step 2: cached factorisation. Same QZ factors (gsylv) or Kronecker LU (gsylvkr) as the JVP, stashed on Enzyme's tape so multiple reverse cotangents reuse it.

Step 3: parameter cotangents.

\[\bar E \;\mathrel{+}=\; Y,\]

and

\[\begin{aligned} \bar A &\mathrel{-}= Y\,B^\top\,X^\top, \\ \bar B &\mathrel{-}= X^\top\,A^\top\,Y, \\ \bar C &\mathrel{-}= Y\,D^\top\,X^\top, \\ \bar D &\mathrel{-}= X^\top\,C^\top\,Y. \end{aligned}\]

The two implementations share these formulas; only the cached factorisation differs.

Step 4: code path. The augmented primal stores the cache; the reverse pass performs one adjoint solve plus the four outer-product accumulations.

Differentiating through `gsylv`

using Enzyme: Active, Const, Duplicated, Reverse, autodiff, make_zero
using LinearAlgebra: dot
using MatrixEquations: gsylv
using MatrixEquationsAD

A = [4.0 0.1 0.0; -0.2 3.6 0.3; 0.1 0.0 3.8]
B = [3.0  0.2; -0.1 2.7]
C = [0.2 0.0 0.0; 0.0 0.2 0.0; 0.0 0.0 0.2]
D = [0.3 0.0; 0.0 0.3]
E = [1.0 -0.4; 0.3 0.8; -0.2 0.5]
W = [0.7 -0.1; -0.2 0.4; 0.5 0.3]

weighted(A, B, C, D, E, W) = dot(W, gsylv(A, B, C, D, E))

dA = make_zero(A); dB = make_zero(B)
dC = make_zero(C); dD = make_zero(D); dE = make_zero(E)
autodiff(
    Reverse, weighted, Active,
    Duplicated(A, dA), Duplicated(B, dB),
    Duplicated(C, dC), Duplicated(D, dD),
    Duplicated(E, dE), Const(W),
)
# dA, dB, dC, dD, dE now hold the gradients.

References: