Generalised Sylvester
Two MatrixEquations.jl solvers for the same equation:
gsylv(A, B, C, D, E)— Schur-based, $O(n^3)$ per solve.gsylvkr(A, B, C, D, E)— Kronecker LU on the $n^2 \times n^2$ operator.
Both solve
\[A\,X\,B \;+\; C\,X\,D \;=\; E. \tag{S}\]
Define $G[X] = A\,X\,B + C\,X\,D$, so (S) reads $G[X] = E$. MatrixEquationsAD provides ForwardDiff Dual dispatches and Enzyme forward / reverse rules for both front-ends; the rules share their tangent / adjoint derivations and differ only in which factorisation is cached.
Implementation pointers:
ext/forwarddiff_sylvester.jl,ext/enzyme_sylvester.jl— AD front-ends forgsylvandgsylvkr.
Primal
gsylv uses the generalised-Schur (QZ) decompositions of $(A, C)$ and $(B, D)$ from MatrixEquations.jl; gsylvkr builds the Kronecker matrix directly:
\[\bigl(B^\top \otimes A + D^\top \otimes C\bigr)\,\operatorname{vec}(X) \;=\; \operatorname{vec}(E).\]
The factorisation assumption is that $G$ is nonsingular — equivalently, the operator pairs $(A - \lambda C)$ and $(D + \lambda B)$ are regular and share no common eigenvalue (MatrixEquations.jl documents this requirement).
Worked example
julia> using MatrixEquations: gsylv, gsylvkr
julia> using MatrixEquationsAD
julia> A = [4.0 0.1 0.0; -0.2 3.6 0.3; 0.1 0.0 3.8]
3×3 Matrix{Float64}:
4.0 0.1 0.0
-0.2 3.6 0.3
0.1 0.0 3.8
julia> B = [3.0 0.2; -0.1 2.7]
2×2 Matrix{Float64}:
3.0 0.2
-0.1 2.7
julia> C = [0.2 0.0 0.0; 0.0 0.2 0.0; 0.0 0.0 0.2]
3×3 Matrix{Float64}:
0.2 0.0 0.0
0.0 0.2 0.0
0.0 0.0 0.2
julia> D = [0.3 0.0; 0.0 0.3]
2×2 Matrix{Float64}:
0.3 0.0
0.0 0.3
julia> E = [1.0 -0.4; 0.3 0.8; -0.2 0.5]
3×2 Matrix{Float64}:
1.0 -0.4
0.3 0.8
-0.2 0.5
julia> X = gsylv(A, B, C, D, E)
3×2 Matrix{Float64}:
0.0805978 -0.0446488
0.0362012 0.072903
-0.017917 0.050781
julia> isapprox(A * X * B + C * X * D - E, zeros(3, 2); atol = 1.0e-12)
truegsylvkr returns the same solution up to round-off:
julia> gsylvkr(A, B, C, D, E) ≈ X
trueForwardDiff / Enzyme JVP
Step 1: differentiate the implicit equation. Differentiating (S),
\[G[d X] \;=\; d E \;-\; d A\,X\,B \;-\; A\,X\,d B \;-\; d C\,X\,D \;-\; C\,X\,d D.\]
i.e. $d X$ solves another generalised Sylvester equation against the same operator $G$.
Step 2: cached factorisation.
gsylv: generalised-Schur (QZ) factors of the pairs $(A, C)$ and $(B, D)$, built on the value layer.gsylvkr: LU of $B^\top \otimes A + D^\top \otimes C$.
Step 3: solve per direction. One triangular sweep (gsylv) or LU back-substitution (gsylvkr) per tangent. Chunked-Dual partials and Enzyme BatchDuplicated tangents reuse the cached factorisation.
Step 4: code path. The AD frontends in ext/forwarddiff_sylvester.jl and ext/enzyme_sylvester.jl build the cache on the value layer and dispatch one solve per lane.
Enzyme VJP
Step 1: differentiate the implicit equation (adjoint). Solve the adjoint generalised Sylvester equation
\[G^*[Y] \;=\; A^\top\,Y\,B^\top \;+\; C^\top\,Y\,D^\top \;=\; \bar X.\]
Step 2: cached factorisation. Same QZ factors (gsylv) or Kronecker LU (gsylvkr) as the JVP, stashed on Enzyme's tape so multiple reverse cotangents reuse it.
Step 3: parameter cotangents.
\[\bar E \;\mathrel{+}=\; Y,\]
and
\[\begin{aligned} \bar A &\mathrel{-}= Y\,B^\top\,X^\top, \\ \bar B &\mathrel{-}= X^\top\,A^\top\,Y, \\ \bar C &\mathrel{-}= Y\,D^\top\,X^\top, \\ \bar D &\mathrel{-}= X^\top\,C^\top\,Y. \end{aligned}\]
The two implementations share these formulas; only the cached factorisation differs.
Step 4: code path. The augmented primal stores the cache; the reverse pass performs one adjoint solve plus the four outer-product accumulations.
Differentiating through gsylv
using Enzyme: Active, Const, Duplicated, Reverse, autodiff, make_zero
using LinearAlgebra: dot
using MatrixEquations: gsylv
using MatrixEquationsAD
A = [4.0 0.1 0.0; -0.2 3.6 0.3; 0.1 0.0 3.8]
B = [3.0 0.2; -0.1 2.7]
C = [0.2 0.0 0.0; 0.0 0.2 0.0; 0.0 0.0 0.2]
D = [0.3 0.0; 0.0 0.3]
E = [1.0 -0.4; 0.3 0.8; -0.2 0.5]
W = [0.7 -0.1; -0.2 0.4; 0.5 0.3]
weighted(A, B, C, D, E, W) = dot(W, gsylv(A, B, C, D, E))
dA = make_zero(A); dB = make_zero(B)
dC = make_zero(C); dD = make_zero(D); dE = make_zero(E)
autodiff(
Reverse, weighted, Active,
Duplicated(A, dA), Duplicated(B, dB),
Duplicated(C, dC), Duplicated(D, dD),
Duplicated(E, dE), Const(W),
)
# dA, dB, dC, dD, dE now hold the gradients.References:
- MatrixEquations.jl documents
gsylvas solving $A X B + C X D = E$ in its Sylvester solver documentation;gsylvkris the Kronecker variant. - LAPACK's DTGSYL documents the generalised Sylvester systems used in Schur-based solvers and their transposed forms.
- Kao and Hennequin derive AD rules for Sylvester-type equations in arXiv:2011.11430.