Backends and Capability Matrix¶
torch-sla dispatches each solve() call to one of several
backends. Pick a backend explicitly via backend="..." or let
backend="auto" choose based on device, dtype, problem size, and
which optional dependencies are installed.
The current backend lineup and what each supports:
Backend |
CPU |
CUDA |
Direct |
Iterative |
Complex |
Batched |
Distributed |
Autograd |
|---|---|---|---|---|---|---|---|---|
|
✔ |
– |
LU / UMFPACK |
CG, BiCGStab, GMRES |
✔ |
via batch helpers |
– |
✔ |
|
✔ |
– |
– |
CG, BiCGStab |
– |
– |
– |
✔ |
|
✔ |
✔ |
– |
CG, BiCGStab, PCG, PBiCGStab |
✔ |
✔ |
via |
✔ |
|
– |
✔ |
LU (cuSPARSE) |
CG, GMRES |
✔ |
via batch helpers |
– |
✔ |
|
– |
✔ |
LU / Cholesky / LDLT / LDLH |
– |
✔ |
– |
– |
✔ |
|
✔ |
✔ (V-cycle only) |
– |
Ruge-Stuben AMG, Smoothed Aggregation |
– |
– |
– |
✔ |
|
– |
✔ |
– |
AMG, PCG, PBiCGStab, FGMRES (NVIDIA AmgX) |
– |
– |
– |
✔ |
Platform availability¶
Direct-solver backends bind to vendor libraries; the table below records which OS each one builds on today.
Backend |
Linux |
Windows |
macOS |
Notes |
|---|---|---|---|---|
|
✔ |
✔ |
✔ |
Pure SciPy; UMFPACK optional via |
|
✔ |
✔ |
✔ |
C++ extension, compiled at install time. |
|
✔ |
✔ |
✔ |
PyTorch-native; CUDA path active when |
|
✔ |
✔ |
– |
Requires NVIDIA CUDA. CuPy has no native macOS wheels. |
|
✔ |
✔ |
– |
Requires |
|
✔ |
✔ |
✔ |
Setup runs on CPU via the optional |
When backend="auto" picks what¶
CUDA tensors: try
cudss(best direct solver) ->cupy(LU) ->pytorch(iterative fallback).CPU tensors, small / medium: prefer
scipyLU.CPU tensors, large or repeated:
pytorchCG / BiCGStab keeps the memory footprint flat.
Override via backend="..." whenever you need exact control (e.g.
backend="cudss" to force a direct GPU solve for a single
ill-conditioned system that iterative methods cannot crack).
Putting it together¶
The capability matrix maps directly to the solve()
parameters: any combination where the cell is ✔ is supported:
import torch
from torch_sla import solve, PreconditionerConfig
A_csr = ... # any accepted matrix format
b = torch.randn(n)
# Direct GPU solve, automatic Cholesky/LDL^H selection
x = solve(A_csr, b, backend="cudss", matrix_type="auto")
# CPU iterative CG with a tuned SSOR preconditioner
x = solve(A_csr, b,
backend="pytorch", method="cg",
preconditioner=PreconditionerConfig(kind="ssor", omega=1.2),
atol=1e-10, maxiter=5_000)
# CPU iterative CG with a real multi-level AMG preconditioner
# (uses PyAMG when installed, falls back to the lightweight
# 2-level stub otherwise). Reduces the iteration count by 10-100x
# on ill-conditioned PDE problems.
x = solve(A_csr, b,
backend="pytorch", method="cg",
preconditioner="amg", # or PreconditionerConfig(kind="amg", ...)
atol=1e-10, maxiter=200)
# Diagnostic return -- iteration count + residual
x, info = solve(A_csr, b, return_info=True)
print(info.iter_count, info.residual, info.converged)
Future backends (roadmap)¶
The next wave of backends will extend the table with cross-platform AMG preconditioning and high-end GPU AMG:
Backend |
Status |
Capability |
Notes |
|---|---|---|---|
|
available (this release) |
CPU AMG setup + cross-device V-cycle |
Already shipping. See above. Standalone solver +
|
|
available (this release) |
CUDA AMG + Krylov (Nvidia AmgX) |
Linux + Windows only. NVIDIA GPU required.
Install via |
|
investigating |
CPU/GPU direct + iterative, distributed (PETSc/hypre BoomerAMG) |
Linux + macOS easy; Windows via WSL. |