Why I Abandoned Candle for libtorch in Rust (And You Might Too)
When you start a Rust project that needs machine learning, Candle looks like the obvious choice. Pure Rust. No C++ dependency. No libtorch. No system-level configuration. Cargo handles everything. You add candle-core to your Cargo.toml, run cargo build, and in theory you have a working ML runtime.
That theory lasts until your first release build on Windows.
The Appeal
I should start by saying why Candle was appealing in the first place. The narrator-tts pipeline needed to run a 1.7B parameter model locally, on Windows, with CUDA support. The project was already in Rust — the orchestrator, the IPC protocol, the worker pool, all Rust. Using a Rust ML framework meant one language, one toolchain, one build system. No FFI boundary. No C++ compiler requirements. No separate build step for native dependencies.
Candle is Hugging Face's pure-Rust ML framework. It implements tensor operations, CUDA kernels, and model loading without any dependency on PyTorch or libtorch. The API is clean. The documentation is decent. For standard operations — matrix multiplication, attention, layer normalization — it works.
The appeal was simple: if everything is Rust, the build is Cargo's problem, not mine. I have spent enough of my career wrestling with CMake and vcpkg and pkg-config. The promise of never touching a C++ build system again was genuinely exciting.
The Wall
The wall is called LNK2038.
On Windows, the C runtime library (CRT) has two linkage modes: /MD (dynamic linking, the default for most Rust projects) and /MT (static linking). When two object files linked into the same binary were compiled with different CRT settings, the MSVC linker emits LNK2038: mismatch detected for RuntimeLibrary.
This is not a Candle-specific problem. It is a Windows ABI problem that affects any project mixing libraries compiled with different CRT settings. But Candle makes it acute because the CUDA integration requires linking against precompiled native libraries — cuBLAS, cuDNN, CUDA runtime — and those libraries were compiled with specific CRT settings that do not always match what Cargo produces.
The error looks like this:
error LNK2038: mismatch detected for 'RuntimeLibrary': value 'MD_DynamicRelease' does not match value 'MT_StaticRelease' in object
And then the build fails. Not with a helpful message. Not with a suggested fix. With a linker error that points at a specific object file and a CRT mismatch that you now have to track down through a chain of transitive dependencies.
LNK2038 is not a bug. It is a fundamental impedance mismatch between how Rust's build system configures CRT linkage and how precompiled native libraries expect it. Fixing it means either recompiling the native libraries from source with the correct CRT setting (good luck recompiling CUDA) or modifying your Rust build configuration to force a specific CRT linkage that may break other dependencies.
I spent three weekends on this. Three weekends of .cargo/config.toml tweaks, RUSTFLAGS environment variables, custom build scripts, and GitHub issues from other people hitting the same wall. Some of those issues were from 2024. They were still open.
The Broader Problem
The CRT mismatch is the symptom. The broader problem is that the pure-Rust ML ecosystem is not yet complete enough for production workloads.
Candle supports a growing list of operations, but TTS models use some operations that are not fully implemented or optimized. Attention mechanisms with custom masking. Specific convolution variants. Quantized inference paths. When an operation is missing or slow in Candle, you cannot just drop down to a PyTorch equivalent — there is no PyTorch. You either implement the operation yourself in Rust, file an issue and wait, or find a workaround.
For a research project, this is fine. You pick models that Candle supports well, and you work within those constraints. For a production pipeline that needs to process 600-segment books overnight without intervention, "this operation might panic on certain inputs" is not acceptable.
The gap between "works in the examples" and "works in production" is the entire job. Candle is in the gap. It will probably close the gap eventually. But I needed to ship.
I was spending more time debugging the ML framework than building the pipeline. The orchestrator, the worker pool, the IPC protocol, the recovery system, the fidelity checker — all of that was working. The ML runtime was the bottleneck. Not because the math was wrong, but because the build would not produce a working binary on Windows with CUDA.
The Switch
I replaced Candle with tch — the Rust bindings to libtorch (PyTorch's C++ backend). This was not a decision I made happily. It meant accepting a C++ dependency. It meant users need libtorch installed. It meant the build is no longer pure Cargo.
It also meant the build works.
libtorch ships precompiled for Windows with known CRT settings. The tch crate handles the linkage correctly. CUDA integration is native — libtorch was built for it, tested for it, and the same CUDA kernels that power PyTorch power your Rust binary. When you call tensor.matmul() in tch, you are calling the exact same implementation that torch.matmul calls in Python.
The switch took a day. The tch API maps closely to PyTorch's API, so translating model code from Python reference implementations was straightforward. The tensor types, the module traits, the optimizer interface — all familiar if you have used PyTorch.
What I lost:
- Pure Rust build. Users need libtorch. This is a real cost. It complicates installation.
- Single toolchain. I now have Cargo plus libtorch plus CUDA Toolkit. Three things that must agree on versions.
- Small binary size. libtorch is large. The release binary went from a few megabytes to a few hundred.
What I gained:
- A build that works.
cargo build --releaseproduces a working binary. Every time. - Full operation coverage. Every PyTorch operation is available. No missing ops, no panics, no workarounds.
- CUDA stability. The CUDA integration is battle-tested through years of PyTorch usage.
- My weekends back.
The Pragmatic Lesson
The lesson is not that Candle is bad. It is not. Candle is an impressive project doing genuinely difficult work — reimplementing an ML runtime in pure Rust is a massive undertaking, and the team has made real progress. I still watch the project. When the CRT linkage story improves on Windows, I will try again.
The lesson is about timing and pragmatism. The right tool is the one that lets you ship. I wanted pure Rust for aesthetic and architectural reasons — one language, one build system, no FFI. Those are good reasons. They are not good enough reasons to spend three weekends fighting a linker error and still not have a working binary.
Choosing a technology because it is elegant is a luxury available to projects without deadlines. Production work has deadlines. The users do not care about your toolchain aesthetics. They care whether the audiobook gets generated.
The tch bindings are not elegant. They are a foreign function interface to a C++ library. The types do not feel native. The error messages sometimes come from libtorch's internals and are not helpful. But the code runs, the build is reproducible, and the output is correct. In production, those three properties beat elegance every time.
If you are starting a Rust ML project today, evaluate both. Try Candle first — it may work for your use case, especially on Linux, especially if your model uses standard operations. But have tch as a fallback plan. Know the signs: linker errors you cannot resolve, operations that are not implemented, CUDA paths that work in examples but fail under load. Those are signals that the ecosystem is not ready for what you need.
Switch early. Do not romanticize the struggle. The pipeline is what matters, not the framework it runs on.