RTX 5070 Ti unsupported? (PyTorch: no kernel image is available)

#19
by hammerill - opened

Hello, I'm trying to run this Docker image on our server having a GeForce RTX 5070 Ti GPU using this Docker Compose file (depending on Nvidia Container Toolkit to pass GPU resources):

services:
  openaudio:
    image: fishaudio/fish-speech:server-cuda
    container_name: openaudio
    restart: unless-stopped
    ports:
      - "8080:8080"
    environment:
      API_SERVER_NAME: "0.0.0.0"
      API_SERVER_PORT: "8080"
      COMPILE: "1"
    volumes:
      - ./data/checkpoints:/app/checkpoints
      - ./data/references:/app/references
      - ./data/cache:/root/.cache
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: ["gpu"]

And I'm executing these commands to do so:

# Define dirs structure
mkdir -p ./data/checkpoints ./data/references ./data/cache

# Download the OpenAudio S1 Mini model
hf download fishaudio/openaudio-s1-mini --local-dir ./data/checkpoints/openaudio-s1-mini

# Start the Docker
docker compose up

But I end up with this error:

$ docker compose up
Attaching to openaudio
openaudio  | [2025-10-28 11:32:58] Starting Fish Speech API Server...
openaudio  | [2025-10-28 11:32:58] Device args: none
openaudio  | [2025-10-28 11:32:58] Compile args: --compile
openaudio  | [2025-10-28 11:32:58] Server: 0.0.0.0:8080
openaudio  | /app/.venv/lib/python3.12/site-packages/audiotools/core/audio_signal.py:32: SyntaxWarning: invalid escape sequence '\_'
openaudio  |   """
openaudio  | /app/.venv/lib/python3.12/site-packages/audiotools/core/audio_signal.py:1012: SyntaxWarning: invalid escape sequence '\_'
openaudio  |   """Wrapper around scipy.signal.get_window so one can also get the
openaudio  | /app/.venv/lib/python3.12/site-packages/audiotools/core/audio_signal.py:1092: SyntaxWarning: invalid escape sequence '\_'
openaudio  |   """Compute how the STFT should be padded, based on match\_stride.
openaudio  | /app/.venv/lib/python3.12/site-packages/audiotools/core/audio_signal.py:1131: SyntaxWarning: invalid escape sequence '\_'
openaudio  |   """Computes the short-time Fourier transform of the audio data,
openaudio  | /app/.venv/lib/python3.12/site-packages/audiotools/core/audio_signal.py:1222: SyntaxWarning: invalid escape sequence '\_'
openaudio  |   """Computes inverse STFT and sets it to audio\_data.
openaudio  | INFO:     Started server process [15]
openaudio  | INFO:     Waiting for application startup.
openaudio  | 2025-10-28 11:33:05.319 | INFO     | fish_speech.models.text2semantic.llama:from_pretrained:432 - Loading model from checkpoints/openaudio-s1-mini, config: DualARModelArgs(model_type='dual_ar', vocab_size=155776, n_layer=28, n_head=16, dim=1024, intermediate_size=3072, n_local_heads=8, head_dim=128, rope_base=1000000, norm_eps=1e-06, max_seq_len=8192, dropout=0.0, tie_word_embeddings=False, attention_qkv_bias=False, attention_o_bias=False, attention_qk_norm=True, codebook_size=4096, num_codebooks=10, use_gradient_checkpointing=True, initializer_range=0.03125, is_reward_model=False, scale_codebook_embeddings=True, n_fast_layer=4, fast_dim=1024, fast_n_head=16, fast_n_local_heads=8, fast_head_dim=64, fast_intermediate_size=3072, fast_attention_qkv_bias=False, fast_attention_qk_norm=False, fast_attention_o_bias=False)
openaudio  | 2025-10-28 11:33:09.234 | INFO     | fish_speech.models.text2semantic.llama:from_pretrained:494 - Loaded weights with error: <All keys matched successfully>
openaudio  | /app/.venv/lib/python3.12/site-packages/torch/cuda/__init__.py:283: UserWarning:
openaudio  |     Found GPU0 NVIDIA GeForce RTX 5070 Ti which is of cuda capability 12.0.
openaudio  |     Minimum and Maximum cuda capability supported by this version of PyTorch is
openaudio  |     (5.0) - (9.0)
openaudio  |
openaudio  |   warnings.warn(
openaudio  | /app/.venv/lib/python3.12/site-packages/torch/cuda/__init__.py:304: UserWarning:
openaudio  |     Please install PyTorch with a following CUDA
openaudio  |     configurations:  12.8 12.9 following instructions at
openaudio  |     https://pytorch.org/get-started/locally/
openaudio  |
openaudio  |   warnings.warn(matched_cuda_warn.format(matched_arches))
openaudio  | /app/.venv/lib/python3.12/site-packages/torch/cuda/__init__.py:326: UserWarning:
openaudio  | NVIDIA GeForce RTX 5070 Ti with CUDA capability sm_120 is not compatible with the current PyTorch installation.
openaudio  | The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90.
openaudio  | If you want to use the NVIDIA GeForce RTX 5070 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
openaudio  |
openaudio  |   warnings.warn(
openaudio  | 2025-10-28 11:33:09.513 | INFO     | fish_speech.models.text2semantic.inference:init_model:357 - Restored model from checkpoint
openaudio  | 2025-10-28 11:33:09.513 | INFO     | fish_speech.models.text2semantic.inference:init_model:362 - Using DualARTransformer
openaudio  | 2025-10-28 11:33:09.514 | INFO     | fish_speech.models.text2semantic.inference:init_model:375 - Compiling function...
openaudio  | Exception in thread Thread-1 (worker):
openaudio  | Traceback (most recent call last):
openaudio  |   File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
openaudio  |     self.run()
openaudio  |   File "/usr/lib/python3.12/threading.py", line 1010, in run
openaudio  |     self._target(*self._args, **self._kwargs)
openaudio  |   File "/app/fish_speech/models/text2semantic/inference.py", line 544, in worker
openaudio  |     model.setup_caches(
openaudio  |   File "/app/fish_speech/models/text2semantic/llama.py", line 618, in setup_caches
openaudio  |     super().setup_caches(max_batch_size, max_seq_len, dtype)
openaudio  |   File "/app/fish_speech/models/text2semantic/llama.py", line 248, in setup_caches
openaudio  |     b.attention.kv_cache = KVCache(
openaudio  |                            ^^^^^^^^
openaudio  |   File "/app/fish_speech/models/text2semantic/llama.py", line 149, in __init__
openaudio  |     self.register_buffer("k_cache", torch.zeros(cache_shape, dtype=dtype))
openaudio  |                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
openaudio  |   File "/app/.venv/lib/python3.12/site-packages/torch/utils/_device.py", line 103, in __torch_function__
openaudio  |     return func(*args, **kwargs)
openaudio  |            ^^^^^^^^^^^^^^^^^^^^^
openaudio  | torch.AcceleratorError: CUDA error: no kernel image is available for execution on the device
openaudio  | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
openaudio  | For debugging consider passing CUDA_LAUNCH_BLOCKING=1
openaudio  | Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
openaudio  |
Gracefully Stopping... press Ctrl+C again to force
 Container openaudio  Stopping
 Container openaudio  Stopped
openaudio exited with code 137

Looks like the Docker image provides the older version of PyTorch that's not compatible anymore with our GPU.

For info, there's our nvidia-smi:

$ nvidia-smi
Tue Oct 28 12:46:04 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.169                Driver Version: 570.169        CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5070 Ti     Off |   00000000:01:00.0 Off |                  N/A |
|  0%   44C    P8             16W /  300W |      36MiB /  16303MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1333      G   /usr/lib/xorg/Xorg                       10MiB |
|    0   N/A  N/A            1483      G   /usr/bin/gnome-shell                      6MiB |
+-----------------------------------------------------------------------------------------+

Oh it looks like I've duplicated this GitHub issue.

By the way forgot to mention it works fine directly on the machine with uv and without using Docker. It's really just an issue about PyTorch version inside the image.

I have the same problem :(

UPD: I fixed this for Cuda 13 and 5070 Ti. I didn't bother with Docker because I was able to run it on the host. Here's the git diff:

diff --git a/fish_speech/inference_engine/reference_loader.py b/fish_speech/inference_engine/reference_loader.py
index 8fa6817..63814c8 100644
--- a/fish_speech/inference_engine/reference_loader.py
+++ b/fish_speech/inference_engine/reference_loader.py
@@ -31,7 +31,12 @@ class ReferenceLoader:
         self.encode_reference: Callable
 
         # Define the torchaudio backend
-        backends = torchaudio.list_audio_backends()
+        try:
+            backends = torchaudio.list_audio_backends()
+        except AttributeError:
+            # Fallback for torchaudio >= 2.4
+            backends = ["soundfile"]
+
         if "ffmpeg" in backends:
             self.backend = "ffmpeg"
         else:
diff --git a/pyproject.toml b/pyproject.toml
index 00aef4b..6f4ef41 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -48,10 +48,6 @@ dependencies = [
 ]
 
 [project.optional-dependencies]
-stable = [
-    "torch<2.9.0",
-    "torchaudio",
-]
 cpu = [
   "torch>=2.5.1",
   "torchaudio",
@@ -68,6 +64,10 @@ cu129 = [
   "torch>=2.5.1",
   "torchaudio",
 ]
+cu130 = [
+  "torch>=2.5.1",
+  "torchaudio",
+]
 
 [tool.uv]
 conflicts = [
@@ -76,6 +76,7 @@ conflicts = [
     { extra = "cu126" },
     { extra = "cu128" },
     { extra = "cu129" },
+    { extra = "cu130" },
   ],
 ]
 
@@ -85,12 +86,14 @@ torch = [
   { index = "pytorch-cu126", extra = "cu126" },
   { index = "pytorch-cu128", extra = "cu128" },
   { index = "pytorch-cu129", extra = "cu129" },
+  { index = "pytorch-cu130", extra = "cu130" },
 ]
 torchaudio = [
   { index = "pytorch-cpu", extra = "cpu" },
   { index = "pytorch-cu126", extra = "cu126" },
   { index = "pytorch-cu128", extra = "cu128" },
   { index = "pytorch-cu129", extra = "cu129" },
+  { index = "pytorch-cu130", extra = "cu130" },
 ]
 
 [[tool.uv.index]]
@@ -113,6 +116,11 @@ name = "pytorch-cu129"
 url = "https://download.pytorch.org/whl/cu129"
 explicit = true
 
+[[tool.uv.index]]
+name = "pytorch-cu130"
+url = "https://download.pytorch.org/whl/cu130"
+explicit = true

https://github.com/fishaudio/fish-speech/issues/1126 You can see the Github issue to get the solutions.

Sign up or log in to comment