Spaces:

thecollabagepatch
/

magenta-retry

Running

App Files Files Community

thecollabagepatch commited on Sep 18

Commit

5b3cd33

1 Parent(s): 8381f2e

declauding the documentation some more

Browse files

Files changed (1) hide show

documentation.html +72 -69

documentation.html CHANGED Viewed

@@ -100,28 +100,28 @@
 <body>
   <div class="header">
     <h1>MagentaRT Research API</h1>
-    <p class="muted"><strong>AI Music Generation API</strong> • Real-time streaming • Custom fine-tune support</p>
-    <span class="badge">Research Project</span>
   </div>
   <div class="section">
     <h2>what this is</h2>
-    <p>This API serves Google's <a href="https://huggingface.co/google/magenta-realtime" target="_blank">MagentaRT</a> in two distinct ways. First, as a backend for our iOS app (the untitled jamming app) where users create initial loops with Stability AI's <a href="https://huggingface.co/stabilityai/stable-audio-open-small" target="_blank">stable-audio-open-small</a> and then MagentaRT jams on top of that audio context. Second, as a standalone web interface that connects directly to MagentaRT via WebSockets without any audio context.</p>
-    <p>Both modes support switching between base models and custom fine-tunes hosted on Hugging Face. This is designed as a template space for duplication, letting you experiment with real-time music generation outside of Google Colab.</p>
-    <p>This is meant to be duplicated to your own GPU-enabled space since the iOS app is still in active development and doesn't have funding to support multiple concurrent users yet.</p>
     <div class="info">
-      <strong>Hardware Requirements:</strong> Optimal performance requires an L40S GPU (48GB VRAM) for real-time streaming. L4 24GB almost works but will not achieve real-time performance (if someone knows an optimization that will solve this, please let me know).
     </div>
   </div>
   <section id="env-vars" style="margin-top: 24px;">
   <h3>environment variables (optional, but helpful)</h3>
   <p>
-    You can boot this Space directly into your own finetune by setting the variables below in
-    <em>Settings → Variables and secrets → Variables</em>. If you don't set them, you can still
     select models at runtime using <code>/model/select</code> from the frontend/API.
   </p>
@@ -132,48 +132,48 @@
       <li><code>MRT_CKPT_STEP</code> → <code>1863001</code></li>
       <li><code>MRT_SIZE</code> → <code>large</code></li>
     </ul>
-    <p style="margin:8px 0 0 0;"><small>Those values correspond to the example finetune in this repo (checkpoint_1863001.tgz on top of the <em>large</em> base).</small></p>
   </div>
   <table class="var-table" style="width:100%;border-collapse:collapse;margin:12px 0;">
     <thead>
       <tr>
-        <th style="text-align:left;border-bottom:1px solid #ddd;padding:8px;">Name</th>
-        <th style="text-align:left;border-bottom:1px solid #ddd;padding:8px;">What it does</th>
-        <th style="text-align:left;border-bottom:1px solid #ddd;padding:8px;">Example</th>
-        <th style="text-align:left;border-bottom:1px solid #ddd;padding:8px;">When to set</th>
       </tr>
     </thead>
     <tbody>
       <tr>
         <td style="padding:8px;border-bottom:1px solid #eee;"><code>MRT_CKPT_REPO</code></td>
-        <td style="padding:8px;border-bottom:1px solid #eee;">Hugging Face repo ID that hosts your finetune checkpoints/assets.</td>
         <td style="padding:8px;border-bottom:1px solid #eee;"><code>thepatch/magenta-ft</code></td>
-        <td style="padding:8px;border-bottom:1px solid #eee;">Set to make this finetune the default on boot.</td>
       </tr>
       <tr>
         <td style="padding:8px;border-bottom:1px solid #eee;"><code>MRT_CKPT_STEP</code></td>
-        <td style="padding:8px;border-bottom:1px solid #eee;">Checkpoint step number to load on boot.</td>
         <td style="padding:8px;border-bottom:1px solid #eee;"><code>1863001</code></td>
-        <td style="padding:8px;border-bottom:1px solid #eee;">Set if you want a specific checkpoint preselected.</td>
       </tr>
       <tr>
         <td style="padding:8px;border-bottom:1px solid #eee;"><code>MRT_SIZE</code></td>
-        <td style="padding:8px;border-bottom:1px solid #eee;">Base model family used by the finetune (e.g., <em>large</em>).</td>
         <td style="padding:8px;border-bottom:1px solid #eee;"><code>large</code></td>
-        <td style="padding:8px;border-bottom:1px solid #eee;">Set to match the base you finetuned from.</td>
       </tr>
       <tr>
         <td style="padding:8px;border-bottom:1px solid #eee;"><code>SPACE_MODE</code></td>
-        <td style="padding:8px;border-bottom:1px solid #eee;">Controls readiness behavior: <code>serve</code> (GPU, ready to generate) vs <code>template</code> (CPU template for duplication). If unset, the server auto-detects.</td>
         <td style="padding:8px;border-bottom:1px solid #eee;"><code>serve</code> or <code>template</code></td>
-        <td style="padding:8px;border-bottom:1px solid #eee;">Set for explicit behavior; otherwise it falls back to auto-detection.</td>
       </tr>
     </tbody>
   </table>
   <details style="margin-top:12px;">
-    <summary><strong>Alternative: select a model at runtime via API</strong></summary>
     <pre style="background:#111;color:#eee;padding:12px;border-radius:8px;overflow:auto;margin-top:8px;"><code style="background: transparent; color: inherit; padding: 0; border: 0; box-shadow: none; display: block;">curl -X POST https://&lt;your-space&gt;.hf.space/model/select \
   -H 'Content-Type: application/json' \
   -d '{
@@ -182,7 +182,7 @@
     "size": "large",
     "prewarm": true
   }'</code></pre>
-    <p style="margin:8px 0 0 0;"><small>When you call <code>prewarm:true</code>, the backend performs a bar-aligned warmup before returning, so the first jam starts hot.</small></p>
   </details>
 </section>
@@ -190,7 +190,7 @@
   <a class="btn" href="/tester" target="_blank" style="
      display:inline-block; padding:10px 14px; border-radius:8px;
      background:#111; color:#eee; text-decoration:none; border:1px solid #444;">
-    Open Realtime Web Tester
   </a>
 </p>
@@ -205,7 +205,7 @@
   <div class="section">
     <h2>overview</h2>
-    <p>This API powers AI music generation using Google's MagentaRT, designed for real-time audio streaming using finetunes hosted on HF. Built for iOS app integration with WebSocket streaming support.</p>
   </div>
   <div class="section">
@@ -250,66 +250,66 @@
     <h2>API endpoints</h2>
     <div class="endpoint">
-      <strong>POST /generate</strong> - Generate 4–8 bars of music with input audio
     </div>
     <div class="endpoint">
-      <strong>POST /generate_style</strong> - Generate music from style prompts only (experimental)
     </div>
     <div class="endpoint">
-      <strong>POST /jam/start</strong> - Start continuous jamming session
     </div>
     <div class="endpoint">
-      <strong>GET /jam/next</strong> - Get next audio chunk from session
     </div>
     <div class="endpoint">
-      <strong>POST /jam/consume</strong> - Mark chunk as consumed
     </div>
     <div class="endpoint">
-      <strong>POST /jam/stop</strong> - End jamming session
     </div>
     <div class="endpoint">
-      <strong>WEBSOCKET /ws/jam</strong> - Real-time streaming interface
     </div>
     <div class="endpoint">
-      <strong>POST /model/select</strong> - Switch between base and fine-tuned models
     </div>
   </div>
   <div class="section">
     <h2>custom fine-tuning</h2>
-    <p>Train your own MagentaRT models and use them with this API and the iOS app.</p>
     <div class="grid">
       <div class="card">
         <h3>1. train your model</h3>
-        <p>Use the official MagentaRT fine-tuning notebook:</p>
-        <p><a href="https://github.com/magenta-realtime/notebooks/blob/main/Magenta_RT_Finetune.ipynb" target="_blank">MagentaRT Fine-tuning Colab</a></p>
         <p>This will create checkpoint folders like:</p>
         <ul>
           <li><code>checkpoint_1861001/</code></li>
           <li><code>checkpoint_1862001/</code></li>
-          <li>And steering assets: <code>cluster_centroids.npy</code>, <code>mean_style_embed.npy</code></li>
         </ul>
       </div>
       <div class="card">
         <h3>2. package checkpoints</h3>
-        <p>Checkpoints must be compressed as .tgz files to preserve .zarray files correctly.</p>
         <div class="warning">
-          <strong>Important:</strong> Do not download checkpoint folders directly from Google Drive - the .zarray files won't transfer properly.
         </div>
       </div>
     </div>
     <h3>checkpoint packaging script</h3>
-    <p>Use this in a Colab cell to properly package your checkpoints:</p>
     <pre><button class="copy-btn" onclick="copyCode(this)">Copy</button># Mount Drive to access your trained checkpoints
 from google.colab import drive
 drive.mount('/content/drive')
@@ -335,7 +335,7 @@ from google.colab import files
 files.download('/content/checkpoint_1862001.tgz')</pre>
     <h3>3. upload to hugging face</h3>
-    <p>Create a model repository and upload:</p>
     <ul>
       <li>Your <code>.tgz</code> checkpoint files</li>
       <li><code>cluster_centroids.npy</code> (for steering)</li>
@@ -344,70 +344,73 @@ files.download('/content/checkpoint_1862001.tgz')</pre>
     <div class="info">
       <strong>Example Repository:</strong> <a href="https://huggingface.co/thepatch/magenta-ft" target="_blank">thepatch/magenta-ft</a><br>
-      Shows the correct file structure with .tgz files and .npy steering assets in the root directory.
     </div>
     <h3>4. use in the app</h3>
-    <p>In the iOS app's model selector, point to your Hugging Face repository URL. The app will automatically discover available checkpoints and allow switching between them.</p>
   </div>
   <div class="section">
     <h2>technical specifications</h2>
     <ul>
-      <li><strong>Audio Format:</strong> 48 kHz stereo, ~2.0s chunks with ~40ms crossfade</li>
-      <li><strong>Model Sizes:</strong> Base and Large variants available (we didn't notice any speedup in generation time using 'base' rather than 'large')</li>
-      <li><strong>Steering:</strong> Support for text prompts, audio embeddings, and centroid-based fine-tune steering</li>
-      <li><strong>Real-time Performance:</strong> L40S recommended; L4 may experience slight delays</li>
-      <li><strong>Memory Requirements:</strong> ~40GB VRAM for sustained real-time streaming</li>
     </ul>
-    <div class="warning">
-      <strong>Note:</strong> The <code>/generate_style</code> endpoint is experimental and may not properly adhere to BPM without additional context (considering metronome-based context instead of silence).
-    </div>
-  </div>
   <div class="section">
-    <h2>integration with iOS app</h2>
-    <p>This API is designed to work seamlessly with our iOS music generation app:</p>
     <ul>
-      <li>Real-time audio streaming via WebSockets</li>
-      <li>Dynamic model switching between base and fine-tuned models</li>
-      <li>Integration with stable-audio-open-small for combined input audio generation</li>
-      <li>Live parameter adjustment during generation</li>
     </ul>
   </div>
   <div class="section">
     <h2>deployment</h2>
     <p>To run your own instance:</p>
     <ol>
-      <li>Duplicate this Hugging Face Space</li>
-      <li>Ensure you have access to an L40S GPU</li>
-      <li>Point your iOS app to the new space URL (e.g., <code>https://your-username-magenta-retry.hf.space</code>)</li>
-      <li>Upload your fine-tuned models as described above</li>
     </ol>
   </div>
   <div class="section">
     <h2>support & contact</h2>
-    <p>This is an active research project. For questions, technical support, or collaboration:</p>
-    <p><strong>Email:</strong> <a href="mailto:kev@thecollabagepatch.com">kev@thecollabagepatch.com</a></p>
     <div class="info">
-      <strong>Research Status:</strong> This project is under active development. Features and API may change. We welcome feedback and contributions from the research community.
     </div>
   </div>
   <div class="section">
     <h2>licensing</h2>
-    <p>Built on Google's MagentaRT (Apache 2.0 + CC-BY 4.0). Users are responsible for their generated outputs and ensuring compliance with applicable laws and platform policies.</p>
-    <p><a href="/docs">API Reference Documentation</a></p>
   </div>
-  <div class="section">
     <h2>contributors</h2>
     <p>Kevin Griffing and Andrew Luck</p>
-  </div>
   <script>
     function copyCode(button) {

 <body>
   <div class="header">
     <h1>MagentaRT Research API</h1>
+    <p class="muted"><strong>AI Music Generation API</strong> • real-time streaming with http/ws • custom fine-tune model-switching support</p>
+    <span class="badge">research project</span>
   </div>
   <div class="section">
     <h2>what this is</h2>
+    <p>this API serves google's <a href="https://huggingface.co/google/magenta-realtime" target="_blank">magentaRT</a> in two distinct ways. first, as a backend for our iOS app (the untitled jamming app) where users create initial loops with stability ai's <a href="https://huggingface.co/stabilityai/stable-audio-open-small" target="_blank">stable-audio-open-small</a> and then MagentaRT jams on top of that audio context. Second, as a standalone web interface that connects directly to MagentaRT via WebSockets without any audio context.</p>
+    <p>both modes support switching between base models and custom fine-tunes hosted on Hugging Face. this is designed as a template space for duplication, letting you experiment with real-time music generation outside of google colab.</p>
+    <p>this is meant to be duplicated to your own GPU-enabled space since the iOS app is still in active development and doesn't have funding to support multiple concurrent users yet.</p>
     <div class="info">
+      <strong>hardware requirements:</strong> optimal performance requires an L40S GPU (48GB VRAM) for real-time streaming. L4 24GB almost works but will not achieve real-time performance (if someone knows an optimization that will solve this, please let me know).
     </div>
   </div>
   <section id="env-vars" style="margin-top: 24px;">
   <h3>environment variables (optional, but helpful)</h3>
   <p>
+    you can boot this Space directly into your own finetune by setting the variables below in
+    <em>Settings → Variables and secrets → Variables</em>. if you don't set them, you can still
     select models at runtime using <code>/model/select</code> from the frontend/API.
   </p>
       <li><code>MRT_CKPT_STEP</code> → <code>1863001</code></li>
       <li><code>MRT_SIZE</code> → <code>large</code></li>
     </ul>
+    <p style="margin:8px 0 0 0;"><small>those values correspond to the example finetune in this repo (checkpoint_1863001.tgz on top of the <em>large</em> base).</small></p>
   </div>
   <table class="var-table" style="width:100%;border-collapse:collapse;margin:12px 0;">
     <thead>
       <tr>
+        <th style="text-align:left;border-bottom:1px solid #ddd;padding:8px;">name</th>
+        <th style="text-align:left;border-bottom:1px solid #ddd;padding:8px;">what it does</th>
+        <th style="text-align:left;border-bottom:1px solid #ddd;padding:8px;">example</th>
+        <th style="text-align:left;border-bottom:1px solid #ddd;padding:8px;">when to set</th>
       </tr>
     </thead>
     <tbody>
       <tr>
         <td style="padding:8px;border-bottom:1px solid #eee;"><code>MRT_CKPT_REPO</code></td>
+        <td style="padding:8px;border-bottom:1px solid #eee;">huggingface repo ID that hosts your finetune checkpoints/assets.</td>
         <td style="padding:8px;border-bottom:1px solid #eee;"><code>thepatch/magenta-ft</code></td>
+        <td style="padding:8px;border-bottom:1px solid #eee;">set to make this finetune the default on boot.</td>
       </tr>
       <tr>
         <td style="padding:8px;border-bottom:1px solid #eee;"><code>MRT_CKPT_STEP</code></td>
+        <td style="padding:8px;border-bottom:1px solid #eee;">checkpoint step number to load on boot.</td>
         <td style="padding:8px;border-bottom:1px solid #eee;"><code>1863001</code></td>
+        <td style="padding:8px;border-bottom:1px solid #eee;">set if you want a specific checkpoint preselected.</td>
       </tr>
       <tr>
         <td style="padding:8px;border-bottom:1px solid #eee;"><code>MRT_SIZE</code></td>
+        <td style="padding:8px;border-bottom:1px solid #eee;">base model family used by the finetune (e.g., <em>large</em>).</td>
         <td style="padding:8px;border-bottom:1px solid #eee;"><code>large</code></td>
+        <td style="padding:8px;border-bottom:1px solid #eee;">set to match the base you finetuned from.</td>
       </tr>
       <tr>
         <td style="padding:8px;border-bottom:1px solid #eee;"><code>SPACE_MODE</code></td>
+        <td style="padding:8px;border-bottom:1px solid #eee;">controls readiness behavior: <code>serve</code> (GPU, ready to generate) vs <code>template</code> (CPU template for duplication). If unset, the server auto-detects.</td>
         <td style="padding:8px;border-bottom:1px solid #eee;"><code>serve</code> or <code>template</code></td>
+        <td style="padding:8px;border-bottom:1px solid #eee;">set for explicit behavior; otherwise it falls back to auto-detection.</td>
       </tr>
     </tbody>
   </table>
   <details style="margin-top:12px;">
+    <summary><strong>alternative: select a model at runtime via API</strong></summary>
     <pre style="background:#111;color:#eee;padding:12px;border-radius:8px;overflow:auto;margin-top:8px;"><code style="background: transparent; color: inherit; padding: 0; border: 0; box-shadow: none; display: block;">curl -X POST https://&lt;your-space&gt;.hf.space/model/select \
   -H 'Content-Type: application/json' \
   -d '{
     "size": "large",
     "prewarm": true
   }'</code></pre>
+    <p style="margin:8px 0 0 0;"><small>when you call <code>prewarm:true</code>, the backend performs a warmup before returning, so the first jam starts hot.</small></p>
   </details>
 </section>
   <a class="btn" href="/tester" target="_blank" style="
      display:inline-block; padding:10px 14px; border-radius:8px;
      background:#111; color:#eee; text-decoration:none; border:1px solid #444;">
+    open realtime web tester
   </a>
 </p>
   <div class="section">
     <h2>overview</h2>
+    <p>this API revolves around google's magentaRT, designed for real-time audio streaming using finetunes hosted on HF. built for iOS app integration with webSocket streaming support for web applications (and potentially VSTs inside the DAW).</p>
   </div>
   <div class="section">
     <h2>API endpoints</h2>
     <div class="endpoint">
+      <strong>POST /generate</strong> - generate 4–8 bars of music with input audio
     </div>
     <div class="endpoint">
+      <strong>POST /generate_style</strong> - generate music from style prompts only (experimental)
     </div>
     <div class="endpoint">
+      <strong>POST /jam/start</strong> - start continuous jamming session
     </div>
     <div class="endpoint">
+      <strong>GET /jam/next</strong> - get next audio chunk from session
     </div>
     <div class="endpoint">
+      <strong>POST /jam/consume</strong> - mark chunk as consumed
     </div>
     <div class="endpoint">
+      <strong>POST /jam/stop</strong> - end jamming session
     </div>
     <div class="endpoint">
+      <strong>WEBSOCKET /ws/jam</strong> - real-time streaming interface
     </div>
     <div class="endpoint">
+      <strong>POST /model/select</strong> - switch between base and fine-tuned models
     </div>
   </div>
   <div class="section">
     <h2>custom fine-tuning</h2>
+    <p>train your own MagentaRT models and use them in the web app demo and the iOS app.</p>
     <div class="grid">
       <div class="card">
         <h3>1. train your model</h3>
+        <p>use the official MagentaRT fine-tuning notebook:</p>
+        <p><a href="https://colab.research.google.com/github/magenta/magenta-realtime/blob/main/notebooks/Magenta_RT_Finetune.ipynb" target="_blank">MagentaRT Fine-tuning Colab</a></p>
         <p>This will create checkpoint folders like:</p>
         <ul>
           <li><code>checkpoint_1861001/</code></li>
           <li><code>checkpoint_1862001/</code></li>
+          <li>and steering assets: <code>cluster_centroids.npy</code>, <code>mean_style_embed.npy</code></li>
         </ul>
       </div>
       <div class="card">
         <h3>2. package checkpoints</h3>
+        <p>checkpoints must be compressed as .tgz files to preserve .zarray files correctly.</p>
         <div class="warning">
+          <strong>important:</strong> do not download checkpoint folders directly from Google Drive - the .zarray files won't transfer properly.
         </div>
       </div>
     </div>
     <h3>checkpoint packaging script</h3>
+    <p>use this in a Colab cell to properly package your checkpoints:</p>
     <pre><button class="copy-btn" onclick="copyCode(this)">Copy</button># Mount Drive to access your trained checkpoints
 from google.colab import drive
 drive.mount('/content/drive')
 files.download('/content/checkpoint_1862001.tgz')</pre>
     <h3>3. upload to hugging face</h3>
+    <p>create a model repository and upload:</p>
     <ul>
       <li>Your <code>.tgz</code> checkpoint files</li>
       <li><code>cluster_centroids.npy</code> (for steering)</li>
     <div class="info">
       <strong>Example Repository:</strong> <a href="https://huggingface.co/thepatch/magenta-ft" target="_blank">thepatch/magenta-ft</a><br>
+      shows the correct file structure with .tgz files and .npy steering assets in the root directory.
     </div>
     <h3>4. use in the app</h3>
+    <p>in the iOS app's model selector, point to your hf repository URL. the app will automatically discover available checkpoints and allow switching between them.</p>
   </div>
   <div class="section">
     <h2>technical specifications</h2>
     <ul>
+      <li><strong>audio format:</strong> 48 kHz stereo, ~2.0s chunks with ~40ms crossfade. the 4/8 bar chunks will have varying length due to input bpm</li>
+      <li><strong>model sizes:</strong> 'base' and 'large' variants available (we didn't notice any speedup in generation time using 'base' rather than 'large')</li>
+      <li><strong>steering:</strong> support for text prompts, audio embeddings, and centroid-based fine-tune steering</li>
+      <li><strong>real-time performance:</strong> L40S recommended; L4 will experience slight delays</li>
+      <li><strong>Memory Requirements:</strong> 30+GB VRAM for sustained real-time streaming</li>
     </ul>
   <div class="section">
+    <h2>a little more about the ios app</h2>
+    <p>uses http requests</p>
     <ul>
+      <li>the reseed endpoints are still under development... the idea is to re-inject the initial context with token splicing</li>
+      <li>single-shot generation endpoints (one_shot_generation.py)</li>
+      <li>the stable-audio-open-small backend is hosted by me. it generates with just 2gb GPU RAM</li>
+      <li>gradual style embed changes to try and avoid abrupt genre switches</li>
     </ul>
   </div>
+      <div class="warning">
+      <strong>Note:</strong> The <code>/generate_style</code> endpoint is experimental and may not properly adhere to BPM without additional context (considering metronome-based context instead of silence).
+    </div>
+  </div>
   <div class="section">
     <h2>deployment</h2>
     <p>To run your own instance:</p>
     <ol>
+      <li>duplicate this huggingface space by clicking the three dots in the top right</li>
+      <li>select 'run locally' if you got a 5090 or something</li>
+      <li>ensure you have access to an L40S GPU by enabling billing</li>
+      <li>point your iOS app to the new space URL (e.g., <code>https://your-username-magenta-retry.hf.space</code>)</li>
+      <li>upload your fine-tuned models to hf as described above </li>
     </ol>
   </div>
   <div class="section">
     <h2>support & contact</h2>
+    <p>this is an active research project. for questions, technical support, or collaboration:</p>
+    <p><strong>email:</strong> <a href="mailto:kev@thecollabagepatch.com">kev@thecollabagepatch.com</a></p>
     <div class="info">
+      <strong>research Status:</strong> this project is under very active development. features and API may change. We welcome feedback and contributions from the research community. im just a vibe coder.
     </div>
   </div>
   <div class="section">
     <h2>licensing</h2>
+    <p>built on google's magentaRT (Apache 2.0 + CC-BY 4.0). users are responsible for their generated outputs and ensuring compliance with applicable laws and platform policies.</p>
+    <p><a href="/docs">auto-generated API docs (for all the http requests)</a></p>
   </div>
+  <!-- <div class="section">
     <h2>contributors</h2>
     <p>Kevin Griffing and Andrew Luck</p>
+  </div> -->
   <script>
     function copyCode(button) {