Spaces:
Running
Running
| Metadata-Version: 2.1 | |
| Name: audiocraft | |
| Version: 1.4.0a1 | |
| Summary: Audio generation research library for PyTorch | |
| Home-page: https://github.com/facebookresearch/audiocraft | |
| Author: FAIR Speech & Audio | |
| Author-email: defossez@meta.com, jadecopet@meta.com | |
| License: MIT License | |
| Classifier: License :: OSI Approved :: MIT License | |
| Classifier: Topic :: Multimedia :: Sound/Audio | |
| Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence | |
| Requires-Python: >=3.8.0 | |
| Description-Content-Type: text/markdown | |
| License-File: LICENSE | |
| License-File: LICENSE_weights | |
| Requires-Dist: av==11.0.0 | |
| Requires-Dist: einops | |
| Requires-Dist: flashy>=0.0.1 | |
| Requires-Dist: hydra-core>=1.1 | |
| Requires-Dist: hydra_colorlog | |
| Requires-Dist: julius | |
| Requires-Dist: num2words | |
| Requires-Dist: numpy<2.0.0 | |
| Requires-Dist: sentencepiece | |
| Requires-Dist: spacy>=3.6.1 | |
| Requires-Dist: torch==2.1.0 | |
| Requires-Dist: torchaudio<2.1.2,>=2.0.0 | |
| Requires-Dist: huggingface_hub | |
| Requires-Dist: tqdm | |
| Requires-Dist: transformers>=4.31.0 | |
| Requires-Dist: xformers<0.0.23 | |
| Requires-Dist: demucs | |
| Requires-Dist: librosa | |
| Requires-Dist: soundfile | |
| Requires-Dist: gradio | |
| Requires-Dist: torchmetrics | |
| Requires-Dist: encodec | |
| Requires-Dist: protobuf | |
| Requires-Dist: torchvision==0.16.0 | |
| Requires-Dist: torchtext==0.16.0 | |
| Requires-Dist: pesq | |
| Requires-Dist: pystoi | |
| Provides-Extra: dev | |
| Requires-Dist: coverage; extra == "dev" | |
| Requires-Dist: flake8; extra == "dev" | |
| Requires-Dist: mypy; extra == "dev" | |
| Requires-Dist: pdoc3; extra == "dev" | |
| Requires-Dist: pytest; extra == "dev" | |
| Provides-Extra: wm | |
| Requires-Dist: audioseal; extra == "wm" | |
| # AudioCraft | |
|  | |
|  | |
|  | |
| AudioCraft is a PyTorch library for deep learning research on audio generation. AudioCraft contains inference and training code | |
| for two state-of-the-art AI generative models producing high-quality audio: AudioGen and MusicGen. | |
| ## Installation | |
| AudioCraft requires Python 3.9, PyTorch 2.1.0. To install AudioCraft, you can run the following: | |
| ```shell | |
| # Best to make sure you have torch installed first, in particular before installing xformers. | |
| # Don't run this if you already have PyTorch installed. | |
| python -m pip install 'torch==2.1.0' | |
| # You might need the following before trying to install the packages | |
| python -m pip install setuptools wheel | |
| # Then proceed to one of the following | |
| python -m pip install -U audiocraft # stable release | |
| python -m pip install -U git+https://git@github.com/facebookresearch/audiocraft#egg=audiocraft # bleeding edge | |
| python -m pip install -e . # or if you cloned the repo locally (mandatory if you want to train). | |
| python -m pip install -e '.[wm]' # if you want to train a watermarking model | |
| ``` | |
| We also recommend having `ffmpeg` installed, either through your system or Anaconda: | |
| ```bash | |
| sudo apt-get install ffmpeg | |
| # Or if you are using Anaconda or Miniconda | |
| conda install "ffmpeg<5" -c conda-forge | |
| ``` | |
| ## Models | |
| At the moment, AudioCraft contains the training code and inference code for: | |
| * [MusicGen](./docs/MUSICGEN.md): A state-of-the-art controllable text-to-music model. | |
| * [AudioGen](./docs/AUDIOGEN.md): A state-of-the-art text-to-sound model. | |
| * [EnCodec](./docs/ENCODEC.md): A state-of-the-art high fidelity neural audio codec. | |
| * [Multi Band Diffusion](./docs/MBD.md): An EnCodec compatible decoder using diffusion. | |
| * [MAGNeT](./docs/MAGNET.md): A state-of-the-art non-autoregressive model for text-to-music and text-to-sound. | |
| * [AudioSeal](./docs/WATERMARKING.md): A state-of-the-art audio watermarking. | |
| ## Training code | |
| AudioCraft contains PyTorch components for deep learning research in audio and training pipelines for the developed models. | |
| For a general introduction of AudioCraft design principles and instructions to develop your own training pipeline, refer to | |
| the [AudioCraft training documentation](./docs/TRAINING.md). | |
| For reproducing existing work and using the developed training pipelines, refer to the instructions for each specific model | |
| that provides pointers to configuration, example grids and model/task-specific information and FAQ. | |
| ## API documentation | |
| We provide some [API documentation](https://facebookresearch.github.io/audiocraft/api_docs/audiocraft/index.html) for AudioCraft. | |
| ## FAQ | |
| #### Is the training code available? | |
| Yes! We provide the training code for [EnCodec](./docs/ENCODEC.md), [MusicGen](./docs/MUSICGEN.md) and [Multi Band Diffusion](./docs/MBD.md). | |
| #### Where are the models stored? | |
| Hugging Face stored the model in a specific location, which can be overridden by setting the `AUDIOCRAFT_CACHE_DIR` environment variable for the AudioCraft models. | |
| In order to change the cache location of the other Hugging Face models, please check out the [Hugging Face Transformers documentation for the cache setup](https://huggingface.co/docs/transformers/installation#cache-setup). | |
| Finally, if you use a model that relies on Demucs (e.g. `musicgen-melody`) and want to change the download location for Demucs, refer to the [Torch Hub documentation](https://pytorch.org/docs/stable/hub.html#where-are-my-downloaded-models-saved). | |
| ## License | |
| * The code in this repository is released under the MIT license as found in the [LICENSE file](LICENSE). | |
| * The models weights in this repository are released under the CC-BY-NC 4.0 license as found in the [LICENSE_weights file](LICENSE_weights). | |
| ## Citation | |
| For the general framework of AudioCraft, please cite the following. | |
| ``` | |
| @inproceedings{copet2023simple, | |
| title={Simple and Controllable Music Generation}, | |
| author={Jade Copet and Felix Kreuk and Itai Gat and Tal Remez and David Kant and Gabriel Synnaeve and Yossi Adi and Alexandre Défossez}, | |
| booktitle={Thirty-seventh Conference on Neural Information Processing Systems}, | |
| year={2023}, | |
| } | |
| ``` | |
| When referring to a specific model, please cite as mentioned in the model specific README, e.g | |
| [./docs/MUSICGEN.md](./docs/MUSICGEN.md), [./docs/AUDIOGEN.md](./docs/AUDIOGEN.md), etc. | |