Add library_name and clarify license

This PR adds the `library_name` to the model card metadata. The code examples and mention of a diffusers version suggest compatibility with the Diffusers library, making this a valuable addition for discoverability. The license is also clarified to explicitly state MIT.

Files changed (1) hide show

README.md +205 -4

README.md CHANGED Viewed

@@ -1,16 +1,18 @@
 ---
-license: other
-language:
-- en
 base_model:
 - THUDM/CogVideoX-5b
 tags:
 - video
 - video-generation
 - cogvideox
 - alibaba
-pipeline_tag: text-to-video
 ---
 <div align="center">
 <img src="icon.jpg" width="250"/>
@@ -56,6 +58,21 @@ Recent advancements in Diffusion Transformer (DiT) have demonstrated remarkable
 - `2024/08/27` We released our v2 paper including appendix.
 - `2024/07/31` We submitted our paper on arXiv and released our project page.
 ## 🎞️ Showcases
 https://github.com/user-attachments/assets/949d5e99-18c9-49d6-b669-9003ccd44bf1
@@ -66,6 +83,190 @@ https://github.com/user-attachments/assets/4026c23d-229d-45d7-b5be-6f3eb9e4fd50
 All videos are available in this [Link](https://cloudbook-public-daily.oss-cn-hangzhou.aliyuncs.com/Tora_t2v/showcases.zip)
 ## 🤝 Acknowledgements
 We would like to express our gratitude to the following open-source projects that have been instrumental in the development of our project:

 ---
 base_model:
 - THUDM/CogVideoX-5b
+language:
+- en
+license: mit
+pipeline_tag: text-to-video
 tags:
 - video
 - video-generation
 - cogvideox
 - alibaba
+library_name: diffusers
 ---
 <div align="center">
 <img src="icon.jpg" width="250"/>
 - `2024/08/27` We released our v2 paper including appendix.
 - `2024/07/31` We submitted our paper on arXiv and released our project page.
+## 📑 Table of Contents
+- [🎞️ Showcases](#%EF%B8%8F-showcases)
+- [✅ TODO List](#-todo-list)
+- [🧨 Diffusers verision](#-diffusers-verision)
+- [🐍 Installation](#-installation)
+- [📦 Model Weights](#-model-weights)
+- [🔄 Inference](#-inference)
+- [🖥️ Gradio Demo](#%EF%B8%8F-gradio-demo)
+- [🧠 Training](#-training)
+- [🎯 Troubleshooting](#-troubleshooting)
+- [🤝 Acknowledgements](#-acknowledgements)
+- [📄 Our previous work](#-our-previous-work)
+- [📚 Citation](#-citation)
 ## 🎞️ Showcases
 https://github.com/user-attachments/assets/949d5e99-18c9-49d6-b669-9003ccd44bf1
 All videos are available in this [Link](https://cloudbook-public-daily.oss-cn-hangzhou.aliyuncs.com/Tora_t2v/showcases.zip)
+## ✅ TODO List
+- [x] Release our inference code and model weights
+- [x] Provide a ModelScope Demo
+- [x] Release our training code
+- [x] Release diffusers version and optimize the GPU memory usage
+- [x] Release complete version of Tora
+## 🧨 Diffusers verision
+Please refer to [the diffusers version](diffusers-version/README.md) for details.
+## 🐍 Installation
+Please make sure your Python version is between 3.10 and 3.12, inclusive of both 3.10 and 3.12.
+```bash
+# Clone this repository.
+git clone https://github.com/alibaba/Tora.git
+cd Tora
+# Install Pytorch (we use Pytorch 2.4.0) and torchvision following the official instructions: https://pytorch.org/get-started/previous-versions/. For example:
+conda create -n tora python==3.10
+conda activate tora
+conda install pytorch==2.4.0 torchvision==0.19.0 pytorch-cuda=12.1 -c pytorch -c nvidia
+# Install requirements
+cd modules/SwissArmyTransformer
+pip install -e .
+cd ../../sat
+pip install -r requirements.txt
+cd ..
+```
+## 📦 Model Weights
+### Folder Structure
+```
+Tora
+└── sat
+    └── ckpts
+        ├── t5-v1_1-xxl
+        │   ├── model-00001-of-00002.safetensors
+        │   └── ...
+        ├── vae
+        │   └── 3d-vae.pt
+        ├── tora
+        │   ├── i2v
+        │   │   └── mp_rank_00_model_states.pt
+        │   └── t2v
+        │       └── mp_rank_00_model_states.pt
+        └── CogVideoX-5b-sat # for training stage 1
+            └── mp_rank_00_model_states.pt
+```
+### Download Links
+*Note: Downloading the `tora` weights requires following the [CogVideoX License](CogVideoX_LICENSE).* You can choose one of the following options: HuggingFace, ModelScope, or native links.\
+After downloading the model weights, you can put them in the `Tora/sat/ckpts` folder.
+#### HuggingFace
+```bash
+# This can be faster
+pip install "huggingface_hub[hf_transfer]"
+HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download Alibaba-Research-Intelligence-Computing/Tora --local-dir ckpts
+```
+or
+```bash
+# use git
+git lfs install
+git clone https://huggingface.co/Alibaba-Research-Intelligence-Computing/Tora
+```
+#### ModelScope
+- SDK
+```bash
+from modelscope import snapshot_download
+model_dir = snapshot_download('xiaoche/Tora')
+```
+- Git
+```bash
+git clone https://www.modelscope.cn/xiaoche/Tora.git
+```
+#### Native
+- Download the VAE and T5 model following [CogVideo](https://github.com/THUDM/CogVideo/blob/main/sat/README.md#2-download-model-weights):\
+    - VAE: https://cloud.tsinghua.edu.cn/f/fdba7608a49c463ba754/?dl=1
+    - T5: [text_encoder](https://huggingface.co/THUDM/CogVideoX-2b/tree/main/text_encoder), [tokenizer](https://huggingface.co/THUDM/CogVideoX-2b/tree/main/tokenizer)
+- Tora t2v model weights: [Link](https://cloudbook-public-daily.oss-cn-hangzhou.aliyuncs.com/Tora_t2v/mp_rank_00_model_states.pt). Downloading this weight requires following the [CogVideoX License](CogVideoX_LICENSE).
+## 🔄 Inference
+### Text to Video
+It requires around 30 GiB GPU memory tested on NVIDIA A100.
+```bash
+cd sat
+PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True torchrun --standalone --nproc_per_node=$N_GPU sample_video.py --base configs/tora/model/cogvideox_5b_tora.yaml configs/tora/inference_sparse.yaml --load ckpts/tora/t2v --output-dir samples --point_path trajs/coaster.txt --input-file assets/text/t2v/examples.txt
+```
+You can change the `--input-file` and `--point_path` to your own prompts and trajectory points files. Please note that the trajectory is drawn on a 256x256 canvas.
+Replace `$N_GPU` with the number of GPUs you want to use.
+### Image to Video
+```bash
+cd sat
+PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True torchrun --standalone --nproc_per_node=$N_GPU sample_video.py --base configs/tora/model/cogvideox_5b_tora_i2v.yaml configs/tora/inference_sparse.yaml --load ckpts/tora/i2v --output-dir samples --point_path trajs/sawtooth.txt --input-file assets/text/i2v/examples.txt --img_dir assets/images --image2video
+```
+The first frame images should be placed in the `--img_dir`. The names of these images should be specified in the corresponding text prompt in `--input-file`, seperated by `@@`.
+### Recommendations for Text Prompts
+For text prompts, we highly recommend using GPT-4 to enhance the details. Simple prompts may negatively impact both visual quality and motion control effectiveness.
+You can refer to the following resources for guidance:
+- [CogVideoX Documentation](https://github.com/THUDM/CogVideo/blob/main/inference/convert_demo.py)
+- [OpenSora Scripts](https://github.com/hpcaitech/Open-Sora/blob/main/scripts/inference.py)
+## 🖥️ Gradio Demo
+Usage:
+```bash
+cd sat
+python app.py --load ckpts/tora/t2v
+```
+## 🧠 Training
+### Data Preparation
+Following this guide https://github.com/THUDM/CogVideo/blob/main/sat/README.md#preparing-the-dataset, structure the datasets as follows:
+```
+.
+├── labels
+│   ├── 1.txt
+│   ├── 2.txt
+│   ├── ...
+└── videos
+    ├── 1.mp4
+    ├── 2.mp4
+    ├── ...
+```
+Training data examples are in `sat/training_examples`
+### Text to Video
+It requires around 60 GiB GPU memory tested on NVIDIA A100.
+Replace `$N_GPU` with the number of GPUs you want to use.
+- Stage 1
+```bash
+PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True torchrun --standalone --nproc_per_node=$N_GPU train_video.py --base configs/tora/model/cogvideox_5b_tora.yaml configs/tora/train_dense.yaml --experiment-name "t2v-stage1"
+```
+- Stage 2
+```bash
+PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True torchrun --standalone --nproc_per_node=$N_GPU train_video.py --base configs/tora/model/cogvideox_5b_tora.yaml configs/tora/train_sparse.yaml --experiment-name "t2v-stage2"
+```
+## 🎯 Troubleshooting
+### 1. ValueError: Non-consecutive added token...
+Upgrade the transformers package to 4.44.2. See [this](https://github.com/THUDM/CogVideo/issues/213) issue.
 ## 🤝 Acknowledgements
 We would like to express our gratitude to the following open-source projects that have been instrumental in the development of our project: