Add Neuron-optimized files for black-forest-labs/FLUX.1-schnell

#1
by Jingya HF Staff - opened
README.md CHANGED
@@ -1,38 +1,82 @@
1
  ---
 
 
2
  license: apache-2.0
 
 
 
 
 
3
  ---
4
 
5
- Exported with
6
 
7
- ```bash
8
- optimum-cli export neuron --model black-forest-labs/FLUX.1-schnell --tensor_parallel_size 8 --batch_size 1 --height 1024 --width 1024 --num_images_per_prompt 1 --sequence_length 256 --torch_dtype bfloat16 flux_schnell_neuron_1024_tp8/
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ```
10
 
11
- Or
12
 
13
  ```python
14
- # [Export]
15
  import torch
16
- from optimum.neuron import NeuronFluxInpaintPipeline
17
-
18
- if __name__ == "__main__":
19
- compiler_args = {"auto_cast": "none"}
20
- input_shapes = {"batch_size": 1, "height": 1024, "width": 1024, "sequence_length": 256}
21
-
22
- pipe = NeuronFluxInpaintPipeline.from_pretrained(
23
- "black-forest-labs/FLUX.1-schnell",
24
- torch_dtype=torch.bfloat16,
25
- export=True,
26
- tensor_parallel_size=8,
27
- **compiler_args,
28
- **input_shapes
29
- )
30
-
31
- # Save locally
32
- pipe.save_pretrained("flux_schnell_neuron_1024x1024_tp8/")
33
-
34
- # Upload to the HuggingFace Hub
35
- pipe.push_to_hub(
36
- "flux_schnell_neuron_1024x1024_tp8/", repository_id="Jingya/Flux.1-Schnell-1024x1024-neuronx-tp8" # Replace with your HF Hub repo id
37
- )
38
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: apache-2.0
5
+ tags:
6
+ - text-to-image
7
+ - image-generation
8
+ - flux
9
+ - neuron
10
  ---
11
 
12
+ ![FLUX.1 [schnell] Grid](./schnell_grid.jpeg)
13
 
14
+ `FLUX.1 [schnell]` is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions.
15
+ For more information, please read our [blog post](https://blackforestlabs.ai/announcing-black-forest-labs/).
16
+
17
+ # Key Features
18
+ 1. Cutting-edge output quality and competitive prompt following, matching the performance of closed source alternatives.
19
+ 2. Trained using latent adversarial diffusion distillation, `FLUX.1 [schnell]` can generate high-quality images in only 1 to 4 steps.
20
+ 3. Released under the `apache-2.0` licence, the model can be used for personal, scientific, and commercial purposes.
21
+
22
+ # Usage
23
+ We provide a reference implementation of `FLUX.1 [schnell]`, as well as sampling code, in a dedicated [github repository](https://github.com/black-forest-labs/flux).
24
+ Developers and creatives looking to build on top of `FLUX.1 [schnell]` are encouraged to use this as a starting point.
25
+
26
+ ## API Endpoints
27
+ The FLUX.1 models are also available via API from the following sources
28
+ - [bfl.ml](https://docs.bfl.ml/) (currently `FLUX.1 [pro]`)
29
+ - [replicate.com](https://replicate.com/collections/flux)
30
+ - [fal.ai](https://fal.ai/models/fal-ai/flux/schnell)
31
+ - [mystic.ai](https://www.mystic.ai/black-forest-labs/flux1-schnell)
32
+
33
+ ## ComfyUI
34
+ `FLUX.1 [schnell]` is also available in [Comfy UI](https://github.com/comfyanonymous/ComfyUI) for local inference with a node-based workflow.
35
+
36
+ ## Diffusers
37
+ To use `FLUX.1 [schnell]` with the 🧨 diffusers python library, first install or upgrade diffusers
38
+
39
+ ```shell
40
+ pip install -U diffusers
41
  ```
42
 
43
+ Then you can use `FluxPipeline` to run the model
44
 
45
  ```python
 
46
  import torch
47
+ from diffusers import FluxPipeline
48
+
49
+ pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
50
+ pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power
51
+
52
+ prompt = "A cat holding a sign that says hello world"
53
+ image = pipe(
54
+ prompt,
55
+ guidance_scale=0.0,
56
+ num_inference_steps=4,
57
+ max_sequence_length=256,
58
+ generator=torch.Generator("cpu").manual_seed(0)
59
+ ).images[0]
60
+ image.save("flux-schnell.png")
 
 
 
 
 
 
 
 
61
  ```
62
+
63
+ To learn more check out the [diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux) documentation
64
+
65
+ ---
66
+ # Limitations
67
+ - This model is not intended or able to provide factual information.
68
+ - As a statistical model this checkpoint might amplify existing societal biases.
69
+ - The model may fail to generate output that matches the prompts.
70
+ - Prompt following is heavily influenced by the prompting-style.
71
+
72
+ # Out-of-Scope Use
73
+ The model and its derivatives may not be used
74
+
75
+ - In any way that violates any applicable national, federal, state, local or international law or regulation.
76
+ - For the purpose of exploiting, harming or attempting to exploit or harm minors in any way; including but not limited to the solicitation, creation, acquisition, or dissemination of child exploitative content.
77
+ - To generate or disseminate verifiably false information and/or content with the purpose of harming others.
78
+ - To generate or disseminate personal identifiable information that can be used to harm an individual.
79
+ - To harass, abuse, threaten, stalk, or bully individuals or groups of individuals.
80
+ - To create non-consensual nudity or illegal pornographic content.
81
+ - For fully automated decision making that adversely impacts an individual's legal rights or otherwise creates or modifies a binding, enforceable obligation.
82
+ - Generating or facilitating large-scale disinformation campaigns.
model_index.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "_class_name": "FluxPipeline",
3
- "_diffusers_version": "0.34.0",
4
  "_name_or_path": "black-forest-labs/FLUX.1-schnell",
5
  "feature_extractor": [
6
  null,
 
1
  {
2
  "_class_name": "FluxPipeline",
3
+ "_diffusers_version": "0.35.1",
4
  "_name_or_path": "black-forest-labs/FLUX.1-schnell",
5
  "feature_extractor": [
6
  null,
scheduler/scheduler_config.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
  "_class_name": "FlowMatchEulerDiscreteScheduler",
3
- "_diffusers_version": "0.34.0",
4
  "base_image_seq_len": 256,
5
  "base_shift": 0.5,
6
  "invert_sigmas": false,
 
1
  {
2
  "_class_name": "FlowMatchEulerDiscreteScheduler",
3
+ "_diffusers_version": "0.35.1",
4
  "base_image_seq_len": 256,
5
  "base_shift": 0.5,
6
  "invert_sigmas": false,
text_encoder/config.json CHANGED
@@ -17,17 +17,15 @@
17
  "max_position_embeddings": 77,
18
  "model_type": "clip_text_model",
19
  "neuron": {
20
- "auto_cast": "none",
21
  "auto_cast_type": "bf16",
22
  "compiler_type": "neuronx-cc",
23
  "compiler_version": "2.19.8089.0+8ab9f450",
24
  "dynamic_batch_size": false,
25
- "float_dtype": "fp32",
26
  "inline_weights_to_neff": true,
27
  "input_names": [
28
  "input_ids"
29
  ],
30
- "int_dtype": "int64",
31
  "model_type": "clip-text-model",
32
  "optlevel": "2",
33
  "output_attentions": false,
@@ -47,6 +45,6 @@
47
  "projection_dim": 768,
48
  "torch_dtype": "bfloat16",
49
  "torchscript": true,
50
- "transformers_version": "4.51.0",
51
  "vocab_size": 49408
52
  }
 
17
  "max_position_embeddings": 77,
18
  "model_type": "clip_text_model",
19
  "neuron": {
20
+ "auto_cast": "matmul",
21
  "auto_cast_type": "bf16",
22
  "compiler_type": "neuronx-cc",
23
  "compiler_version": "2.19.8089.0+8ab9f450",
24
  "dynamic_batch_size": false,
 
25
  "inline_weights_to_neff": true,
26
  "input_names": [
27
  "input_ids"
28
  ],
 
29
  "model_type": "clip-text-model",
30
  "optlevel": "2",
31
  "output_attentions": false,
 
45
  "projection_dim": 768,
46
  "torch_dtype": "bfloat16",
47
  "torchscript": true,
48
+ "transformers_version": "4.51.3",
49
  "vocab_size": 49408
50
  }
text_encoder/model.neuron CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:b06e7db406a36b5fff59da6901ece973c57b0eb682dd0033e737d797dec79d10
3
- size 307199155
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e3d2eaa1c907fa795d1bbf9b352bbffba666890318c1633ad81823c410f6e45
3
+ size 307199283
text_encoder_2/config.json CHANGED
@@ -19,17 +19,15 @@
19
  "layer_norm_epsilon": 1e-06,
20
  "model_type": "t5",
21
  "neuron": {
22
- "auto_cast": "none",
23
  "auto_cast_type": "bf16",
24
  "compiler_type": "neuronx-cc",
25
  "compiler_version": "2.19.8089.0+8ab9f450",
26
  "dynamic_batch_size": false,
27
- "float_dtype": "fp32",
28
  "inline_weights_to_neff": true,
29
  "input_names": [
30
  "input_ids"
31
  ],
32
- "int_dtype": "int64",
33
  "model_type": "t5-encoder",
34
  "optlevel": "2",
35
  "output_attentions": false,
@@ -38,9 +36,9 @@
38
  "last_hidden_state"
39
  ],
40
  "static_batch_size": 1,
41
- "static_sequence_length": 256,
42
  "task": "feature-extraction",
43
- "tensor_parallel_size": 8
44
  },
45
  "num_decoder_layers": 24,
46
  "num_heads": 64,
@@ -51,7 +49,7 @@
51
  "relative_attention_num_buckets": 32,
52
  "tie_word_embeddings": false,
53
  "torch_dtype": "bfloat16",
54
- "transformers_version": "4.51.0",
55
  "use_cache": true,
56
  "vocab_size": 32128
57
  }
 
19
  "layer_norm_epsilon": 1e-06,
20
  "model_type": "t5",
21
  "neuron": {
22
+ "auto_cast": "matmul",
23
  "auto_cast_type": "bf16",
24
  "compiler_type": "neuronx-cc",
25
  "compiler_version": "2.19.8089.0+8ab9f450",
26
  "dynamic_batch_size": false,
 
27
  "inline_weights_to_neff": true,
28
  "input_names": [
29
  "input_ids"
30
  ],
 
31
  "model_type": "t5-encoder",
32
  "optlevel": "2",
33
  "output_attentions": false,
 
36
  "last_hidden_state"
37
  ],
38
  "static_batch_size": 1,
39
+ "static_sequence_length": 512,
40
  "task": "feature-extraction",
41
+ "tensor_parallel_size": 4
42
  },
43
  "num_decoder_layers": 24,
44
  "num_heads": 64,
 
49
  "relative_attention_num_buckets": 32,
50
  "tie_word_embeddings": false,
51
  "torch_dtype": "bfloat16",
52
+ "transformers_version": "4.51.3",
53
  "use_cache": true,
54
  "vocab_size": 32128
55
  }
text_encoder_2/model.neuron/tp_0.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f862ee5d89ecf4bb54d731fe141e27b41989913d52e60a8ddaa7d58dddde2415
3
- size 2353450084
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ecab0268d019630104a550ef8c939671136976082a51b04c50487bcedde9f9b
3
+ size 2638601250
text_encoder_2/model.neuron/tp_1.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3d699573d836aca259c61ebc1b84fcda0cc05e10d46a2f910d96175b31cb2365
3
- size 2353450084
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97bb8b8e284f86fcbc2d14242e6987162e582e4fad18d14d4c5491989d7b42f5
3
+ size 2638601246
text_encoder_2/model.neuron/tp_2.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ce6929ccae128108d1e809f80f7adfab864c6106866e09c5bba45cdf6cb3cab7
3
- size 2353450084
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b3d7d46212df1a9896237b13b535b325d4a0fc07e9e281801e6f3eddb35dea25
3
+ size 2638600234
text_encoder_2/model.neuron/tp_3.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:87ebc6b7d1b98803058d3c1c0a28461dac122e1a2daa2f9eded153777a8f64a2
3
- size 2353450078
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d6f9acfab2caa330aa22006c1b64955a2a7fc94e8e38c3c1a217dbbbb96af899
3
+ size 2638601252
transformer/config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "_class_name": "FluxTransformer2DModel",
3
  "_commit_hash": null,
4
- "_diffusers_version": "0.34.0",
5
  "_use_default_values": [
6
  "axes_dims_rope",
7
  "out_channels"
@@ -16,12 +16,11 @@
16
  "in_channels": 64,
17
  "joint_attention_dim": 4096,
18
  "neuron": {
19
- "auto_cast": "none",
20
  "auto_cast_type": "bf16",
21
  "compiler_type": "neuronx-cc",
22
  "compiler_version": "2.19.8089.0+8ab9f450",
23
  "dynamic_batch_size": false,
24
- "float_dtype": "bf16",
25
  "inline_weights_to_neff": true,
26
  "input_names": [
27
  "hidden_states",
@@ -30,7 +29,6 @@
30
  "timestep",
31
  "image_rotary_emb"
32
  ],
33
- "int_dtype": "int64",
34
  "model_type": "flux-transformer-2d",
35
  "optlevel": "2",
36
  "output_attentions": false,
@@ -44,11 +42,11 @@
44
  "static_num_channels": 64,
45
  "static_patch_size": 1,
46
  "static_rotary_axes_dim": 128,
47
- "static_sequence_length": 256,
48
  "static_vae_scale_factor": 8,
49
  "static_width": 128,
50
  "task": "semantic-segmentation",
51
- "tensor_parallel_size": 8
52
  },
53
  "num_attention_heads": 24,
54
  "num_layers": 19,
 
1
  {
2
  "_class_name": "FluxTransformer2DModel",
3
  "_commit_hash": null,
4
+ "_diffusers_version": "0.35.1",
5
  "_use_default_values": [
6
  "axes_dims_rope",
7
  "out_channels"
 
16
  "in_channels": 64,
17
  "joint_attention_dim": 4096,
18
  "neuron": {
19
+ "auto_cast": "matmul",
20
  "auto_cast_type": "bf16",
21
  "compiler_type": "neuronx-cc",
22
  "compiler_version": "2.19.8089.0+8ab9f450",
23
  "dynamic_batch_size": false,
 
24
  "inline_weights_to_neff": true,
25
  "input_names": [
26
  "hidden_states",
 
29
  "timestep",
30
  "image_rotary_emb"
31
  ],
 
32
  "model_type": "flux-transformer-2d",
33
  "optlevel": "2",
34
  "output_attentions": false,
 
42
  "static_num_channels": 64,
43
  "static_patch_size": 1,
44
  "static_rotary_axes_dim": 128,
45
+ "static_sequence_length": 512,
46
  "static_vae_scale_factor": 8,
47
  "static_width": 128,
48
  "task": "semantic-segmentation",
49
+ "tensor_parallel_size": 4
50
  },
51
  "num_attention_heads": 24,
52
  "num_layers": 19,
transformer/model.neuron CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6e0e82e21d817426b9a51645e8c1ecd183fef5997c03278de87f976f241f60e3
3
- size 10293465
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a484acd1a81f5d75edd355cb15a3ac25dd66ef9b01f27630853a85f5062e8ab
3
+ size 15259033
transformer/weights/tp0_sharded_checkpoint.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:817168e6975df553be52b029f5cbd495c4c66b92fe15de13879b67735ad64ed0
3
- size 2975453200
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7e608877d83fc6a5484bfdb9168b7a36bb78a9d4399b91a9e5942d912d390709
3
+ size 5947888624
transformer/weights/tp1_sharded_checkpoint.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:078b9c43674dc9b58c906a56a5f245643bdebe5526fc8cf96b88f1f26c978aac
3
- size 2975453200
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a298049023e5c6070a2557abfd8457720ed13369c0a5d0de3ca49c1ce4075769
3
+ size 5947888624
transformer/weights/tp2_sharded_checkpoint.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8e78c457d6865a1bc692e2974316a8ece1fb189eb6a9080fb0cca8846fcb59cd
3
- size 2975453200
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:542e1f05d85a121c7d3fa833b319200c9d64b381c40aa554c1a021df91750018
3
+ size 5947888624
transformer/weights/tp3_sharded_checkpoint.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:32b05d960e707d2a865ffe22ff623711ebd151dc6f1befc96aa4bcddccfd3109
3
- size 2975453200
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e326944599ffcf49a67ece25876668c91d1abd4d08ec552bfc4d2103b229e41
3
+ size 5947888624
vae_decoder/config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "_class_name": "AutoencoderKL",
3
  "_commit_hash": null,
4
- "_diffusers_version": "0.34.0",
5
  "act_fn": "silu",
6
  "block_out_channels": [
7
  128,
@@ -23,17 +23,15 @@
23
  "layers_per_block": 2,
24
  "mid_block_add_attention": true,
25
  "neuron": {
26
- "auto_cast": "none",
27
  "auto_cast_type": "bf16",
28
  "compiler_type": "neuronx-cc",
29
  "compiler_version": "2.19.8089.0+8ab9f450",
30
  "dynamic_batch_size": false,
31
- "float_dtype": "bf16",
32
  "inline_weights_to_neff": true,
33
  "input_names": [
34
  "latent_sample"
35
  ],
36
- "int_dtype": "int64",
37
  "model_type": "vae-decoder",
38
  "optlevel": "2",
39
  "output_attentions": false,
 
1
  {
2
  "_class_name": "AutoencoderKL",
3
  "_commit_hash": null,
4
+ "_diffusers_version": "0.35.1",
5
  "act_fn": "silu",
6
  "block_out_channels": [
7
  128,
 
23
  "layers_per_block": 2,
24
  "mid_block_add_attention": true,
25
  "neuron": {
26
+ "auto_cast": "matmul",
27
  "auto_cast_type": "bf16",
28
  "compiler_type": "neuronx-cc",
29
  "compiler_version": "2.19.8089.0+8ab9f450",
30
  "dynamic_batch_size": false,
 
31
  "inline_weights_to_neff": true,
32
  "input_names": [
33
  "latent_sample"
34
  ],
 
35
  "model_type": "vae-decoder",
36
  "optlevel": "2",
37
  "output_attentions": false,
vae_decoder/model.neuron CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8417b41fa554cc057a82576bb90b3cbabe53590932dcbf77808a4568a435e5f8
3
- size 700319539
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff4de5283ac4cbef3008dbc14bb243bf5d4436973f6519ef671f3d1b944c68b1
3
+ size 373356467
vae_encoder/config.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "_class_name": "AutoencoderKL",
3
  "_commit_hash": null,
4
- "_diffusers_version": "0.34.0",
5
  "act_fn": "silu",
6
  "block_out_channels": [
7
  128,
@@ -23,17 +23,15 @@
23
  "layers_per_block": 2,
24
  "mid_block_add_attention": true,
25
  "neuron": {
26
- "auto_cast": "none",
27
  "auto_cast_type": "bf16",
28
  "compiler_type": "neuronx-cc",
29
  "compiler_version": "2.19.8089.0+8ab9f450",
30
  "dynamic_batch_size": false,
31
- "float_dtype": "fp32",
32
  "inline_weights_to_neff": true,
33
  "input_names": [
34
  "sample"
35
  ],
36
- "int_dtype": "int64",
37
  "model_type": "vae-encoder",
38
  "optlevel": "2",
39
  "output_attentions": false,
 
1
  {
2
  "_class_name": "AutoencoderKL",
3
  "_commit_hash": null,
4
+ "_diffusers_version": "0.35.1",
5
  "act_fn": "silu",
6
  "block_out_channels": [
7
  128,
 
23
  "layers_per_block": 2,
24
  "mid_block_add_attention": true,
25
  "neuron": {
26
+ "auto_cast": "matmul",
27
  "auto_cast_type": "bf16",
28
  "compiler_type": "neuronx-cc",
29
  "compiler_version": "2.19.8089.0+8ab9f450",
30
  "dynamic_batch_size": false,
 
31
  "inline_weights_to_neff": true,
32
  "input_names": [
33
  "sample"
34
  ],
 
35
  "model_type": "vae-encoder",
36
  "optlevel": "2",
37
  "output_attentions": false,
vae_encoder/model.neuron CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:20d721fc229b9185d819bf3bfe0152e3a136fbf9170eb364a9cbadf37ae15a7b
3
- size 414810291
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d5fa38dafc4fd2f93a91ddc641c192cabc1a3d181d0a87f493f042b33d0a46fd
3
+ size 223240499