Add Neuron-optimized files for black-forest-labs/FLUX.1-schnell

by Jingya HF Staff - opened 3 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+111

-77

Files changed (20) hide show

README.md +71 -27
model_index.json +1 -1
scheduler/scheduler_config.json +1 -1
text_encoder/config.json +2 -4
text_encoder/model.neuron +2 -2
text_encoder_2/config.json +4 -6
text_encoder_2/model.neuron/tp_0.pt +2 -2
text_encoder_2/model.neuron/tp_1.pt +2 -2
text_encoder_2/model.neuron/tp_2.pt +2 -2
text_encoder_2/model.neuron/tp_3.pt +2 -2
transformer/config.json +4 -6
transformer/model.neuron +2 -2
transformer/weights/tp0_sharded_checkpoint.safetensors +2 -2
transformer/weights/tp1_sharded_checkpoint.safetensors +2 -2
transformer/weights/tp2_sharded_checkpoint.safetensors +2 -2
transformer/weights/tp3_sharded_checkpoint.safetensors +2 -2
vae_decoder/config.json +2 -4
vae_decoder/model.neuron +2 -2
vae_encoder/config.json +2 -4
vae_encoder/model.neuron +2 -2

README.md CHANGED Viewed

@@ -1,38 +1,82 @@
 ---
 license: apache-2.0
 ---
-Exported with
-```bash
-optimum-cli export neuron --model black-forest-labs/FLUX.1-schnell --tensor_parallel_size 8 --batch_size 1 --height 1024 --width 1024 --num_images_per_prompt 1 --sequence_length 256 --torch_dtype bfloat16 flux_schnell_neuron_1024_tp8/
 ```
-Or
 ```python
-# [Export]
 import torch
-from optimum.neuron import NeuronFluxInpaintPipeline
-if __name__ == "__main__":
-    compiler_args = {"auto_cast": "none"}
-    input_shapes = {"batch_size": 1, "height": 1024, "width": 1024, "sequence_length": 256}
-    pipe = NeuronFluxInpaintPipeline.from_pretrained(
-        "black-forest-labs/FLUX.1-schnell",
-        torch_dtype=torch.bfloat16,
-        export=True,
-        tensor_parallel_size=8,
-        **compiler_args,
-        **input_shapes
-    )
-    # Save locally
-    pipe.save_pretrained("flux_schnell_neuron_1024x1024_tp8/")
-    # Upload to the HuggingFace Hub
-    pipe.push_to_hub(
-        "flux_schnell_neuron_1024x1024_tp8/", repository_id="Jingya/Flux.1-Schnell-1024x1024-neuronx-tp8"  # Replace with your HF Hub repo id
-    )
 ```

 ---
+language:
+- en
 license: apache-2.0
+tags:
+- text-to-image
+- image-generation
+- flux
+- neuron
 ---
+![FLUX.1 [schnell] Grid](./schnell_grid.jpeg)
+`FLUX.1 [schnell]` is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions.
+For more information, please read our [blog post](https://blackforestlabs.ai/announcing-black-forest-labs/).
+# Key Features
+1. Cutting-edge output quality and competitive prompt following, matching the performance of closed source alternatives.
+2. Trained using latent adversarial diffusion distillation, `FLUX.1 [schnell]` can generate high-quality images in only 1 to 4 steps.
+3. Released under the `apache-2.0` licence, the model can be used for personal, scientific, and commercial purposes.
+# Usage
+We provide a reference implementation of `FLUX.1 [schnell]`, as well as sampling code, in a dedicated [github repository](https://github.com/black-forest-labs/flux).
+Developers and creatives looking to build on top of `FLUX.1 [schnell]` are encouraged to use this as a starting point.
+## API Endpoints
+The FLUX.1 models are also available via API from the following sources
+- [bfl.ml](https://docs.bfl.ml/) (currently `FLUX.1 [pro]`)
+- [replicate.com](https://replicate.com/collections/flux)
+- [fal.ai](https://fal.ai/models/fal-ai/flux/schnell)
+- [mystic.ai](https://www.mystic.ai/black-forest-labs/flux1-schnell)
+## ComfyUI
+`FLUX.1 [schnell]` is also available in [Comfy UI](https://github.com/comfyanonymous/ComfyUI) for local inference with a node-based workflow.
+## Diffusers
+To use `FLUX.1 [schnell]` with the 🧨 diffusers python library, first install or upgrade diffusers
+```shell
+pip install -U diffusers
 ```
+Then you can use `FluxPipeline` to run the model
 ```python
 import torch
+from diffusers import FluxPipeline
+pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
+pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power
+prompt = "A cat holding a sign that says hello world"
+image = pipe(
+    prompt,
+    guidance_scale=0.0,
+    num_inference_steps=4,
+    max_sequence_length=256,
+    generator=torch.Generator("cpu").manual_seed(0)
+).images[0]
+image.save("flux-schnell.png")
 ```
+To learn more check out the [diffusers](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux) documentation
+---
+# Limitations
+- This model is not intended or able to provide factual information.
+- As a statistical model this checkpoint might amplify existing societal biases.
+- The model may fail to generate output that matches the prompts.
+- Prompt following is heavily influenced by the prompting-style.
+# Out-of-Scope Use
+The model and its derivatives may not be used
+- In any way that violates any applicable national, federal, state, local or international law or regulation.
+- For the purpose of exploiting, harming or attempting to exploit or harm minors in any way; including but not limited to the solicitation, creation, acquisition, or dissemination of child exploitative content.
+- To generate or disseminate verifiably false information and/or content with the purpose of harming others.
+- To generate or disseminate personal identifiable information that can be used to harm an individual.
+- To harass, abuse, threaten, stalk, or bully individuals or groups of individuals.
+- To create non-consensual nudity or illegal pornographic content.
+- For fully automated decision making that adversely impacts an individual's legal rights or otherwise creates or modifies a binding, enforceable obligation.
+- Generating or facilitating large-scale disinformation campaigns.

model_index.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "_class_name": "FluxPipeline",
-  "_diffusers_version": "0.34.0",
   "_name_or_path": "black-forest-labs/FLUX.1-schnell",
   "feature_extractor": [
     null,

 {
   "_class_name": "FluxPipeline",
+  "_diffusers_version": "0.35.1",
   "_name_or_path": "black-forest-labs/FLUX.1-schnell",
   "feature_extractor": [
     null,

scheduler/scheduler_config.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "_class_name": "FlowMatchEulerDiscreteScheduler",
-  "_diffusers_version": "0.34.0",
   "base_image_seq_len": 256,
   "base_shift": 0.5,
   "invert_sigmas": false,

 {
   "_class_name": "FlowMatchEulerDiscreteScheduler",
+  "_diffusers_version": "0.35.1",
   "base_image_seq_len": 256,
   "base_shift": 0.5,
   "invert_sigmas": false,

text_encoder/config.json CHANGED Viewed

@@ -17,17 +17,15 @@
   "max_position_embeddings": 77,
   "model_type": "clip_text_model",
   "neuron": {
-    "auto_cast": "none",
     "auto_cast_type": "bf16",
     "compiler_type": "neuronx-cc",
     "compiler_version": "2.19.8089.0+8ab9f450",
     "dynamic_batch_size": false,
-    "float_dtype": "fp32",
     "inline_weights_to_neff": true,
     "input_names": [
       "input_ids"
     ],
-    "int_dtype": "int64",
     "model_type": "clip-text-model",
     "optlevel": "2",
     "output_attentions": false,
@@ -47,6 +45,6 @@
   "projection_dim": 768,
   "torch_dtype": "bfloat16",
   "torchscript": true,
-  "transformers_version": "4.51.0",
   "vocab_size": 49408
 }

   "max_position_embeddings": 77,
   "model_type": "clip_text_model",
   "neuron": {
+    "auto_cast": "matmul",
     "auto_cast_type": "bf16",
     "compiler_type": "neuronx-cc",
     "compiler_version": "2.19.8089.0+8ab9f450",
     "dynamic_batch_size": false,
     "inline_weights_to_neff": true,
     "input_names": [
       "input_ids"
     ],
     "model_type": "clip-text-model",
     "optlevel": "2",
     "output_attentions": false,
   "projection_dim": 768,
   "torch_dtype": "bfloat16",
   "torchscript": true,
+  "transformers_version": "4.51.3",
   "vocab_size": 49408
 }

text_encoder/model.neuron CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b06e7db406a36b5fff59da6901ece973c57b0eb682dd0033e737d797dec79d10
-size 307199155

 version https://git-lfs.github.com/spec/v1
+oid sha256:4e3d2eaa1c907fa795d1bbf9b352bbffba666890318c1633ad81823c410f6e45
+size 307199283

text_encoder_2/config.json CHANGED Viewed

@@ -19,17 +19,15 @@
   "layer_norm_epsilon": 1e-06,
   "model_type": "t5",
   "neuron": {
-    "auto_cast": "none",
     "auto_cast_type": "bf16",
     "compiler_type": "neuronx-cc",
     "compiler_version": "2.19.8089.0+8ab9f450",
     "dynamic_batch_size": false,
-    "float_dtype": "fp32",
     "inline_weights_to_neff": true,
     "input_names": [
       "input_ids"
     ],
-    "int_dtype": "int64",
     "model_type": "t5-encoder",
     "optlevel": "2",
     "output_attentions": false,
@@ -38,9 +36,9 @@
       "last_hidden_state"
     ],
     "static_batch_size": 1,
-    "static_sequence_length": 256,
     "task": "feature-extraction",
-    "tensor_parallel_size": 8
   },
   "num_decoder_layers": 24,
   "num_heads": 64,
@@ -51,7 +49,7 @@
   "relative_attention_num_buckets": 32,
   "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",
-  "transformers_version": "4.51.0",
   "use_cache": true,
   "vocab_size": 32128
 }

   "layer_norm_epsilon": 1e-06,
   "model_type": "t5",
   "neuron": {
+    "auto_cast": "matmul",
     "auto_cast_type": "bf16",
     "compiler_type": "neuronx-cc",
     "compiler_version": "2.19.8089.0+8ab9f450",
     "dynamic_batch_size": false,
     "inline_weights_to_neff": true,
     "input_names": [
       "input_ids"
     ],
     "model_type": "t5-encoder",
     "optlevel": "2",
     "output_attentions": false,
       "last_hidden_state"
     ],
     "static_batch_size": 1,
+    "static_sequence_length": 512,
     "task": "feature-extraction",
+    "tensor_parallel_size": 4
   },
   "num_decoder_layers": 24,
   "num_heads": 64,
   "relative_attention_num_buckets": 32,
   "tie_word_embeddings": false,
   "torch_dtype": "bfloat16",
+  "transformers_version": "4.51.3",
   "use_cache": true,
   "vocab_size": 32128
 }

text_encoder_2/model.neuron/tp_0.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f862ee5d89ecf4bb54d731fe141e27b41989913d52e60a8ddaa7d58dddde2415
-size 2353450084

 version https://git-lfs.github.com/spec/v1
+oid sha256:4ecab0268d019630104a550ef8c939671136976082a51b04c50487bcedde9f9b
+size 2638601250

text_encoder_2/model.neuron/tp_1.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3d699573d836aca259c61ebc1b84fcda0cc05e10d46a2f910d96175b31cb2365
-size 2353450084

 version https://git-lfs.github.com/spec/v1
+oid sha256:97bb8b8e284f86fcbc2d14242e6987162e582e4fad18d14d4c5491989d7b42f5
+size 2638601246

text_encoder_2/model.neuron/tp_2.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:ce6929ccae128108d1e809f80f7adfab864c6106866e09c5bba45cdf6cb3cab7
-size 2353450084

 version https://git-lfs.github.com/spec/v1
+oid sha256:b3d7d46212df1a9896237b13b535b325d4a0fc07e9e281801e6f3eddb35dea25
+size 2638600234

text_encoder_2/model.neuron/tp_3.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:87ebc6b7d1b98803058d3c1c0a28461dac122e1a2daa2f9eded153777a8f64a2
-size 2353450078

 version https://git-lfs.github.com/spec/v1
+oid sha256:d6f9acfab2caa330aa22006c1b64955a2a7fc94e8e38c3c1a217dbbbb96af899
+size 2638601252

transformer/config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "_class_name": "FluxTransformer2DModel",
   "_commit_hash": null,
-  "_diffusers_version": "0.34.0",
   "_use_default_values": [
     "axes_dims_rope",
     "out_channels"
@@ -16,12 +16,11 @@
   "in_channels": 64,
   "joint_attention_dim": 4096,
   "neuron": {
-    "auto_cast": "none",
     "auto_cast_type": "bf16",
     "compiler_type": "neuronx-cc",
     "compiler_version": "2.19.8089.0+8ab9f450",
     "dynamic_batch_size": false,
-    "float_dtype": "bf16",
     "inline_weights_to_neff": true,
     "input_names": [
       "hidden_states",
@@ -30,7 +29,6 @@
       "timestep",
       "image_rotary_emb"
     ],
-    "int_dtype": "int64",
     "model_type": "flux-transformer-2d",
     "optlevel": "2",
     "output_attentions": false,
@@ -44,11 +42,11 @@
     "static_num_channels": 64,
     "static_patch_size": 1,
     "static_rotary_axes_dim": 128,
-    "static_sequence_length": 256,
     "static_vae_scale_factor": 8,
     "static_width": 128,
     "task": "semantic-segmentation",
-    "tensor_parallel_size": 8
   },
   "num_attention_heads": 24,
   "num_layers": 19,

 {
   "_class_name": "FluxTransformer2DModel",
   "_commit_hash": null,
+  "_diffusers_version": "0.35.1",
   "_use_default_values": [
     "axes_dims_rope",
     "out_channels"
   "in_channels": 64,
   "joint_attention_dim": 4096,
   "neuron": {
+    "auto_cast": "matmul",
     "auto_cast_type": "bf16",
     "compiler_type": "neuronx-cc",
     "compiler_version": "2.19.8089.0+8ab9f450",
     "dynamic_batch_size": false,
     "inline_weights_to_neff": true,
     "input_names": [
       "hidden_states",
       "timestep",
       "image_rotary_emb"
     ],
     "model_type": "flux-transformer-2d",
     "optlevel": "2",
     "output_attentions": false,
     "static_num_channels": 64,
     "static_patch_size": 1,
     "static_rotary_axes_dim": 128,
+    "static_sequence_length": 512,
     "static_vae_scale_factor": 8,
     "static_width": 128,
     "task": "semantic-segmentation",
+    "tensor_parallel_size": 4
   },
   "num_attention_heads": 24,
   "num_layers": 19,

transformer/model.neuron CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:6e0e82e21d817426b9a51645e8c1ecd183fef5997c03278de87f976f241f60e3
-size 10293465

 version https://git-lfs.github.com/spec/v1
+oid sha256:9a484acd1a81f5d75edd355cb15a3ac25dd66ef9b01f27630853a85f5062e8ab
+size 15259033

transformer/weights/tp0_sharded_checkpoint.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:817168e6975df553be52b029f5cbd495c4c66b92fe15de13879b67735ad64ed0
-size 2975453200

 version https://git-lfs.github.com/spec/v1
+oid sha256:7e608877d83fc6a5484bfdb9168b7a36bb78a9d4399b91a9e5942d912d390709
+size 5947888624

transformer/weights/tp1_sharded_checkpoint.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:078b9c43674dc9b58c906a56a5f245643bdebe5526fc8cf96b88f1f26c978aac
-size 2975453200

 version https://git-lfs.github.com/spec/v1
+oid sha256:a298049023e5c6070a2557abfd8457720ed13369c0a5d0de3ca49c1ce4075769
+size 5947888624

transformer/weights/tp2_sharded_checkpoint.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8e78c457d6865a1bc692e2974316a8ece1fb189eb6a9080fb0cca8846fcb59cd
-size 2975453200

 version https://git-lfs.github.com/spec/v1
+oid sha256:542e1f05d85a121c7d3fa833b319200c9d64b381c40aa554c1a021df91750018
+size 5947888624

transformer/weights/tp3_sharded_checkpoint.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:32b05d960e707d2a865ffe22ff623711ebd151dc6f1befc96aa4bcddccfd3109
-size 2975453200

 version https://git-lfs.github.com/spec/v1
+oid sha256:9e326944599ffcf49a67ece25876668c91d1abd4d08ec552bfc4d2103b229e41
+size 5947888624

vae_decoder/config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "_class_name": "AutoencoderKL",
   "_commit_hash": null,
-  "_diffusers_version": "0.34.0",
   "act_fn": "silu",
   "block_out_channels": [
     128,
@@ -23,17 +23,15 @@
   "layers_per_block": 2,
   "mid_block_add_attention": true,
   "neuron": {
-    "auto_cast": "none",
     "auto_cast_type": "bf16",
     "compiler_type": "neuronx-cc",
     "compiler_version": "2.19.8089.0+8ab9f450",
     "dynamic_batch_size": false,
-    "float_dtype": "bf16",
     "inline_weights_to_neff": true,
     "input_names": [
       "latent_sample"
     ],
-    "int_dtype": "int64",
     "model_type": "vae-decoder",
     "optlevel": "2",
     "output_attentions": false,

 {
   "_class_name": "AutoencoderKL",
   "_commit_hash": null,
+  "_diffusers_version": "0.35.1",
   "act_fn": "silu",
   "block_out_channels": [
     128,
   "layers_per_block": 2,
   "mid_block_add_attention": true,
   "neuron": {
+    "auto_cast": "matmul",
     "auto_cast_type": "bf16",
     "compiler_type": "neuronx-cc",
     "compiler_version": "2.19.8089.0+8ab9f450",
     "dynamic_batch_size": false,
     "inline_weights_to_neff": true,
     "input_names": [
       "latent_sample"
     ],
     "model_type": "vae-decoder",
     "optlevel": "2",
     "output_attentions": false,

vae_decoder/model.neuron CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8417b41fa554cc057a82576bb90b3cbabe53590932dcbf77808a4568a435e5f8
-size 700319539

 version https://git-lfs.github.com/spec/v1
+oid sha256:ff4de5283ac4cbef3008dbc14bb243bf5d4436973f6519ef671f3d1b944c68b1
+size 373356467

vae_encoder/config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "_class_name": "AutoencoderKL",
   "_commit_hash": null,
-  "_diffusers_version": "0.34.0",
   "act_fn": "silu",
   "block_out_channels": [
     128,
@@ -23,17 +23,15 @@
   "layers_per_block": 2,
   "mid_block_add_attention": true,
   "neuron": {
-    "auto_cast": "none",
     "auto_cast_type": "bf16",
     "compiler_type": "neuronx-cc",
     "compiler_version": "2.19.8089.0+8ab9f450",
     "dynamic_batch_size": false,
-    "float_dtype": "fp32",
     "inline_weights_to_neff": true,
     "input_names": [
       "sample"
     ],
-    "int_dtype": "int64",
     "model_type": "vae-encoder",
     "optlevel": "2",
     "output_attentions": false,

 {
   "_class_name": "AutoencoderKL",
   "_commit_hash": null,
+  "_diffusers_version": "0.35.1",
   "act_fn": "silu",
   "block_out_channels": [
     128,
   "layers_per_block": 2,
   "mid_block_add_attention": true,
   "neuron": {
+    "auto_cast": "matmul",
     "auto_cast_type": "bf16",
     "compiler_type": "neuronx-cc",
     "compiler_version": "2.19.8089.0+8ab9f450",
     "dynamic_batch_size": false,
     "inline_weights_to_neff": true,
     "input_names": [
       "sample"
     ],
     "model_type": "vae-encoder",
     "optlevel": "2",
     "output_attentions": false,

vae_encoder/model.neuron CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:20d721fc229b9185d819bf3bfe0152e3a136fbf9170eb364a9cbadf37ae15a7b
-size 414810291

 version https://git-lfs.github.com/spec/v1
+oid sha256:d5fa38dafc4fd2f93a91ddc641c192cabc1a3d181d0a87f493f042b33d0a46fd
+size 223240499