--- language: - en library_name: transformers pipeline_tag: image-text-to-text license: apache-2.0 datasets: - ServiceNow/BigDocs-Sketch2Flow base_model: - mistralai/Pixtral-12B-2409 --- # Model Card for ServiceNow/Pixtral-12B-2409-StarFlow Pixtral-12B-2409-StarFlow is a vision-language model finetuned for **structured workflow generation from sketch images**. It translates hand-drawn or computer-generated workflow diagrams into structured JSON workflows, including triggers, flow logic, and actions. ## Model Details ### Model Description Pixtral-12B-2409-StarFlow is part of the **StarFlow** framework for automating workflow creation. It extends Pixtral-12B with domain-specific finetuning on workflow diagrams, enabling accurate sketch-to-workflow generation. * **Developed by:** ServiceNow Research * **Model type:** Transformer-based Vision-Language Model (VLM) * **Language(s) (NLP):** English * **License:** Apache 2.0 * **Finetuned from model:** [Pixtral-12B](https://huggingface.co/mistralai/Pixtral-12B-2409) ### Model Sources * **Repository:** [ServiceNow/Pixtral-12B-2409-StarFlow](https://huggingface.co/ServiceNow/Pixtral-12B-2409-StarFlow) * **Paper:** [StarFlow: Generating Structured Workflow Outputs From Sketch Images](https://arxiv.org/abs/2503.21889); --- ## Uses ### Direct Use * Translating **sketches of workflows** (hand-drawn, whiteboard, or digital diagrams) into **JSON structured workflows**. * Supporting **workflow automation** in enterprise platforms by removing the need for manual low-code configuration. ### Downstream Use * Integration into **enterprise low-code platforms** for rapid prototyping of workflows by users. * Used in **automation migration pipelines**, e.g., converting legacy workflow screenshots into JSON representations. ### Out-of-Scope Use * General-purpose vision-language tasks (e.g., image captioning, OCR). * Use on domains outside workflow automation (e.g., arbitrary diagram-to-code). * Real-time handwriting recognition (StarFlow focuses on structured workflow translation, not raw OCR). --- ## Bias, Risks, and Limitations * **Limited generalization**: Finetuned models perform poorly on out-of-distribution diagrams from unfamiliar platforms. * **Sensitivity to input style**: Whiteboard/handwritten sketches degrade performance compared to digital or UI-rendered workflows. * **Component naming mismatches**: Model may mispredict action definitions (e.g., “create\_user” vs. “create\_a\_user”), leading to execution errors. * **Evaluation gap**: Current metrics don’t always reflect execution correctness of generated workflows. ### Recommendations Users should: * Validate outputs before deployment. * Be cautious with **handwritten/ambiguous sketches**. * Consider supplementing with **retrieval-augmented generation (RAG)** or **tool grounding** for robustness. --- ## How to Get Started with the Model ```python from transformers import AutoProcessor, LlavaForConditionalGeneration from PIL import Image processor = AutoProcessor.from_pretrained("ServiceNow/Pixtral-12B-2409-StarFlow") model = LlavaForConditionalGeneration.from_pretrained("ServiceNow/Pixtral-12B-2409-StarFlow") image = Image.open("workflow_sketch.png") inputs = processor(images=image, text="Generate workflow JSON", return_tensors="pt") outputs = model.generate(**inputs, max_length=4096) workflow_json = processor.decode(outputs[0], skip_special_tokens=True) print(workflow_json) ``` --- ## Training Details ### Training Data The model was trained using the [ServiceNow/BigDocs-Sketch2Flow](https://huggingface.co/datasets/ServiceNow/BigDocs-Sketch2Flow) dataset, which includes the following data distribution: * **Synthetic** (12,376 Graphviz-generated diagrams) * **Manual** (3,035 sketches hand-drawn by annotators) * **Digital** (2,613 diagrams drawn using software) * **Whiteboard** (484 sketches drawn on whiteboard / blackboard) * **User Interface** (373 screenshots from ServiceNow Flow Designer) ### Training Procedure #### Preprocessing * Synthetic workflows generated via **heuristics** (Scheduled Loop, IF/ELSE, FOREACH, etc.). * Annotators recreated flows in digital, manual, and whiteboard formats. #### Training Hyperparameters * Optimizer: **AdamW** with β=(0.95,0.999), lr=2e-5, weight decay=1e-6. * Scheduler: **cosine learning rate** with 30 warmup steps. * Early stopping based on validation loss. * Precision: **bf16 mixed-precision**. * Sequence length: up to **32k tokens**. #### Speeds, Sizes, Times * Trained with **16× NVIDIA H100 80GB GPUs** across two nodes. * Full Sharded Data Parallel (FSDP) training, no CPU offloading. --- ## Evaluation ### Testing Data Same dataset distribution as training: synthetic, manual, digital, whiteboard, UI-rendered workflows. ### Factors * **Source of sample** (synthetic, manual, UI, etc.) * **Orientation** (portrait vs. landscape diagrams) * **Resolution** (small <400k pixels, medium, large >1M pixels) ### Metrics All Evaluation metrics can be found in the official [StarFlow repo](https://github.com/ServiceNow/StarFlow). * **Flow Similarity (FlowSim)** – tree edit distance similarity. * **TreeBLEU** – structural recall of subtrees. * **Trigger Match (TM)** – accuracy of workflow triggers. * **Component Match (CM)** – overlap of predicted vs. gold components. ### Results * Proprietary models (GPT-4o, Claude-3.7, Gemini 2.0) outperform open-weights **without finetuning**. * **Finetuned Pixtral-12B achieves SOTA**: * FlowSim w/ inputs: **0.919** * TreeBLEU w/ inputs: **0.950** * Trigger Match: **0.753** * Component Match: **0.930** #### Summary Finetuning yields **large gains over base Pixtral-12B and GPT-4o**, particularly in matching workflow components and triggers. --- ## Model Examination * Finetuned models capture **naming conventions** and structured execution logic better. * Failure modes include **missing ELSE branches** or **generic table names**. --- ## Technical Specifications ### Model Architecture and Objective * Base: **Pixtral-12B**, a multimodal transformer (12B parameters). * Objective: **Image-to-JSON structured workflow generation**. ### Compute Infrastructure * **Hardware:** 16× NVIDIA H100 80GB (2 nodes) * **Software:** FSDP, bf16 mixed precision, PyTorch/Transformers --- ## Citation **BibTeX:** ```bibtex @article{bechard2025starflow, title={StarFlow: Generating Structured Workflow Outputs from Sketch Images}, author={B{\'e}chard, Patrice and Wang, Chao and Abaskohi, Amirhossein and Rodriguez, Juan and Pal, Christopher and Vazquez, David and Gella, Spandana and Rajeswar, Sai and Taslakian, Perouz}, journal={arXiv preprint arXiv:2503.21889}, year={2025} } ``` **APA:** Béchard, P., Wang, C., Abaskohi, A., Rodriguez, J., Pal, C., Vazquez, D., Gella, S., Rajeswar, S., & Taslakian, P. (2025). **StarFlow: Generating Structured Workflow Outputs from Sketch Images**. *arXiv preprint arXiv:2503.21889*. --- ## Glossary * **FlowSim**: Metric based on tree edit distance for workflows. * **TreeBLEU**: BLEU-like score using tree structures. * **Trigger Match**: Correctness of predicted workflow trigger. * **Component Match**: Correctness of predicted components (order-agnostic). --- ## More Information * [ServiceNow Flow Designer](https://www.servicenow.com/products/platform-flow-designer.html) * [StarFlow Blog](https://www.servicenow.com/blogs/2025/starflow-ai-turns-sketches-into-workflows) --- ## The StarFlow Team * Patrice Béchard, Chao Wang, Amirhossein Abaskohi, Juan Rodriguez, Christopher Pal, David Vazquez, Spandana Gella, Sai Rajeswar, Perouz Taslakian --- ## Model Card Contact * Patrice Bechard - [patrice.bechard@servicenow.com](mailto:patrice.bechard@servicenow.com) * ServiceNow Research – [research.servicenow.com](https://research.servicenow.com)