File size: 3,274 Bytes
c4bc283 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
---
license: apache-2.0
language:
- en
base_model:
- microsoft/deberta-v3-large
- HuggingFaceTB/SmolLM2-135M-Instruct
pipeline_tag: token-classification
tags:
- NER
- encoder
- decoder
- GLiNER
- information-extraction
---

**GLiNER** is a Named Entity Recognition (NER) model capable of identifying *any* entity type in a **zero-shot** manner.
This architecture combines:
* An **encoder** for representing entity spans
* A **decoder** for generating label names
This hybrid approach enables new use cases such as **entity linking** and expands GLiNER’s capabilities.
By integrating large modern decoders—trained on vast datasets—GLiNER can leverage their **richer knowledge capacity** while maintaining competitive inference speed.
---
## Key Features
* **Open ontology**: Works when the label set is unknown
* **Multi-label entity recognition**: Assign multiple labels to a single entity
* **Entity linking**: Handle large label sets via constrained generation
* **Knowledge expansion**: Gain from large decoder models
* **Efficient**: Minimal speed reduction on GPU compared to single-encoder GLiNER
---
## Installation
Update to the latest version of GLiNER:
```bash
pip install -U gliner
```
---
## Usage
```python
from gliner import GLiNER
model = GLiNER.from_pretrained("gliner-decoder-large-v1.0")
text = (
"Apple was founded as Apple Computer Company on April 1, 1976, "
"by Steve Wozniak, Steve Jobs (1955–2011) and Ronald Wayne to "
"develop and sell Wozniak's Apple I personal computer."
)
labels = ["person", "other"]
model.run(text, labels, threshold=0.3, num_gen_sequences=1)
```
---
### Example Output
```json
[
[
{
"start": 21,
"end": 26,
"text": "Apple",
"label": "other",
"score": 0.6795641779899597,
"generated labels": ["Organization"]
},
{
"start": 47,
"end": 60,
"text": "April 1, 1976",
"label": "other",
"score": 0.44296327233314514,
"generated labels": ["Date"]
},
{
"start": 65,
"end": 78,
"text": "Steve Wozniak",
"label": "person",
"score": 0.9934439659118652,
"generated labels": ["Person"]
},
{
"start": 80,
"end": 90,
"text": "Steve Jobs",
"label": "person",
"score": 0.9725918769836426,
"generated labels": ["Person"]
},
{
"start": 107,
"end": 119,
"text": "Ronald Wayne",
"label": "person",
"score": 0.9964536428451538,
"generated labels": ["Person"]
}
]
]
```
---
### Restricting the Decoder
You can limit the decoder to generate labels only from a predefined set:
```python
model.run(
text, labels,
threshold=0.3,
num_gen_sequences=1,
gen_constraints=[
"organization", "organization type", "city",
"technology", "date", "person"
]
)
```
---
## Performance Tips
Two label trie implementations are available.
For a **faster, memory-efficient C++ version**, install **Cython**:
```bash
pip install cython
```
This can significantly improve performance and reduce memory usage, especially with millions of labels.
|