danielwangwk commited on
Commit
3637503
·
verified ·
1 Parent(s): 20f692d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +298 -115
README.md CHANGED
@@ -1,199 +1,382 @@
1
  ---
2
- library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
 
 
10
 
 
11
 
12
- ## Model Details
 
 
 
 
 
13
 
14
- ### Model Description
15
 
16
- <!-- Provide a longer summary of what this model is. -->
17
 
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
 
 
 
19
 
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
 
28
- ### Model Sources [optional]
 
29
 
30
- <!-- Provide the basic links for the model. -->
 
 
31
 
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
 
 
 
35
 
36
- ## Uses
 
 
 
 
 
 
37
 
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
 
40
- ### Direct Use
41
 
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
 
 
43
 
44
- [More Information Needed]
 
 
 
 
45
 
46
- ### Downstream Use [optional]
47
 
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
 
 
49
 
50
- [More Information Needed]
 
51
 
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
 
62
- [More Information Needed]
 
63
 
64
- ### Recommendations
 
 
 
 
 
 
65
 
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 
 
67
 
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 
69
 
70
- ## How to Get Started with the Model
71
 
72
- Use the code below to get started with the model.
73
 
74
- [More Information Needed]
75
 
76
- ## Training Details
 
 
 
 
77
 
78
- ### Training Data
 
 
 
 
79
 
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
 
82
- [More Information Needed]
 
 
 
 
83
 
84
- ### Training Procedure
85
 
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
 
88
- #### Preprocessing [optional]
89
 
90
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
91
 
 
92
 
93
- #### Training Hyperparameters
 
 
 
 
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
 
97
- #### Speeds, Sizes, Times [optional]
 
 
98
 
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
 
101
- [More Information Needed]
102
 
103
- ## Evaluation
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
 
 
 
106
 
107
- ### Testing Data, Factors & Metrics
108
 
109
- #### Testing Data
 
110
 
111
- <!-- This should link to a Dataset Card if possible. -->
112
 
113
- [More Information Needed]
 
 
 
 
114
 
115
- #### Factors
116
 
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
 
119
- [More Information Needed]
 
 
 
120
 
121
- #### Metrics
 
 
 
 
122
 
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
 
125
- [More Information Needed]
126
 
127
- ### Results
 
 
 
 
 
128
 
129
- [More Information Needed]
130
 
131
- #### Summary
 
 
 
 
132
 
 
133
 
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
140
 
141
  ## Environmental Impact
142
 
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
 
 
 
144
 
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
 
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
 
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
 
167
- #### Software
168
 
169
- [More Information Needed]
170
 
171
- ## Citation [optional]
172
 
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 
 
 
 
 
 
 
 
174
 
175
- **BibTeX:**
176
 
177
- [More Information Needed]
 
 
 
 
 
 
 
178
 
179
- **APA:**
180
 
181
- [More Information Needed]
 
 
 
 
 
 
 
182
 
183
- ## Glossary [optional]
184
 
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
 
187
- [More Information Needed]
 
 
 
 
 
 
188
 
189
- ## More Information [optional]
190
 
191
- [More Information Needed]
 
 
192
 
193
- ## Model Card Authors [optional]
194
 
195
- [More Information Needed]
 
 
 
196
 
197
- ## Model Card Contact
198
 
199
- [More Information Needed]
 
 
 
1
  ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags:
5
+ - question-answering
6
+ - bert
7
+ - squad
8
+ - extractive-qa
9
+ - baseline
10
+ datasets:
11
+ - squad
12
+ metrics:
13
+ - f1
14
+ - exact_match
15
+ model-index:
16
+ - name: bert-base-uncased-squad-baseline
17
+ results:
18
+ - task:
19
+ type: question-answering
20
+ name: Question Answering
21
+ dataset:
22
+ name: SQuAD 1.1
23
+ type: squad
24
+ split: validation
25
+ metrics:
26
+ - type: exact_match
27
+ value: 79.45
28
+ name: Exact Match
29
+ - type: f1
30
+ value: 87.41
31
+ name: F1 Score
32
  ---
33
 
34
+ # BERT Base Uncased - SQuAD 1.1 Baseline
35
 
36
+ This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the SQuAD 1.1 dataset for extractive question answering.
37
 
38
+ ## Model Description
39
 
40
+ **BERT (Bidirectional Encoder Representations from Transformers)** fine-tuned on the Stanford Question Answering Dataset (SQuAD 1.1) to perform extractive question answering - finding the answer span within a given context passage.
41
 
42
+ - **Model Type:** Question Answering (Extractive)
43
+ - **Base Model:** `bert-base-uncased`
44
+ - **Language:** English
45
+ - **License:** Apache 2.0
46
+ - **Fine-tuned on:** SQuAD 1.1
47
+ - **Parameters:** 108,893,186 (all trainable)
48
 
49
+ ## Intended Use
50
 
51
+ ### Primary Use Cases
52
 
53
+ This model is designed for extractive question answering tasks where:
54
+ - The answer exists as a continuous span of text within the provided context
55
+ - Questions are factual and answerable from the context
56
+ - English language text processing
57
 
58
+ ### Example Usage
 
 
 
 
 
 
59
 
60
+ ```python
61
+ from transformers import AutoModelForQuestionAnswering, AutoTokenizer, pipeline
62
 
63
+ # Load model and tokenizer
64
+ model = AutoModelForQuestionAnswering.from_pretrained("your-username/bert-squad-baseline")
65
+ tokenizer = AutoTokenizer.from_pretrained("your-username/bert-squad-baseline")
66
 
67
+ # Create QA pipeline
68
+ qa_pipeline = pipeline(
69
+ "question-answering",
70
+ model=model,
71
+ tokenizer=tokenizer
72
+ )
73
 
74
+ # Ask a question
75
+ context = """
76
+ The Amazon rainforest is a moist broadleaf tropical rainforest in the Amazon biome
77
+ that covers most of the Amazon basin of South America. This basin encompasses
78
+ 7,000,000 km2 (2,700,000 sq mi), of which 5,500,000 km2 (2,100,000 sq mi) are
79
+ covered by the rainforest.
80
+ """
81
 
82
+ question = "How large is the Amazon basin?"
83
 
84
+ result = qa_pipeline(question=question, context=context)
85
 
86
+ print(f"Answer: {result['answer']}")
87
+ print(f"Confidence: {result['score']:.4f}")
88
+ ```
89
 
90
+ **Output:**
91
+ ```
92
+ Answer: 7,000,000 km2
93
+ Confidence: 0.9234
94
+ ```
95
 
96
+ ### Direct Model Usage (without pipeline)
97
 
98
+ ```python
99
+ import torch
100
+ from transformers import AutoModelForQuestionAnswering, AutoTokenizer
101
 
102
+ model = AutoModelForQuestionAnswering.from_pretrained("your-username/bert-squad-baseline")
103
+ tokenizer = AutoTokenizer.from_pretrained("your-username/bert-squad-baseline")
104
 
105
+ question = "What is the capital of France?"
106
+ context = "Paris is the capital and largest city of France."
 
 
 
 
 
 
 
107
 
108
+ # Tokenize
109
+ inputs = tokenizer(question, context, return_tensors="pt")
110
 
111
+ # Get predictions
112
+ with torch.no_grad():
113
+ outputs = model(**inputs)
114
+
115
+ # Get answer span
116
+ answer_start = torch.argmax(outputs.start_logits)
117
+ answer_end = torch.argmax(outputs.end_logits) + 1
118
 
119
+ answer = tokenizer.convert_tokens_to_string(
120
+ tokenizer.convert_ids_to_tokens(inputs.input_ids[0][answer_start:answer_end])
121
+ )
122
 
123
+ print(f"Answer: {answer}")
124
+ ```
125
 
126
+ ## Training Data
127
 
128
+ ### Dataset: SQuAD 1.1
129
 
130
+ The Stanford Question Answering Dataset (SQuAD) v1.1 consists of questions posed by crowdworkers on a set of Wikipedia articles.
131
 
132
+ **Training Set:**
133
+ - **Examples:** 87,599
134
+ - **Average question length:** 10.06 words
135
+ - **Average context length:** 119.76 words
136
+ - **Average answer length:** 3.16 words
137
 
138
+ **Validation Set:**
139
+ - **Examples:** 10,570
140
+ - **Average question length:** 10.22 words
141
+ - **Average context length:** 123.95 words
142
+ - **Average answer length:** 3.02 words
143
 
144
+ ### Data Preprocessing
145
 
146
+ - **Tokenizer:** `bert-base-uncased`
147
+ - **Max sequence length:** 384 tokens
148
+ - **Stride:** 128 tokens (for handling long contexts)
149
+ - **Padding:** Maximum length
150
+ - **Truncation:** Only second sequence (context)
151
 
152
+ Long contexts are split into multiple features with overlapping windows to ensure answers aren't lost at sequence boundaries.
153
 
154
+ ## Training Procedure
155
 
156
+ ### Training Hyperparameters
157
 
158
+ | Parameter | Value |
159
+ |-----------|-------|
160
+ | **Base model** | bert-base-uncased |
161
+ | **Optimizer** | AdamW |
162
+ | **Learning rate** | 3e-5 |
163
+ | **Learning rate schedule** | Linear with warmup |
164
+ | **Warmup ratio** | 0.1 (10% of training) |
165
+ | **Weight decay** | 0.01 |
166
+ | **Batch size (train)** | 8 |
167
+ | **Batch size (eval)** | 8 |
168
+ | **Number of epochs** | 1 |
169
+ | **Mixed precision** | FP16 (enabled) |
170
+ | **Gradient accumulation** | 1 |
171
+ | **Max gradient norm** | 1.0 |
172
 
173
+ ### Training Environment
174
 
175
+ - **Hardware:** NVIDIA GPU (CUDA enabled)
176
+ - **Framework:** PyTorch with Transformers library
177
+ - **Training time:** ~29.5 minutes (1 epoch)
178
+ - **Training samples/second:** 44.95
179
+ - **Total FLOPs:** 14,541,777 GF
180
 
181
+ ### Training Metrics
182
 
183
+ - **Final training loss:** 1.2236
184
+ - **Evaluation strategy:** End of epoch
185
+ - **Metric for best model:** Evaluation loss
186
 
187
+ ## Performance
188
 
189
+ ### Evaluation Results
190
 
191
+ Evaluated on SQuAD 1.1 validation set (10,570 examples):
192
 
193
+ | Metric | Score |
194
+ |--------|-------|
195
+ | **Exact Match (EM)** | **79.45%** |
196
+ | **F1 Score** | **87.41%** |
197
 
198
+ ### Metric Explanations
199
 
200
+ - **Exact Match (EM):** Percentage of predictions that match the ground truth answer exactly
201
+ - **F1 Score:** Token-level F1 score measuring overlap between predicted and ground truth answers
202
 
203
+ ### Comparison to BERT Base Performance
204
 
205
+ | Model | EM | F1 | Training |
206
+ |-------|----|----|----------|
207
+ | **This model (1 epoch)** | 79.45 | 87.41 | 29.5 min |
208
+ | BERT Base (original paper, 3 epochs) | 80.8 | 88.5 | ~2-3 hours |
209
+ | BERT Base (fully trained) | 81-84 | 88-91 | ~2-3 hours |
210
 
211
+ **Note:** This is a baseline model trained for only 1 epoch. Performance can be improved with additional training epochs.
212
 
213
+ ### Performance by Question Type
214
 
215
+ The model performs well on:
216
+ - ✅ Factual questions (What, When, Where, Who)
217
+ - ✅ Short answer spans (1-5 words)
218
+ - ✅ Questions with clear context
219
 
220
+ May struggle with:
221
+ - ⚠️ Questions requiring reasoning across multiple sentences
222
+ - ⚠️ Very long answer spans
223
+ - ⚠️ Ambiguous questions with multiple valid answers
224
+ - ⚠️ Questions requiring world knowledge not in context
225
 
226
+ ## Limitations and Biases
227
 
228
+ ### Known Limitations
229
 
230
+ 1. **Extractive Only:** Can only extract answers present in the context; cannot generate or synthesize answers
231
+ 2. **Single Answer:** Provides only one answer span, even if multiple valid answers exist
232
+ 3. **Context Dependency:** Requires relevant context; cannot answer from general knowledge
233
+ 4. **Length Constraints:** Limited to 384 tokens per context window
234
+ 5. **English Only:** Trained on English text; not suitable for other languages
235
+ 6. **Training Duration:** Only 1 epoch of training; may underfit compared to longer training
236
 
237
+ ### Potential Biases
238
 
239
+ - **Domain Bias:** Trained primarily on Wikipedia articles; may perform worse on other text types (news, technical docs, etc.)
240
+ - **Temporal Bias:** Training data from 2016; may have outdated information
241
+ - **Cultural Bias:** Reflects biases present in Wikipedia content
242
+ - **Answer Position Bias:** May favor answers appearing in certain positions within context
243
+ - **BERT Base Biases:** Inherits any biases from the pre-trained BERT base model
244
 
245
+ ### Out-of-Scope Use
246
 
247
+ This model should NOT be used for:
248
+ - Medical, legal, or financial advice
249
+ - ❌ High-stakes decision making
250
+ - Generative question answering (creating new answers)
251
+ - ❌ Non-English languages
252
+ - Yes/no or multiple choice questions (without adaptation)
253
+ - ❌ Questions requiring reasoning beyond the context
254
+ - ❌ Real-time fact checking or verification
255
+
256
+ ## Technical Specifications
257
+
258
+ ### Model Architecture
259
+
260
+ ```
261
+ BertForQuestionAnswering(
262
+ (bert): BertModel(
263
+ (embeddings): BertEmbeddings
264
+ (encoder): BertEncoder (12 layers)
265
+ (pooler): BertPooler
266
+ )
267
+ (qa_outputs): Linear(768 -> 2) # Start and end position logits
268
+ )
269
+ ```
270
+
271
+ - **Hidden size:** 768
272
+ - **Attention heads:** 12
273
+ - **Intermediate size:** 3072
274
+ - **Hidden layers:** 12
275
+ - **Vocabulary size:** 30,522
276
+ - **Max position embeddings:** 512
277
+ - **Total parameters:** 108,893,186
278
+
279
+ ### Input Format
280
+
281
+ The model expects tokenized input with:
282
+ - Question and context concatenated with `[SEP]` token
283
+ - Format: `[CLS] question [SEP] context [SEP]`
284
+ - Token type IDs to distinguish question (0) from context (1)
285
+ - Attention mask to identify real vs padding tokens
286
+
287
+ ### Output Format
288
+
289
+ Returns:
290
+ - `start_logits`: Scores for each token being the start of the answer span
291
+ - `end_logits`: Scores for each token being the end of the answer span
292
+
293
+ The predicted answer is the span from token with highest start_logit to token with highest end_logit (where end >= start).
294
+
295
+ ## Evaluation Data
296
+
297
+ **SQuAD 1.1 Validation Set**
298
+ - 10,570 question-context-answer triples
299
+ - Same source and format as training data
300
+ - Used for final performance evaluation
301
 
302
  ## Environmental Impact
303
 
304
+ - **Training hardware:** 1x NVIDIA GPU
305
+ - **Training time:** ~29.5 minutes
306
+ - **Compute region:** Not specified
307
+ - **Carbon footprint:** Estimated minimal due to short training time
308
 
309
+ ## Model Card Authors
310
 
311
+ [Your Name / Team Name]
 
 
 
 
312
 
313
+ ## Model Card Contact
 
 
 
 
 
 
 
 
 
 
 
 
314
 
315
+ [Your Email / Contact Information]
316
 
317
+ ## Citation
318
 
319
+ If you use this model, please cite:
320
 
321
+ ```bibtex
322
+ @misc{bert-squad-baseline-2025,
323
+ author = {Your Name},
324
+ title = {BERT Base Uncased Fine-tuned on SQuAD 1.1 (Baseline)},
325
+ year = {2025},
326
+ publisher = {HuggingFace},
327
+ howpublished = {\url{https://huggingface.co/your-username/bert-squad-baseline}}
328
+ }
329
+ ```
330
 
331
+ ### Original BERT Paper
332
 
333
+ ```bibtex
334
+ @article{devlin2018bert,
335
+ title={BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding},
336
+ author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
337
+ journal={arXiv preprint arXiv:1810.04805},
338
+ year={2018}
339
+ }
340
+ ```
341
 
342
+ ### SQuAD Dataset
343
 
344
+ ```bibtex
345
+ @article{rajpurkar2016squad,
346
+ title={SQuAD: 100,000+ Questions for Machine Comprehension of Text},
347
+ author={Rajpurkar, Pranav and Zhang, Jian and Lopyrev, Konstantin and Liang, Percy},
348
+ journal={arXiv preprint arXiv:1606.05250},
349
+ year={2016}
350
+ }
351
+ ```
352
 
353
+ ## Additional Information
354
 
355
+ ### Future Improvements
356
 
357
+ Potential enhancements for this baseline model:
358
+ - 🔄 Train for additional epochs (2-3 epochs recommended)
359
+ - 📈 Increase batch size with gradient accumulation
360
+ - 🎯 Implement learning rate scheduling
361
+ - 🔍 Add answer validation/verification
362
+ - 📊 Ensemble with multiple models
363
+ - 🚀 Distillation to smaller model for deployment
364
 
365
+ ### Related Models
366
 
367
+ - [bert-base-uncased](https://huggingface.co/bert-base-uncased) - Base model
368
+ - [bert-large-uncased-whole-word-masking-finetuned-squad](https://huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad) - Larger BERT variant
369
+ - [distilbert-base-uncased-distilled-squad](https://huggingface.co/distilbert-base-uncased-distilled-squad) - Smaller, faster variant
370
 
371
+ ### Acknowledgments
372
 
373
+ - Google Research for BERT
374
+ - Stanford NLP for SQuAD dataset
375
+ - Hugging Face for Transformers library
376
+ - [Your course/institution if applicable]
377
 
378
+ ---
379
 
380
+ **Last updated:** October 2025
381
+ **Model version:** 1.0 (Baseline)
382
+ **Status:** Baseline model - suitable for development/comparison