Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,83 @@
|
|
| 1 |
---
|
| 2 |
-
license:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
license: mit
|
| 3 |
+
datasets:
|
| 4 |
+
- him1411/EDGAR10-Q
|
| 5 |
+
language:
|
| 6 |
+
- en
|
| 7 |
+
metrics:
|
| 8 |
+
- rouge
|
| 9 |
---
|
| 10 |
+
license: mit
|
| 11 |
+
language:
|
| 12 |
+
- en
|
| 13 |
+
tags:
|
| 14 |
+
- finance
|
| 15 |
+
- ContextNER
|
| 16 |
+
- language models
|
| 17 |
+
datasets:
|
| 18 |
+
- him1411/EDGAR10-Q
|
| 19 |
+
metrics:
|
| 20 |
+
- rouge
|
| 21 |
+
---
|
| 22 |
+
|
| 23 |
+
EDGAR-BART-Base
|
| 24 |
+
=============
|
| 25 |
+
|
| 26 |
+
BART base model finetuned on [EDGAR10-Q dataset](https://huggingface.co/datasets/him1411/EDGAR10-Q)
|
| 27 |
+
|
| 28 |
+
You may want to check out
|
| 29 |
+
* Our paper: [CONTEXT-NER: Contextual Phrase Generation at Scale](https://arxiv.org/abs/2109.08079/)
|
| 30 |
+
* GitHub: [Click Here](https://github.com/him1411/edgar10q-dataset)
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
|
| 34 |
+
Direct Use
|
| 35 |
+
=============
|
| 36 |
+
|
| 37 |
+
It is possible to use this model to generate text, which is useful for experimentation and understanding its capabilities. **It should not be directly used for production or work that may directly impact people.**
|
| 38 |
+
|
| 39 |
+
How to Use
|
| 40 |
+
=============
|
| 41 |
+
|
| 42 |
+
You can very easily load the models with Transformers, instead of downloading them manually. The [bart-base model](https://huggingface.co/facebook/bart-base) is the backbone of our model. Here is how to use the model in PyTorch:
|
| 43 |
+
|
| 44 |
+
```python
|
| 45 |
+
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
| 46 |
+
tokenizer = AutoTokenizer.from_pretrained("him1411/EDGAR-BART-Base")
|
| 47 |
+
model = AutoModelForSeq2SeqLM.from_pretrained("him1411/EDGAR-BART-Base")
|
| 48 |
+
```
|
| 49 |
+
Or just clone the model repo
|
| 50 |
+
```
|
| 51 |
+
git lfs install
|
| 52 |
+
git clone https://huggingface.co/him1411/EDGAR-BART-Base
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
Inference Example
|
| 56 |
+
=============
|
| 57 |
+
|
| 58 |
+
Here, we provide an example for the "ContextNER" task. Below is an example of one instance.
|
| 59 |
+
|
| 60 |
+
```python
|
| 61 |
+
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
| 62 |
+
tokenizer = AutoTokenizer.from_pretrained("him1411/EDGAR-BART-Base")
|
| 63 |
+
model = AutoModelForSeq2SeqLM.from_pretrained("him1411/EDGAR-BART-Base")
|
| 64 |
+
# Input shows how we have appended instruction from our file for HoC dataset with instance.
|
| 65 |
+
input = "14.5 years . The definite lived intangible assets related to the contracts and trade names had estimated weighted average useful lives of 5.9 years and 14.5 years, respectively, at acquisition."
|
| 66 |
+
tokenized_input= tokenizer(input)
|
| 67 |
+
# Ideal output for this input is 'Definite lived intangible assets weighted average remaining useful life'
|
| 68 |
+
output = model(tokenized_input)
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
|
| 72 |
+
BibTeX Entry and Citation Info
|
| 73 |
+
===============
|
| 74 |
+
If you are using our model, please cite our paper:
|
| 75 |
+
|
| 76 |
+
```bibtex
|
| 77 |
+
@article{gupta2021context,
|
| 78 |
+
title={Context-NER: Contextual Phrase Generation at Scale},
|
| 79 |
+
author={Gupta, Himanshu and Verma, Shreyas and Kumar, Tarun and Mishra, Swaroop and Agrawal, Tamanna and Badugu, Amogh and Bhatt, Himanshu Sharad},
|
| 80 |
+
journal={arXiv preprint arXiv:2109.08079},
|
| 81 |
+
year={2021}
|
| 82 |
+
}
|
| 83 |
+
```
|