Spaces:
Running
Running
Added deviation section in README
Browse files- README.md +26 -0
- pipeline.png +0 -0
README.md
CHANGED
|
@@ -89,6 +89,32 @@ SentenceTransformer, such as `all-mpnet-base-v2` or `roberta-base`. You can exte
|
|
| 89 |
extending the `Encoder` base class in the `encoder_models.py` file.
|
| 90 |
|
| 91 |
## Deviations from Published Methodology
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
|
| 93 |
## Citation
|
| 94 |
```bibtex
|
|
|
|
| 89 |
extending the `Encoder` base class in the `encoder_models.py` file.
|
| 90 |
|
| 91 |
## Deviations from Published Methodology
|
| 92 |
+
In our implementation, we expand upon the methodology presented in the original paper, which focused solely on
|
| 93 |
+
extractive model summaries. The primary approach in the paper involved ranking sentences in the source document based on
|
| 94 |
+
ground-truth reference sentences. The Normalized Cumulative Gain (NCG) score was computed using the formula:
|
| 95 |
+
|
| 96 |
+
```ncg = $\frac{\text{cumulative gain}}{\text{ideal cumulative gain}}$```
|
| 97 |
+
|
| 98 |
+
as depicted in the following image:
|
| 99 |
+

|
| 100 |
+
|
| 101 |
+
Key deviations in our implementation from the paper include:
|
| 102 |
+
1. **Inclusion of Abstractive Model Summaries:** Unlike the paper, which exclusively considered extractive model
|
| 103 |
+
summaries, our implementation supports both extractive and abstractive summarization models.
|
| 104 |
+
2. **Enhanced Calculation of NCG Scores:** For both extractive and abstractive summaries, we compute rankings based on
|
| 105 |
+
both the reference/ground truth (`gt_gain`) and predicted summaries (`pred_gain`). The NCG score is calculated using the
|
| 106 |
+
method shown below:
|
| 107 |
+
```python
|
| 108 |
+
def compute_ncg(pred_gains, gt_gains, k: int) -> float:
|
| 109 |
+
gt_dict = dict(gt_gains)
|
| 110 |
+
gt_rel = [v for _, v in gt_gains[:k]]
|
| 111 |
+
model_rel = [gt_dict[position] for position, _ in pred_gains[:k]]
|
| 112 |
+
return sum(model_rel)/sum(gt_rel)
|
| 113 |
+
```
|
| 114 |
+
|
| 115 |
+
This approach allows us to evaluate summarization quality across both extractive and abstractive methods, providing a
|
| 116 |
+
more comprehensive assessment than the original methodology.
|
| 117 |
+
|
| 118 |
|
| 119 |
## Citation
|
| 120 |
```bibtex
|
pipeline.png
ADDED
|