shorecode's picture
Update README.md
b035a5a verified
metadata
license: mit
datasets:
  - shorecode/summary-collection-200k-rows
language:
  - en
base_model:
  - google/t5-efficient-tiny-nh8
library_name: transformers
tags:
  - summary
  - summarizer
widget:
  - text: Model training
    output:
      url: Screenshot_20251104_204645.png
metrics:
  - f1
  - rouge
  - extractiveness
model-index:
  - name: t5-efficient-tiny-summarizer-general-purpose-v2
    results:
      - task:
          type: Summarization
        dataset:
          name: shorecode/summary-collection-60k-rows
          type: shorecode/summary-collection-60k-rows
        metrics:
          - name: f1 Score
            type: f1 Score
            value: 0.29
      - task:
          type: Summarization
        dataset:
          name: shorecode/summary-collection-60k-rows
          type: shorecode/summary-collection-60k-rows
        metrics:
          - name: Faithfullness (facebook/bart-large-cnn)
            type: facebook/bart-large-cnn
            value: 1.71
      - task:
          type: Summarization
        dataset:
          name: shorecode/summary-collection-60k-rows
          type: shorecode/summary-collection-60k-rows
        metrics:
          - name: Summarization Compression
            type: Lighteval extractiveness
            value: 7.52
      - task:
          type: Summarization
        dataset:
          name: shorecode/summary-collection-60k-rows
          type: shorecode/summary-collection-60k-rows
        metrics:
          - name: Summarization Coverage
            type: Lighteval extractiveness
            value: 0.96
      - task:
          type: Summarization
        dataset:
          name: shorecode/summary-collection-60k-rows
          type: shorecode/summary-collection-60k-rows
        metrics:
          - name: Summarization Density
            type: Lighteval extractiveness
            value: 8.68
      - task:
          type: Summarization
        dataset:
          name: shorecode/summary-collection-60k-rows
          type: shorecode/summary-collection-60k-rows
        metrics:
          - name: rougeL precision
            type: Lighteval
            value: 0.59
      - task:
          type: Summarization
        dataset:
          name: shorecode/summary-collection-60k-rows
          type: shorecode/summary-collection-60k-rows
        metrics:
          - name: rougeL recall
            type: Lighteval
            value: 0.31
      - task:
          type: Summarization
        dataset:
          name: shorecode/summary-collection-60k-rows
          type: shorecode/summary-collection-60k-rows
        metrics:
          - name: rougeL fmeasure
            type: Lighteval
            value: 0.41
      - task:
          type: Summarization
        dataset:
          name: shorecode/summary-collection-60k-rows
          type: shorecode/summary-collection-60k-rows
        metrics:
          - name: rouge1 precision
            type: Lighteval
            value: 0.63
      - task:
          type: Summarization
        dataset:
          name: shorecode/summary-collection-60k-rows
          type: shorecode/summary-collection-60k-rows
        metrics:
          - name: rouge1 recall
            type: Lighteval
            value: 0.33
      - task:
          type: Summarization
        dataset:
          name: shorecode/summary-collection-60k-rows
          type: shorecode/summary-collection-60k-rows
        metrics:
          - name: rouge1 fmeasure
            type: Lighteval
            value: 0.44

This model was built to shorten text that is injected into LLM prompts to reduce API calling costs

Very high compression (7x+) meaning the text is 7 times smaller when sent to your LLM provider! Recommended kwargs:

  • num_beams=2 || 3
  • no_repeat_ngram_size=2
  • min_length=20
  • max_new_tokens=500 || as high as you can tolerate, 4x500 = 2000 characters. anything after 2000 is clipped
    Prompt
    Model training

https://api.wandb.ai/links/shorecode-shorecode-llc/6udfudmr