radoslavralev's picture
Add new SentenceTransformer model
1c2d42f verified
|
raw
history blame
34 kB
metadata
language:
  - en
license: apache-2.0
tags:
  - biencoder
  - sentence-transformers
  - text-classification
  - sentence-pair-classification
  - semantic-similarity
  - semantic-search
  - retrieval
  - reranking
  - generated_from_trainer
  - dataset_size:9233417
  - loss:ArcFaceInBatchLoss
base_model: answerdotai/ModernBERT-large
widget:
  - source_sentence: >-
      Hayley Vaughan portrayed Ripa on the ABC daytime soap opera , `` All My
      Children `` , between 1990 and 2002 .
    sentences:
      - >-
        Traxxpad is a music application for Sony 's PlayStation Portable
        published by Definitive Studios and developed by Eidos Interactive .
      - >-
        Between 1990 and 2002 , Hayley Vaughan Ripa portrayed in the ABC soap
        opera `` All My Children `` .
      - >-
        Between 1990 and 2002 , Ripa Hayley portrayed Vaughan in the ABC soap
        opera `` All My Children `` .
  - source_sentence: >-
      Olivella monilifera is a species of dwarf sea snail , small gastropod
      mollusk in the family Olivellidae , the marine olives .
    sentences:
      - >-
        Olivella monilifera is a species of the dwarf - sea snail , small
        gastropod mollusk in the Olivellidae family , the marine olives .
      - >-
        He was cut by the Browns after being signed by the Bills in 2013 . He
        was later released .
      - >-
        Olivella monilifera is a kind of sea snail , marine gastropod mollusk in
        the Olivellidae family , the dwarf olives .
  - source_sentence: >-
      Hayashi said that Mackey `` is a sort of `` of the original model for
      Tenchi .
    sentences:
      - >-
        In the summer of 2009 , Ellick shot a documentary about Malala Yousafzai
        .
      - >-
        Hayashi said that Mackey is `` sort of `` the original model for Tenchi
        .
      - >-
        Mackey said that Hayashi is `` sort of `` the original model for Tenchi
        .
  - source_sentence: >-
      Much of the film was shot on location in Los Angeles and in nearby Burbank
      and Glendale .
    sentences:
      - >-
        Much of the film was shot on location in Los Angeles and in nearby
        Burbank and Glendale .
      - >-
        Much of the film was shot on site in Burbank and Glendale and in the
        nearby Los Angeles .
      - >-
        Traxxpad is a music application for the Sony PlayStation Portable
        developed by the Definitive Studios and published by Eidos Interactive .
  - source_sentence: >-
      According to him , the earth is the carrier of his artistic work , which
      is only integrated into the creative process by minimal changes .
    sentences:
      - National players are Bold players .
      - >-
        According to him , earth is the carrier of his artistic work being
        integrated into the creative process only by minimal changes .
      - >-
        According to him , earth is the carrier of his creative work being
        integrated into the artistic process only by minimal changes .
datasets:
  - redis/langcache-sentencepairs-v2
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_precision@1
  - cosine_recall@1
  - cosine_ndcg@10
  - cosine_mrr@1
  - cosine_map@100
  - cosine_auc_precision_cache_hit_ratio
  - cosine_auc_similarity_distribution
model-index:
  - name: Redis fine-tuned BiEncoder model for semantic caching on LangCache
    results:
      - task:
          type: custom-information-retrieval
          name: Custom Information Retrieval
        dataset:
          name: test
          type: test
        metrics:
          - type: cosine_accuracy@1
            value: 0.6070776173931731
            name: Cosine Accuracy@1
          - type: cosine_precision@1
            value: 0.6070776173931731
            name: Cosine Precision@1
          - type: cosine_recall@1
            value: 0.588632794022045
            name: Cosine Recall@1
          - type: cosine_ndcg@10
            value: 0.7755359823507149
            name: Cosine Ndcg@10
          - type: cosine_mrr@1
            value: 0.6070776173931731
            name: Cosine Mrr@1
          - type: cosine_map@100
            value: 0.7291245351244533
            name: Cosine Map@100
          - type: cosine_auc_precision_cache_hit_ratio
            value: 0.348058858138603
            name: Cosine Auc Precision Cache Hit Ratio
          - type: cosine_auc_similarity_distribution
            value: 0.21125989323367672
            name: Cosine Auc Similarity Distribution

Redis fine-tuned BiEncoder model for semantic caching on LangCache

This is a sentence-transformers model finetuned from answerdotai/ModernBERT-large on the LangCache Sentence Pairs (all) dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for sentence pair similarity.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 100, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("redis/langcache-embed-experimental")
# Run inference
sentences = [
    'According to him , the earth is the carrier of his artistic work , which is only integrated into the creative process by minimal changes .',
    'According to him , earth is the carrier of his artistic work being integrated into the creative process only by minimal changes .',
    'According to him , earth is the carrier of his creative work being integrated into the artistic process only by minimal changes .',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9609, 0.4414],
#         [0.9609, 1.0000, 0.4395],
#         [0.4414, 0.4395, 1.0000]], dtype=torch.bfloat16)

Evaluation

Metrics

Custom Information Retrieval

  • Dataset: test
  • Evaluated with ir_evaluator.CustomInformationRetrievalEvaluator
Metric Value
cosine_accuracy@1 0.6071
cosine_precision@1 0.6071
cosine_recall@1 0.5886
cosine_ndcg@10 0.7755
cosine_mrr@1 0.6071
cosine_map@100 0.7291
cosine_auc_precision_cache_hit_ratio 0.3481
cosine_auc_similarity_distribution 0.2113

Training Details

Training Dataset

LangCache Sentence Pairs (all)

  • Dataset: LangCache Sentence Pairs (all)
  • Size: 126,938 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 8 tokens
    • mean: 27.27 tokens
    • max: 49 tokens
    • min: 8 tokens
    • mean: 27.27 tokens
    • max: 48 tokens
    • min: 7 tokens
    • mean: 26.54 tokens
    • max: 61 tokens
  • Samples:
    anchor positive negative
    The newer Punts are still very much in existence today and race in the same fleets as the older boats . The newer punts are still very much in existence today and run in the same fleets as the older boats . how can I get financial freedom as soon as possible?
    The newer punts are still very much in existence today and run in the same fleets as the older boats . The newer Punts are still very much in existence today and race in the same fleets as the older boats . The older Punts are still very much in existence today and race in the same fleets as the newer boats .
    Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada . Turner Valley , , was located at Turner Valley Bar N Ranch Airport , southwest of Turner Valley Bar N Ranch , Alberta , Canada . Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .
  • Loss: losses.ArcFaceInBatchLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Evaluation Dataset

LangCache Sentence Pairs (all)

  • Dataset: LangCache Sentence Pairs (all)
  • Size: 126,938 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 8 tokens
    • mean: 27.27 tokens
    • max: 49 tokens
    • min: 8 tokens
    • mean: 27.27 tokens
    • max: 48 tokens
    • min: 7 tokens
    • mean: 26.54 tokens
    • max: 61 tokens
  • Samples:
    anchor positive negative
    The newer Punts are still very much in existence today and race in the same fleets as the older boats . The newer punts are still very much in existence today and run in the same fleets as the older boats . how can I get financial freedom as soon as possible?
    The newer punts are still very much in existence today and run in the same fleets as the older boats . The newer Punts are still very much in existence today and race in the same fleets as the older boats . The older Punts are still very much in existence today and race in the same fleets as the newer boats .
    Turner Valley , was at the Turner Valley Bar N Ranch Airport , southwest of the Turner Valley Bar N Ranch , Alberta , Canada . Turner Valley , , was located at Turner Valley Bar N Ranch Airport , southwest of Turner Valley Bar N Ranch , Alberta , Canada . Turner Valley Bar N Ranch Airport , , was located at Turner Valley Bar N Ranch , southwest of Turner Valley , Alberta , Canada .
  • Loss: losses.ArcFaceInBatchLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 100
  • per_device_eval_batch_size: 100
  • weight_decay: 0.001
  • adam_beta2: 0.98
  • adam_epsilon: 1e-06
  • max_steps: 75000
  • warmup_ratio: 0.1
  • load_best_model_at_end: True
  • optim: stable_adamw
  • ddp_find_unused_parameters: False
  • push_to_hub: True
  • hub_model_id: redis/langcache-embed-experimental
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 100
  • per_device_eval_batch_size: 100
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.001
  • adam_beta1: 0.9
  • adam_beta2: 0.98
  • adam_epsilon: 1e-06
  • max_grad_norm: 1.0
  • num_train_epochs: 3.0
  • max_steps: 75000
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: stable_adamw
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: False
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: True
  • resume_from_checkpoint: None
  • hub_model_id: redis/langcache-embed-experimental
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss test_cosine_ndcg@10
-1 -1 - - 0.6274
0.0054 500 2.0433 0.5003 0.7156
0.0108 1000 0.2913 0.3804 0.7423
0.0162 1500 0.1876 0.3343 0.7526
0.0217 2000 0.1484 0.3172 0.7528
0.0271 2500 0.132 0.2945 0.7569
0.0325 3000 0.1161 0.2822 0.7636
0.0379 3500 0.1105 0.2918 0.7580
0.0433 4000 0.1072 0.2820 0.7597
0.0487 4500 0.1061 0.2483 0.7661
0.0542 5000 0.0991 0.2671 0.7600
0.0596 5500 0.0971 0.2843 0.7595
0.0650 6000 0.0953 0.2448 0.7640
0.0704 6500 0.1015 0.3021 0.7632
0.0758 7000 0.0985 0.2744 0.7616
0.0812 7500 0.1009 0.2764 0.7615
0.0866 8000 0.0984 0.2865 0.7608
0.0921 8500 0.0947 0.3062 0.7600
0.0975 9000 0.0914 0.2997 0.7584
0.1029 9500 0.0896 0.2484 0.7617
0.1083 10000 0.0846 0.2850 0.7594
0.1137 10500 0.0907 0.2896 0.7571
0.1191 11000 0.0859 0.2657 0.7599
0.1245 11500 0.0875 0.2509 0.7620
0.1300 12000 0.0849 0.2728 0.7620
0.1354 12500 0.0788 0.2707 0.7587
0.1408 13000 0.0804 0.2985 0.7567
0.1462 13500 0.0815 0.2526 0.7620
0.1516 14000 0.0783 0.2441 0.7655
0.1570 14500 0.0791 0.2707 0.7645
0.1625 15000 0.0797 0.2781 0.7576
0.1679 15500 0.077 0.2624 0.7595
0.1733 16000 0.0742 0.2882 0.7620
0.1787 16500 0.0739 0.2654 0.7630
0.1841 17000 0.0695 0.2832 0.7607
0.1895 17500 0.0726 0.2595 0.7627
0.1949 18000 0.0739 0.2376 0.7653
0.2004 18500 0.0751 0.2671 0.7652
0.2058 19000 0.0717 0.3013 0.7595
0.2112 19500 0.0696 0.2538 0.7671
0.2166 20000 0.0659 0.2569 0.7612
0.2220 20500 0.0669 0.2595 0.7648
0.2274 21000 0.0679 0.2231 0.7664
0.2328 21500 0.0657 0.2732 0.7636
0.2383 22000 0.0703 0.2658 0.7674
0.2437 22500 0.0636 0.2582 0.7676
0.2491 23000 0.0688 0.2586 0.7682
0.2545 23500 0.0598 0.2612 0.7675
0.2599 24000 0.0664 0.2581 0.7655
0.2653 24500 0.0621 0.2393 0.7642
0.2708 25000 0.0641 0.2309 0.7673
0.2762 25500 0.0624 0.2346 0.7700
0.2816 26000 0.0595 0.2179 0.7671
0.2870 26500 0.0605 0.2332 0.7664
0.2924 27000 0.0609 0.2227 0.7678
0.2978 27500 0.0621 0.2312 0.7688
0.3032 28000 0.0626 0.2404 0.7680
0.3087 28500 0.063 0.2429 0.7672
0.3141 29000 0.0601 0.2275 0.7671
0.3195 29500 0.0617 0.2235 0.7663
0.3249 30000 0.0581 0.2370 0.7698
0.3303 30500 0.06 0.2450 0.7652
0.3357 31000 0.0591 0.2851 0.7638
0.3411 31500 0.0585 0.2718 0.7664
0.3466 32000 0.0563 0.2532 0.7664
0.3520 32500 0.059 0.2330 0.7689
0.3574 33000 0.0545 0.2158 0.7695
0.3628 33500 0.0567 0.2263 0.7672
0.3682 34000 0.0566 0.2338 0.7682
0.3736 34500 0.0586 0.2244 0.7696
0.3791 35000 0.0559 0.2474 0.7671
0.3845 35500 0.053 0.2332 0.7687
0.3899 36000 0.0507 0.2258 0.7679
0.3953 36500 0.0527 0.2240 0.7712
0.4007 37000 0.0545 0.2229 0.7700
0.4061 37500 0.0558 0.2119 0.7704
0.4115 38000 0.0538 0.2611 0.7693
0.4170 38500 0.0549 0.2336 0.7686
0.4224 39000 0.0501 0.2316 0.7687
0.4278 39500 0.0497 0.2289 0.7697
0.4332 40000 0.0512 0.2299 0.7683
0.4386 40500 0.0511 0.2654 0.7704
0.4440 41000 0.0498 0.2272 0.7731
0.4495 41500 0.053 0.2327 0.7696
0.4549 42000 0.0487 0.2380 0.7715
0.4603 42500 0.0518 0.2230 0.7724
0.4657 43000 0.0488 0.2249 0.7703
0.4711 43500 0.0529 0.2452 0.7716
0.4765 44000 0.0497 0.2341 0.7720
0.4819 44500 0.0486 0.2480 0.7696
0.4874 45000 0.0518 0.2349 0.7715
0.4928 45500 0.0471 0.2237 0.7720
0.4982 46000 0.0483 0.2299 0.7712
0.5036 46500 0.0462 0.2184 0.7705
0.5090 47000 0.0497 0.2335 0.7718
0.5144 47500 0.05 0.2302 0.7697
0.5198 48000 0.0488 0.2252 0.7701
0.5253 48500 0.045 0.2291 0.7687
0.5307 49000 0.048 0.2135 0.7698
0.5361 49500 0.0442 0.2215 0.7704
0.5415 50000 0.0479 0.2233 0.7707
0.5469 50500 0.0464 0.2275 0.7713
0.5523 51000 0.0454 0.2175 0.7717
0.5578 51500 0.0477 0.2152 0.7719
0.5632 52000 0.0463 0.2364 0.7701
0.5686 52500 0.0433 0.2430 0.7736
0.5740 53000 0.0454 0.2328 0.7708
0.5794 53500 0.0472 0.2283 0.7722
0.5848 54000 0.0447 0.2320 0.7720
0.5902 54500 0.0445 0.2404 0.7689
0.5957 55000 0.0429 0.2353 0.7693
0.6011 55500 0.0422 0.2366 0.7722
0.6065 56000 0.0436 0.2321 0.7720
0.6119 56500 0.0453 0.2250 0.7723
0.6173 57000 0.0431 0.2219 0.7733
0.6227 57500 0.0421 0.2244 0.7723
0.6281 58000 0.0434 0.2137 0.7728
0.6336 58500 0.0416 0.2181 0.7743
0.6390 59000 0.0412 0.2230 0.7717
0.6444 59500 0.0436 0.2116 0.7737
0.6498 60000 0.0404 0.2114 0.7736
0.6552 60500 0.041 0.2095 0.7736
0.6606 61000 0.0408 0.2079 0.7741
0.6661 61500 0.0408 0.2040 0.7756
0.6715 62000 0.0404 0.2098 0.7733
0.6769 62500 0.0418 0.2105 0.7741
0.6823 63000 0.0402 0.2081 0.7741
0.6877 63500 0.0394 0.2120 0.7742
0.6931 64000 0.0418 0.2129 0.7742
0.6985 64500 0.0406 0.2145 0.7753
0.7040 65000 0.0382 0.2257 0.7741
0.7094 65500 0.0373 0.2250 0.7756
0.7148 66000 0.0382 0.2269 0.7732
0.7202 66500 0.0405 0.2087 0.7764
0.7256 67000 0.042 0.2114 0.7753
0.7310 67500 0.0389 0.2138 0.7748
0.7364 68000 0.0339 0.2084 0.7761
0.7419 68500 0.0379 0.2090 0.7760
0.7473 69000 0.0369 0.2161 0.7742
0.7527 69500 0.0354 0.2226 0.7748
0.7581 70000 0.0396 0.2191 0.7753
0.7635 70500 0.0356 0.2195 0.7759
0.7689 71000 0.0359 0.2182 0.7760
0.7744 71500 0.0389 0.2187 0.7753
0.7798 72000 0.0366 0.2194 0.7753
0.7852 72500 0.0351 0.2198 0.7749
0.7906 73000 0.038 0.2175 0.7754
0.7960 73500 0.0378 0.2172 0.7756
0.8014 74000 0.0376 0.2174 0.7754
0.8068 74500 0.038 0.2176 0.7753
0.8123 75000 0.0379 0.2174 0.7755
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.3
  • Sentence Transformers: 5.1.0
  • Transformers: 4.56.0
  • PyTorch: 2.8.0+cu128
  • Accelerate: 1.10.1
  • Datasets: 4.0.0
  • Tokenizers: 0.22.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}