BAAI
/

bge-reasoner-embed-qwen3-8b-0923

@@ -165,7 +165,7 @@ print(scores.cpu().tolist())
 ## Evaluation
-BGE-Reasoner-Embed-Qwen3-8B-0923 exhibits strong performance in reasoning-intensive retrieval tasks, as demonstrated by its results (nDCG@10 = 37.1 using original query) on the BRIGHT benchmark.
 <img src="./imgs/bright-performance.png" alt="BRIGHT Performance" style="zoom:200%;" />
@@ -194,5 +194,10 @@ Note:
 If you find this repository useful, please consider giving a star :star: and citation
 ```
-To be added
 ```

 ## Evaluation
+BGE-Reasoner-Embed-Qwen3-8B-0923 exhibits strong performance in reasoning-intensive retrieval tasks, as demonstrated by its results (nDCG@10 = 37.1 using original query) on the BRIGHT benchmark. You can try reproduce the evaluation results using [this script](https://huggingface.co/BAAI/bge-reasoner-embed-qwen3-8b-0923/tree/main/evaluation_scripts/eval_bright_short.sh) (refer to [FlagEmbedding](https://github.com/FlagOpen/FlagEmbedding/blob/master/examples/evaluation/bright/eval_bright_short.sh)).
 <img src="./imgs/bright-performance.png" alt="BRIGHT Performance" style="zoom:200%;" />
 If you find this repository useful, please consider giving a star :star: and citation
 ```
+@article{chen2025reasonembed,
+  title={ReasonEmbed: Enhanced Text Embeddings for Reasoning-Intensive Document Retrieval},
+  author={Chen, Jianlyu and Lan, Junwei and Li, Chaofan and Lian, Defu and Liu, Zheng},
+  journal={arXiv preprint arXiv:2510.08252},
+  year={2025}
+}
 ```

evaluation_scripts/eval_bright_short.sh ADDED Viewed

	@@ -0,0 +1,49 @@

+if [ -z "$HF_HUB_CACHE" ]; then
+    export HF_HUB_CACHE="$HOME/.cache/huggingface/hub"
+fi
+# full datasets
+dataset_names="biology earth_science economics psychology robotics stackoverflow sustainable_living leetcode pony aops theoremqa_questions theoremqa_theorems"
+model_args="\
+    --embedder_name_or_path BAAI/bge-reasoner-embed-qwen3-8b-0923 \
+    --embedder_model_class decoder-only-base \
+    --query_instruction_format_for_retrieval 'Instruct: {}\nQuery: {}' \
+    --pooling_method last_token \
+    --devices cuda:0 cuda:1 cuda:2 cuda:3 cuda:4 cuda:5 cuda:6 cuda:7 \
+    --cache_dir $HF_HUB_CACHE \
+    --embedder_batch_size 8 \
+    --embedder_query_max_length 8192 \
+    --embedder_passage_max_length 8192 \
+"
+split_list=("examples" "gpt4_reason")
+for split in "${split_list[@]}"; do
+    eval_args="\
+        --task_type short \
+        --use_special_instructions True \
+        --eval_name bright_short \
+        --dataset_dir ./bright_short/data \
+        --dataset_names $dataset_names \
+        --splits $split \
+        --corpus_embd_save_dir ./bright_short/corpus_embd \
+        --output_dir ./bright_short/search_results/$split \
+        --search_top_k 2000 \
+        --cache_path $HF_HUB_CACHE \
+        --overwrite False \
+        --k_values 1 10 100 \
+        --eval_output_method markdown \
+        --eval_output_path ./bright_short/eval_results_$split.md \
+        --eval_metrics ndcg_at_10 recall_at_10 recall_at_100 \
+    "
+    cmd="python -m FlagEmbedding.evaluation.bright \
+        $eval_args \
+        $model_args \
+    "
+    echo $cmd
+    eval $cmd
+done