Spaces:

symanto
/

generation_evaluator

Running

HalteroXHunter commited on Jul 11, 2024

Commit

9ddbb93

1 Parent(s): 80d7919

include chrf

Files changed (2) hide show

generation_evaluator.py CHANGED Viewed

@@ -82,6 +82,9 @@ and then employing another pre-training phrase using synthetic data. Finally it
 it for your specific application (the latter is expected to perform better).
 See the project's README at https://github.com/google-research/bleurt#readme for more information.
 """
 _KWARGS_DESCRIPTION = """
@@ -118,6 +121,12 @@ BERT_SCORE:{
 },
 BLEURT:{
     "scores": List of scores.
 }
 """
@@ -180,6 +189,9 @@ class GenerationEvaluator(evaluate.Metric):
         mean_bleurt_score = np.mean(bleurt_results['scores'])
         bleurt_results['scores'] = round(mean_bleurt_score, 4)
         return {
             "ROUGE": rouge_results,
@@ -187,4 +199,5 @@ class GenerationEvaluator(evaluate.Metric):
             "EXACT_MATCH": exact_match_results,
             "BERT_SCORE": bert_score_results,
             "BLEURT": bleurt_results,
         }

 it for your specific application (the latter is expected to perform better).
 See the project's README at https://github.com/google-research/bleurt#readme for more information.
+ChrF and ChrF++ are two MT evaluation metrics. They both use the F-score statistic for character n-gram matches,
+and ChrF++ adds word n-grams as well which correlates more strongly with direct assessment. We use the implementation
+that is already present in sacrebleu.
 """
 _KWARGS_DESCRIPTION = """
 },
 BLEURT:{
     "scores": List of scores.
+},
+CHRF:{
+    'score' (float): The chrF (chrF++) score,
+    'char_order' (int): The character n-gram order,
+    'word_order' (int): The word n-gram order. If equals to 2, the metric is referred to as chrF++,
+    'beta' (int): Determine the importance of recall w.r.t precision
 }
 """
         mean_bleurt_score = np.mean(bleurt_results['scores'])
         bleurt_results['scores'] = round(mean_bleurt_score, 4)
+        chrf = evaluate.load("chrf")
+        chrf_results = chrf.compute(predictions=predictions, references=references)
         return {
             "ROUGE": rouge_results,
             "EXACT_MATCH": exact_match_results,
             "BERT_SCORE": bert_score_results,
             "BLEURT": bleurt_results,
+            "CHRF": chrf_results
         }

requirements.txt CHANGED Viewed

@@ -4,4 +4,6 @@ scikit-learn
 gradio
 bert_score
 git+https://github.com/google-research/bleurt.git
-numpy

 gradio
 bert_score
 git+https://github.com/google-research/bleurt.git
+numpy
+git+https://github.com/huggingface/evaluate@a4bdc10c48a450b978d91389a48dbb5297835c7d
+sacrebleu