maslionok commited on
Commit
6979b00
·
1 Parent(s): d641754
Files changed (1) hide show
  1. app.py +2 -2
app.py CHANGED
@@ -82,11 +82,11 @@ with gr.Blocks(title="Solr Normalization Demo") as demo:
82
  """
83
  - **Tokenization**: Splits text into individual tokens
84
  - **Tokenfilter**: Applies various transformations like:
85
- - elison: removes apostrophes
86
  - lowercase: converts to lowercase
87
  - asciifolding: converts accented characters to ASCII
88
  - stop: removes common stopwords
89
- - stemmer: reduces words to their root form
90
  - normalization: applies language-specific normalization
91
  """
92
  )
 
82
  """
83
  - **Tokenization**: Splits text into individual tokens
84
  - **Tokenfilter**: Applies various transformations like:
85
+ - elision: removes leading apostrophes and articles in languages like French and Italian
86
  - lowercase: converts to lowercase
87
  - asciifolding: converts accented characters to ASCII
88
  - stop: removes common stopwords
89
+ - stemmer: reduces words to a common base or stem, improving recall in search
90
  - normalization: applies language-specific normalization
91
  """
92
  )