maslionok
commited on
Commit
·
6979b00
1
Parent(s):
d641754
small fix
Browse files
app.py
CHANGED
|
@@ -82,11 +82,11 @@ with gr.Blocks(title="Solr Normalization Demo") as demo:
|
|
| 82 |
"""
|
| 83 |
- **Tokenization**: Splits text into individual tokens
|
| 84 |
- **Tokenfilter**: Applies various transformations like:
|
| 85 |
-
-
|
| 86 |
- lowercase: converts to lowercase
|
| 87 |
- asciifolding: converts accented characters to ASCII
|
| 88 |
- stop: removes common stopwords
|
| 89 |
-
- stemmer: reduces words to
|
| 90 |
- normalization: applies language-specific normalization
|
| 91 |
"""
|
| 92 |
)
|
|
|
|
| 82 |
"""
|
| 83 |
- **Tokenization**: Splits text into individual tokens
|
| 84 |
- **Tokenfilter**: Applies various transformations like:
|
| 85 |
+
- elision: removes leading apostrophes and articles in languages like French and Italian
|
| 86 |
- lowercase: converts to lowercase
|
| 87 |
- asciifolding: converts accented characters to ASCII
|
| 88 |
- stop: removes common stopwords
|
| 89 |
+
- stemmer: reduces words to a common base or stem, improving recall in search
|
| 90 |
- normalization: applies language-specific normalization
|
| 91 |
"""
|
| 92 |
)
|