maslionok
commited on
Commit
·
9d4f5d0
1
Parent(s):
9f9797d
updated a bit the code
Browse files
app.py
CHANGED
|
@@ -66,9 +66,15 @@ with gr.Blocks(title="Solr Normalization Demo") as demo:
|
|
| 66 |
# Add Solr explanation accordion
|
| 67 |
with gr.Accordion("❓ What is Solr?", open=False) as solr_info:
|
| 68 |
gr.Markdown("""
|
| 69 |
-
**Solr
|
| 70 |
-
|
| 71 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
""")
|
| 73 |
|
| 74 |
with gr.Row():
|
|
|
|
| 66 |
# Add Solr explanation accordion
|
| 67 |
with gr.Accordion("❓ What is Solr?", open=False) as solr_info:
|
| 68 |
gr.Markdown("""
|
| 69 |
+
**Solr is the search engine platform used to power fast and flexible information retrieval.**
|
| 70 |
+
It indexes large collections of text and allows users to query them efficiently, returning the most relevant results.
|
| 71 |
+
|
| 72 |
+
Before data can be used in Solr, it must go through several **preprocessing and indexing steps**.
|
| 73 |
+
These include tokenization (splitting text into words), lowercasing, stopword removal (e.g., ignoring common words like "the" or "and"), and stemming or lemmatization (reducing words to their root forms).
|
| 74 |
+
|
| 75 |
+
Such steps are common in **Natural Language Processing (NLP)** pipelines, as they help standardize text and make search more robust.
|
| 76 |
+
For example, thanks to normalization, a search for "running" can also match documents containing "run."
|
| 77 |
+
Similarly, lowercasing ensures that "History" and "history" are treated as the same word, making searches case-insensitive.
|
| 78 |
""")
|
| 79 |
|
| 80 |
with gr.Row():
|