maslionok commited on
Commit
9d4f5d0
·
1 Parent(s): 9f9797d

updated a bit the code

Browse files
Files changed (1) hide show
  1. app.py +9 -3
app.py CHANGED
@@ -66,9 +66,15 @@ with gr.Blocks(title="Solr Normalization Demo") as demo:
66
  # Add Solr explanation accordion
67
  with gr.Accordion("❓ What is Solr?", open=False) as solr_info:
68
  gr.Markdown("""
69
- **Solr** is the platform that provides search capabilities in Impresso. Several preprocessing steps must be undertaken to prepare data to be searchable in Solr.
70
-
71
- These steps are common in Natural Language Processing pipelines, as they help with normalising textual data by, for example, making the whole text lowercase. This makes possible non case-sensitive searches, where if you either write 'Dog' or 'dog', you can get the same results.
 
 
 
 
 
 
72
  """)
73
 
74
  with gr.Row():
 
66
  # Add Solr explanation accordion
67
  with gr.Accordion("❓ What is Solr?", open=False) as solr_info:
68
  gr.Markdown("""
69
+ **Solr is the search engine platform used to power fast and flexible information retrieval.**
70
+ It indexes large collections of text and allows users to query them efficiently, returning the most relevant results.
71
+
72
+ Before data can be used in Solr, it must go through several **preprocessing and indexing steps**.
73
+ These include tokenization (splitting text into words), lowercasing, stopword removal (e.g., ignoring common words like "the" or "and"), and stemming or lemmatization (reducing words to their root forms).
74
+
75
+ Such steps are common in **Natural Language Processing (NLP)** pipelines, as they help standardize text and make search more robust.
76
+ For example, thanks to normalization, a search for "running" can also match documents containing "run."
77
+ Similarly, lowercasing ensures that "History" and "history" are treated as the same word, making searches case-insensitive.
78
  """)
79
 
80
  with gr.Row():