Spaces:

impresso-project
/

solr-normalization-demo

Sleeping

maslionok commited on Sep 24

Commit

9d4f5d0

1 Parent(s): 9f9797d

updated a bit the code

Files changed (1) hide show

app.py CHANGED Viewed

@@ -66,9 +66,15 @@ with gr.Blocks(title="Solr Normalization Demo") as demo:
     # Add Solr explanation accordion
     with gr.Accordion("❓ What is Solr?", open=False) as solr_info:
         gr.Markdown("""
-        **Solr** is the platform that provides search capabilities in Impresso. Several preprocessing steps must be undertaken to prepare data to be searchable in Solr.
-        These steps are common in Natural Language Processing pipelines, as they help with normalising textual data by, for example, making the whole text lowercase. This makes possible non case-sensitive searches, where if you either write 'Dog' or 'dog', you can get the same results.
         """)
     with gr.Row():

     # Add Solr explanation accordion
     with gr.Accordion("❓ What is Solr?", open=False) as solr_info:
         gr.Markdown("""
+        **Solr is the search engine platform used to power fast and flexible information retrieval.**
+        It indexes large collections of text and allows users to query them efficiently, returning the most relevant results.
+        Before data can be used in Solr, it must go through several **preprocessing and indexing steps**.
+        These include tokenization (splitting text into words), lowercasing, stopword removal (e.g., ignoring common words like "the" or "and"), and stemming or lemmatization (reducing words to their root forms).
+        Such steps are common in **Natural Language Processing (NLP)** pipelines, as they help standardize text and make search more robust.
+        For example, thanks to normalization, a search for "running" can also match documents containing "run."
+        Similarly, lowercasing ensures that "History" and "history" are treated as the same word, making searches case-insensitive.
         """)
     with gr.Row():