File size: 932 Bytes
4796552
 
8691bbb
4796552
 
 
 
eb0ce1f
4796552
 
8e796ef
 
 
 
 
 
 
 
 
 
 
 
 
4796552
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
---
title: Solr Normalization Demo
emoji: 🧹
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
short_description: Understand the normalization of texts during indexing
---

# Solr Normalization Demo

This space demonstrates how text is normalized in the **Impresso** project, replicating Solr's text processing functionality. 

Solr normalization is meant to demonstrate how text is normalized in the Impresso project. The pipeline processes text through various analyzers including tokenization, stopword removal, and language-specific transformations to prepare text for search and analysis.

## Features
- Multi-language support (German, French, Spanish, Italian, Portuguese, Dutch, English)
- Automatic language detection
- Detailed analyzer pipeline visualization
- Stopword detection and removal
- Token normalization

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference