NER-Explorer-Tool / README.md
SorrelC's picture
Update README.md
c1ba6b7 verified

A newer version of the Gradio SDK is available: 5.49.1

Upgrade
metadata
title: NER Explorer Tool
emoji: πŸ“š
colorFrom: blue
colorTo: gray
sdk: gradio
sdk_version: 5.36.2
app_file: app.py
pinned: false
license: mit

Named Entity Recognition (NER) Explorer Tool

Background

This is a web-based interactive tool designed specifically for exploring Named Entity Recognition (NER) in practice. It was developed as a result of the Digital Scholarship at Oxford (DiSc) funded Extracting Keywords from Crowdsourced Collections project.

Overview

This NER Explorer Tool is an educational and exploratory interface to enable users to 'play' with different NER models and approaches. It was created in an effort to make the Natural Language Processing (NLP) approach more accessible to Digital Humanities (DH), Galleries, Libraries, Archives and Museums (GLAM) professionals, volunteers and researchers - who might otherwise not have the means or opportunity to explore what they can do with NER. Simply copy in some text you would like to test the models on or click examples provided if you don't have/wish to use your own text.

Why this tool?

During our short exploratory research project on keyword extraction from crowdsourced collections, we found that NER has real potential for enhancing search and discovery in digital archives while allowing records to 'speak for themselves'.

It can be difficult to know where to start when selecting NER models, as they can work differently and can be used to find different things. So here we've provided access to models that, of those we tested on a small sample, performed the best, while also trying to be clear that no model is perfect.

We also wanted to raise awareness of the existence of zero-shot NER models (e.g. GLiNER) which can be more flexible than models with pre-defined entity types (e.g. SpaCy), and show how it's possible to use these together.

Models included in the Explorer tool:

  • spacy_en_core_web_trf - spaCy's transformer-based model
  • flair_ner-large - Flair's large English NER model
  • flair_ner-ontonotes-large - Flair's OntoNotes-based model
  • gliner_knowledgator/modern-gliner-bi-large-v1.0 - Modern zero-shot GLiNER model

Key features:

  • Highlighted Text: See entities highlighted directly in your text with color-coded labels
  • Split-Color Highlighting: Entities identified by both common NER models AND custom GLiNER searches are shown with distinctive split-color highlighting (marked with 🀝)
  • Detailed Tables: Examine all identified entities with confidence scores and source attribution
  • Adjustable confidence threshold: Control how certain models need to be before predicting entities (0.1-0.9)

Important

Please note this tool is designed for exploration and education purposes. This tool is not designed or recommended for production use with very long text (e.g. more than 5,000 characters), large collections or sensitive materials. In those cases, if working with these NER models in other environments, additional testing, validation, and ethical review are strongly recommended.

If you have any questions about this tool please email: catherine.conisbee@bodleian.ox.ac.uk See also:main project repository

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference