twitter-roberta-large-sensitive-binary

This is a RoBERTa-large model trained on 154M tweets until the end of December 2022 and finetuned for detecting sensitive content (binary classification) on the X-Sensitive dataset. The original Twitter-based RoBERTa model can be found here.

Labels

"id2label": {
    "0": "non-sensitive",
    "1": "sensitive"
  }

Full classification example

from transformers import pipeline
    
pipe = pipeline(model='cardiffnlp/twitter-roberta-large-sensitive-binary')
text = "Call me today to earn some money mofos!"

pipe(text)

Output:

[{'label': 'sensitive', 'score': 0.999821126461029}]

BibTeX entry and citation info

@article{antypas2024sensitive,
  title={Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation},
  author={Antypas, Dimosthenis and Sen, Indira and Perez-Almendros, Carla and Camacho-Collados, Jose and Barbieri, Francesco},
  journal={arXiv preprint arXiv:2411.19832},
  year={2024}
}

Downloads last month: 2

Safetensors

Model size

0.4B params

Tensor type

F32

Dataset used to train cardiffnlp/twitter-roberta-large-sensitive-binary

Collection including cardiffnlp/twitter-roberta-large-sensitive-binary

Sensitive Content

Collection

Dataset and models associated with the detection of sensitive content in X/social media. • 5 items • Updated Apr 24 • 1