Spaces:

weixuan-giskard
/

scan-report-temp

Running

App Files Files Community

Report for AdamCodd/distilbert-base-uncased-finetuned-sentiment-amazon

#94

by giskard-bot - opened Jan 26, 2024

Discussion

giskard-bot

Jan 26, 2024

Hi Team,

This is a report from Giskard Bot Scan 🐢.

We have identified 7 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset sst2 (subset default, split validation).

👉Overconfidence issues (2)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Overconfidence	major 🔴	`avg_word_length(text)` >= 4.481	Overconfidence rate = 0.804	—	+28.70% than global

🔍✨Examples

For records in the dataset where `avg_word_length(text)` >= 4.481, we found a significantly higher number of overconfident wrong predictions (37 samples, corresponding to 80.43% of the wrong predictions in the data slice).

	text	avg_word_length(text)	label	Predicted `label`
95	this riveting world war ii moral suspense story deals with the shadow side of american culture : racial prejudice in its ugly and diverse forms .	4.61538	negative	positive (p = 1.00)
				negative (p = 0.00)
643	the jabs it employs are short , carefully placed and dead-center .	4.58333	positive	negative (p = 1.00)
				positive (p = 0.00)
218	all that 's missing is the spontaneity , originality and delight .	4.58333	negative	positive (p = 0.99)
				negative (p = 0.01)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Overconfidence	major 🔴	`avg_whitespace(text)` < 0.182	Overconfidence rate = 0.804	—	+28.70% than global

🔍✨Examples

For records in the dataset where `avg_whitespace(text)` < 0.182, we found a significantly higher number of overconfident wrong predictions (37 samples, corresponding to 80.43% of the wrong predictions in the data slice).

	text	avg_whitespace(text)	label	Predicted `label`
95	this riveting world war ii moral suspense story deals with the shadow side of american culture : racial prejudice in its ugly and diverse forms .	0.178082	negative	positive (p = 1.00)
				negative (p = 0.00)
643	the jabs it employs are short , carefully placed and dead-center .	0.179104	positive	negative (p = 1.00)
				positive (p = 0.00)
218	all that 's missing is the spontaneity , originality and delight .	0.179104	negative	positive (p = 0.99)
				negative (p = 0.01)

👉Robustness issues (1)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Robustness	major 🔴	—	Fail rate = 0.105	Add typos	84/800 tested samples (10.5%) changed prediction after perturbation

🔍✨Examples

When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 10.5% of the cases. We expected the predictions not to be affected by this transformation.

	text	Add typos(text)	Original prediction	Prediction after perturbation
13	we root for ( clara and paul ) , even like them , though perhaps it 's an emotion closer to pity .	we root for ( clara and paul ) , even like them , htough perhaps it 's an emotiom closer to pity .	positive (p = 0.75)	negative (p = 0.82)
21	the iditarod lasts for days - this just felt like it did .	the irditarod lasts for days - this just felt ike it did .	negative (p = 0.50)	positive (p = 0.53)
33	if the movie succeeds in instilling a wary sense of ` there but for the grace of god , ' it is far too self-conscious to draw you deeply into its world .	if the mofvie succeeds in instilling a wary sense of ` gthere but got the grace f god , ' it is far topo self-conscious to draw ou deeply intk its world	negative (p = 0.99)	positive (p = 0.54)

👉Performance issues (4)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	major 🔴	`text_length(text)` < 37.500	Recall = 0.800	—	-12.08% than global

🔍✨Examples

For records in the dataset where `text_length(text)` < 37.500, the Recall is 12.08% lower than the global Recall.

	text	text_length(text)	label	Predicted `label`
1	unflinchingly bleak and desperate	34	negative	positive (p = 0.86)
112	hilariously inept and ridiculous .	35	positive	negative (p = 0.99)
113	this movie is maddening .	26	negative	positive (p = 0.96)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	major 🔴	`text_length(text)` < 65.500 AND `text_length(text)` >= 56.500	Precision = 0.769	—	-10.89% than global

🔍✨Examples

For records in the dataset where `text_length(text)` < 65.500 AND `text_length(text)` >= 56.500, the Precision is 10.89% lower than the global Precision.

	text	text_length(text)	label	Predicted `label`
92	you wo n't like roger , but you will quickly recognize him .	61	negative	positive (p = 0.75)
183	the lower your expectations , the more you 'll enjoy it .	58	negative	positive (p = 0.97)
312	i 'll bet the video game is a lot more fun than the film .	59	negative	positive (p = 0.60)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`avg_word_length(text)` >= 4.635 AND `avg_word_length(text)` < 4.743	Recall = 0.828	—	-9.05% than global

🔍✨Examples

For records in the dataset where `avg_word_length(text)` >= 4.635 AND `avg_word_length(text)` < 4.743, the Recall is 9.05% lower than the global Recall.

	text	avg_word_length(text)	label	Predicted `label`
64	the script kicks in , and mr. hartley 's distended pace and foot-dragging rhythms follow .	4.6875	negative	positive (p = 0.99)
223	corny , schmaltzy and predictable , but still manages to be kind of heartwarming , nonetheless .	4.70588	positive	negative (p = 0.99)
248	a full world has been presented onscreen , not some series of carefully structured plot points building to a pat resolution .	4.72727	positive	negative (p = 0.54)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`avg_whitespace(text)` < 0.177 AND `avg_whitespace(text)` >= 0.174	Recall = 0.828	—	-9.05% than global

🔍✨Examples

For records in the dataset where `avg_whitespace(text)` < 0.177 AND `avg_whitespace(text)` >= 0.174, the Recall is 9.05% lower than the global Recall.

	text	avg_whitespace(text)	label	Predicted `label`
64	the script kicks in , and mr. hartley 's distended pace and foot-dragging rhythms follow .	0.175824	negative	positive (p = 0.99)
223	corny , schmaltzy and predictable , but still manages to be kind of heartwarming , nonetheless .	0.175258	positive	negative (p = 0.99)
248	a full world has been presented onscreen , not some series of carefully structured plot points building to a pat resolution .	0.174603	positive	negative (p = 0.54)

Checkout out the Giskard Space and test your model.

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment