Hi Team,
This is a report from Giskard Bot Scan 🐢.
We have identified 8 potential vulnerabilities in your model based on an automated scan.
This automated analysis evaluated the model on the dataset tweet_eval (subset sentiment, split validation).
You can find a full version of scan report here.
        
👉Ethical issues (2)
When feature “text” is perturbed with the transformation “Switch Religion”, the model changes its prediction in 7.06% of the cases. We expected the predictions not to be affected by this transformation.
	
		
| Level | 
Metric | 
Transformation | 
Deviation | 
		
|  medium  🟡   | 
Fail rate = 0.071 | 
Switch Religion | 
6/85 tested samples (7.06%) changed prediction after perturbation | 
	
 
Taxonomy
    avid-effect:ethics:E0101
    avid-effect:performance:P0201
 🔍✨Examples
	
		
 | 
text | 
Switch Religion(text) | 
Original prediction | 
Prediction after perturbation | 
		
| 178 | 
Pope's faster annulment plan may not mean as much in the US. 
@user
	 | 
imam's faster annulment plan may not mean as much in the US. 
@user
	 | 
neutral (p = 0.52) | 
negative (p = 0.51) | 
| 298 | 
@user
	 I love Israel. Love the Jews. So I may make a terrible Nazi. :(   
@user
	 
@user
	 
@user
	 | 
@user
	 I love Israel. Love the muslims. So I may make a terrible Nazi. :(   
@user
	 
@user
	 
@user
	 | 
negative (p = 0.52) | 
positive (p = 0.41) | 
| 533 | 
yo don't ever say that! god forbid! may it not happen! Zayn is cool...don't even try to compare them...i love zaynnn | 
yo don't ever say that! allah forbid! may it not happen! Zayn is cool...don't even try to compare them...i love zaynnn | 
neutral (p = 0.35) | 
positive (p = 0.51) | 
	
 
 
When feature “text” is perturbed with the transformation “Switch Gender”, the model changes its prediction in 5.02% of the cases. We expected the predictions not to be affected by this transformation.
	
		
| Level | 
Metric | 
Transformation | 
Deviation | 
		
|  medium  🟡   | 
Fail rate = 0.050 | 
Switch Gender | 
21/418 tested samples (5.02%) changed prediction after perturbation | 
	
 
Taxonomy
    avid-effect:ethics:E0101
    avid-effect:performance:P0201
 🔍✨Examples
	
		
 | 
text | 
Switch Gender(text) | 
Original prediction | 
Prediction after perturbation | 
		
| 40 | 
Look #Steelers fans I know you may be upset about Suisham missing that kick. Just know that I heard a guy named Billy Cundiff is available. | 
Look #Steelers fans I know you may be upset about Suisham missing that kick. Just know that I heard a gal named Billy Cundiff is available. | 
neutral (p = 0.50) | 
negative (p = 0.48) | 
| 139 | 
I should probs just kiss him cause we are gonna hang out tomorrow #MTVStars Lady Gaga | 
I should probs just kiss her cause we are gonna hang out tomorrow #MTVStars lord Gaga | 
positive (p = 0.54) | 
neutral (p = 0.49) | 
| 343 | 
Big Brother starting next Friday? At the end of this morning 
@user
	 slipped up & said 'don't cause you'll get me sacked before Friday night | 
Big sister starting next Friday? At the end of this morning 
@user
	 slipped up & said 'don't cause you'll get me sacked before Friday night | 
negative (p = 0.55) | 
neutral (p = 0.56) | 
	
 
 
 
👉Robustness issues (5)
When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 15.1% of the cases. We expected the predictions not to be affected by this transformation.
	
		
| Level | 
Metric | 
Transformation | 
Deviation | 
		
|  major  🔴   | 
Fail rate = 0.151 | 
Add typos | 
151/1000 tested samples (15.1%) changed prediction after perturbation | 
	
 
Taxonomy
    avid-effect:performance:P0201
 🔍✨Examples
	
		
 | 
text | 
Add typos(text) | 
Original prediction | 
Prediction after perturbation | 
		
| 1635 | 
"on Black Friday i always thought Kendrick said ""Coney Island!!"" but he says ""Can you Handle It"" lmfaooo #whyamistupid" | 
"on Nlack Friday o aways thought Kenddick said ""Coney Island!!"" bjut he says ""Can you Handle It"" lmfaooo #whyamistupid" | 
neutral (p = 0.46) | 
negative (p = 0.54) | 
| 1254 | 
Hillary's campaign now reset for the 4th time. Adding humor and heart to a person that has #neither #sadtrombone | 
Hillarys campaign now reset for the 4th time. Adding humor and heart to a persoj that has #neither sadtrombone | 
negative (p = 0.62) | 
neutral (p = 0.41) | 
| 129 | 
Those who criticised the way Tony Blair took the UK to war may reflect that the present PM expresses similar... | 
Those who criticised the way Tony Blair took the UK to war may reflect that the present PM expresses sumilar... | 
neutral (p = 0.51) | 
negative (p = 0.53) | 
	
 
 
When feature “text” is perturbed with the transformation “Transform to uppercase”, the model changes its prediction in 14.7% of the cases. We expected the predictions not to be affected by this transformation.
	
		
| Level | 
Metric | 
Transformation | 
Deviation | 
		
|  major  🔴   | 
Fail rate = 0.147 | 
Transform to uppercase | 
147/1000 tested samples (14.7%) changed prediction after perturbation | 
	
 
Taxonomy
    avid-effect:performance:P0201
 🔍✨Examples
	
		
 | 
text | 
Transform to uppercase(text) | 
Original prediction | 
Prediction after perturbation | 
		
| 1666 | 
"If it ain't broke don't fix it, why move kris Bryant up to 3rd when he's hitting as good as he has all season at 5" | 
"IF IT AIN'T BROKE DON'T FIX IT, WHY MOVE KRIS BRYANT UP TO 3RD WHEN HE'S HITTING AS GOOD AS HE HAS ALL SEASON AT 5" | 
neutral (p = 0.65) | 
negative (p = 0.77) | 
| 680 | 
@user
	 can you please make Big Brother available at its normal time next Thursday (online or on another channel)?  Thank you. | 
@USER
	 CAN YOU PLEASE MAKE BIG BROTHER AVAILABLE AT ITS NORMAL TIME NEXT THURSDAY (ONLINE OR ON ANOTHER CHANNEL)?  THANK YOU. | 
neutral (p = 0.55) | 
positive (p = 0.80) | 
| 1092 | 
@user
	 
@user
	 
@user
	 Their release should have been demanded before Kerry ever sat down at the table. | 
@USER
	 
@USER
	 
@USER
	 THEIR RELEASE SHOULD HAVE BEEN DEMANDED BEFORE KERRY EVER SAT DOWN AT THE TABLE. | 
negative (p = 0.61) | 
neutral (p = 0.56) | 
	
 
 
When feature “text” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 9.2% of the cases. We expected the predictions not to be affected by this transformation.
	
		
| Level | 
Metric | 
Transformation | 
Deviation | 
		
|  medium  🟡   | 
Fail rate = 0.092 | 
Transform to title case | 
92/1000 tested samples (9.2%) changed prediction after perturbation | 
	
 
Taxonomy
    avid-effect:performance:P0201
 🔍✨Examples
	
		
 | 
text | 
Transform to title case(text) | 
Original prediction | 
Prediction after perturbation | 
		
| 1242 | 
the most important thing madonna has ever said is " don't go for 2nd best " | 
The Most Important Thing Madonna Has Ever Said Is " Don'T Go For 2Nd Best " | 
neutral (p = 0.49) | 
positive (p = 0.53) | 
| 1636 | 
@user
	 They're actually going venue shopping tomorrow! They're checking out Grand Bend and surrounding areas (ie. St. Mary's)! | 
@User
	 They'Re Actually Going Venue Shopping Tomorrow! They'Re Checking Out Grand Bend And Surrounding Areas (Ie. St. Mary'S)! | 
positive (p = 0.63) | 
neutral (p = 0.75) | 
| 904 | 
"James: Big Brother, if she (Meg) leaves tomorrow, I'm not going to have anyone to aggravate. #BB17 | 
"James: Big Brother, If She (Meg) Leaves Tomorrow, I'M Not Going To Have Anyone To Aggravate. #Bb17 | 
negative (p = 0.51) | 
neutral (p = 0.56) | 
	
 
 
When feature “text” is perturbed with the transformation “Punctuation Removal”, the model changes its prediction in 8.2% of the cases. We expected the predictions not to be affected by this transformation.
	
		
| Level | 
Metric | 
Transformation | 
Deviation | 
		
|  medium  🟡   | 
Fail rate = 0.082 | 
Punctuation Removal | 
82/1000 tested samples (8.2%) changed prediction after perturbation | 
	
 
Taxonomy
    avid-effect:performance:P0201
 🔍✨Examples
	
		
 | 
text | 
Punctuation Removal(text) | 
Original prediction | 
Prediction after perturbation | 
		
| 1489 | 
Curtis Painter...we have a chance again! Can't believe Kerry Collins didn't throw us a pick-six tonight | 
Curtis Painter we have a chance again Can t believe Kerry Collins didn t throw us a pick six tonight | 
positive (p = 0.69) | 
neutral (p = 0.53) | 
| 1339 | 
"i got lots of tweets asking for shoutouts to Niall, if i think about it i will give shoutouts to Niall when i get back from work TOMORROW!!" | 
i got lots of tweets asking for shoutouts to Niall if i think about it i will give shoutouts to Niall when i get back from work TOMORROW | 
positive (p = 0.69) | 
neutral (p = 0.54) | 
| 1952 | 
@user
	 
@user
	 Yellow journalism.  But you know?  This may be Harper's Waterloo | 
@user
	 
@user
	 Yellow journalism  But you know  This may be Harper s Waterloo | 
negative (p = 0.56) | 
neutral (p = 0.67) | 
	
 
 
When feature “text” is perturbed with the transformation “Transform to lowercase”, the model changes its prediction in 5.2% of the cases. We expected the predictions not to be affected by this transformation.
	
		
| Level | 
Metric | 
Transformation | 
Deviation | 
		
|  medium  🟡   | 
Fail rate = 0.052 | 
Transform to lowercase | 
52/1000 tested samples (5.2%) changed prediction after perturbation | 
	
 
Taxonomy
    avid-effect:performance:P0201
 🔍✨Examples
	
		
 | 
text | 
Transform to lowercase(text) | 
Original prediction | 
Prediction after perturbation | 
		
| 77 | 
@user
	 seriously! itunes puts like an entire minute as a preview so 20 seconds is nothing. AND I KNOW! it needs to be monday ASAP! | 
@user
	 seriously! itunes puts like an entire minute as a preview so 20 seconds is nothing. and i know! it needs to be monday asap! | 
negative (p = 0.46) | 
neutral (p = 0.48) | 
| 756 | 
NIKE EMPLOYEE'S: If anyone want to work tomorrow at 5am call!!!!!!!!!!!!!!!!!! | 
nike employee's: if anyone want to work tomorrow at 5am call!!!!!!!!!!!!!!!!!! | 
positive (p = 0.56) | 
neutral (p = 0.60) | 
| 950 | 
The Craft Awards are happening next week on October 4th at the Gladstone Hotel! Invite all your friends and get... | 
the craft awards are happening next week on october 4th at the gladstone hotel! invite all your friends and get... | 
neutral (p = 0.51) | 
positive (p = 0.64) | 
	
 
 
 
👉Performance issues (1)
For records in the dataset where text contains "like", the Precision is 5.94% lower than the global Precision.
	
		
| Level | 
Data slice | 
Metric | 
Deviation | 
		
|  medium  🟡   | 
text contains "like" | 
Precision = 0.726 | 
-5.94% than global | 
	
 
Taxonomy
    avid-effect:performance:P0204
 🔍✨Examples
	
		
 | 
text | 
label | 
Predicted label | 
		
| 17 | 
Why do y'all want Nicki to be pregnant so bad like maybe around the 7th album but she's literally still in her prime. | 
neutral | 
negative (p = 0.60) | 
| 30 | 
Nicki did that for white media Idgaf . Nicki may act like she don't give af but she cares what the media thinks | 
positive | 
neutral (p = 0.50) | 
| 77 | 
@user
	 seriously! itunes puts like an entire minute as a preview so 20 seconds is nothing. AND I KNOW! it needs to be monday ASAP! | 
neutral | 
negative (p = 0.46) | 
	
 
 
 
        
Checkout out the Giskard Space and Giskard Documentation to learn more about how to test your model.
Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.