Spaces:
Running
Running
Update index.html
Browse files- index.html +5 -4
index.html
CHANGED
|
@@ -155,10 +155,11 @@ We provide more details about the running flow of Gradient Cuff in the paper.
|
|
| 155 |
</p>
|
| 156 |
|
| 157 |
<h2 id="demonstration">Demonstration</h2>
|
| 158 |
-
<p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder)
|
| 159 |
-
different jailbreak attacks~(GCG, AutoDAN, PAIR, TAP, Base64, and LRL) and benign user queries on 2 LLMs (LLaMA-2-7B-Chat and
|
| 160 |
-
We demonstrate the average refusal rate across these 6 malicious user query datasets as the Average Malicious Refusal
|
| 161 |
-
on benign user queries as the Benign Refusal Rate. The
|
|
|
|
| 162 |
</p>
|
| 163 |
|
| 164 |
|
|
|
|
| 155 |
</p>
|
| 156 |
|
| 157 |
<h2 id="demonstration">Demonstration</h2>
|
| 158 |
+
<p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder)
|
| 159 |
+
against 6 different jailbreak attacks~(GCG, AutoDAN, PAIR, TAP, Base64, and LRL) and benign user queries on 2 LLMs (LLaMA-2-7B-Chat and
|
| 160 |
+
Vicuna-7B-V1.5). We below demonstrate the average refusal rate across these 6 malicious user query datasets as the Average Malicious Refusal
|
| 161 |
+
Rate and the refusal rate on benign user queries as the Benign Refusal Rate. The defending performance against different jailbreak types is
|
| 162 |
+
shown in the provided bar chart.
|
| 163 |
</p>
|
| 164 |
|
| 165 |
|