Spaces:
Running
Running
Update index.html
Browse files- index.html +11 -5
index.html
CHANGED
|
@@ -161,7 +161,7 @@ Exploring Refusal Loss Landscapes </title>
|
|
| 161 |
<ul>
|
| 162 |
<li>Paper: <a href="https://arxiv.org/abs/2310.02446" target="_blank" rel="noopener noreferrer">
|
| 163 |
Low-Resource Languages Jailbreak GPT-4</a></li>
|
| 164 |
-
<li>Brief Introduction: Translate the malicious user query into low
|
| 165 |
</ul>
|
| 166 |
</div>
|
| 167 |
</div>
|
|
@@ -174,7 +174,8 @@ Exploring Refusal Loss Landscapes </title>
|
|
| 174 |
<ul>
|
| 175 |
<li>Paper: <a href="https://arxiv.org/abs/2309.00614" target="_blank" rel="noopener noreferrer">
|
| 176 |
Baseline Defenses for Adversarial Attacks Against Aligned Language Models</a></li>
|
| 177 |
-
<li>Brief Introduction:
|
|
|
|
| 178 |
</ul>
|
| 179 |
</div>
|
| 180 |
<h3>SmoothLLM</h3>
|
|
@@ -182,7 +183,10 @@ Exploring Refusal Loss Landscapes </title>
|
|
| 182 |
<ul>
|
| 183 |
<li>Paper: <a href="https://arxiv.org/abs/2310.03684" target="_blank" rel="noopener noreferrer">
|
| 184 |
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks</a></li>
|
| 185 |
-
<li>Brief Introduction:
|
|
|
|
|
|
|
|
|
|
| 186 |
</ul>
|
| 187 |
</div>
|
| 188 |
<h3>Erase-Check</h3>
|
|
@@ -190,7 +194,8 @@ Exploring Refusal Loss Landscapes </title>
|
|
| 190 |
<ul>
|
| 191 |
<li>Paper: <a href="https://arxiv.org/abs/2309.02705" target="_blank" rel="noopener noreferrer">
|
| 192 |
Certifying LLM Safety against Adversarial Prompting</a></li>
|
| 193 |
-
<li>Brief Introduction:
|
|
|
|
| 194 |
</ul>
|
| 195 |
</div>
|
| 196 |
<h3>Self-Reminder</h3>
|
|
@@ -198,7 +203,8 @@ Exploring Refusal Loss Landscapes </title>
|
|
| 198 |
<ul>
|
| 199 |
<li>Paper: <a href="https://assets.researchsquare.com/files/rs-2873090/v1_covered_eb589a01-bf05-4f32-b3eb-0d6864f64ad9.pdf?c=1702456350" target="_blank" rel="noopener noreferrer">
|
| 200 |
Defending ChatGPT against Jailbreak Attack via Self-Reminder</a></li>
|
| 201 |
-
<li>Brief Introduction:
|
|
|
|
| 202 |
</ul>
|
| 203 |
</div>
|
| 204 |
</div>
|
|
|
|
| 161 |
<ul>
|
| 162 |
<li>Paper: <a href="https://arxiv.org/abs/2310.02446" target="_blank" rel="noopener noreferrer">
|
| 163 |
Low-Resource Languages Jailbreak GPT-4</a></li>
|
| 164 |
+
<li>Brief Introduction: Translate the malicious user query into low-resource language before using it to query the model.</li>
|
| 165 |
</ul>
|
| 166 |
</div>
|
| 167 |
</div>
|
|
|
|
| 174 |
<ul>
|
| 175 |
<li>Paper: <a href="https://arxiv.org/abs/2309.00614" target="_blank" rel="noopener noreferrer">
|
| 176 |
Baseline Defenses for Adversarial Attacks Against Aligned Language Models</a></li>
|
| 177 |
+
<li>Brief Introduction: Perplexity Filter uses an LLM to compute the perplexity of the input query and rejects those
|
| 178 |
+
with high perplexity.</li>
|
| 179 |
</ul>
|
| 180 |
</div>
|
| 181 |
<h3>SmoothLLM</h3>
|
|
|
|
| 183 |
<ul>
|
| 184 |
<li>Paper: <a href="https://arxiv.org/abs/2310.03684" target="_blank" rel="noopener noreferrer">
|
| 185 |
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks</a></li>
|
| 186 |
+
<li>Brief Introduction: SmoothLLM perturbs the original input query to obtain several copies and aggregates
|
| 187 |
+
the intermediate responses of the target LLM to these perturbed queries to give the final response to the
|
| 188 |
+
original query.
|
| 189 |
+
</li>
|
| 190 |
</ul>
|
| 191 |
</div>
|
| 192 |
<h3>Erase-Check</h3>
|
|
|
|
| 194 |
<ul>
|
| 195 |
<li>Paper: <a href="https://arxiv.org/abs/2309.02705" target="_blank" rel="noopener noreferrer">
|
| 196 |
Certifying LLM Safety against Adversarial Prompting</a></li>
|
| 197 |
+
<li>Brief Introduction: Erase-Check employs a model to check whether the original query or any of its erased subsentences
|
| 198 |
+
is harmful. The query would be rejected if the query or one of its sub-sentences is regarded as harmful by the safety checker</li>
|
| 199 |
</ul>
|
| 200 |
</div>
|
| 201 |
<h3>Self-Reminder</h3>
|
|
|
|
| 203 |
<ul>
|
| 204 |
<li>Paper: <a href="https://assets.researchsquare.com/files/rs-2873090/v1_covered_eb589a01-bf05-4f32-b3eb-0d6864f64ad9.pdf?c=1702456350" target="_blank" rel="noopener noreferrer">
|
| 205 |
Defending ChatGPT against Jailbreak Attack via Self-Reminder</a></li>
|
| 206 |
+
<li>Brief Introduction: Self-Reminder modifying the system prompt of the target LLM so that the model reminds itself to process
|
| 207 |
+
and respond to the user in the context of being an aligned LLM.</li>
|
| 208 |
</ul>
|
| 209 |
</div>
|
| 210 |
</div>
|