TrustSafeAI

community

https://sites.google.com/site/pinyuchenpage/home

pinyuchenTW

pinyuchen

AI & ML interests

Research Demos and Tools for Trustworthy and Safe AI Development and Deployment

Recent Activity

kumitang updated a Space 14 days ago

TrustSafeAI/Test-Time-Calibration

kumitang published a Space 18 days ago

TrustSafeAI/Test-Time-Calibration

kumitang updated a Space about 1 month ago

TrustSafeAI/LLM-physical-safety

View all activity

TrustSafeAI 's Spaces 15

Test Time Calibration

Test-time calibration for improving test-time reasoning

LLM Physical Safety

LLM benchmark for Physical Safety

No application file

README

CoP Agentic Red-teaming

Generate jailbreak prompts for LLMs using principles

AudioDeepfakeDetector

Detect fake audio using uploaded files

AudioPerturber

Evaluate audio deepfake detection robustness under corruptions

Retention Score

Evaluate jailbreak risks for Vision-Language Models using Retention Score

Token Highlighter

Demonstration of Token Highlighter: A Jailbreak Defense

GradientCuff-Jailbreak-Defense

Demonstration of Gradient Cuff: A Jailbreak Defense

Attention Tracker Prompt Injection Detector

Attention Tracker: Prompt Injection Detector

NeuralFuse

Protect Model from Suffering Low-voltage-induced Bit Errors

NCTV: Neural Clamping Toolkit and Visualization

Model-agnostic Toolkit for Neural Network Calibration

GREAT Score

Evaluate model robustness using GREAT Score

Defensive Prompt Patch Jailbreak Defense

Defend LLMs against jailbreak attacks

RADAR AI Text Detector

Detect if text is AI-generated or human-written