Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Walled AI

company
https://www.walled.ai/
walledai
Activity Feed

AI & ML interests

None defined yet.

Rishabh Bhardwaj's profile picture

walledai 's collections 1

Research
Our AI Safety Research
  • Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique

    Paper • 2408.10701 • Published Aug 20, 2024 • 12
  • Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming

    Paper • 2406.11654 • Published Jun 17, 2024 • 6
  • Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

    Paper • 2409.11242 • Published Sep 17, 2024 • 7
  • Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment

    Paper • 2308.09662 • Published Aug 18, 2023 • 3
Research
Our AI Safety Research
  • Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique

    Paper • 2408.10701 • Published Aug 20, 2024 • 12
  • Ruby Teaming: Improving Quality Diversity Search with Memory for Automated Red Teaming

    Paper • 2406.11654 • Published Jun 17, 2024 • 6
  • Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

    Paper • 2409.11242 • Published Sep 17, 2024 • 7
  • Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment

    Paper • 2308.09662 • Published Aug 18, 2023 • 3
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs