We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We would like to implement the LLM jailbreak attack outlined in "Attacking Large Language Models with Projected Gradient Descent" by Geisler et al. Evaluating this evasion attack in Armory Library requires the steps below.
The authors were contacted for any source code but have not yet responded, but an unverified implementation is available from Dreadnode.
The AdvBench dataset used in "Universal and Transferable Adversarial Attacks on Aligned Language Models" may be used.
The text was updated successfully, but these errors were encountered:
swsuggs
No branches or pull requests
We would like to implement the LLM jailbreak attack outlined in "Attacking Large Language Models with Projected Gradient Descent" by Geisler et al. Evaluating this evasion attack in Armory Library requires the steps below.
The authors were contacted for any source code but have not yet responded, but an unverified implementation is available from Dreadnode.
The AdvBench dataset used in "Universal and Transferable Adversarial Attacks on Aligned Language Models" may be used.
The text was updated successfully, but these errors were encountered: