Zeming Wei
ZemingWei
·
AI & ML interests
Trustworthy AI
Recent Activity
authored
a paper
6 days ago
False Sense of Security: Why Probing-based Malicious Input Detection
Fails to Generalize
commented on
a paper
8 days ago
False Sense of Security: Why Probing-based Malicious Input Detection
Fails to Generalize
authored
a paper
over 1 year ago
Jailbreak and Guard Aligned Language Models with Only Few In-Context
Demonstrations
Organizations
None yet