Zeming Wei's picture

1 2

Zeming Wei

ZemingWei

·

https://weizeming.github.io

AI & ML interests

Trustworthy AI

Recent Activity

authored a paper 2 days ago

False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize

commented on a paper 3 days ago

False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize

authored a paper over 1 year ago

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations

View all activity

Organizations

None yet

Papers 2

arxiv:2509.03888

arxiv:2310.06387

models 0

None public yet

datasets 0

None public yet