I am a CS PhD candidate at Tokyo Institute of Technology, advised by Naoaki Okazaki. My expected graduation date is March 2026. Currently, I am visiting the UPenn NLP, hosted by Chris Callison-Burch. I also work with Preslav Nakov from MBZUAI NLP.
I work on AI safety, specifically improving the safety of large language models (LLMs) from various perspectives, including:
- Machine-generated Text Detection, especially increasing the robustness against adversarial attacks in the wild, regarding a method OUTFOX (AAAI 2024) and an evaluation How You Prompt Matters (Findings of EMNLP 2024).
- LLM-as-a-judge, particularly towards a reliable evaluation by mitigating its self-preference bias Likelihood-based mitigation (Findings of ACL 2024).
- Member Inference Attack, Jailbreak, and Safe Alignment (TBD)
π’ I am actively looking for research internships starting in (Summer | Fall | Winter) 2025.
- Personal Website: sites.google.com/view/ryutokoike/
- Twitter: @sponddd
- email: my_first_name.my_last_name[at]nlp.c.titech.ac.jp