Shinwoo Park

About Me

Name: Shinwoo Park

Position: Ph.D. Candidate in Artificial Intelligence

Affiliation: Yonsei University, Seoul, South Korea

Expected Graduation: February 2026

Research Interests

My research focuses on enhancing the safety and transparency of large language models (LLMs) by developing robust methods for detecting LLM-generated text and code. I have explored both linguistic/stylistic feature-based detection and watermarking techniques—two complementary approaches that enable accurate and imperceptible attribution of LLM outputs. These methods have been applied to a wide range of content types, including natural language (in both English and Korean) and programming language (in Python, C, C++, and Java), demonstrating strong multilingual and multimodal generalization. Through these experiences, I have cultivated a deep interest in AI safety and AI ethics, particularly in building trustworthy and accountable systems for generative AI.

Research Summary

My research aims to promote the safe and responsible use of LLMs by developing reliable methods for detecting LLM-generated text and code. I pursue two complementary directions: (1) linguistic/stylistic feature-based detection, which analyzes differences in text/code patterns between human- and LLM-authored text/code, and (2) LLM watermarking, which embeds imperceptible but detectable signals during the generation process.

I apply both methods to natural language and source code, validating across English, Korean, Python, C, C++, and Java. This multimodal and multilingual focus enhances interpretability, robustness, and real-world practicality.

Research Statement

My research is driven by the goal of ensuring the safe and responsible use of LLMs through interpretable techniques for LLM-generated text and code detection.

Linguistic/Stylistic feature-based detection:
I extract features such as word spacing, part-of-speech (POS) n-grams, and comma usage in text, and naming convention consistency or indentation in code. These features are then used in lightweight classifiers.

LLM watermarking:
I design both zero-bit and multi-bit watermarking schemes that encode signals into generation outputs for origin identification and metadata recovery, while maintaining content fluency.

My contributions span natural language and source code, validated across English and Korean, supporting the broader vision of trustworthy and safe generative AI.

Research Projects

AI for Issue-Fact Mapping (2021–2022):
Knowledge graph entity retrieval from unstructured text using topic modeling.
Medical Text Mining (2022–2025):
Clinical insight extraction from medical records using topic modeling.
Human-AI Programming Lab (2023–2025):
Code search, QA, and time complexity prediction for collaborative coding systems.
LLM for Math Categorization (2025):
Difficulty/type prediction for math problems using LLMs.

Skills

Programming: Python, Java, C, C++, Bash
ML Frameworks: PyTorch, scikit-learn, Hugging Face
NLP: SpaCy, NLTK, KoNLPy, KiwiPiePy
LLM APIs: OpenAI API, Gemini API, Ollama
Data Analysis: Pandas, NumPy, SciPy
Visualization: Matplotlib, Seaborn
Version Control: Git
Writing: LaTeX
Languages: Korean (native), English (fluent)