About Me

Name: Shinwoo Park

Position: Ph.D. Candidate in Artificial Intelligence

Affiliation: Yonsei University, Seoul, South Korea

Expected Graduation: February 2026

Research Interests

My research focuses on ensuring the safety, transparency, and accountability of large language models (LLMs) through detection and watermarking techniques. I develop multi-modal and multi-lingual systems that identify or trace LLM-generated content across natural language and source code domains.

Specifically, I explore two complementary directions: (1) linguistic and stylistic feature–based detection, which analyzes morphological, syntactic, and stylistic patterns to distinguish human- and LLM-generated text or code; and (2) LLM watermarking, which embeds imperceptible yet verifiable statistical or structural signals into generated outputs.

My recent works include KatFishNet, the first linguistic feature–based detector for Korean text; LPcodedec, a coding-style-driven detector for paraphrased code; STELA, a syntactic-predictability watermark enabling model-free detection; and WaterMod, a probability-balanced modular watermarking framework supporting multi-bit payloads.

Broadly, my goal is to build trustworthy generative systems that are interpretable, regulation-compliant, and resistant to misuse.

Research Summary

My research aims to promote responsible and verifiable AI generation by developing reliable methods for detecting and attributing LLM-generated text and code. I pursue two main directions that mutually reinforce each other:

These systems demonstrate strong multilingual (English, Korean) and multimodal (text + code) generalization, advancing interpretable and regulation-aligned AI provenance research.

Research Statement

My long-term research vision is to establish a unified framework for provenance-aware and interpretable AI that spans both language and programming modalities. To achieve this, I combine linguistic insight, statistical modeling, and watermark design to construct transparent interfaces between human communication and generative models.

Linguistic / Stylistic Feature-based Detection:
My work on KatFishNet introduces the first benchmark and detector for LLM-generated Korean text, leveraging word-spacing irregularities, POS n-gram diversity, and comma usage to expose cross-morphological differences between human and machine writing. Extending this idea to source code, LPcodedec identifies LLM-paraphrased code by quantifying coding-style features such as naming consistency, indentation regularity, and comment ratio.

LLM Watermarking:
My research advances from distribution-based watermarking to linguistically adaptive and probability-balanced methods. STELA modulates watermark strength according to syntactic predictability modeled by POS n-gram entropy, enabling model-free public detection. WaterMod generalizes this concept through modular token-rank partitioning that guarantees at least one high-probability token per class, supporting zero-bit and multi-bit watermarking with minimal quality loss.

Together, these studies form a coherent agenda for trustworthy and interpretable generative AI, bridging linguistic analysis and information-theoretic watermark design to meet emerging transparency and safety requirements.

Projects

Professional Services

Skills