CV

Name: Shinwoo Park

Current Position: Postdoctoral Researcher, University of Luxembourg

Affiliation: Software Verification and Validation (SVV) Group, Interdisciplinary Centre for Security, Reliability and Trust (SnT), University of Luxembourg

PI: Prof. Domenico Bianculli

Degree: Ph.D. in Artificial Intelligence, Yonsei University

Graduation Date: February 2026

Appointments

Research Interests

My research focuses on ensuring the safety, transparency, and accountability of large language models (LLMs) through detection, watermarking, and domain-specific language understanding techniques. I develop multi-modal and multi-lingual systems that identify, trace, structure, and analyze AI-generated or domain-specific content across natural language, source code, and financial/regulatory documents.

Specifically, I explore three complementary directions: (1) linguistic and stylistic feature–based detection, which analyzes morphological, syntactic, and stylistic patterns to distinguish human- and LLM-generated text or code; (2) LLM watermarking, which embeds imperceptible yet verifiable statistical or structural signals into generated outputs; and (3) NLP/NLU and Requirements Engineering for financial and regulatory domains, which aims to support the analysis, structuring, and verification of complex financial contracts and legal documents.

My recent works include KatFishNet, the first linguistic feature–based detector for Korean text; LPcodedec, a coding-style-driven detector for paraphrased code; STELA, a syntactic-predictability watermark enabling model-free detection; and WaterMod, a probability-balanced modular watermarking framework supporting multi-bit payloads.

Broadly, my goal is to build trustworthy generative and language-understanding systems that are interpretable, regulation-compliant, and robust enough for high-stakes domains.

Research Summary

My research aims to promote responsible and verifiable AI generation by developing reliable methods for detecting, attributing, and interpreting LLM-generated or domain-specific text and code. I pursue three main directions that mutually reinforce each other:

These systems demonstrate strong multilingual (English, Korean) and multimodal (text + code) generalization, advancing interpretable, regulation-aligned, and domain-aware AI research.

Research Statement

My long-term research vision is to establish a unified framework for provenance-aware, interpretable, and domain-grounded AI that spans language, programming, and high-stakes document analysis. To achieve this, I combine linguistic insight, statistical modeling, watermark design, and requirements-oriented analysis to construct transparent interfaces between human communication, domain knowledge, and generative models.

Linguistic / Stylistic Feature-based Detection:
My work on KatFishNet introduces the first benchmark and detector for LLM-generated Korean text, leveraging word-spacing irregularities, POS n-gram diversity, and comma usage to expose cross-morphological differences between human and machine writing. Extending this idea to source code, LPcodedec identifies LLM-paraphrased code by quantifying coding-style features such as naming consistency, indentation regularity, and comment ratio.

LLM Watermarking:
My research advances from distribution-based watermarking to linguistically adaptive and probability-balanced methods. STELA modulates watermark strength according to syntactic predictability modeled by POS n-gram entropy, enabling model-free public detection. WaterMod generalizes this concept through modular token-rank partitioning that guarantees at least one high-probability token per class, supporting zero-bit and multi-bit watermarking with minimal quality loss.

NLP/NLU and Requirements Engineering for Financial and Regulatory Domains:
My postdoctoral work at the University of Luxembourg extends this agenda toward trustworthy language technologies for high-stakes institutional settings. In collaboration with the European Investment Bank (EIB), I work on applying NLP, NLU, and Requirements Engineering techniques to financial and regulatory-domain problems, including the analysis of financial contracts, legal documents, and regulatory texts.

Together, these studies form a coherent agenda for trustworthy and interpretable AI, bridging linguistic analysis, information-theoretic watermark design, and domain-specific language understanding to meet emerging transparency, safety, and regulatory requirements.

Publications


First author    †* Co-first author (marked with *)

To Appear / Published

A Linguistics-Aware LLM Watermarking via Syntactic Predictability

Shinwoo Park †, Hyejin Park, Hyeseon Ahn, Yo-Sub Han

ACL 2026 (Main Conference), to appear.


DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation

Hyeseon Ahn, Shinwoo Park, Suyeon Woo, Yo-Sub Han

EACL 2026 (Main Conference), pp. 4922--4936.


Marking Code Without Breaking It: Code Watermarking for Detecting LLM-Generated Code

Jungin Kim †*, Shinwoo Park †*, Yo-Sub Han

Findings of EACL 2026, pp. 3990–-4002.

* Equal contribution


WaterMod: Modular Token-Rank Partitioning for Probability-Balanced LLM Watermarking

Shinwoo Park †, Hyejin Park, Hyeseon Ahn, Yo-Sub Han

AAAI 2026 (Main Technical Track, Oral Presentation).


EnCur: Curriculum-Based In-Context Learning with Structural Encoding for Code Time Complexity Prediction

Joonghyuk Hahn, Aditi, SeungYeop Baik, Shinwoo Park, Sang-Ki Ko, Yo-Sub Han

Expert Systems with Applications, Vol. 296, 129094, January 2026.


Detecting Code Paraphrased by Large Language Models using Coding Style Features

Shinwoo Park †, Hyundong Jin, Jeong-Won Cha, Yo-Sub Han

Engineering Applications of Artificial Intelligence, Vol. 162, December 2025.


Mondrian: A Framework for Logical Abstract (Re)Structuring

Elizabeth Grace Orwig, Shinwoo Park, Hyundong Jin, Yo-Sub Han

EMNLP 2025 (Main Conference), pp. 33663--33678.


TrapDoc: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents

Hyundong Jin, Sicheol Sung, Shinwoo Park, SeungYeop Baik, Yo-Sub Han

Findings of EMNLP 2025, pp. 18881--18897.


Advanced Code Time Complexity Prediction Approach Using Contrastive Learning

Shinwoo Park †, Joonghyuk Hahn, Elizabeth Orwig, Sang-Ki Ko, Yo-Sub Han

Engineering Applications of Artificial Intelligence, Vol. 151, July 2025.


KatFishNet: Detecting LLM-Generated Korean Text through Linguistic Feature Analysis

Shinwoo Park †, Shubin Kim, Do-Kyung Kim, Yo-Sub Han

ACL 2025 (Main Conference), pp. 21189–21222.


ConPrompt: Pre-training a Language Model with Machine-Generated Data for Implicit Hate Speech Detection

Youngwook Kim, Shinwoo Park, Youngsoo Namgoong, Yo-Sub Han

Findings of EMNLP 2023, pp. 10964–10980.


Contrastive Learning with Keyword-based Data Augmentation for Code Search and Code Question Answering

Shinwoo Park †, Youngwook Kim, Yo-Sub Han

EACL 2023 (Main Conference), pp. 3609–3619.


Generalizable Implicit Hate Speech Detection using Contrastive Learning

Youngwook Kim, Shinwoo Park, Yo-Sub Han

COLING 2022, pp. 6667–6679.

Under Review

Linguistics-Aware Non-Distortionary LLM Watermarking

Shinwoo Park †, Hyejin Park, Hyeseon An, Yo-Sub Han


Sequential Behavioral Watermarking for LLM Agents

Hyeseon Ahn, Shinwoo Park, Dongsu Kim, Yo-Sub Han


Scalable Semantic Code Clone Retrieval via Module-Level Graph Aggregation

Suyoung Park, Seongmin Kim, Shinwoo Park, Sang-Ki Ko


From Intuition to Calibrated Judgment: A Rubric-Based Expert-Panel Study of Human Detection of LLM-Generated Korean Text

Shinwoo Park † and Yo-Sub Han


Steering Language Models Before They Speak: Logit-Level Interventions

Hyeseon Ahn, Shinwoo Park, Hyundong Jin, Yo-Sub Han


Select then MixUp: Improving Out-of-Distribution Natural Language Code Search

Shinwoo Park † and Yo-Sub Han

Projects

Professional Services

Skills