CV
Name: Shinwoo Park
Position: Ph.D. Candidate in Artificial Intelligence
Affiliation: Yonsei University, Seoul, South Korea
Expected Graduation: February 2026
Research Interests
My research centers on enhancing the safety and transparency of large language models (LLMs) by developing robust methods for detecting LLM-generated content. I have explored both linguistic feature-based detection and watermarking techniques—two complementary approaches that enable accurate and imperceptible attribution of LLM outputs. These methods have been applied to a wide range of content types, including natural language (in both English and Korean) and programming code (in Python, C, C++, and Java), demonstrating strong multilingual and multimodal generalization. Through these experiences, I have cultivated a deep interest in AI safety and AI ethics, particularly in building trustworthy and accountable systems for generative AI.
Research Keywords
Core: LLM-Generated Content Detection, Linguistic Feature-Based Detection, LLM Watermarking
Related: AI Safety, AI Ethics, LLM Guardrails, Responsible AI
Research Summary
My research aims to ensure the safe and responsible use of LLMs by developing reliable methods for detecting LLM-generated content. I pursue two complementary directions: (1) linguistic feature-based detection, which analyzes statistical differences in text patterns between human- and LLM-authored content, and (2) LLM watermarking, which embeds imperceptible but detectable signals during the generation process for post hoc attribution.
I apply both methods to natural language and source code, validating across English, Korean, Python, C, C++, and Java. This multimodal and multilingual focus enhances interpretability, robustness, and real-world practicality.
Research Statement
My research is driven by the goal of ensuring the safe and responsible use of LLMs through interpretable techniques for content detection. As LLMs expand across domains, risks like misinformation, academic ethics violations, and plagiarism must be addressed.
Linguistic feature-based detection:
I extract features such as word spacing, part-of-speech (POS) n-grams, and comma usage in text, and naming convention consistency or indentation in code. These features are then used in interpretable classifiers.
LLM watermarking:
I design both zero-bit and multi-bit watermarking schemes that encode signals into generation outputs for origin identification and metadata recovery, while maintaining content fluency.
My contributions span natural language and source code, validated across English and Korean, supporting the broader vision of trustworthy and safe generative AI.
Research Projects
- AI for Issue-Fact Mapping (2021–2022):
Knowledge graph entity retrieval from unstructured text using topic modeling. - Medical Text Mining (2022–2024):
Clinical insight extraction from medical records using topic modeling. - Human-AI Programming Lab (2023–2025):
Code search, QA, and time complexity prediction for collaborative coding systems. - LLM for Math Categorization (2025):
Difficulty/type prediction for math problems using LLMs.
Entrepreneurial Experience
- AI Campus Startup (Spring 2024):
Founder & Project Lead — AI-generated graduation albums using Stable Diffusion. - LLM-Driven Edu-Tech (Fall 2025):
PI — CSAT question generator using LLM pipelines.
Skills
- Programming: Python, Java, C, C++, Bash
- ML Frameworks: PyTorch, scikit-learn, Hugging Face
- NLP: spaCy, NLTK, KoNLPy
- LLM APIs: OpenAI API, Gemini API, Ollama
- Data Analysis: pandas, NumPy, SciPy
- Visualization: Matplotlib, Seaborn
- Version Control: Git
- Writing: LaTeX
- Languages: Korean (native), English (fluent)