Publications


ST-VAD Teaser

O-VAD: Industrial Video Anomaly Detection through Object-Centric State Tracking and Reasoning

Mei Yuan, Qi Long, Qifeng Wu, Zhenyang Li, Yizhou Zhao, Lei Wang, Yang Liu, Min Xu

Submitted to ECCV 2026

ongoing

TL;DR: A VLM-based reasoning framework that elevates video anomaly detection from pattern matching to cognitive-level understanding. By simulating human spatial perception and representing scene dynamics via object-centric state tracking, our approach achieves state-of-the-art performance on industrial benchmarks, pioneering explainable anomaly detection for robotic laboratories.

Time-STaR Framework
ongoing

Time-STaR: Self-Taught Reasoners Augmented with Tools for Reliable Time Series Analysis

ongoing

TL;DR: A reasoning-centric framework that repurposes LLMs for time series forecasting. By curating the Time-STaR-CoTT dataset and implementing GRPO-style reinforcement learning, we enable models to identify causal relationships, detect regime changes, and generate interpretable forecasts—achieving state-of-the-art results across weather, traffic, and finance domains.

Pronunciation Coaching System
CHI 2026

Guiding Grasp and Growth: Multi-Modal Detection and Feedback on Accented Mispronunciation

Mei Yuan, Boting Li

Submitted to CHI 2026

TL;DR: An interactive text-vision-audio pronunciation coaching system combining LLM-powered assessment, Neural TTS exemplars, and viseme animations. Validated with 82 students showing 90%+ satisfaction, the system was adopted as an intelligent teaching assistant in a graduate-level English course at Peking University.

DiffuVST Visual Storytelling
EMNLP 2023

DiffuVST: Narrating Fictional Scenes with Global-History-Guided Denoising Models

Shengguang Wu, Mei Yuan, Qi Su

Findings of EMNLP 2023

TL;DR: A non-autoregressive DiffusionLM-based storytelling model that generates coherent narratives around visual sequences. Trained with weighted conditions on global vision-language history, DiffuVST achieves superior performance with 10× faster inference than autoregressive models.

Person Re-identification
ACM MM 2022

A Person Re-identification Approach Focusing on the Occlusion Problem and Ranking Optimization

Wenkai Zheng, Mei Yuan

ACM Multimedia 2022 (MMSports Workshop)

TL;DR: A robust person re-identification method addressing occlusion challenges through dual-branch Vision Transformer with jigsaw patch modules and innovative ranking optimization. Achieved 98.38% mAP and 99.57% rank-1 accuracy, winning second place in the DeepSportRadar Player REID Challenge.

Also see Google Scholar for the complete list.