王通 | Tong Wang
Senior Computer Vision Researcher, MT Lab, Meitu Inc.
I am a senior computer vision researcher at MT Lab, Meitu Inc. (HKEX: 01357). I received my M.S. in Computer Science from USC Viterbi and B.S. in Mathematics-Computer Science from UC San Diego, with a minor in Speculative Design.
My research revolves around multimodal deep learning and representation learning — how to learn and align representations across modalities for robust perception and controllable generation. This theme connects my work across several domains: in scene text editing, I design glyph-aware representations within diffusion models to achieve high-fidelity text replacement; in video editing, I extract hierarchical vision-language features to guide temporally consistent manipulation; and in audio-visual speech recognition, I fuse visual lip-movement and acoustic representations to improve recognition under noisy conditions.
news
| Apr 30, 2026 | 🎉 Two papers accepted at ICML 2026: MiVE on reference-guided video editing and Self-Prompting DiT on open-vocabulary scene text editing! |
|---|---|
| Feb 27, 2025 | 🎉 Paper GlyphMastero accepted at CVPR 2025 — a glyph encoder for high-fidelity scene text editing. |
publications
patents
| 日期 | 名称 | 专利号 | 排名 |
|---|---|---|---|
| 2022-05-13 | 合成语音评估方法、装置、设备及存储介质 | CN114493232B | 第一发明人 |
| 2024-01-26 | 一种新视角图像生成方法、装置、设备及可读存储介质 | CN117456031A | 第一发明人 |
| 2022-07-29 | 一种语音数据获取方法、装置、电子设备和存储介质 | CN114822494A | 第一发明人 |
| 2025-07-18 | 处理图像中文本的方法、装置、可读存储介质和程序品 | CN120339462A | 第二发明人 |
| 2023-03-21 | 语音克隆模型生成方法、装置及电子设备 | CN115831088A | 第二发明人 |
| 2022-10-25 | 语音合成方法以及装置、存储介质、电子装置 | CN115240631A | 第三发明人 |