王通 | Tong Wang

Senior Computer Vision Researcher, MT Lab, Meitu Inc.

prof_pic.jpg

I am a senior computer vision researcher at MT Lab, Meitu Inc. (HKEX: 01357). I received my M.S. in Computer Science from USC Viterbi and B.S. in Mathematics-Computer Science from UC San Diego, with a minor in Speculative Design.

My research revolves around multimodal deep learning and representation learning — how to learn and align representations across modalities for robust perception and controllable generation. This theme connects my work across several domains: in scene text editing, I design glyph-aware representations within diffusion models to achieve high-fidelity text replacement; in video editing, I extract hierarchical vision-language features to guide temporally consistent manipulation; and in audio-visual speech recognition, I fuse visual lip-movement and acoustic representations to improve recognition under noisy conditions.

news

Apr 30, 2026 🎉 Two papers accepted at ICML 2026: MiVE on reference-guided video editing and Self-Prompting DiT on open-vocabulary scene text editing!
Feb 27, 2025 🎉 Paper GlyphMastero accepted at CVPR 2025 — a glyph encoder for high-fidelity scene text editing.

publications

  1. MiVE: Multiscale Vision-Language Features for Reference-Guided Video Editing
    Tong Wang, Meng Zou, Chengjing Wu, Xiaochao Qu, Luoqi Liu, Xiaolin Hu, and Ting Liu
    In International Conference on Machine Learning, 2026
  2. Self-Prompting Diffusion Transformer for Open-Vocabulary Scene Text Editing via In-Context Learning
    Hongxi Li, Tong Wang, Chengjing Wu, Tianbao Liu, Jiangtao Yao, Xiaochao Qu, Xinxiao Wu, Luoqi Liu, and Ting Liu
    In International Conference on Machine Learning, 2026
  3. GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing
    Tong Wang, Ting Liu, Xiaochao Qu, Chengjing Wu, Luoqi Liu, and Xiaolin Hu
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
  4. DenseGAP: Graph-Structured Dense Correspondence Learning with Anchor Points
    Zhengfei Kuang, Jiaman Li, Mingming He, Tong Wang, and Yajie Zhao
    In 26th International Conference on Pattern Recognition (ICPR Oral), 2022

patents

日期 名称 专利号 排名
2022-05-13 合成语音评估方法、装置、设备及存储介质 CN114493232B 第一发明人
2024-01-26 一种新视角图像生成方法、装置、设备及可读存储介质 CN117456031A 第一发明人
2022-07-29 一种语音数据获取方法、装置、电子设备和存储介质 CN114822494A 第一发明人
2025-07-18 处理图像中文本的方法、装置、可读存储介质和程序品 CN120339462A 第二发明人
2023-03-21 语音克隆模型生成方法、装置及电子设备 CN115831088A 第二发明人
2022-10-25 语音合成方法以及装置、存储介质、电子装置 CN115240631A 第三发明人