Cyberworlds 2025 Forum
Youth Scholars Forum
3D Measurement Data Processing, Analysis, and Generation
Speaker: Honghua Chen
Abstract: With the rapid development of 3D sensing and imaging technologies, the acquisition, processing, and understanding of high-quality 3D data are becoming fundamental to fields such as intelligent manufacturing, robotic perception, and digital twins. This report focuses on the theme of "Research on 3D Geometric Learning in Practical Applications" and systematically presents our progress and achievements in 3D measurement data processing, analysis, and generation. Specifically, the report will cover topics including 3D data denoising, registration, completion and reconstruction, feature computation, as well as 3D editing and generation methods incorporating priors from pre-trained models. In addition, the report will explore future research directions that integrate 3D geometry with physical motion attributes, aiming to achieve unified perception and generation of object shape, physical properties, and dynamic behaviors in complex scenes.
Biography: Honghua Chen is a Research Assistant Professor at Lingnan University, Hong Kong. He received his Ph.D. degree from Nanjing University of Aeronautics and Astronautics in 2022. From December 2020 to April 2022, he was a Research Assistant at The Chinese University of Hong Kong. From January 2023 to July 2025, he worked as a Research Fellow at MMLab@NTU and S-Lab, Nanyang Technological University, Singapore. His main research interests lie in computer-aided design and intelligent manufacturing, with a focus on high-precision 3D measurement data processing, analysis, generation, and assembly quality control. He has published multiple papers in top-tier international conferences and journals (SIGGRAPH & SIGGRAPH Asia, CVPR, ICCV, TPAMI, IJCV, TVCG, CAD, etc.) and holds more than ten granted national invention patents. He was awarded the CCF-CAD&CG Doctoral Dissertation Incentive Award (2024) and the First Prize of the China Invention Association Invention and Innovation Award.
Towards Cognitive Trustworthiness: Mechanisms and Governance of Hallucinations in Large Language Models
Speaker: Xiang Chen
Abstract: Large Language Models (LLMs), while transformative, are hindered by "hallucinations"—factual errors and fabrications—that undermine their cognitive trustworthiness and limit applications in critical domains. This report dissects the underlying mechanisms of hallucination by drawing on interpretability research into "knowledge neurons" and "neural circuits." Based on this analysis, we introduce the governance framework structured around prevention (e.g., alignment via reinforcement learning), mitigation (e.g., retrieval-augmentation and decoding interventions), and measurement (e.g., standardized evaluation). The report concludes by outlining future research paths toward building more reliable and trustworthy LLMs.
Biography:Xiang Chen is a Professor at the College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics (NUAA), and a member of the MIIT Key Laboratory of Pattern Analysis and Machine Intelligence. He received his Ph.D. from Zhejiang University, and his research focuses on Natural Language Processing, Machine Learning, and Knowledge Engineering. Pro. Chen has published over 40 papers in premier AI venues (e.g., NeurIPS, ICLR, ACL, IJCAI), with his work amassing over 3,200 Google Scholar citations. Four of his papers were named among the most influential by Paper Digest. He was selected for the inaugural Jiangsu "U35 Nurturing Program" for Young Science and Technology Talents. He actively serves the academic community as a Senior Program Committee member or Area Chair for top conferences like EMNLP, IJCAI, ACM MM, ICLR and NeurIPS. Additionally, he has presided over projects funded by the National Natural Science Foundation of China, the Jiangsu Provincial Natural Science Foundation, the CCF–Didi Gaia Scholar Research Fund, and the CAAI–Huawei MindSpore Research Fund, among others.
Digital Twin–Enabled Precision Diagnosis and Treatment for Pan-Vascular Diseases
Speaker: Weixin Si
Abstract: Pan-vascular diseases have become the leading cause of mortality in China. Although vascular intervention techniques can provide effective minimally invasive treatment, reliable risk assessment and precise planning of stent implantation surgeries still heavily rely on doctors’ experience, leading to challenges such as insufficient accuracy in functional analysis of target lesions, difficulties in interventional procedural planning, lack of depth-aware image guidance, and inaccurate prognosis evaluation. Our team aims to achieve precise functional evaluation of pan-vascular disease targets and accurate planning, navigation, and prognosis of interventional therapies. We have conducted a series of studies addressing two key issues: efficient hybrid modeling of multi-scale, multi-dimensional digital twins for pan-vascular target regions, and the rapid evolutionary mechanisms of personalized interventional strategies for pan-vascular diseases.
Biography: Weixin Si is an Associate Professor at the School of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, and serves as the Associate Director of the Center for Evidence-Based Medicine and Artificial Intelligence at the Institute of Artificial Intelligence. He is a recipient of the Shenzhen Excellent Young Scientist Fund and a Senior Member of both IEEE and the China Computer Federation (CCF). His research focuses on computer-assisted surgery and digital twin technology. In the past three years, he has published over 20 first-author or corresponding-author papers in high-impact journals and conferences such as The Lancet sub-journals, IEEE Transactions, and CCF-A ranked venues. His 2 papers were selected for oral presentation at MICCAI, a top conference in computer-assisted medicine. He holds 11 authorized national invention patents and has been cited over 1900 times. He has led more than 10 national, provincial, and municipal research projects, including grants from the NSFC (General and Young Scientists Programs), Guangdong Province (Young Scholars and General Programs), and the Shenzhen Excellent Young Scientists Program and Key Projects. He serves as an executive committee member for six national professional committees under organizations such as the China Computer Federation, Chinese Institute of Electronics, and Chinese Hospital Association. Additionally, he is an editorial board member or youth editor for four journals, including a sub-journal of The Innovation, and serves as an area chair for IPCAI 2026. His work has been recognized with a 2024 Highly Cited Paper award in journal Visual Computing for Industry, Biomedicine, and Art(IF: 6.0)and the Best Paper Award at the Asia Simulation Conference.
Foundation Models for Visual Synthesis, Restoration, and Healthcare
Speaker: Lei Zhu
Abstract: Due to the generalization capabilities on a board range of general tasks, foundation models can serve as the base or building blocks for creating more specialized applications with superior performance. In this talk, we will talk about our proposed image and video synthesis algorithms based on foundation models. Then, we will introduce our proposed LLMs-based image restoration and segmentation methods. Finally, we will present medical image analysis works based on diffusion models, mamba models, and LLMs.
Biography:Lei ZHU is currently working as an Assistant Professor at ROAS Thrust, HKUST(GZ), and also an Affiliated Assistant Professor, ECE, HKUST. Before that, he has been a postdoctoral researcher at University of Cambridge, the Hong Kong Polytechnic University and the Chinese University of Hong Kong. He received his PhD degree at Department of Computer Science and Engineering from the Chinese University of Hong Kong in 2017. His research interest is to develop AI-powered image perception theory and algorithms for outdoor vision systems, multimedia, and healthcare. Our works have been published in top-tier conferences and journals, e.g., IEEE-TPAMI/IJCV/NeurIPS/CVPR/ICCV/IEEE TMI/MICCAI, ECCV/AAAI/IJCAI/ACM-MM, IEEE-TIP, IEEE-TNNLS, IEEE-TMM, IEEE-TCSVT, IEEE-TBME, IEEE-TVCG, IEEE TCyb, and Medical Image Analysis. I have served as a program chair of ACM SIGGRAPH VRCAI 2025 & 2024 & 2022, an organization chair of Computer Graphics International (CGI) 2023, senior committee member (SPC) of AAAI 2026 and IJCAI 2025, an area chair of ACM MM 2025 & 2022 & 2021, CVPR 2026 & 2025, ICLR 2026 & 2025, ECCV 2024, MICCAI 2025 & 2024 & 2023, MIUA 2022, as well as a session chair of ACM Multimedia 2022, CGI 2022, and an associate editor of The Visual Computer. The Google scholar citation reaches 9400+. Our team has obtained best paper awards four times, and winners of many international challenges. I am honorably selected among “World's Top 2\% Scientists” By Stanford for research excellent and impact in 2022, 2023, 2024, and 2025. 10+ papers have been selected as Oral/Highlight/Spotlight papers in CVPR/MICCAI/ACM MM/ICRA. I have received an Adobe Research Gift Grant in 2025.
Towards Automated Orthodontics: Data Generation and Automatic Tooth Alignment on 3D Dental Models
Speaker:Jiajia Dai
Abstract:The core of invisible orthodontic treatment planning lies in predicting the target tooth positions based on the initial state of a patient's intraoral teeth and mapping the complete movement sequence of the teeth during the orthodontic process. Traditional methods rely heavily on clinicians' expertise and face challenges due to data scarcity. To address the issue of predicting orthodontic target positions, this paper proposes a diffusion model-based network for tooth target position prediction. By learning the distribution of tooth transformation matrices from limited clinical data, the network automatically generates poses from random initial states to normal occlusion, overcoming the generalization limitations of existing methods caused by insufficient data. To tackle the challenge of obtaining intermediate pose sequences for 3D dental models during orthodontic treatment, a dynamic sequence reconstruction framework guided by animated videos is proposed. This framework integrates the geometric features of the initial dental model with 2D projection features from videos, employs a diffusion model to learn the evolution patterns of tooth poses, and incorporates weakly supervised cross-modal constraints through differentiable rendering to generate biomechanically plausible tooth movement sequences. These two approaches advance intelligent orthodontic technology from both static target pose prediction and dynamic process modeling perspectives. Experiments demonstrate that the proposed methods outperform existing solutions in terms of clinical rationality and dynamic consistency of the generated results, providing a comprehensive solution for intelligent invisible orthodontic planning.
Biography:Jiajia Dai is a postdoctoral researcher at Tsinghua University, having earned her Ph.D. from Nanjing University of Aeronautics and Astronautics. Her research focuses on computer vision and computer graphics, primarily encompassing digital geometry processing and machine vision localization. She has published multiple papers in prominent journals, including IEEE Transactions on Intelligent Transportation Systems (T-ITS), IEEE Transactions on Instrumentation and Measurement (TIM), and Computer-Aided Design (CAD).
Meta-Learning-Based 3D Human Mesh Reconstruction from Images
Speaker: Yongwei Nie
Abstract: 3D human mesh reconstruction from single images plays a crucial role in virtual avatars, virtual try-on, and sport-refereeing applications. Existing solutions fall into two paradigms: direct regression and test-time optimization. The latter refines the parameters of a regression network for every test sample by exploiting the detected 2D joint locations. However, when the training and test data are drawn from markedly different distributions, the initial parameters supplied by the regressor no longer constitute a good starting point, and the final mesh accuracy drops. To tackle this problem, we introduce a meta-learning-powered test-time optimization framework that contains two synergistic networks: (i) a meta-learner that learns how to transfer optimization knowledge across samples and produces sample-specific initial parameters, and (ii) an optimization network that performs fast parameter fine-tuning. Extensive experiments show that the proposed method preserves computational efficiency while significantly improving reconstruction accuracy under cross-domain scenarios.
Biography: Yongwei Nie is currently an Associate Professor and Ph.D. supervisor at South China University of Technology, and an Associate Editor of The Visual Computer. He has presided over more than ten projects including those funded by the National Natural Science Foundation of China, Guangdong Provincial Natural Science Foundation, and Guangzhou Science and Technology Plan. He received the Second Prize of Hubei Provincial Natural Science Award, the First Prize of Hubei Provincial Excellent Academic Paper Award, and the Best Paper Award at CCF CAD/CG 2022. He has published three papers in top-tier computer graphics journals (ACM Transactions on Graphics) and conferences (SIGGRAPH, SIGGRAPH Asia), and over 30 papers in top journals such as IJCV, TVCG, and TIP, and top conferences including CVPR, ICCV, NeurIPS, and ICLR, among which 20+ are first-authored or corresponding-authored CCF-A papers. He has filed 20 patent applications and obtained 13 invention patents. His research spans computer graphics and computer vision.
Outstanding Doctoral Students Forum

Advancing Open World 3D Point Cloud Understanding with Uncertainty and Structural Awareness
Speaker: Jinfeng Xu
Abstract:3D point cloud understanding is essential for applications such as robotics, autonomous driving, and immersive environments. However, most existing methods operate under the closed-set assumption, which restricts their ability to generalize in real-world scenarios filled with unknown objects and evolving contexts. To address this limitation, our research explores open world 3D point cloud understanding, aiming to recognize known classes while effectively discovering and adapting to unknown categories.
At the object level, we introduce a saliency-aware structural perception approach that decomposes objects into salient and non-salient parts. This structural separation not only strengthens the representation of known categories but also enables the synthesis of pseudo-unknowns, thereby enhancing open set recognition. Beyond individual objects, at the scene level we propose a probability-driven framework that leverages uncertainty estimation to uncover novel geometric patterns in large-scale environments and incorporates incremental knowledge distillation to continuously assimilate new classes while mitigating catastrophic forgetting.
Although designed at different granularities, both approaches share the common goal of moving beyond the closed-set assumption. Together, they demonstrate that combining structural perception at the object level with probabilistic modeling at the scene level provides a robust pathway for advancing open world 3D point cloud understanding. Extensive experiments on benchmarks including ShapeNet, ModelNet, S3DIS, and ScanNet validate the effectiveness of this line of research, highlighting its potential for building resilient 3D perception systems in open environments.
Biography:Jinfeng Xu is currently pursuing a Ph.D. degree in Computer Science at Huazhong University of Science and Technology (HUST). His research interests lie in 3D vision, scene understanding, and open world learning. He has published as first author in top-tier conferences, including AAAI and CVPR (two papers), and has also co-authored papers in prestigious journals and conferences such as ACM Transactions on Graphics (TOG), IEEE Transactions on Visualization and Computer Graphics (TVCG), and ACM Multimedia (MM).

Representation and Low-Level Vision for Multi-/Hyperspectral Remote Sensing
Speaker: Wuzhou Quan
Abstract:Multi- and hyperspectral remote sensing imagery provides Earth observation with an extremely rich array of spectral dimensions and spatial detail. However, their high dimensionality and cross-modal nature also present fundamental challenges for representation learning and low-level vision: on the one hand, how to establish consistent representations between spectral and spatial representations to avoid distortion and bias; on the other hand, how to achieve robust and universal understanding given the prevalence of heterogeneous structures and uncertainty. Focusing on this core issue, this paper explores a holistic approach from representation to learning mechanisms, emphasizing the integration of information in high-dimensional vision, the resolution of heterogeneity, and the introduction of cognitive-driven uncertainty modeling, thereby advancing multi- and hyperspectral remote sensing towards more reliable and intelligent approaches.
Biography:Quan Wuzhou is currently a ph.D. student at the School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics. His research focuses on pattern recognition, computer vision, remote sensing imagery, and infrared image processing. His work has been published in journals such as TGRS and TOMM. His recent research focuses on the representation of multispectral and hyperspectral data, as well as low-level vision tasks such as pan-sharpening and classification, aiming to improve the effectiveness and reliability of optical remote sensing image analysis.

Research on Empowering Technologies of Remote Sensing Image Visual-Language Models for Smart Cities
Speaker: Yang Zhigang
Abstract:In the context of smart city construction, remote sensing images, as important geospatial data sources, play a significant role in urban planning and management through their efficient interpretation and application. This research proposes a multi-task integrated remote sensing image visual-language model, which aims to enhance the intelligence level of remote sensing images in urban applications by integrating visual and language information. The research covers the following key tasks: 1. Pixel-level tasks: This research proposes a remote sensing image segmentation method for urban element segmentation (including roads, buildings, traffic targets, etc.) and introduces a directional segmentation task. Guided by textual prompt information, this method can achieve fine segmentation of specific targets in remote sensing images, meeting the needs for personalized target identification and localization. 2. Semantic-level tasks: This research designs semantic description generation and visual question answering tasks for remote sensing images. The semantic description generation task uses natural language generation technology to automatically generate accurate descriptions of remote sensing image content. These two tasks, starting from the perspectives of generation and reasoning, comprehensively improve the model's semantic understanding ability of remote sensing images and provide a new technical approach for intelligent interpretation of remote sensing data. 3. Temporal tasks: This research further expands the application scope of the model. Through temporal analysis technology, it unifies various change detection tasks, providing strong support for dynamic urban monitoring and management.
Biography: Zhigang Yang obtained his master's degree in Computer Science and Technology (Big Data School) from Taiyuan University of Technology in 2023. His research interests mainly focus on the field of remote sensing image visual-language models. His research results have been published in international journals such as IEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS) and IEEE Geoscience and Remote Sensing Letters (IEEE GRSL). In addition, he serves as a reviewer for IEEE TGRS, JSTARS, GRSL, International Journal of Digital Earth, and the AAAI conference.

Video temporal understanding in complex scenes: detection, retrieval, grounding..
Speaker: Min Yang - PHD Candidate, Multimedia Computing Group, Nanjing University
Abstract: How to comprehend the temporal sequence in videos has always been a crucial issue in the field of video understanding. With the further development of online media and video recording equipment, video scenes have become more diverse, and users are constantly emerging with new demands. Video temporal understanding models are evolving towards higher efficiency, faster processing, and the ability to comprehend more complex scenes and tasks. Here, we focus on three fundamental tasks in video temporal understanding: detection, retrieval, and grounding. I will introduce the related work within our group. The work presented here covers small models deployed on the mobile device to large multimodal models with real-time response. We have put a lot of effort into efficiently deploying practical end-to-end video temporal understanding models, hoping to inspire further exploration in video temporal understanding in subsequent work.
Biography: Min Yang is a Ph.D. candidate in the Multimedia Computing Group at Nanjing University. He received his B.Eng. degree from the School of Software, Jilin University in 2020, and subsequently entered the Ph.D. program at Nanjing University under the supervision of Prof. Limin Wang. His research interests include temporal action detection, video retrieval, and multimodal large models. He has published several papers at leading computer vision conferences (CVPR, ICCV) and has served as a reviewer for multiple top-tier conferences and journals.

Research on Geometry-Prompt-Driven Adaptive Segmentation with SAM
Speaker: Xueyu Liu
Abstract: Image segmentation tasks face significant challenges in practical applications due to their reliance on large-scale pixel-level annotations, leading to high labeling costs, modality diversity, and limited generalization capability. The Segment Anything Model (SAM) simplifies the segmentation process through geometric prompts, yet its interactive mode constrains automated deployment. Moreover, generating effective prompts to enhance segmentation accuracy remains a critical issue. To address this, this study proposes a geometry-prompt-driven adaptive segmentation method based on SAM, introducing two optimization strategies: (1) a cyclic dual-space prompt engineering approach that jointly optimizes prompt points in the physical and feature spaces; and (2) a dual-space prompt engineering approach with automated optimization of heterogeneous graph structures, which organizes prompt points by integrating physical and feature relationships. The proposed method enables automatic generation of high-quality prompts, reduces manual intervention, and improves both the performance and adaptability of SAM in segmentation tasks.
Biography: Xueyu Liu is currently a Tenure-track Associate Professor and Master’s Supervisor at the College of Artificial Intelligence, Taiyuan University of Technology, China. He has published more than ten papers in SCI-indexed journals and prestigious conferences such as CVPR and MedIA. He is a member of the Institute of Electrical and Electronics Engineers (IEEE), the China Computer Federation (CCF), the China Society of Image and Graphics (CSIG), and the Chinese Association for Artificial Intelligence (CAAI). Xueyu Liu’s research focuses on computer vision and medical data analytics. His interests include foundation models, meta-learning, weakly supervised learning, and few-shot learning in computer vision, as well as pathological image analysis and multimodal data analysis in medical data analytics.