Cyberworlds 2025 Forum
Youth Scholars Forum
3D Measurement Data Processing, Analysis, and Generation
Speaker: Honghua Chen
Abstract: With the rapid development of 3D sensing and imaging technologies, the acquisition, processing, and understanding of high-quality 3D data are becoming fundamental to fields such as intelligent manufacturing, robotic perception, and digital twins. This report focuses on the theme of "Research on 3D Geometric Learning in Practical Applications" and systematically presents our progress and achievements in 3D measurement data processing, analysis, and generation. Specifically, the report will cover topics including 3D data denoising, registration, completion and reconstruction, feature computation, as well as 3D editing and generation methods incorporating priors from pre-trained models. In addition, the report will explore future research directions that integrate 3D geometry with physical motion attributes, aiming to achieve unified perception and generation of object shape, physical properties, and dynamic behaviors in complex scenes.
Biography: Honghua Chen is a Research Assistant Professor at Lingnan University, Hong Kong. He received his Ph.D. degree from Nanjing University of Aeronautics and Astronautics in 2022. From December 2020 to April 2022, he was a Research Assistant at The Chinese University of Hong Kong. From January 2023 to July 2025, he worked as a Research Fellow at MMLab@NTU and S-Lab, Nanyang Technological University, Singapore. His main research interests lie in computer-aided design and intelligent manufacturing, with a focus on high-precision 3D measurement data processing, analysis, generation, and assembly quality control. He has published multiple papers in top-tier international conferences and journals (SIGGRAPH & SIGGRAPH Asia, CVPR, ICCV, TPAMI, IJCV, TVCG, CAD, etc.) and holds more than ten granted national invention patents. He was awarded the CCF-CAD&CG Doctoral Dissertation Incentive Award (2024) and the First Prize of the China Invention Association Invention and Innovation Award.
Towards Cognitive Trustworthiness: Mechanisms and Governance of Hallucinations in Large Language Models
Speaker: Xiang Chen
Abstract: Large Language Models (LLMs), while transformative, are hindered by "hallucinations"—factual errors and fabrications—that undermine their cognitive trustworthiness and limit applications in critical domains. This report dissects the underlying mechanisms of hallucination by drawing on interpretability research into "knowledge neurons" and "neural circuits." Based on this analysis, we introduce the governance framework structured around prevention (e.g., alignment via reinforcement learning), mitigation (e.g., retrieval-augmentation and decoding interventions), and measurement (e.g., standardized evaluation). The report concludes by outlining future research paths toward building more reliable and trustworthy LLMs.
Biography: Prof. Xiang Chen Professor, School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, China.
Outstanding Doctoral Students Forum

Advancing Open World 3D Point Cloud Understanding with Uncertainty and Structural Awareness
Speaker: Jinfeng Xu
Abstract:3D point cloud understanding is essential for applications such as robotics, autonomous driving, and immersive environments. However, most existing methods operate under the closed-set assumption, which restricts their ability to generalize in real-world scenarios filled with unknown objects and evolving contexts. To address this limitation, our research explores open world 3D point cloud understanding, aiming to recognize known classes while effectively discovering and adapting to unknown categories.
At the object level, we introduce a saliency-aware structural perception approach that decomposes objects into salient and non-salient parts. This structural separation not only strengthens the representation of known categories but also enables the synthesis of pseudo-unknowns, thereby enhancing open set recognition. Beyond individual objects, at the scene level we propose a probability-driven framework that leverages uncertainty estimation to uncover novel geometric patterns in large-scale environments and incorporates incremental knowledge distillation to continuously assimilate new classes while mitigating catastrophic forgetting.
Although designed at different granularities, both approaches share the common goal of moving beyond the closed-set assumption. Together, they demonstrate that combining structural perception at the object level with probabilistic modeling at the scene level provides a robust pathway for advancing open world 3D point cloud understanding. Extensive experiments on benchmarks including ShapeNet, ModelNet, S3DIS, and ScanNet validate the effectiveness of this line of research, highlighting its potential for building resilient 3D perception systems in open environments.
Biography:Jinfeng Xu is currently pursuing a Ph.D. degree in Computer Science at Huazhong University of Science and Technology (HUST). His research interests lie in 3D vision, scene understanding, and open world learning. He has published as first author in top-tier conferences, including AAAI and CVPR (two papers), and has also co-authored papers in prestigious journals and conferences such as ACM Transactions on Graphics (TOG), IEEE Transactions on Visualization and Computer Graphics (TVCG), and ACM Multimedia (MM).

Representation and Low-Level Vision for Multi-/Hyperspectral Remote Sensing
Speaker: Wuzhou Quan
Abstract:Multi- and hyperspectral remote sensing imagery provides Earth observation with an extremely rich array of spectral dimensions and spatial detail. However, their high dimensionality and cross-modal nature also present fundamental challenges for representation learning and low-level vision: on the one hand, how to establish consistent representations between spectral and spatial representations to avoid distortion and bias; on the other hand, how to achieve robust and universal understanding given the prevalence of heterogeneous structures and uncertainty. Focusing on this core issue, this paper explores a holistic approach from representation to learning mechanisms, emphasizing the integration of information in high-dimensional vision, the resolution of heterogeneity, and the introduction of cognitive-driven uncertainty modeling, thereby advancing multi- and hyperspectral remote sensing towards more reliable and intelligent approaches.
Biography:Quan Wuzhou is currently a ph.D. student at the School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics. His research focuses on pattern recognition, computer vision, remote sensing imagery, and infrared image processing. His work has been published in journals such as TGRS and TOMM. His recent research focuses on the representation of multispectral and hyperspectral data, as well as low-level vision tasks such as pan-sharpening and classification, aiming to improve the effectiveness and reliability of optical remote sensing image analysis.

Research on Empowering Technologies of Remote Sensing Image Visual-Language Models for Smart Cities
Speaker: Yang Zhigang
Abstract:In the context of smart city construction, remote sensing images, as important geospatial data sources, play a significant role in urban planning and management through their efficient interpretation and application. This research proposes a multi-task integrated remote sensing image visual-language model, which aims to enhance the intelligence level of remote sensing images in urban applications by integrating visual and language information. The research covers the following key tasks: 1. Pixel-level tasks: This research proposes a remote sensing image segmentation method for urban element segmentation (including roads, buildings, traffic targets, etc.) and introduces a directional segmentation task. Guided by textual prompt information, this method can achieve fine segmentation of specific targets in remote sensing images, meeting the needs for personalized target identification and localization. 2. Semantic-level tasks: This research designs semantic description generation and visual question answering tasks for remote sensing images. The semantic description generation task uses natural language generation technology to automatically generate accurate descriptions of remote sensing image content. These two tasks, starting from the perspectives of generation and reasoning, comprehensively improve the model's semantic understanding ability of remote sensing images and provide a new technical approach for intelligent interpretation of remote sensing data. 3. Temporal tasks: This research further expands the application scope of the model. Through temporal analysis technology, it unifies various change detection tasks, providing strong support for dynamic urban monitoring and management.
Biography: Zhang Wei is a third-year Ph.D. student at the School of Information Science and Technology, Zhejiang University, advised by Professor Liu Yang. His research interests include computer vision, 3D deep learning, and scene understanding. He has published papers at CVPR and ICCV, and was awarded the Microsoft Research Asia Fellowship in 2024.

Video temporal understanding in complex scenes: detection, retrieval, grounding..
Speaker: Min Yang - PHD Candidate, Multimedia Computing Group, Nanjing University
Abstract: How to comprehend the temporal sequence in videos has always been a crucial issue in the field of video understanding. With the further development of online media and video recording equipment, video scenes have become more diverse, and users are constantly emerging with new demands. Video temporal understanding models are evolving towards higher efficiency, faster processing, and the ability to comprehend more complex scenes and tasks. Here, we focus on three fundamental tasks in video temporal understanding: detection, retrieval, and grounding. I will introduce the related work within our group. The work presented here covers small models deployed on the mobile device to large multimodal models with real-time response. We have put a lot of effort into efficiently deploying practical end-to-end video temporal understanding models, hoping to inspire further exploration in video temporal understanding in subsequent work.
Biography: Min Yang is a Ph.D. candidate in the Multimedia Computing Group at Nanjing University. He received his B.Eng. degree from the School of Software, Jilin University in 2020, and subsequently entered the Ph.D. program at Nanjing University under the supervision of Prof. Limin Wang. His research interests include temporal action detection, video retrieval, and multimodal large models. He has published several papers at leading computer vision conferences (CVPR, ICCV) and has served as a reviewer for multiple top-tier conferences and journals.

Research on Geometry-Prompt-Driven Adaptive Segmentation with SAM
Speaker: Xueyu Liu
Abstract: Image segmentation tasks face significant challenges in practical applications due to their reliance on large-scale pixel-level annotations, leading to high labeling costs, modality diversity, and limited generalization capability. The Segment Anything Model (SAM) simplifies the segmentation process through geometric prompts, yet its interactive mode constrains automated deployment. Moreover, generating effective prompts to enhance segmentation accuracy remains a critical issue. To address this, this study proposes a geometry-prompt-driven adaptive segmentation method based on SAM, introducing two optimization strategies: (1) a cyclic dual-space prompt engineering approach that jointly optimizes prompt points in the physical and feature spaces; and (2) a dual-space prompt engineering approach with automated optimization of heterogeneous graph structures, which organizes prompt points by integrating physical and feature relationships. The proposed method enables automatic generation of high-quality prompts, reduces manual intervention, and improves both the performance and adaptability of SAM in segmentation tasks.
Biography: Xueyu Liu is currently a Tenure-track Associate Professor and Master’s Supervisor at the College of Artificial Intelligence, Taiyuan University of Technology, China. He has published more than ten papers in SCI-indexed journals and prestigious conferences such as CVPR and MedIA. He is a member of the Institute of Electrical and Electronics Engineers (IEEE), the China Computer Federation (CCF), the China Society of Image and Graphics (CSIG), and the Chinese Association for Artificial Intelligence (CAAI). Xueyu Liu’s research focuses on computer vision and medical data analytics. His interests include foundation models, meta-learning, weakly supervised learning, and few-shot learning in computer vision, as well as pathological image analysis and multimodal data analysis in medical data analytics.