Modula VLM - Search News

‘Make me a chair’: robotic assembly system responds to text requests

US researchers have developed an AI-driven robotic assembly system that builds physical objects requested as text ...

VLM Detections - ROS 2 Vision-Language Perception Package

A modular ROS 2 package for vision-language model (VLM) based perception, supporting zero-shot object detection, entity property recognition (gestures, emotions), and relation detection.

KrASIA

CowaRobot finds its first real market for embodied robots in sanitation

That is why CowaRobot chose sanitation as its starting point. “Our approach is pragmatic,” said Liao Wenlong, the company’s ...

IEEE

VLM-CPL: Consensus Pseudo-Labels From Vision-Language Models for Annotation-Free Pathological Image Classification

Abstract: Classification of pathological images is the basis for automatic cancer diagnosis. Despite that deep learning methods have achieved remarkable performance, they heavily rely on labeled data, ...

IEEE

Vision-Language Modeling Meets Remote Sensing: Models, datasets, and perspectives

Abstract: Vision-language modeling (VLM) aims to bridge the information gap between images and natural language. Under the new paradigm of first pretraining on massive image-text pairs and then ...

GitHub

mil-tokyo/DEJIMA-VLM

A workspace to pretrain, finetune, and evaluate DEJIMA VLM. It includes simple environment tooling, training scripts, and evaluation runners for Japanese and general benchmarks. env/ env.sh # ...

Microsoft

CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation

The advancement of large Vision-Language-Action (VLA) models has significantly improved robotic manipulation in terms of language-guided task execution and generalization to unseen scenarios. While ...

Microsoft

VLM4D: Towards Spatiotemporal Awareness in Vision Language Models

Vision language models (VLMs) have shown remarkable capabilities in integrating linguistic and visual reasoning but remain fundamentally limited in understanding dynamic spatiotemporal interactions.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results