arXiv:10.48551/arXiv.2602.21157

HALO, a unified VLA model that enables embodied multimodal chain-of-thought (EM-CoT) reasoning through a sequential process of textual task reasoning, visual subgoal prediction for fine-grained guidan

xeuron.com/p/halo-a-unified-vla-model·u/george16152·DOI·Source

AI Summary

HALO, a unified VLA model that enables embodied multimodal chain-of-thought (EM-CoT) reasoning through a sequential process of textual task reasoning, visual subgoal prediction for fine-grained guidance, and EM-CoT-augmented action prediction. We instantiate HALO with a Mixture-of-Transformers (MoT) architecture that decouples semantic reasoning, visual foresight, and action prediction into specialized experts while allowing seamless cross-expert collaboration.

AI Metadata Extraction

Extract authors, key findings, references, and an executive summary using AI.

No extraction yet

Attach a PDF to this publication first, or provide a PDF URL when extracting.