We introduce the Progressive Visual Token Compression (PVC) in large vision-language models (VLMs), which unifies the visual inputs as videos and progressively compresses vision tokens across video ...
Abstract: The rich spectral information within hyperspectral images (HSIs) results in large data volumes. Thus finding a compact representation for HSIs while maintaining reconstruction quality is a ...
Abstract: Large Vision-Language Models (VLMs) have been extended to understand both images and videos. Visual token compression is leveraged to reduce the considerable token length of visual inputs.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results