We introduce the Progressive Visual Token Compression (PVC) in large vision-language models (VLMs), which unifies the visual inputs as videos and progressively compresses vision tokens across video ...
Abstract: The rich spectral information within hyperspectral images (HSIs) results in large data volumes. Thus finding a compact representation for HSIs while maintaining reconstruction quality is a ...
Abstract: Large Vision-Language Models (VLMs) have been extended to understand both images and videos. Visual token compression is leveraged to reduce the considerable token length of visual inputs.