Photoelectric Encoder

Harnessing Frozen Unimodal Encoders for Flexible Multimodal Alignment

Abstract: Recent contrastive multimodal vision-language models like CLIP have demonstrated robust open-world semantic understanding, becoming the standard image backbones for vision-language ...

GitHub

Exploring the Potential of Encoder-free Architectures in 3D LMMs

Official repository for the paper "Exploring the Potential of Encoder-free Architectures in 3D LMMs". The encoder-free 3D LMM directly utilizes a token embedding module to convert point cloud data ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Harnessing Frozen Unimodal Encoders for Flexible Multimodal Alignment

Exploring the Potential of Encoder-free Architectures in 3D LMMs

Trending now