Visual Basic Video Module

Violent Video Recognition Based on Global-Local Visual and Audio Contrastive Learning

Abstract: The aim of the violent recognition task is to determine whether a video contains violent behaviors. Given that violent behavior often comes with visual and audio anomalies, multimodal ...

IEEE

Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach

Abstract: Estimating the camera’s pose given images from a single camera is a traditional task in mobile robots and autonomous vehicles. This problem is called monocular visual odometry and often ...

GitHub

AVF-MAE++ : Scaling Affective Video Facial Masked Autoencoders via Efficient Audio-Visual Self-Supervised Learning

Abstract: Affective Video Facial Analysis (AVFA) is important for advancing emotion-aware AI, yet the persistent data scarcity in AVFA presents challenges. Recently, the self-supervised learning (SSL) ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Violent Video Recognition Based on Global-Local Visual and Audio Contrastive Learning

Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach

AVF-MAE++ : Scaling Affective Video Facial Masked Autoencoders via Efficient Audio-Visual Self-Supervised Learning

Trending now