ChatVTG: Video Temporal Grounding Via Chat with Video Dialogue Large Language Models
Computer Vision and Pattern Recognition(2024)
Key words
Language Model,Large Language Models,Temporal Grounding,Natural Language,Training Data,Similarity Score,Visual Features,Intersection Over Union,Visual Signals,Video Clips,Similarity Matrix,Need For Training,Level Of Granularity,Extensive Dataset,Video Content,Textual Features,Video Segments,Multimodal Model,Final Moments,Sliding Window,Number Of Clips,Video Captioning,Conduct Ablation Experiments,Video Understanding,Understanding Of Content,Maximum Similarity,End Time,Feature Space,Semantic Features
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined