CogAgent: A Visual Language Model for GUI Agents
CVPR 2024(2024)
Key words
Graphical User Interface,Language Model,Visual Model,High-resolution Images,Low-resolution Images,Image Encoder,Visual Question Answering,Image Features,Natural Language,Multi-agent,Sequence Of Actions,Visual Features,Original Structure,Bounding Box,Web Page,Model Architecture,Natural Images,Description Task,Residual Connection,Optical Character Recognition,Pre-training Data,Text Sequence,Hidden Size,Floating-point Operations,Decoder Layer,Domain Generalization,Side Of The Image,Data Augmentation
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined