You are sitting in an exit row. You casually look at the emergency guide, and it is a combination of images and text. Your brain naturally combines them and presents you with a complete picture of the intended message — open the door in the unlikely event of an emergency.
As humans, this ability to correlate comes instinctively to us, but for a minute, think about how a computer sees the same document. An OCR (optical character recognition) system reads the text. An image recognition model scans the image. Then, there is a third system that correlates the image and text to understand the complete picture.