Size and Model Limitations for Image Inputs for ChatGPT


It’s always good to know the model limitations. Could be a time saver. For example, I didn’t know 20mb was the size limit.


What limitations should users be aware of when using ChatGPT with Image Inputs?

If you're using ChatGPT's new image input feature, it's important to be aware of these limitations:

Medical: The model is not suitable for interpreting specialized medical images like CT scans and shouldn't be used for medical advice.

Non-English: The model does not perform as well handling images with text of non-Latin alphabets, such as Japanese or Korean.

Big text: Enlarge text within the image to improve readability, but avoid cropping important details.

Rotation: The model may misinterpret rotated / upside-down text or images.

Visual elements: The model may struggle to understand graphs or text where colors or styles like solid, dashed, or dotted lines vary.

Spatial: The model struggles with tasks requiring precise spatial localization, such as identifying chess positions.

Accuracy: The model may generate incorrect descriptions or captions in certain scenarios.

Shape: The model struggles with panoramic and fisheye images.

Metadata and resizing: The model doesn't process original file names or metadata, and images are resized before analysis, affecting their original dimensions.

Counting: May give approximate counts for objects in images.


The day after I posted this, ChatGPT-4o came out.