Size and Model Limitations for Image Inputs for ChatGPT


Interesting…

It’s always good to know the model limitations. Could be a time saver. For example, I didn’t know 20mb was the size limit.

(Source: https://help.openai.com/en/articles/8400551-image-inputs-for-chatgpt-faq#h_69ea507a37)

What limitations should users be aware of when using ChatGPT with Image Inputs?


If you're using ChatGPT's new image input feature, it's important to be aware of these limitations:


Medical: The model is not suitable for interpreting specialized medical images like CT scans and shouldn't be used for medical advice.


Non-English: The model does not perform as well handling images with text of non-Latin alphabets, such as Japanese or Korean.


Big text: Enlarge text within the image to improve readability, but avoid cropping important details.


Rotation: The model may misinterpret rotated / upside-down text or images.


Visual elements: The model may struggle to understand graphs or text where colors or styles like solid, dashed, or dotted lines vary.


Spatial: The model struggles with tasks requiring precise spatial localization, such as identifying chess positions.


Accuracy: The model may generate incorrect descriptions or captions in certain scenarios.


Shape: The model struggles with panoramic and fisheye images.


Metadata and resizing: The model doesn't process original file names or metadata, and images are resized before analysis, affecting their original dimensions.


Counting: May give approximate counts for objects in images.

---
Update: 

The day after I posted this, ChatGPT-4o came out.