
Interesting…
It’s always good to know the model limitations. Could be a time saver. For example, I didn’t know 20mb was the size limit.
(Source: https://help.openai.com/en/articles/8400551-image-inputs-for-chatgpt-faq#h_69ea507a37)
What limitations should users be aware of when using ChatGPT with Image Inputs?
If you're using ChatGPT's new image input feature, it's important to be aware of these limitations:
Medical: The model is not suitable for interpreting specialized medical images like CT scans and shouldn't be used for medical advice.
Non-English: The model does not perform as well handling images with text of non-Latin alphabets, such as Japanese or Korean.
Big text: Enlarge text within the image to improve readability, but avoid cropping important details.
Rotation: The model may misinterpret rotated / upside-down text or images.
Visual elements: The model may struggle to understand graphs or text where colors or styles like solid, dashed, or dotted lines vary.
Spatial: The model struggles with tasks requiring precise spatial localization, such as identifying chess positions.
Accuracy: The model may generate incorrect descriptions or captions in certain scenarios.
Shape: The model struggles with panoramic and fisheye images.
Metadata and resizing: The model doesn't process original file names or metadata, and images are resized before analysis, affecting their original dimensions.
Counting: May give approximate counts for objects in images.
(Source: https://help.openai.com/en/articles/8400551-image-inputs-for-chatgpt-faq#h_69ea507a37)
What limitations should users be aware of when using ChatGPT with Image Inputs?
If you're using ChatGPT's new image input feature, it's important to be aware of these limitations:
Medical: The model is not suitable for interpreting specialized medical images like CT scans and shouldn't be used for medical advice.
Non-English: The model does not perform as well handling images with text of non-Latin alphabets, such as Japanese or Korean.
Big text: Enlarge text within the image to improve readability, but avoid cropping important details.
Rotation: The model may misinterpret rotated / upside-down text or images.
Visual elements: The model may struggle to understand graphs or text where colors or styles like solid, dashed, or dotted lines vary.
Spatial: The model struggles with tasks requiring precise spatial localization, such as identifying chess positions.
Accuracy: The model may generate incorrect descriptions or captions in certain scenarios.
Shape: The model struggles with panoramic and fisheye images.
Metadata and resizing: The model doesn't process original file names or metadata, and images are resized before analysis, affecting their original dimensions.
Counting: May give approximate counts for objects in images.
---
Update:
The day after I posted this, ChatGPT-4o came out.