![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhTeZf54i3k8aYZjXlj8yDFNBvTcxRQviaBygAgYeV1Vs18RhQkgnNuDex-sG9XPTZUQBqe2grIptkUM38hYfls5NZx_cRhZ01O8hAei_WNkpUi7Z5nCEUdCnOwvZhLvXWow5bP1ogvhGdYRmXA0lhPWWRD4nZosfpnFbaXyiIbkDcVrMLezQUdj3tojpPd/w640-h276-rw/articlescreenshotopenaimodellimits.png)
Interesting…
It’s always good to know the model limitations. Could be a time saver. For example, I didn’t know 20mb was the size limit.
(Source: https://help.openai.com/en/articles/8400551-image-inputs-for-chatgpt-faq#h_69ea507a37)
What limitations should users be aware of when using ChatGPT with Image Inputs?
If you're using ChatGPT's new image input feature, it's important to be aware of these limitations:
Medical: The model is not suitable for interpreting specialized medical images like CT scans and shouldn't be used for medical advice.
Non-English: The model does not perform as well handling images with text of non-Latin alphabets, such as Japanese or Korean.
Big text: Enlarge text within the image to improve readability, but avoid cropping important details.
Rotation: The model may misinterpret rotated / upside-down text or images.
Visual elements: The model may struggle to understand graphs or text where colors or styles like solid, dashed, or dotted lines vary.
Spatial: The model struggles with tasks requiring precise spatial localization, such as identifying chess positions.
Accuracy: The model may generate incorrect descriptions or captions in certain scenarios.
Shape: The model struggles with panoramic and fisheye images.
Metadata and resizing: The model doesn't process original file names or metadata, and images are resized before analysis, affecting their original dimensions.
Counting: May give approximate counts for objects in images.
(Source: https://help.openai.com/en/articles/8400551-image-inputs-for-chatgpt-faq#h_69ea507a37)
What limitations should users be aware of when using ChatGPT with Image Inputs?
If you're using ChatGPT's new image input feature, it's important to be aware of these limitations:
Medical: The model is not suitable for interpreting specialized medical images like CT scans and shouldn't be used for medical advice.
Non-English: The model does not perform as well handling images with text of non-Latin alphabets, such as Japanese or Korean.
Big text: Enlarge text within the image to improve readability, but avoid cropping important details.
Rotation: The model may misinterpret rotated / upside-down text or images.
Visual elements: The model may struggle to understand graphs or text where colors or styles like solid, dashed, or dotted lines vary.
Spatial: The model struggles with tasks requiring precise spatial localization, such as identifying chess positions.
Accuracy: The model may generate incorrect descriptions or captions in certain scenarios.
Shape: The model struggles with panoramic and fisheye images.
Metadata and resizing: The model doesn't process original file names or metadata, and images are resized before analysis, affecting their original dimensions.
Counting: May give approximate counts for objects in images.
---
Update:
The day after I posted this, ChatGPT-4o came out.