Size and Model Limitations for Image Inputs for ChatGPT

Interesting…

It’s always good to know the model limitations. Could be a time saver. For example, I didn’t know 20mb was the size limit.

(Source: https://help.openai.com/en/articles/8400551-image-inputs-for-chatgpt-faq#h_69ea507a37)

What limitations should users be aware of when using ChatGPT with Image Inputs?

If you're using ChatGPT's new image input feature, it's important to be aware of these limitations:

Medical: The model is not suitable for interpreting specialized medical images like CT scans and shouldn't be used for medical advice.

Non-English: The model does not perform as well handling images with text of non-Latin alphabets, such as Japanese or Korean.

Big text: Enlarge text within the image to improve readability, but avoid cropping important details.

Rotation: The model may misinterpret rotated / upside-down text or images.

Visual elements: The model may struggle to understand graphs or text where colors or styles like solid, dashed, or dotted lines vary.

Spatial: The model struggles with tasks requiring precise spatial localization, such as identifying chess positions.

Accuracy: The model may generate incorrect descriptions or captions in certain scenarios.

Shape: The model struggles with panoramic and fisheye images.

Metadata and resizing: The model doesn't process original file names or metadata, and images are resized before analysis, affecting their original dimensions.

Counting: May give approximate counts for objects in images.

---

Update:

The day after I posted this, ChatGPT-4o came out.

https://openai.com/index/hello-gpt-4o/

Exploring All the Things

Search This Blog

Size and Model Limitations for Image Inputs for ChatGPT