Posts

Showing posts from May, 2024

GPT-4o is Available

Image
  Time to dig in.. https://openai.com/index/hello-gpt-4o/ What does gpt4o enable? GPT-4o: GPT-4o (GPT-4 Optimized) enables several advanced features and capabilities: 1. **Improved Efficiency**: GPT-4o is designed to be more efficient in terms of computational resources, making it faster and more responsive. 2. **Enhanced Contextual Understanding**: It offers a deeper understanding of context, enabling more coherent and relevant responses, especially in extended conversations. 3. **Better Adaptability**: GPT-4o can adapt more effectively to various tasks and queries, providing accurate and contextually appropriate answers across different domains. 4. **Advanced Problem-Solving**: It enhances problem-solving abilities by leveraging improved reasoning and analytical skills, making it suitable for complex queries and tasks. 5. **Multimodal Capabilities**: GPT-4o may support multimodal inputs, such as text and images, allowing for richer interactions and more comprehensive responses. 6. **

Size and Model Limitations for Image Inputs for ChatGPT

Image
Interesting… It’s always good to know the model limitations. Could be a time saver. For example, I didn’t know 20mb was the size limit. (Source:  https://help.openai.com/en/articles/8400551-image-inputs-for-chatgpt-faq#h_69ea507a37 ) What limitations should users be aware of when using ChatGPT with Image Inputs? If you're using ChatGPT's new image input feature, it's important to be aware of these limitations: Medical: The model is not suitable for interpreting specialized medical images like CT scans and shouldn't be used for medical advice. Non-English: The model does not perform as well handling images with text of non-Latin alphabets, such as Japanese or Korean. Big text: Enlarge text within the image to improve readability, but avoid cropping important details. Rotation: The model may misinterpret rotated / upside-down text or images. Visual elements: The model may struggle to understand graphs or text where colors or styles like solid, dashed, or dotted lines vary

Singularity Summit 12 (as seen by Gemini and GPT 4 Vision)

Image
This just fell out of one of my books.  I remember going to the Singularity Summit in 2012. Peter Norvig showed examples of how Google could study images from YouTube. It learned there’s a thing called a cat, and it could output a grainy image of a cat. I was fascinated, but didn’t really have a frame of reference in 2012 to really know exactly what to do with that knowledge. Looking back, after seeing this presentation I could have created futurist visualizations to show in high fidelity what’s to come with AI imagery. I could have created that visualization back then, but I was focused on research for a book. At least now I can create that futurist visualization… ---- Describe the event. ---- Gemini: The Singularity Summit 2012 was a conference focused on cutting-edge technologies with the potential to fundamentally change humanity. Here's a breakdown of what we know about the event: Dates & Location:Held on October 13th and 14th, 2012[ Wikipedia entry on Singularity Summit ]

Google Maps is now AI

Image
Image created with AI. Been absorbed with all things Google Maps Platform ( https://mapsplatform.google.com/ ) AI and AR lately. As a creative, Geospatial Creator ( https://developers.google.com/ar/geospatialcreator ) is really interesting. You can, as a global brand for example, do so much already with AR and AI predictive analytics. It’s feasible to leave smart AR stores in any spatial location. It can have a sphere of influence where your phone can trigger an augmented experience if you enter the spatial awareness of a geotagged location. Concept Exploration  https://www.instagram.com/reel/C7-Wu15NTnJ/?igsh=MWRteW1sMzd1a2hyNQ== Leaving things at a location isn’t new, and AR shopping and try on, and purchasing aren’t new. But, it AI vision can now understand so much through the camera, that it can deliver an even more nuanced, personalized AR experience. It can even know about you when you are near it, and not directly in the sphere of influence, adding to a brand’s predictive ana

AR Animated Card

Image
  Scan the QR code with your phone's camera  or visit the link below on your phone to view.   https://adobeaero.app.link/rSis9vJtjJb   Here's a preview of the animation.

Tokens

Image
  In AI art generation, crafting effective prompts is key to achieving the desired results. This excerpt from a guide I created, "Transforming Art into AI Style Reference Models," dives into the concept of tokens, the building blocks that AI systems use to understand our prompts. Excerpt from Transforming Art into AI Style Reference Models: A CGS User Guide for Artists Tokens: The Building Blocks of Prompts Tokens are the Lego building blocks used by many AI systems to understand your text prompts.  They do this by breaking down prompt input text into the AI’s internal mathematical representation of natural language patterns and object relationships, acquired during it’s training process. Influence on Interpretation: Specifics and Sequence are Significant In AI systems in general, the total number of tokens in your prompt, as well as their sequence and the specific tokens used, significantly influence how the AI interprets your input.  This is because each token carries meani

Dalle System Card

Image
Image created with AI. Interesting reading:  https://cdn.openai.com/papers/DALL_E_3_System_Card.pdf --- Gemini Summary: DALL-E 3 System Card Summary This is a summary of the DALL-E 3 system card, a document that details the system's capabilities, potential risks, and mitigation strategies. Key Points: DALL-E 3 is an AI system that generates images from text descriptions. It builds on DALL-E 2 by improving image quality and fidelity to the caption. Potential risks include generating biased, racy, or violent content. OpenAI, the creator of DALL-E 3, has implemented various mitigation techniques to address these risks, such as: Data filtering to exclude explicit content from the training data. ChatGPT integration to refine prompts before sending them to DALL-E 3. Prompt transformations to ensure prompts comply with guidelines. Image output classifiers to identify and block inappropriate images. Additional Notes: The system card also details the development process, including learnings

Text to Instant Video

Image
Image created with AI. Huggingface always has interesting ML AI things to test and learn from. Here’s a quick video generator I tried out… https://huggingface.co/spaces/KingNish/Instant-Video Prompt: “Focus: A large coral reef underwater (Animate: Fish swimming around the coral, occasional sea turtle passing by) (Shot From Distance: A panoramic view that encompasses the vibrant underwater ecosystem)” Prompt: “Focus: A night sky filled with stars at a remote location (Animate: Shooting stars occasionally streaking across, constellations slowly rotating) (Shot From Distance: A broad view of the heavens that captures the majesty of the night sky)“