Gpt Vision/Midjourney UI Test: Render, Describe Using Coordinates, Export HTML


Takeaway: I think understanding AI capabilites in workflows aren't really a choice at this point. Even when they are inefficient and hallucinate, and not effective, it's good to know that as well. 

4/12/24

I did this all on my phone. It was an exploration to test Midjourney's ability to render UI images from a basic text prompt, and GPT's ability to describe the layout using a coordinate system, then export the code for the layout, possibly with svg for images.

I wrote a quick prompt (just something basic with a 9:32 aspect ratio):

"Website ui for a digital design portfolio of a ui designer, showcasing design theory, color theory, typography, ui and ux components, beautiful website --ar 9:32 --style raw --stylize 0"

and rendered it Midjourney.

Then had Gpt Vision analyze it a little, to quickly see how they work together, what they come up with with little coaching, like getting a baseline for their defaults.


Hey GPT, describe.

---

Gpt:

The image appears to be a compilation of website design templates or a portfolio presentation layout. It features a mix of geometric shapes, minimalist furniture images, and design mockups. The color scheme includes coral, teal, black, and cream. Different sections of the design showcase text overlays, navigation menus, and image placeholders, suggesting a sophisticated, modern aesthetic for web design. The layout is organized in a way that emphasizes clean lines and a mix of flat and depth elements, potentially aimed at showcasing design work for a portfolio or a design firm.

---

Me: Yes its an example of a ui design as rendered in Midjourney.

---

Gpt:


Understood, the image is of a portfolio website design, which highlights a sleek and contemporary approach to presenting work. It appears to use a balance of typography, color, and imagery to create a professional and visually engaging online showcase.

---

Me: Use a coordinate system and describe it.

---

Gpt:

In the image, we can visualize an abstract coordinate system laid over the portfolio design to describe the content placement and design elements. 

Imagine the x-axis (horizontal) running from left to right and the y-axis (vertical) running from top to bottom of the image.

Top Left (x: low, y: low): 
The header with navigation links.

Top Center (x: medium, y: low): 
A large banner area, possibly a slider with images and text.

Top Right (x: high, y: low): 
Social media or contact icons.

Mid Left (x: low, y: medium): 
Content sections with text and images, some with geometric background shapes.

Center (x: medium, y: medium): 
This area contains the main content blocks with descriptions and visuals in a staggered, grid-like pattern.

Mid Right (x: high, y: medium): 
Additional content blocks or images aligned to the right margin.

Bottom Left (x: low, y: high): 
Smaller text blocks or footer information.

Bottom Center (x: medium, y: high): 
Possibly testimonials or client logos.

Bottom Right (x: high, y: high): 
Call-to-action buttons or contact details.

Each section is designed with margins and padding in mind, creating a harmonious balance and guiding the viewer's eye through the content in a structured manner.

---

Me: Use pixel measurements to describe it.

---

Gpt:

Describing the design using pixel measurements assumes a standard orientation and resolution. 

However, without specific measurements for each element, I'll provide a relative description with hypothetical pixel values.

Header (0, 0 to 1024, 150): 

Stretches across the full width, with navigation and logo.

Main Banner (0, 150 to 1024, 600): 

Occupies a large, central part of the upper screen.

Service Blocks (0, 600 to 340, 900 each): 

Three columns, likely about a third of the width each, with services or portfolio items.

Content Sections (0, 900 to 1024, 1400): 

Various content sections that span the full width, with varying heights, possibly around 100-500 pixels each.

Footer (0, 1400 to 1024, 1536): 

Runs across the bottom of the design, less height than the banner, but stretches full width.

Without the actual design file, it's impossible to give accurate pixel dimensions, but this is a hypothetical framework to describe the layout. 

The full width is assumed to be 1024 pixels, which was a common width for web design, but today's designs are often fluid or responsive, adjusting to the screen width.

---

Me: Give me a labeled grid of the coordinate description

---

Gpt:




import matplotlib.pyplot as plt

# Let's assume the image dimensions are 1024px wide by 1536px high for the purpose of this grid.

# Create figure and axes
fig, ax = plt.subplots(figsize=(5, 15))

# Set the grid and its labels
plt.xticks([256, 512, 768], ['256px', '512px', '768px'])
plt.yticks([192, 384, 576, 768, 960, 1152, 1344, 1536], ['192px', '384px', '576px', '768px', '960px', '1152px', '1344px', '1536px'])

# Draw grid lines
ax.grid(which='both', color='black', linestyle='-', linewidth=1)

# Set axis limits to match the image size
ax.set_xlim(0, 1024)
ax.set_ylim(0, 1536)

# Label the major sections of the image based on the description.
# The labels are placed at the approximate centers of where the sections would be.

# Header
ax.text(512, 1440, 'Header', va='center', ha='center')

# Main Banner
ax.text(512, 1152, 'Main Banner', va='center', ha='center')

# Service Blocks (3 columns)
ax.text(170, 960, 'Service Block 1', va='center', ha='center')
ax.text(512, 960, 'Service Block 2', va='center', ha='center')
ax.text(854, 960, 'Service Block 3', va='center', ha='center')

# Content Sections
ax.text(512, 768, 'Content Section 1', va='center', ha='center')
ax.text(512, 576, 'Content Section 2', va='center', ha='center')
ax.text(512, 384, 'Content Section 3', va='center', ha='center')
ax.text(512, 192, 'Content Section 4', va='center', ha='center')

# Footer
ax.text(512, 48, 'Footer', va='center', ha='center')

# Hide the axes
ax.axis('off')

plt.gca().invert_yaxis() # Invert the y-axis to match the image coordinate system
plt.show()

———

Me: Generate the html use as many style elements as you can from the original portfolio image. Create content and flow it in to the html.

---

GPT:

Creating HTML with a similar look and feel to the original portfolio image involves using structural elements for layout and CSS for styling. 

Below is a simplified version of what such HTML might look like. 

The content is made up to fit the sections we discussed.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Portfolio</title>
<style>
  body { font-family: 'Arial', sans-serif; margin: 0; padding: 0; }
  .header { background-color: #ffe3e3; padding: 20px; text-align: center; }
  .banner { background-color: #ffcccc; padding: 50px; color: #333; text-align: center; }
  .service-block { display: flex; }
  .service { flex: 1; padding: 30px; text-align: center; background-color: #ffe3e3; }
  .service:nth-child(even) { background-color: #ffcccc; }
  .content-section { padding: 20px; background-color: #fff0f0; }
  .footer { background-color: #ffe3e3; padding: 10px; text-align: center; }
</style>
</head>
<body>

<div class="header">
  <h1>My Portfolio</h1>
  <nav>
    <a href="#services">Services</a> | 
    <a href="#about">About</a> | 
    <a href="#contact">Contact</a>
  </nav>
</div>

<div class="banner">
  <h2>Welcome to My World of Design</h2>
  <p>Creative solutions for every need.</p>
</div>

<div class="service-block" id="services">
  <div class="service">
    <h3>Web Design</h3>
    <p>Beautiful and responsive websites.</p>
  </div>
  <div class="service">
    <h3>Branding</h3>
    <p>Unique identities for businesses.</p>
  </div>
  <div class="service">
    <h3>Illustration</h3>
    <p>Custom artwork for your projects.</p>
  </div>
</div>

<div class="content-section" id="about">
  <h2>About Me</h2>
  <p>As a designer, I aim to create memorable experiences through my work.</p>
</div>

<div class="content-section" id="contact">
  <h2>Contact Me</h2>
  <p>Let's get in touch and talk about your next project.</p>
</div>

<div class="footer">
  <p>&copy; 2024 My Portfolio</p>
</div>

</body>
</html>

This code defines a basic structure and applies some styling that draws from the color scheme and layout of the portfolio design in the image. Of course, actual web development would involve more complex CSS for responsive design, interactivity with JavaScript, and possibly backend integration. This example is static HTML and would need to be enriched to meet modern web standards and functionalities.

---

Other Midjourney prompt results (the aspect ratio set to —ar 9:32 helps it look like a site example.)







Quick Takeaways:

This was a really fast test. Zero planning. Zero iteration. 

Coordinates should have been fine. It explained coordinates for other things but they were much simpler. This layout image was possibly too complex for GPT.  The order got reversed and it put the header at the footer and the footer at the header, but it did create blocks. 

I should repeat this in a few months to see if it can do it better without a better prompt or coaching.

It would be weird but interesting to see this method work in a polished way. I don't really know how that could happen. I think because it's subjective and based on goals.

I did see an auto gen GPT for websites that generated a site for you, and you then had to join their service. There's so many different ways of approaching this. 

I will likely keep testing  to see how close the GPT Vision method and prompt input can get to being pixel perfect. And, if it gets better, throw in responsive columns, mobile-first elements, try to get it to render clean usable inline no dependency code for portability, etc. That prompt and workflow is probably floating around out there. I don't think there's a choice really, I think understanding AI capabilites in workflows aren't really a choice but a requirement at this point even when they are innefficient and hallucinate, and not effective. It's good to know that as well. 

I've also tested Replit in the text to code area. That AI iteration method, rather than for example hand coding, can generate a lot of code and files to evaluate, and that can create some extra work itself. But, my goal was a full unedited useful coded page with layout. 

I want to see what it can do, and layer in more sophisticated things. 

It seems fitting to leave this here: