In October of 2023, Open Ai announced the availability of Image generation using an Ai tool called DALL-E (Version 3). This update was unique as Chat GPT 4 did not have the ability of image output, it could only process images as input (multimodal) and provide the user with textual content as feedback.
In this article, you will learn the difference in how GPT-4 and Bing process prompts and generate image using DALL-E 3 image generation technology.
How do Chat GPT 4 and Bing Chat work with DALL-E 3?
GPT-4 and DALL-E
GPT-4 and DALL-E 3 leverage advanced neural network architectures for generating images, but they operate differently.
GPT-4 utilizes vast amounts of text data to understand context and generate relevant images when coupled with DALLE-3, an image generation module. The GPT-4 setup employs a Transformer architecture, enhancing its ability to capture long-range dependencies in data; meaning, more context is captured when prompting.
On the other hand, DALL-E 3 is built to generate images from textual descriptions directly. It is an evolution of the original DALL-E, which also used a Transformer architecture, but DALL-E 3 introduces more sophisticated techniques to better correlate text with visual content, thus creating more accurate and contextually relevant images based on the provided textual descriptions.
GPT-4 has an additional layer of rendering for the prompt where it adjusts the prompt by adding more context, and details to the original request.
Bing Chat and DALL-E
Bing Chat uses GPT-3.5 and GPT-4 for language processing purposes. Bing Chat has access to the net (via Bing) and will conduct image search while processing the prompt for image generation.
The internet access aspect makes Chat Bing unique compared to GPT-4 which cannot have both image generation and access to the web.
What makes this combo unique compared to other image generators?
Most image generators use the prompt placed by the user to generate an image against existing datasets. Image generators also rely on transformer architecture to understand context and match it with the appropriate output. However, if the user is not well-versed in prompting, the usually end up with images that are poor in quality.
Chat GPT-4 overrides your PROMPT
After you have prompted GPT-4 to create an image for you, it would process the prompt and adjust it for better image quality, and policy compliance (and diversity and inclusivity):
GPT-4 may add more context, but also re-write parts of your prompt and generate an image based on these corrections:
More often than not, these corrections are not exactly helpful, but instead cause more work to backtrack and figure out items to remove:
Comparison between GPT-4 and Bing Chat for image generation using DALL-E 3
Let’s compare the same prompt in both GPT-4, and Bing Chat – which rely on the exact same technology and both work with DALL-E:
Bing Chat + DALL-E 3
Here is the prompt and results:
Chat Bing offers a few options to add to your images to make it more context-rich. This is a really nice
From a marketing stand point, these images look really good:
Here is a prompt you can use for a product you are selling to a specific audience segment and with a particular brand specification you’d like to depict in your creatives:
And here are the results:
Looks pretty good.
However, keep in mind that some sensitive prompts (especially with racial and sexual context) won’t be produced by Bing Chat:
So make sure to work around it and adjust your prompts.
Chat GPT-4 + DALL-E 3
We get different results in GPT-4 as it re-writes some of the prompt. Here we can compare the same prompts and check the outputs:
For example, it did the illustration override on its own. This is a feature special to GPT-4 integration with DALL-3:
It also has added contextual nuances that were not previously mentioned:
In the example above, the addition is typically inclusive and focuses on ensuring diversity in output.
At times, this override can be troublesome when you are targeting a specific market segment, where inclusivity and diversity would not be applicable.
The following is the same prompt used for a product marketing Advertisement:
Again, we see the same phenomena, GPT-4 adds to the prompt and adjusts it on its own:
Both images were illustrations. It seems GPT-4 struggles with policies and regulations around the use of racial content. Only 2 images were created, and both were illustrations.
Which is better for image generation: Chat Bing or GPT-4?
Both tools use DALL-3 as the image gen base and the quality is the same. But there were subtle differences:
PROS of using Chat Bing
- Chat Bing can meet your prompt more accurately as it doesn’t override your text
- Chat Bing successfully produced images for a sensitive prompt (when adjusted)
PROS of GPT-4
- GPT-4 adds context automatically
- GPT-4 finetunes the prompt using useful details
- GPT-4 adjusts the prompt when the initial input is sensitive, so it still generate an image although with partial efficiency