OpenAI has introduced new image generation capabilities to ChatGPT. Sam Altman, CEO of OpenAI, announced in a livestream Tuesday that ChatGPT’s image generation capabilities have been significantly improved with the help of the GPT-4o model. Now, it can depict text within images, create cartoons, and create transparent images.
GPT-4o replaces the existing DALL-E model with a new image generation function, and can edit images as well as create them. The text insertion function has been improved, so that images with more accurate text can be created. According to OpenAI, the new model was developed by learning from publicly available data and data provided by partners such as Shutterstock.
GPT-4o performs more operations during the image generation process, producing more precise and realistic results than existing models. It can also generate conceptual images, such as bicycles with square wheels, which existing models could not achieve. It also provides editing functions, such as modifying parts of the image or changing the background, according to the user's request.
Existing AI image generation models had low accuracy when inserting text and required editing, but GPT-4o can insert accurate text into images, so it can be used immediately without additional modification. In the example released by OpenAI, an image was created that included a diagram explaining the spectral principle, and when the user requested to change the viewpoint and background, a modified image was created accordingly.
“This model has the potential to revolutionize education,” said Gabriel Go, a research scientist at OpenAI.
Currently available to both free and paid users on ChatGPT, with developer access via API coming soon. However, there are some limitations, such as cropping, multilingual text rendering, and high-density information representation, and inappropriate content creation is blocked for security reasons.







