Table of Contents for What is Stable Diffusion and How Does it Work...:
- What is Stable Diffusion?
- Step by step guide Stable Diffusion
- Advantages and disadvantages of the AI image generator Stable Diffusion
- Use of Ki-generated content
- Alternatives to Stable Diffusion?
- Stable Diffusion vs. AI Midjourney
- Conclusion
- FAQ
What is Stable Diffusion?
Stable Diffusion is an AI image generator that generates digital images based on prompts, which are instructions in text form. The application was developed by Stability AI, a London-based startup that has been around since 2020. The company's AI image generator involved Runway ML, EleutherAI, German company LAION and a research group from LMU Munich. The first version of the tool came out in August 2022.
It is open source software. This means that users can build on the existing code and develop it further. The whole thing is based on a deep learning system, i.e. on a deep neural network consisting of several layers that make it possible to recognize and "learn" complex patterns and relationships in data sets. In this tool, image recognition and speech recognition come together: The AI recognizes the voice commands that users enter and searches out the elements that match them from an existing image database.
The AI was trained with an extremely high number of images, each with a matching term and subjected to a latent diffusion model process. Diffusion is the process of starting from a pattern (points or pixels) and creating an image, as well as the corresponding program recognizing the specified aspects of the image. The several million images came from the LAION Aesthetics dataset. The AI can only use existing sources to generate "new" images.
Step by step guide Stable Diffusion
Stable Diffusion can be accessed in several ways. Option 1: Open Stability AI's website and click on the "Dream Studio" tool. Option 2: Open Hugging Face Hub via the platform. Option 3: Download software to your own device.
Step 1:
Open the Stability AI website.
Step 2:
Scroll down until you see the "Dream Studio" button. Click on it.
Step 3:
On the page that opens, look for the "Get started" button (may also be marked as "Try me now" or "Try for free"). Click on it.
Step 4:
Register with your e-mail address in the input mask that should open now.
Step 5:
You will receive a confirmation email. Use the link in the email to access the Dream Studios front-end application.
Step 6:
You see again an input mask. In the designated text field you enter your prompt, i.e. the text command.
Important to know: The quality of the prompt is directly related to the quality of the output. The more precise you are, the more accurate the output you get. Because not everyone is a gifted prompt engineer, Stability AI has published a prompt guide.
For best results, you can use Stable Diffusion with English prompts. The tool can also work with German instructions. But it uses a much larger database in English. The prompts should be as detailed as possible. Keywords are better understood than phrases.
Once you have entered your prompt, the tool provides you with four image variants. You can use these variants to continue working with it.
Advantages and disadvantages of the AI image generator Stable Diffusion
First of all, it sounds relatively easy to generate usable images with this tool. And it is. You should be reasonably fluent in English and be able to describe what you expect from the tool. This way you can generate images in sufficient resolution for free and with a manageable amount of time.
But this is also where the problems start: The 3D footage is usable, and the resolution is good. It is not outstanding footage, and the resolution is not outstanding. The more specific you want your results to be, the more time-consuming it becomes to generate the material. At a certain point, the time required is no longer manageable at all. And then there is still the problem that Stable Diffusion can only work with the image material that LAION's AI has been fed with. So it is not possible to create something completely new.
The biggest advantages are that the tool is free to use and intuitive.
Copyrights of Ki-generated content
What about copyrights and usage rights? First of all, the legislation varies in the different countries where the tool is accessible. There is no uniform regulation. And then there is an overall dispute about who owns the rights to AI-generated content. There are good arguments that the copyrights belong to those who programmed the AI. After all, without those people, the content could not be created. But equally logical is the position that the copyrights lie with those who, via the input of custom prompts, got the AI to create that very content. So this question is not conclusively settled. It is also unclear who can be held liable in the event of problematic content.
Given this, it is completely understandable that companies are very hesitant to use AI-generated content. After all, the rights to use artistic and creative content can only be granted by those who hold the copyright. And that, as mentioned above, is not clear. In any case, the applicable terms and conditions should be thoroughly reviewed before content is used to whatever extent.
Alternatives to Stable Diffusion?
There are indeed some AI image generators you can try as an alternative. Artbreeder is one of them, DeepAI and DALL-E are other possibilities. Craiyon, NightCafe and Visionist are also more or less suitable for generating images. Probably the best-known representative among AI image generators, however, is AI Midjourney.
Stable Diffusion vs. AI Midjourney
The first noticeable point is: Stable Diffusion is free to use, and the resolution is good enough compared to AI Midjourney (higher than DALL-E). The speed and implementation of the prompts are satisfactory, and the image quality is comparable. However, it is eye-catching that you have direct access to the input mask and the results of the AI tool of Stability AI via Dream Studio. AI Midjourney is currently (summer 2023) still used via Discord. Discord needs to be installed, you need a user account, often enough the data transfer is overloaded. Then you wait a long time for your prompts to be processed, even for relatively simple tasks, which is annoying.
The second point is privacy. At AI Midjourney, the generated image content does not belong to you. AI Midjourney reserves the right to show your generated material as an example in the gallery. This means that the 3D images are available to anyone who is interested and can continue to work with them. If you want to generate more than a handful of images and use them commercially, you will not get around a subscription. Privacy costs, too.
Conclusion
Generating images via AI has become much easier in the last two years. The technology is making tremendous progress. De facto, the development of the tools is ahead of the formation of opinion in society - we simply don't know today how to legally and morally deal with this visual material. The visual material is not curated, which is why there can be offensive material. You can't expect unique visuals tailored to your application here. You can't even expect flawless imagery, because horses with five legs and similar blunders happen all the time. You should not expect diversity in terms of skin colors, nationalities, languages, etc., either; this is where algorithmic bias comes into play.
If the result is still sufficient for you, there is nothing to be said against using Stable Diffusion or a comparable tool. AI image generators will not disappear again, but will find and hold their place in the creative industries. So it's time to look into them - technically, from an ethnic point of view, from a user's point of view, and from a legal point of view.