invokeAI-docs

Getting Started with AI Image Generation

New to image generation with AI? You’re in the right place!

This is a high level walkthrough of some of the concepts and terms you’ll see as you start using InvokeAI. Please note, this is not an exhaustive guide and may be out of date due to the rapidly changing nature of the space.

Using InvokeAI

Prompt Crafting

To get started, here’s an easy template to use for structuring your prompts:

Generation Workflows

Improving Image Quality

Terms & Concepts

If you’re interested in learning more, check out this presentation from one of our maintainers (@lstein).

Stable Diffusion

Stable Diffusion is deep learning, text-to-image model that is the foundation of the capabilities found in InvokeAI. Since the release of Stable Diffusion, there have been many subsequent models created based on Stable Diffusion that are designed to generate specific types of images.

Prompts

Prompts provide the models directions on what to generate. As a general rule of thumb, the more detailed your prompt is, the better your result will be.

Models

Models are the magic that power InvokeAI. These files represent the output of training a machine on understanding massive amounts of images - providing them with the capability to generate new images using just a text description of what you’d like to see. (Like Stable Diffusion!)

Invoke offers a simple way to download several different models upon installation, but many more can be discovered online, including at https://models.invoke.ai

Each model can produce a unique style of output, based on the images it was trained on - Try out different models to see which best fits your creative vision!

Scheduler

Schedulers guide the process of removing noise (de-noising) from data. They determine:

  1. The number of steps to take to remove the noise.
  2. Whether the steps are random (stochastic) or predictable (deterministic).
  3. The specific method (algorithm) used for de-noising.

Experimenting with different schedulers is recommended as each will produce different outputs!

Steps

The number of de-noising steps each generation through.

Schedulers can be intricate and there’s often a balance to strike between how quickly they can de-noise data and how well they can do it. It’s typically advised to experiment with different schedulers to see which one gives the best results. There has been a lot written on the internet about different schedulers, as well as exploring what the right level of “steps” are for each. You can save generation time by reducing the number of steps used, but you’ll want to make sure that you are satisfied with the quality of images produced!

Low-Rank Adaptations / LoRAs

Low-Rank Adaptations (LoRAs) are like a smaller, more focused version of models, intended to focus on training a better understanding of how a specific character, style, or concept looks.

Textual Inversion Embeddings

Textual Inversion Embeddings, like LoRAs, assist with more easily prompting for certain characters, styles, or concepts. However, embeddings are trained to update the relationship between a specific word (known as the “trigger”) and the intended output.

ControlNet

ControlNets are neural network models that are able to extract key features from an existing image and use these features to guide the output of the image generation model.

VAE

Variational auto-encoder (VAE) is a encode/decode model that translates the “latents” image produced during the image generation procees to the large pixel images that we see.