Last Updated | Changes |
10/8/2023 | First version published |
What is a Prompt?
With the rise of Generative AI, a fascinating and transformative development has emerged: the ability to generate images, music, text, and even video from simple instructions known as “prompts“. These prompts act as a guide for AI systems, providing a framework of what is expected to be produced as an output. This beginner-to-advanced Prompt-Crafting guide will walk you through the steps required to take your from never-having-prompted, to Prompt Master!
Imagine whispering a concept into an artist’s ear and waiting for their imagination to run wild, except the artist, in this case, is an AI model, and the whisper is our written prompt.
Prompting is a two-way interaction between the user and the AI system. We offer the system an idea, a phrase, or a sentence, and in return, the AI paints us a digital picture, crafts a piece of text, or composes a melody. This level of engagement brings about an intriguing blend of precision and unpredictability.
We’re already seeing new career paths for particularly skilled “prompters” – the role of the Prompt Engineer. A Prompt Engineer is akin to a translator, or guide, helping bridge the gap between human intention and AI interpretation. By understanding how AI processes language and imagery, Prompt Engineers are able to craft prompts that effectively steer the AI’s generative abilities towards the desired outcome.
While the concept of prompting in Generative AI might seem straightforward at first glance, there’s a common misconception that prompt engineering is easy. On a basic level, anyone can input a sentence and generate an output. However, this is merely the tip of the iceberg. There’s prompting, and then there’s the art and science of effective prompting; it’s a little like playing an instrument – one might be able to play the first three chords of Stairway to Heaven, but to play with the same flair, nuance, and precision as guitarist Jimmy Page is another thing entirely. Likewise, simply creating a prompt doesn’t make one an expert at harnessing the full potential of Generative AI.
While the basics of prompting can be grasped easily, the mastery of prompt engineering takes substantial knowledge, practice, and an intimate understanding of the AI’s inner workings.
What makes a good prompt?
Firstly, it’s important to understand that prompts must be tailored to the platform. AI models have different specializations, and this significantly influences how they interpret and respond to instruction. A text-to-image (txt2img) system like Stable Diffusion and a text-based model like ChatGPT, each have unique underlying architectures, biases, trained knowledge bases, and their own “dialects” to which they respond; our prompts must be entered in the “language” the AI system understands best.
A crucial thing to remember about the art of prompting is that there’s a significant element of personal preference involved. It’s a bit like cooking: every chef has their own unique style, their preferred techniques, and their special ingredients. When discussing prompting with other prompters, you’ll find that some people swear by one method, structuring their prompts in a certain way, while others will favor an entirely different approach. And that’s perfectly okay!
This guide isn’t intended to impose hard-and-fast rules or declare a definitive “best way” to prompt AI systems. Instead, think of it as a starter kit – a collection of useful tools, insights, and tips to help you start your journey in prompt engineering.
The aim is to equip you with a basic understanding and a set of strategies that you can tweak, modify, and refine according to your preferences. Remember, prompting is as much an art (AI detractors will refute this!) as it is a science, and the beauty of art lies in its diversity and personal expression.
Stable Diffusion
This guide will initially focus on prompting for Stable Diffusion (txt2img), as that’s what we’re known for here at Civitai, but will evolve over time to include strategies for other Generative AI technologies.
Keep in Mind!
One of the most important things to keep in mind while prompting for Stable Diffusion is that while the general structure and syntax often remain much the same across models, specific tokens will produce a wide variety of results – what works well for one model won’t necessarily work well for another!
Additionally, prompting for SD 1.4/1.5 is very different to prompting for models using the SDXL architecture. There is overlap, and often an SD 1.5 prompt will work well with an SDXL model, but perhaps not quite as well as a prompt specifically tailored to the framework.
Basic Prompting – The Essentials
As mentioned above, there are many ways to prompt, and none of them are “wrong“, so long as we’re happy with the output, but by giving structure to our prompts we can maintain consistency, and foster good “prompting habits” to help keep our prompts easily readable.
The following sections outline the fundamental knowledge required to start prompting effectively!
The Positive Prompt – Basic Prompt Elements
The Positive Prompt, most often just referred to as “the prompt“, contains all the details of what we want to see in our images (the subject), it also defines the medium, style, composition and color & lighting of the image.
Note that there’s absolutely no requirement to include all of these elements in every prompt. Some prompts might only have a subject and medium, some might not even have a subject! It is entirely up to you!
The Subject
The subject is the focal-point of our image – the main object we want to depict.
Subject description: 1girl, woman, petite
Medium
The medium refers to the materials or tools an artist uses to create their work. It can include things like oil paint, watercolors, charcoal, pencil, etc. We can direct Stable Diffusion to reproduce a particular medium by specifying it in our prompt. Many models are created to reproduce a specific medium, and may not need additional prompting to achieve the desired effect.
Medium: watercolor painting
Style
Style defines the artistic style of our image. Examples of style include impressionism, realism, pop-art, surrealism, etc. Similarly to the Medium, many models produce a specific style and may not require additional tokens to produce the desired style effect.
Style: impressionist background
Composition
Composition describes how the various elements of the image are arranged within the artwork to create a pleasing result. This may include tokens to control the balance and symmetry of items within the scene, the framing of the image, scale and proportion, and other artistic concepts to make our images exactly as we imagine they should look.
Composition: from above
Color & Lighting
We have full control over both color and lighting in our images. This can include the color of particular items in our scene, or the overall hue.
Similarly, lighting plays a huge part in any artwork, and we can control many aspects of it, including shadows and overall brightness and vibrancy.
Color: rainbow hue
Lighting: bright
The Positive Prompt – Basic Syntax & Structure
As mentioned previously, it’s good practice to apply a somewhat standardized syntax and structure when prompting, and as we start writing our own prompts we discover what works well for us, and how we prefer to lay-out our prompts.
The following is one example of how a prompt can be organized. Other than looking tidy and enhancing readability, there are some practical reasons for prompting in this way. The most important is due to the fact that keeping similar tokens grouped together increases the chances of them being included in the final output. For example, Stable Diffusion wouldn’t like to see “red hair” at the very start of a prompt, and “long nose” at the very end – it’s likely one of those would be skipped, but paired together in a “block” of tokens describing a subject will increase the chances both will be taken into account.
We’re going to break our prompt into sections;
The First Section – Subject & Setting
The first few tokens describes the subject and their appearance; 1girl, woman, petite, pale skin, detailed face, bobcut hair, blue eyes, wearing yellow tank top, happy, laugh, statement sunglasses in a park, sky, trees, moonlight, stars
The Second Section – Color, Style and Lighting
In the second block of the prompt I’m defining the styles and modifiers related to color and lighting; vivid colors, bokeh background, dramatic color, cartoon
The Third Section – Composition & Additional Modifiers
The last section of this prompt of this prompt defines the composition, and adds a few extra words to help capture the desired atmosphere of the image; from below, cinematic, whimsical
The final prompt, and result;
1girl, woman, petite, pale skin, detailed face, bobcut hair, blue eyes, wearing yellow tank top, happy, laugh, statement sunglasses in a park, sky, trees, moonlight, stars
, vivid colors, bokeh background, dramatic color, cartoon
, from below, cinematic, whimsical
Negative Prompts
The Negative prompt allow us to control what we don’t want to see in our images. Often, stating what we don’t want to see in the Positive prompt has the opposite effect! The solution is to utilize the dedicated Negative Prompt section, to remove undesired features from our images.
Many prompts you’ll come across will have a “common” or “universal” negative – a set of objects, attributes, and concepts which are generally deemed to be undesirable in our images. One such negative might look like;
low res, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, ugly
Words vs Tokens
Stable Diffusion doesn’t understand words in the sense that humans do. It relies on a “tokenizer” (for SD 1.5 CLIP) to convert our prompt words into “tokens” – numerical representations of words it has in its’ “dictionary” of vocabulary.
Many common words equate to a single token. Some longer and more complex words are broken down by the tokenizer into separate tokens, each with their own meaning. Many particularly obscure or unusual words might not be recognized at all, and will have odd effects when used in the prompt.
Let’s examine the following prompt;
Outcome | |
---|---|
Text prompt | a beautiful cat with long whiskers |
Tokenized Split | [a] [beautiful] [cat] [with] [long] [whis] [kers] |
Numeric Tokens | 7 tokens, 320, 1215, 2368, 593, 1538, 6024, 2880 |
The interesting point here is that the word whiskers had to be split into two tokens (6024, 2880) to be understood by Stable Diffusion. Having a word be split into multiple tokens isn’t necessarily a bad thing – Stable Diffusion still recognizes many multi-token words!
Conclusion
In this guide we’ve explored the absolute basics of prompt creation, but there’s so much more to cover! In Part 2, we’ll look at prompt Weight & Emphasis, and in the final part we’ll examine a number of advanced topics!