🎨JSON Style Guides for Controlled Image Generation with GPT-4o and GPT-Image-1
raphiki

raphiki @raphiki

About: I am a Yogeek, meaning both Yoga practitioner/instructor and IT passionate

Location:
Paris, France
Joined:
Nov 30, 2022

🎨JSON Style Guides for Controlled Image Generation with GPT-4o and GPT-Image-1

Publish Date: May 8
54 7

Image generation with GPT-4o and GPT-Image-1 can yield visually stunning results—but without clear instructions, results may vary. Using JSON style guides is a powerful way to bring clarity, structure, and repeatability to your prompts. This tutorial will walk you through why JSON style guides matter, how to use them effectively, and provide a complete reference to all parameters you can define.


🚀 Why Use a JSON Style Guide?

Natural language is powerful but often ambiguous. By organizing your image prompts using JSON:

  • ✅ You eliminate ambiguity with structured fields.
  • ✅ You ensure consistency across multiple generations.
  • ✅ You can automate or scale prompt creation for batch processing.
  • ✅ You separate content from style, making iterations easier.
  • ✅ Developers and designers can work together using shared, machine-readable formats.

🛠️ How to Use a JSON Style Guide

A JSON prompt is simply a structured document specifying everything you want the model to include. Here’s a simple example:

{
  "scene": "a magical forest clearing",
  "subjects": [
    {
      "type": "fox",
      "description": "wearing a wizard hat, sitting on a tree stump",
      "position": "center"
    }
  ],
  "style": "storybook illustration",
  "color_palette": ["forest green", "gold", "midnight blue"],
  "lighting": "soft dappled sunlight",
  "mood": "whimsical and cozy",
  "background": "glowing mushrooms and tall trees",
  "composition": "eye-level view, centered subject"
}
Enter fullscreen mode Exit fullscreen mode

This structure gives the model explicit, interpretable instructions for what to render and how.

Fox in Magical forest


📚 Parameter Reference

Here’s a breakdown of possible fields you can use in a JSON style guide.

1. scene

A short overview of the entire setting or environment.

  • Example: "a futuristic city at sunset"

2. subjects (array of objects)

Describes each key subject in the image. Each subject can include:

{
  "type": "robot",
  "description": "silver body with glowing blue eyes",
  "position": "foreground",
  "pose": "standing upright",
  "size": "large",
  "expression": "neutral",
  "interaction": "looking at a floating screen"
}
Enter fullscreen mode Exit fullscreen mode

3. style

The artistic or visual rendering style.

  • Examples: "photorealistic", "watercolor", "pixel art", "cyberpunk", "anime"

4. color_palette

An array of dominant and accent colors.

  • Example: ["emerald green", "burnt orange", "charcoal"]

5. lighting

How the image is lit.

  • Examples: "sunset backlight", "soft studio lighting", "glow from below"

6. mood

The emotional tone or atmosphere.

  • Examples: "peaceful", "dramatic", "eerie", "playful"

7. background

The scenery or backdrop.

  • Examples: "mountain landscape", "white cyclorama", "dreamy nebula sky"

8. composition

Overall layout and positioning.

  • Examples: "symmetrical", "rule of thirds", "top-down shot", "portrait orientation"

9. camera

Virtual photography settings.

{
  "angle": "eye-level",
  "distance": "medium shot",
  "lens": "wide-angle",
  "focus": "sharp subject, blurred background"
}
Enter fullscreen mode Exit fullscreen mode

10. medium

Simulated medium or format.

  • Examples: "oil painting", "3D render", "ink drawing", "chalkboard sketch"

11. textures

Surface qualities and tactile impressions.

  • Examples: "soft velvet", "rusty metal", "wet pavement"

12. resolution

Intended resolution or output size.

  • Examples: "4K", "web banner", "Instagram square"

13. details

Extra fine-tuned attributes.

{
  "clothing": "flowing red cape",
  "weather": "light snowfall",
  "facial_features": "freckles and sharp jawline",
  "material": "glass and brass",
  "ornaments": "glasses, ring"
}
Enter fullscreen mode Exit fullscreen mode

14. effects

Special effects or visual treatments.

  • Examples: "lens flare", "bokeh blur", "double exposure", "film grain"

15. inspirations

Known references to guide visual style.

  • Examples: "inspired by Studio Ghibli", "in the style of Van Gogh", "similar to Blade Runner"

🧪 Example Use Cases

Fantasy Character Concept Art

{
  "scene": "mountaintop at sunrise",
  "subjects": [
    {
      "type": "warrior elf",
      "description": "leather armor, long silver hair",
      "pose": "standing with sword raised",
      "position": "foreground"
    }
  ],
  "style": "digital painting",
  "color_palette": ["misty gray", "light gold", "teal"],
  "lighting": "sunrise backlight",
  "mood": "heroic and calm",
  "background": "foggy mountains",
  "composition": "rule of thirds",
  "camera": {
    "angle": "low angle",
    "distance": "medium shot",
    "focus": "sharp on character"
  }
}
Enter fullscreen mode Exit fullscreen mode

Fantasy Character

Product Mockup

{
  "scene": "minimalist white studio",
  "subjects": [
    {
      "type": "smartwatch",
      "description": "silver frame with red strap",
      "position": "center",
      "pose": "lying flat"
    }
  ],
  "style": "photorealistic",
  "lighting": "diffused light from above",
  "mood": "clean and sleek",
  "background": "white gradient",
  "composition": "centered product with top view",
  "resolution": "4K"
}
Enter fullscreen mode Exit fullscreen mode

Smartwatch

Realistic Scene with two Characters

{
  "scene": "urban café terrace in Paris during golden hour",
  "subjects": [
    {
      "type": "young woman",
      "description": "30s, Black hair in a bun, wearing a white blouse and tan trench coat, holding a coffee cup",
      "pose": "sitting at a café table, leaning forward slightly",
      "position": "left foreground",
      "expression": "engaged, smiling softly"
    },
    {
      "type": "young man",
      "description": "30s, light brown curly hair, wearing a navy blue jacket and scarf, gesturing with one hand",
      "pose": "sitting across from the woman, mid-conversation",
      "position": "right foreground",
      "expression": "animated, talking"
    }
  ],
  "style": "hyper-realistic photography",
  "lighting": "natural golden hour light with soft shadows and sun flare",
  "mood": "warm and intimate",
  "background": {
    "elements": ["street with bicycles", "café signage", "distant pedestrians"],
    "depth_of_field": "shallow, blurred background"
  },
  "composition": "framed using the rule of thirds, both characters centered with table between them",
  "camera": {
    "angle": "eye level",
    "distance": "medium close-up",
    "focus": "sharp on characters' faces"
  },
  "color_palette": ["warm gold", "beige", "navy", "soft rose", "espresso brown"],
  "props": ["ceramic coffee cups", "croissants on a small plate", "notebook and pen on table"],
  "resolution": "4K"
}
Enter fullscreen mode Exit fullscreen mode

Coffee Break in Paris


Using JSON style guides gives you a consistent, modular, and precise way to control image generation. Whether you're creating a portfolio of characters, designing branded assets, or prototyping environments, structured prompts give you the power to communicate with clarity and scale with confidence.

And don’t hesitate to use ChatGPT to refine or co-create your JSON Style Guides! It can turn vague ideas into structured, generation-ready prompts in seconds.

Comments 7 total

  • raphiki
    raphikiMay 25, 2025

    And this also work with Gemini:

    Realistic Scene with two Characters

  • raphiki
    raphikiMay 25, 2025

    And this also work with Gemini:
    Product Mockup

  • raphiki
    raphikiMay 25, 2025

    And this also work with Gemini:
    Fantasy Character Concept Art

  • raphiki
    raphikiMay 25, 2025

    And this also work with Gemini:
    Fox in Magical Forest

  • raphiki
    raphikiAug 20, 2025

    Here are instructions you can give to an assistant or agent to propose a full-blown JSON STyle Guide base on a few inputs words:

    You are tasked with generating a JSON object in English (even if the initial user prompt is in another language) containing properties for image generation based on a simple design prompt provided by the user. The JSON object should reflect the artistic intent, technical specifications, and thematic elements of the prompt and be compatible with Stable Diffusion or Flux models. Be creative and consistent in your interpretations.
    Instructions:

    1. Ask the user for a short, simple design prompt (e.g., "a serene Japanese garden at sunrise" or "cyberpunk, city skyline, flying cars").
    2. Analyze the design prompt to extract all relevant artistic, thematic, and technical details.
    3. Identify visual style, subjects, colors, mood, composition, lighting, and other distinctive elements.
    4. Generate a structured JSON object containing (at minimum) the following top-level keys:
    5. style_name – A catchy, descriptive phrase that encapsulates the design prompt’s overall identity.
    6. inspiration – 2–4 artistic styles, artists, or cultural references that align with the prompt.
    7. scene – A concise description of the overall setting/environment.
    8. subjects – An array of objects describing each main element in the scene: { "type": "robot", "description": "silver body with glowing blue eyes", "position": "foreground", "pose": "standing upright", "size": "large", "expression": "neutral", "interaction": "looking at a floating screen" }
    9. style – Artistic rendering style (e.g., "watercolor", "photorealistic", "anime").
    10. color_palette – Object containing: primary: hex code for main color secondary: complementary hex color highlight: accent hex color shadow: depth hex color background_gradient: array of two hex colors for gradients
    11. lighting – Description of light source, tone, and direction (e.g., "soft morning light", "neon backlighting").
    12. mood – Emotional tone or atmosphere (e.g., "serene", "mysterious", "energetic").
    13. background – Object describing backdrop: type: "solid", "gradient", "pattern", "scenery" details: description of background elements
    14. composition – Layout and framing approach (e.g., "rule of thirds", "center focus", "top-down view").
    15. camera – Virtual camera settings: angle: e.g., "eye-level", "low angle" distance: e.g., "close-up", "wide shot" lens: e.g., "35mm", "wide-angle" focus: e.g., "sharp subject, blurred background"
    16. medium – Simulated artistic medium (e.g., "oil painting", "charcoal sketch", "3D render").
    17. textures – Surface qualities (e.g., "rough concrete", "silky fabric").
    18. resolution – Intended output size (e.g., "4K", "Instagram square").
    19. details – Extra specific attributes (e.g., clothing, weather, materials, accessories).
    20. effects – Special visual treatments (e.g., "bokeh blur", "lens flare", "film grain").
    21. themes – 3–5 conceptual or emotional themes in the design.
    22. usage_notes – Short guidelines on how to apply the style effectively.
    23. Optional – You can add additional keys and properties - only if relevant - to have a comprehensive JSON Style Guide to be use for image generation by Stable Diffusion or Flux models.
    24. Merge everything in one single JSON file and output the final JSON file only with proper indentation and formatting for readability and without other extra text before or after. Ensure that all properties in the JSON object reflect the input design prompt and that the text is in English.
  • raphiki
    raphikiAug 23, 2025

    Qwen Edit also understands JSON Prompt Guides

Add comment