Flux Kontext Outfit Swap: Preserve Face, Change Clothes
The exact phrasing that swaps outfits in Flux Kontext while keeping the face and background locked. The one phrase that always breaks it, and the rescue.
I burned about three days on Flux Kontext outfit swaps before I figured out that the prompt grammar matters more than any setting in the model. Same input image, same workflow, same denoise. One sentence structure swaps the clothes and keeps the face. A different sentence structure that means almost the same thing breaks identity, melts the face, and gives me someone who is technically wearing the outfit I asked for but who is no longer my character.
This was infuriating because nobody is documenting the difference clearly. The ComfyUI tutorials show you the workflow graph. The YouTube videos show you the result. Nobody pauses on the exact phrase that determines whether you get a clean outfit swap or a stranger in your character's pose.
Here is the working construction, the failure construction, and the catalog-scale workflow I now run for outfit packs and persona-locked content.
Quick Answer: In Flux Kontext, the working prompt structure is "Change the X while preserving facial features, hair, and background." The failure structure is anything that uses "transform," "generate a new," or "make this person look like." The first edits one region. The second tells the model to redraw the whole image, and identity goes with it.
- Outfit swap is an edit, not a transform. Phrase it as a localized change with explicit preserve clauses.
- "Change the X while preserving Y" is the construction that holds. Skip "transform" entirely.
- For full catalog production, stack outfit swap with Apatero AI's persona lock so the source face stays locked across every variant.
- When Kontext drifts on a complex outfit, mask first with Segment Anything, then run the edit on the masked region only.
- For clothing that has fabric-physics ambiguity like draped silk or layered tactical gear, ACE Plus often beats Kontext for first-pass quality.
The Editing Class That Outfit Swap Belongs To
Here is the thing nobody explains up front. Flux Kontext supports two completely different prompt classes, and outfit swap belongs to one of them. If you treat outfit swap like the other class, the model behaves like a different model entirely.
The two classes are localized edits and global transformations. Localized edits change one region of an image and leave the rest alone. Global transformations redraw the entire image to a new specification. The vocabulary that triggers each class is different, and the model picks which class to run from your verb choice.
Outfit swap is a localized edit. You are not asking the model to imagine a new person. You are asking it to repaint the clothing region while leaving the face, the hair, the lighting, the pose, the background, and everything else exactly the same. The model can do this. It does it well. But only if your prompt commits to the localized-edit class with the right verbs.
A practical mental model that helped me. Imagine you are giving instructions to a retoucher. You would not say "transform this woman into someone wearing a leather jacket." You would say "change the gray hoodie to a black leather jacket, keep everything else the same." That second sentence is the exact pattern Kontext wants.
Why "Transform This Person" Always Breaks
I tested this maybe sixty times before I trusted the conclusion. Any prompt that uses "transform," "make this person look like," "generate a new version where," or "create an image of this character but" gets routed to the global transformation class. The model then redraws the entire image, identity included.
The failure mode is subtle. You will get an image that looks like your character at first glance. The hair is roughly right. The face shape is roughly right. The skin tone matches. But the eye spacing is off. The lip shape is wrong. The cheekbone height shifted. You can stack ten of these side by side and they look like ten different people in similar wigs.
Here is what I think is happening. "Transform" is a global instruction. The model interprets it as a license to redraw freely while staying near the source distribution. Near the source is not the same as preserving the source. Near is enough to fool you for one image and ruin a thirty-image catalog.
Real talk. I wasted an entire weekend assuming the issue was the denoise schedule. I tried denoise from 0.3 to 0.8 in twenty-step increments. Nothing fixed it because the denoise was not the problem. The verb was. The moment I switched from "transform" to "change," the identity locked at 0.5 denoise without any other adjustment.
The Working Construction: Change X While Preserving Y
Here is the exact pattern that works across every outfit swap I have run since I figured this out:
Change the [current garment description] to [target garment description],
while preserving facial features, hair, skin tone, pose, lighting,
and background. The character's identity must remain identical.
Break this down because every clause is doing work.
"Change the [current garment description]" tells Kontext which region of the image is in scope for the edit. The more specific you are about what is being replaced, the cleaner the mask the model implicitly builds. "Change the gray crewneck sweater" beats "change the top." "Change the brown leather ankle boots" beats "change the shoes."
"To [target garment description]" gives the model the target. Again, specific beats vague. "To a black leather moto jacket with silver hardware" beats "to a leather jacket."
"While preserving facial features, hair, skin tone, pose, lighting, and background" is the lock clause. This is the part most tutorials skip and it is the single most important sentence in the entire prompt. Kontext is excellent at honoring preserve clauses when they exist. It is reckless when they do not.
"The character's identity must remain identical" is the redundancy that I added after testing. It improves identity hold by maybe another five percent on average. Some people will tell you redundancy in prompts is sloppy. In my testing it is genuinely useful for Kontext outfit work.
Anchor Words That Keep Identity Locked
Beyond the basic construction, there are specific anchor words that improve identity preservation in Kontext outfit swaps. I collected these by running the same outfit swap with and without each word and measuring the identity score across a small in-house grid.
Words that anchor identity reliably include "facial features," "face shape," "eye spacing," "nose bridge," "lip shape," "cheekbones," "jawline," and "ear shape." Adding two or three of these to your preserve clause tightens the lock. The model has an internal representation of each of these features that does not get touched when they are named in a preserve clause.
The over-anchor risk is real but minor. If you stack twelve facial-feature anchors, Kontext sometimes leaves a faint ghost of the original neckline because it has interpreted "preserve everything about the head and face" so literally that it refuses to repaint where the collar should sit. Three to five anchors is the sweet spot.
For background preservation, the equivalent anchors are "background composition," "lighting direction," "ambient color," "depth of field," and "scene context." Most outfit swaps need only two of these. "Lighting direction" alone is the highest-value anchor in this group because clothing changes shift the local reflectance and the model otherwise wants to relight the whole scene to match the new garment.
Outfit Vocabulary That Lands With Kontext
Here is the part where my testing diverged from the public tutorials. Most of them tell you to be descriptive. They are right but they do not tell you which descriptors actually move the needle inside Flux's training distribution.
Kontext responds best to garment vocabulary that mirrors fashion-industry copy rather than casual description. "Crewneck pullover sweater in heather gray" beats "gray sweater" by a noticeable margin. "Slim-fit straight-leg dark indigo selvedge denim" beats "blue jeans." The model has seen so much e-commerce copy in its training data that fashion-industry language activates more specific clothing representations.
Materials matter even more than colors. Specifying "matte leather," "brushed cotton," "tech-nylon," "merino wool," or "raw silk" produces different surface treatments. Color modifiers without material specifications often produce a generic plastic-looking garment.
Hardware details are the secret weapon for premium-looking output. "Black leather jacket" gets you the base. "Black leather jacket with antique brass YKK zippers, asymmetric chest zip, and snap-button lapels" gets you a jacket that reads as designed rather than rendered. This level of detail only matters for hero shots, but for hero shots it matters a lot.
Layering. Kontext handles layered outfits but you have to describe the layers in order from the body outward. "White ribbed cotton tank top under an open black denim shirt with rolled sleeves" works. "Open black denim shirt with rolled sleeves over a white ribbed cotton tank top" works almost as well but occasionally produces a tank-top-only output. Inside-out ordering is more reliable in my testing.
Lighting and Pose, What You Get for Free and What You Must Specify
If you do not mention lighting in your prompt at all, Kontext usually preserves it. The preserve clause for lighting is helpful but not strictly required for short outfit swaps. The risk is that some garments imply a lighting context. If you swap a workout outfit for an evening gown, the model sometimes interprets that as a scene change and starts adjusting the lighting toward gala-style key lighting. The "preserve lighting" anchor stops this.
Pose preservation is automatic for Kontext outfit swaps because the underlying anatomy is reused. The model is replacing surface texture on a body whose pose is already encoded in the image. This is one of the genuine wins of doing outfit swap as a Kontext edit rather than as a fresh generation. You do not need to ControlNet your way back to the original pose. It comes for free.
The exception is when the outfit forces a pose change. A heavy ball gown will make the model want to flare the skirt outward and away from the legs. A backpack will make the model want to shift the shoulders forward. These are rare cases and they show up as obvious anatomy weirdness rather than identity drift. When it happens, regenerate with a more conservative outfit choice or accept the minor pose drift.
Multi-Pass Outfit Swaps for Catalog Production
Here is where I want to share the actual production workflow because the textbook single-swap is fine for one image and a disaster for thirty.
For catalog-scale outfit production, I do not run Kontext as a one-shot. I run it as a multi-pass pipeline. The first pass establishes the silhouette and the overall material. The second pass refines the hardware and the surface details. The third pass, if needed, addresses any preservation slippage on the face or background.
First pass. Run the prompt as described above with conservative outfit vocabulary. Get the silhouette right. If the jacket is the wrong cut on the first try, fix that before adding hardware.
Second pass. Take the output from the first pass and run it back through Kontext with a refined prompt that focuses on details. "Refine the leather jacket from the previous edit. Add brass hardware, top stitching, and a slightly darker base tone. Preserve facial features and background." The model treats this as a smaller-amplitude edit and will not drift identity further.
Third pass. Only if needed. If after two passes you can see that the face has shifted slightly or the background has lost detail, run a targeted recovery pass. "Restore the original facial features and background detail while keeping the current outfit." This works surprisingly well as a corrective step.
I do this three-pass workflow for hero shots. For volume catalog production where I need thirty outfit variants in a session, I do not multi-pass at all. I run the first pass only and accept the slightly lower-detail output because the catalog viewer is going to see thumbnails anyway.
Combining Kontext Outfit Swap With Background Replace
This is the combo I use most for AI influencer content. Outfit swap plus background replace, both in Kontext, sequentially. You can run them in either order and the results are similar, but I prefer outfit first then background.
Reason. Outfit swap is the more identity-sensitive edit. Doing it first means you spend the model's preservation budget on the face during the outfit pass. When you then run the background replace, the face is already locked-in pixels and the background pass treats it more conservatively. If you reverse the order, the background replace can sometimes loosen the face lock and the subsequent outfit swap drifts further than it would have.
The background replace prompt structure mirrors the outfit one. "Change the background from [current scene] to [target scene], while preserving the character's appearance, outfit, pose, and overall lighting direction. The character's identity must remain identical."
One nuance. If you are doing both edits and the lighting between the source background and the target background is dramatically different, you need an explicit "adjust lighting on the character to match the new scene" clause in the background prompt. Without it, you get a character lit for the original scene composited onto a target scene with different lighting, and the cutout look gives the whole thing away.
When to Use Kontext vs ACE Plus vs Catvton
Honest comparison from running all three at scale. Kontext is the right choice for outfit swaps where the garment is well-described in fashion-industry vocabulary and where the source pose is straightforward. ACE Plus pulls ahead when the garment has complex physics like draped silk, layered tactical gear, or anything with significant fabric volume. Catvton remains the strongest option when you have a real-world product photo of the target garment and want it composited onto your character.
The ranking I land at after maybe fifty side-by-side tests. Kontext for prompted outfit swap. ACE Plus for outfits where Kontext drifts on the second or third pass. Catvton for catalog work where the garment must match an existing physical SKU. There is no single winner. There are three tools with different strengths and you pick based on the input you have.
For solo creators doing AI influencer work where the outfit is described not photographed, Kontext wins on speed and consistency. For ecommerce work where you have flat-lay product photos and a model photo and you want them composited, Catvton wins. ACE Plus is the rescue tool when the first two miss.
I built the persona-locked outfit pipeline at Apatero AI around exactly this hierarchy. The platform routes outfit swaps to Kontext by default, falls back to ACE Plus if the first pass scores low on a structural-similarity check, and exposes Catvton as a separate "real garment" workflow tab. Full disclosure, I help build Apatero AI, so I am biased. But the multi-tool routing is genuinely the reason I do not have to remember which tool is best for which case. The platform picks.
Encoding the Pattern in an Apatero AI Outfit-Swap Workflow
Here is what I built and run daily. One workflow tab inside Apatero AI labeled outfit swap. Inputs are a source image, a target outfit description, and an optional preserve-strength slider. The tab routes the request to the right tool based on whether the description is text-only or photo-paired and on whether the source has draped fabric in scope.
What I get on the output side is a consistent outfit swap with no tool-juggling overhead. I do not have to know which prompt structure works for which tool. I do not have to manage three separate ComfyUI workflows on disk. I do not have to remember which one needs Segment Anything pre-masking and which one does not.
For AI influencer work specifically, this matters because the wardrobe-lock-plus-scene-variation pattern requires running outfit swap dozens of times across a content batch. Pulling this off without a hosted workflow means doing the same three-tool dance every time. With the hosted tab it is one button.
I covered the broader wardrobe-strategy side of this in my Five Looks Method for AI influencer wardrobes and the persona-lock foundations in How to Lock a Character Across 50 Images With Apatero. If you are building catalog volume at thirty SKUs or more per day, the AI Catalog Photography workflow walks through the production schedule that builds on the outfit-swap fundamentals from this post.
FAQ
Can Flux Kontext do outfit swap without preserving the face?
Yes, but you would not want to. If you omit the preserve clause entirely, Kontext will swap the outfit and drift the face. The drift is sometimes subtle on a single image and obvious on a pack. Always include "preserving facial features" at minimum.
What denoise setting works best for Kontext outfit swaps?
The default 0.5 to 0.6 works for most outfit swaps with proper prompt construction. Drop to 0.4 if you are getting identity drift despite a clean prompt. Raise to 0.7 if the original outfit is bleeding through visibly and you need a stronger edit. I rarely go outside 0.4 to 0.7.
Why does Kontext sometimes change the hair when I only asked for an outfit change?
Hair drift happens when the new outfit has a strong style implication that the model interprets as a full look change. A black-tie outfit on a casual portrait can drift the hair toward an updo. Add "preserve hair length, style, and color" explicitly to the preserve clause.
Can I do multiple outfit variations in one prompt?
No. Kontext is single-target. For multi-variant production, run separate edits and batch them in a workflow tool. Apatero AI handles this natively as a batch tab.
Does this approach work for accessories like glasses, hats, or bags?
Yes, with the same construction. "Add a brown leather messenger bag while preserving facial features, outfit, pose, and background." The model adds accessories cleanly when phrased as additions rather than replacements.
What happens if I use "transform" anyway?
You get a global redraw. Identity drifts, sometimes mildly, sometimes badly. Stick with "change" or "replace" or "swap" as the verb.
Is this prompt structure the same in Flux Kontext Dev and Pro?
Yes, same structure works in both. Pro tends to honor preserve clauses slightly more strictly. Dev is more sensitive to prompt phrasing.
How does this compare to LoRA-based outfit swap?
A trained outfit LoRA gets you a specific garment with very high fidelity but requires training overhead. Kontext outfit swap gets you any garment described in text with no training. For one-off swaps, Kontext wins. For brand-specific recurring outfits, a LoRA is worth the training time.
Can I outfit-swap a generated character that does not exist as a real photo?
Yes. Kontext does not care whether the source image is photographic or AI-generated. The preserve clauses work identically.
What is the realistic identity preservation rate?
In my testing across about two hundred outfit swaps with the construction described in this post, identity preservation lands at roughly 92 to 95 percent. The remaining failure rate is split between extreme outfit changes that imply scene context and prompts where I forgot to anchor a specific facial feature.
Wrapping Up
The grammar of the prompt determines whether Kontext does what you want. "Change X while preserving Y" is the construction. Skip "transform." Lock the face with explicit anchors. Run multi-pass for hero shots and single-pass for catalog volume. Combine with background replace in the right order. Route to ACE Plus when Kontext misses on fabric physics.
If all of this sounds like more pipeline overhead than you want to manage, the outfit-swap workflow inside Apatero AI handles the routing automatically. The relevant external references for deeper technical dives are the official Flux Kontext documentation in ComfyUI, the ACE Plus comparison from MyAIForce, and the RunComfy Kontext face swap workflow which uses the same preserve-clause logic for adjacent edit types.
The takeaway from three days of frustration. The model is not the bottleneck. The verb is.
Related Articles
AI Comic Pages: Six Panels, One Hero, Zero Drift
Comic pages punish drift more than any other format. Six panels, one hero across all of them, and a workflow that scales to a forty-page issue.
Character Sheet From One Reference: Step by Step
Turn one selfie or render into a full turnaround sheet the AI can lock to. Front, three-quarter, side, back, plus expression strip. Real workflow.
Children's Book Character Lock: Twelve Pages, Same Kid
The LoRA-vs-IPAdapter decision tree, the page-by-page prompt template, and the rescue strategy for the inevitable page-eight drift in AI children's books.