Building an AI Shopify App: How We Built Krafted

Krafted is an embedded Shopify app that turns a product URL into a complete, live product page. Paste a link from AliExpress, Alibaba, Amazon, or your own store — Krafted pulls the product data, classifies it into a niche, generates copy and a colour-matched layout, and injects the finished page directly into your Shopify theme. One click, publishable result.

This case study covers how we built it, where we hit genuine technical walls, what we'd do differently, and where the product is going next.

The Problem

Shopify merchants who source products at volume — particularly from AliExpress and Alibaba — have a problem most e-commerce tooling ignores: product page quality doesn't scale with product quantity.

The import flow itself is fast. Pull the data, create the Shopify product, set a price. What's slow is everything after that — writing copy that doesn't read like a translated AliExpress listing, picking the right hero image out of 12 mediocre shots, choosing colours that match the store brand, building a layout that actually sells the product. For a merchant listing 20 products a week, that's 20+ hours of repetitive creative work that eats the margin that volume importing was supposed to create.

The brief: automate the entire post-import phase without trading quality for speed. A generated page should be publishable, not a first draft you have to fix.

How We Approached It

Before writing code, we mapped every component of the system independently and asked the same question about each: is this an AI problem, a deterministic logic problem, or a platform constraint problem? The answer shaped every architectural decision we made.

Most AI product builds treat AI as the solution to everything and bolt on corrections later. We didn't do that. Three principles ran through the entire build:

The right tool for each task, not the most popular one. GPT-4 is measurably better at categorical disambiguation. Gemini is better — and significantly cheaper — at evaluating product images. node-vibrant is more reliable than any language model at extracting dominant colours from an existing image. We built task-specific routing from the start, not as an optimisation after the fact.

Accept platform constraints early. AliExpress and Alibaba have OAuth implementations that don't support server-to-server token refresh. That's not an engineering problem with a clever solution — it's a fixed constraint. We spent time early trying to work around it. That was a mistake. Once we designed the UX around it instead of against it, the authentication architecture became simple and maintainable.

Validate what AI can't. Colour contrast ratios, Shopify theme write success, theme asset availability after installation — none of these can be trusted to model output alone. Every AI-generated value that has an objective correctness criterion gets programmatic validation before we mark the step complete.

The Generation Pipeline

Every step runs through a runStep() wrapper that handles database idempotency, status tracking, and error alerting. The pipeline is fully resumable — an interrupted generation picks up from the last completed step, not from scratch. That resilience wasn't retrofitted. It was the starting architecture.

Step 0 — Title refinement (marketplace imports only)

AliExpress titles are keyword-stuffed for search, not for human reading. "2024 New Hot Selling Girl Toy Pretend Play Cosmetic Set Birthday Gift 32pcs Makeup Kit Children Educational Toy" is five potential niches compressed into one string. Before any AI classification runs, we convert the raw title into a clean DTC-brand product name and generate a URL-safe Shopify handle. Skipped entirely for native Shopify products.

Step 1 — Niche detection

AI classifies the product across 20+ categories into one of seven niche themes: Baby, Cosmetics, Fitness, Home Decor, Kitchen, Pet, or Toys. Each theme has its own section layout, copy tone, colour families, and template structure. This is the highest-leverage decision in the pipeline — every downstream step inherits it.

Step 2 — AI product image generation (optional)

When the source product has no usable imagery, AI generates a hero product image. Output is Sharp-optimised before being prepended to the product's image array in Shopify.

→ Publish gate

A billing-aware checkpoint. If the merchant's subscription tier restricts further processing, the run pauses at publish_pending. The AI work in Steps 0–2 has already run — usage is recorded, the work isn't lost. The pipeline resumes automatically on plan upgrade.

Step 3 — Product creation (marketplace imports only)

Creates the Shopify product, migrates all images from platform CDNs to Shopify-hosted URLs, and publishes to all sales channels.

Step 4 — Theme merge

Overlays niche-specific sections and templates onto the merchant's active theme. Produces a mergeSummary with the baseProductTemplate that page generation builds against.

Step 5 / 6 — Page and homepage generation

AI generates full product page content against the niche template. In homepage-plus-product-page mode, a second pass generates a matching homepage with the product's data embedded.

The Hard Problems

Niche detection: benchmarking for your actual task

The first version classified products on title alone. It worked on clean inputs and broke on anything ambiguous. A girls' toy makeup set came back as cosmetics. Technically defensible. Wrong — the product belonged in the Toys theme, with completely different layout and copy tone.

The fix was two things: Step 0 to clean the title before classification runs, and a prompt change that forces the model to synthesise both title and description rather than pattern-matching on a single field.

We benchmarked GPT-4, Claude, and several open-source alternatives on a set of deliberately ambiguous products. GPT-4 was meaningfully better at following the disambiguation instruction reliably. Not because it's generally smarter — other models weren't far behind on standard tasks — but because it handled the "synthesise both signals, don't pattern-match" constraint more consistently. Other models reverted to matching the dominant keyword in the title, particularly when title and description pointed to different niches.

The lesson: general benchmark scores don't predict performance on your specific prompt structure. Test on your actual task.

Colour generation: three problems, three solutions

Colour generation was the hardest single problem in the build. The task sounds straightforward — generate a palette for each page section that's visually coherent and accessible. It breaks into three independent sub-problems that each required a different approach.

Coherence with the merchant's existing store. A generated palette that clashes with the merchant's current theme is worse than no palette at all. We use node-vibrant to extract dominant colours from the merchant's active theme assets and pass those as seed inputs to colour generation. The model gets real context for what "on-brand" means for that specific store, not a generic niche palette.

Cross-section consistency. Language models generate colour values in isolation. They can reason that "coral and navy pair well" in the abstract, but they can't verify how a background chosen for Section A will look adjacent to an accent chosen for Section C. We constrain generation: the model selects from pre-validated colour families per niche rather than generating hex values freely. This bounds the combinatorial problem without requiring the model to do spatial reasoning it can't do reliably.

Accessibility compliance. WCAG AA requires a 4.5:1 contrast ratio for normal text. Every model we tested, including GPT-4, produced outputs below this threshold without explicit enforcement — ratios of 2.8:1 were common. There is no way to fix this by prompting. Models have no mechanism to calculate contrast ratios. We built programmatic contrast validation as a mandatory post-generation layer, with rule-based correction for failures. This is non-negotiable for any AI tool that outputs content a human will read.

Platform OAuth: knowing when to stop fighting a constraint

AliExpress and Alibaba's OAuth 2.0 implementations don't support machine-to-machine token refresh. Getting API access at all required weeks of company documentation review and approval. Documentation had gaps that only surfaced during implementation. For Amazon, we use ScrapingBee and Puppeteer rather than a first-party API entirely.

The mandatory manual step in AliExpress auth — a human retrieving an authorisation code for the initial token — is a hard platform constraint. We tried for a while to design around it. That was time wasted. Once we accepted it as immovable and designed the UX to surface it clearly rather than hiding it, the architecture became clean and we stopped chasing a problem that had no solution.

Shopify reliability: trust nothing, verify everything

Two Shopify-specific failure modes shaped how we approach every theme operation in the app.

The first: silent write failures. Shopify's theme write API can return a 200 GraphQL response with zero errors while silently dropping an app block. This happens when an extension UID is not properly registered or released. There are no error signals — the block simply doesn't persist. Every theme write in Krafted is followed by a read-back that confirms the expected content is in place before we mark the operation complete. This isn't something we added after an incident. It's the baseline.

The second: theme installation as a distributed process. Installing a pre-built theme involves seven steps: download assets, archive, upload to S3, generate a presigned URL, trigger installation via GraphQL, wait for Shopify's async processing, verify the result. Each step can fail independently. The async processing step has a race condition — theme assets aren't always committed by the time Shopify fires a webhook. We solved this by reading base templates from our own stored copies rather than making live reads against the newly installed theme. The flow is deterministic regardless of Shopify's processing time.

The underlying principle: Shopify's APIs are eventually consistent in ways the documentation doesn't always acknowledge. Build as if every write needs verification and every async operation might not be ready when you expect it.

Bundle logic: writing code for themes you've never seen

The bundle extension — cross-sell bundle offers with automatic discounts on any Shopify storefront — required a Liquid block that works correctly inside any merchant theme, written by any theme developer, running any JavaScript. You can't test against every theme. The implementation has to be correct in environments it will never encounter before production.

The challenge: intercept the cart add event, silently add bundle items via a secondary request, preserve the primary cart response that the merchant's theme uses to update its UI, and suppress conflicting change listeners that would otherwise cause variant selection to revert or display incorrectly. Cart JavaScript varies substantially across themes — different fetch implementations, different event dispatch patterns, different state management approaches. The implementation has to handle all of it without assuming anything about the host environment.

For discount logic, we used Shopify Functions to apply bundle discounts server-side as automatic discounts rather than through storefront JavaScript. This removes timing and race condition problems from the client-side layer entirely. The trade-off is Shopify Functions' data access constraints, which required careful scoping of the discount logic — but the reliability gain was worth it.

What We Shipped

AI product page generation — full Shopify page from any product URL across 7 niche themes, with niche-specific layouts, copy, and colour schemes
AI homepage generation — full homepage with product data embedded, in homepage-plus-product-page mode
AI logo generation — brand name + niche → logo across five format types (Icon, Wordmark, Emblem, Abstract, Monogram), style presets, batch output
AI product image generation — hero image generation with style presets, batch variations, Sharp-optimised output
7 niche theme system — Baby, Cosmetics, Fitness, Home Decor, Kitchen, Pet, Toys — install from ZIP, niche asset overlay, store logo integration
Bundle offers extension — Shopify app block with Shopify Functions-powered automatic discounts, auto-synced on product update and delete webhooks
Billing and usage infrastructure — HeyMantle subscription enforcement, per-generation usage tracking, billing-aware publish gate with resumable runs
GDPR compliance — full customer data request, redaction, and shop data handlers
Admin tooling — example output management, AI image style configuration, email alerting on consecutive generation errors

What's Next

More niche themes. Electronics, Clothing/Apparel, and Health/Supplements are the highest-volume Shopify categories not yet covered. Each requires its own layout logic, copy patterns, and colour constraint system — it's not a configuration add, it's a full theme build.

TikTok Shop and eBay import support. The scraping and normalisation architecture is already abstracted per platform. The primary work is schema normalisation for TikTok Shop's product data structure, which differs enough from AliExpress to require new niche detection benchmarking.

Post-generation CRO analytics. We know what we generate but not what converts. A feedback layer that connects generated page elements to real conversion outcomes would close the loop — and eventually let us use that data to improve generation decisions. This turns Krafted from a generation tool into a continuous improvement system.

Streaming generation UI. The current pipeline uses client polling. A section-by-section preview as each step completes would dramatically improve perceived performance and let merchants catch a wrong niche classification before the full pipeline runs.

Colour generation evolution. The current three-layer approach (node-vibrant seed → constrained generation → programmatic validation) is robust but manually maintained. As multimodal models improve at spatial reasoning, there's a path toward models that understand colour relationships well enough to reduce the constraint system. We're watching how GPT-5 and Gemini 2.0's vision capabilities develop specifically for this.

If you're building something in a similar space — an embedded Shopify app, a multi-step AI generation pipeline, or anything that needs to write reliably into third-party systems you don't control — get in touch. We're happy to talk through the architecture before you start.

For more on our approach to AI integration and automation and generative AI development, see our services pages.

Krafted — Building an AI Product Page Generator for Shopify