Building ShopMule: AI Diagnostics for an Industry That Runs on Paper

The commercial vehicle repair industry is large, fragmented, and almost entirely offline. A heavy-duty truck repair shop — the kind that services 18-wheelers and construction equipment — might process 30-50 work orders per week. Most of them track this on paper tickets, whiteboards, or at best, a spreadsheet someone set up in 2008.

The software tools that exist are expensive, complicated, and built for dealership-scale fleets, not the 3-bay independent shop that does $2M in revenue with 8 technicians. ShopMule was built for those shops.

This post is about the technical architecture — specifically the AI diagnostics layer, which turned out to be both the hardest and the most valuable part.

The Domain Problem

Before writing a line of code, I spent time in actual shops. The workflow is more complex than it looks from the outside:

A truck comes in. The service advisor creates a work order, writes down the complaint ("engine light, runs rough"), and assigns it to a technician. The technician diagnoses the issue — this can take anywhere from 20 minutes to two days for intermittent faults. They request parts from the parts manager. Parts get ordered, received, tracked. Labor hours are logged. The work order closes. An invoice is generated. Payment is collected.

Multiply this by 30-50 units per week, across multiple technicians with different specialties, and you have a coordination problem that paper simply cannot solve.

The software requirement was clear: work orders, time tracking, parts/inventory, invoicing, payroll, and multi-user access with role-based permissions. That's the baseline. The AI layer came after.

Architecture

ShopMule is a Next.js 14 monorepo with a PostgreSQL backend via Prisma. Multi-tenancy is handled at the database level — every record has a shopId foreign key, and all queries are scoped through a tenant middleware layer.

// Middleware that scopes all DB queries to the current tenant
export async function withTenant<T>(
  shopId: string,
  query: (tx: PrismaClient) => Promise<T>
): Promise<T> {
  // Row-level security via Prisma middleware
  return prisma.$transaction(async (tx) => {
    await tx.$executeRaw`SET app.current_shop_id = $1`, [shopId];
    return query(tx);
  });
}

The API has 100+ endpoints covering the full shop lifecycle. The ones that see the most traffic are work order creation/updates, parts lookups, and time clock entries (technicians clock in/out per job).

Role-based access is strict:

Owner — full access including payroll and financials
Service Advisor — create/manage work orders, customer communication
Technician — view assigned jobs, log time, request parts
Parts Manager — inventory management, purchase orders
Accounting — invoices, payments, reports

Getting the permission matrix right took longer than expected. The edge cases multiply: a technician who is also the owner. A service advisor who should see some but not all financial data. I ended up with a capability-based system rather than pure role checks.

The AI Diagnostic Assistant

This is what separates ShopMule from a glorified spreadsheet.

The diagnostic assistant is a chat interface embedded in the work order page. When a technician is working on a job, they can describe what they're seeing and ask the assistant for help. The assistant responds with diagnostic procedures, common failure causes, torque specs, and parts recommendations.

The model is Llama 3.3 70B running via Groq's inference API. Why Groq? Speed. Diagnostics mid-job need sub-2-second responses. Groq's hardware (LPU chips) achieves 800+ tokens/second on Llama — OpenAI or Anthropic at that speed costs 5x more.

The system prompt is the key engineering challenge. A general-purpose LLM knows trucks in the same way it knows everything — superficially. The real value comes from injecting context:

You are a master diesel technician with 20+ years of experience
on heavy-duty commercial vehicles. You are helping a technician at [shop.name].

ACTIVE WORK ORDER:
- Unit: [year] [make] [model]
- Engine: [engineMake] [engineModel]
- Mileage: [mileage]
- Complaint: [complaint text]
- DTCs present: [comma-separated fault codes]

SHOP INVENTORY (parts currently in stock):
- [partNumber]: [description] ([qty] in stock)
...

Provide specific, actionable diagnostic steps. When recommending parts,
check the inventory above first. Cite TSBs or known issues when relevant.
Respond concisely — technicians are working, not reading essays.

The context window includes the specific vehicle (year/make/model/engine), the customer complaint, any diagnostic trouble codes already pulled, and a filtered subset of the shop's parts inventory relevant to the complaint category.

That last piece — inventory-aware recommendations — is what makes it genuinely useful. An assistant that says "you probably need a #3 injector" when the shop already has three of them in stock is more valuable than one that gives you a general answer and makes you go look it up.

Retrieval-Augmented Diagnostics

For common faults, I went further than pure LLM generation. I built a small knowledge base of:

Known service bulletins for major engine families (Cummins ISX, Detroit DD15, PACCAR MX)
Common fault code descriptions and procedures
OEM torque specs for high-frequency repairs

These are chunked, embedded (using OpenAI's text-embedding-3-small), and stored in pgvector. Before every assistant response, I retrieve the top-3 most relevant chunks and inject them into the context:

async function getRelevantContext(complaint: string, dtcs: string[]) {
  const query = complaint + ' ' + dtcs.join(' ');
  const embedding = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: query,
  });
 
  const vector = JSON.stringify(embedding.data[0].embedding);
 
  // pgvector cosine distance search
  const relevant = await db.$queryRaw(
    `SELECT content, source
     FROM knowledge_base
     ORDER BY embedding <=> $1::vector
     LIMIT 3`,
    [vector]
  );
 
  return relevant;
}

This grounds the LLM's responses in actual OEM documentation rather than letting it hallucinate procedures. Technicians noticed the difference immediately — responses started citing real TSB numbers and matching OEM procedures they already knew.

Stripe Integration and Invoicing

Shops need to get paid. The invoicing system generates PDF invoices from work orders, tracks payment status, and processes payments via Stripe. The payment flow handles both immediate card payments and net-30 terms for fleet accounts.

One thing I got wrong initially: I built the invoice as a separate entity from the work order. This created sync issues — technicians would add labor after the invoice was generated and the totals wouldn't match. The fix was making the invoice a computed view of the work order rather than a separate record. It's generated on-demand and always reflects current state.

Mobile App

Technicians aren't at desks. The React Native companion app covers the use cases that happen in the shop bay:

Clock in/out on jobs
View assigned work orders
Scan parts barcodes to add to a work order
Photo documentation of damage/repairs

The mobile app shares the same API layer as the web app. The only mobile-specific engineering was offline support — shop bays often have poor WiFi. I used a local SQLite cache with sync-on-reconnect for work order data, so technicians can continue logging time even when connectivity drops.

What I Learned

Domain expertise matters more than model capability. The quality gap between a well-prompted Llama 3.3 with good context vs. GPT-4 with a generic prompt heavily favors the former. Injecting the right information is the job.

Multi-tenancy is a first-class concern, not an afterthought. Retrofitting tenant isolation after building features is painful. I learned this from having to go back and add shopId scoping to 40+ queries after the fact.

Trust your users on UX. My initial work order form had 12 required fields. Shops refused to use it. The revised version has 3 required fields and 20 optional ones. Adoption went up immediately.

ShopMule is live and being used by real shops. The AI assistant has become the feature that gets mentioned in every demo — not because AI is a buzzword, but because it genuinely cuts diagnostic time for junior technicians who don't yet have the 15 years of mental models that make a master tech fast.