Growth

    Home Services - AI Lead Engagement System

    Case study on using an AI lead generation system for home improvement services in United Kingdom

    Outcome: 98% leads contacted within 5 minutes
    Home Services - AI Lead Engagement System

    Client: Confidential (Home Services — Roofing & Exteriors)

    Industry: Home Services (Roofing, Siding, Guttering)

    Location: Greater Manchester, UK

    Revenue: £3.2M/year (2025)

    Team Size: 18 employees (4 sales, 2 admin, 12 field crews)

    Engagement: 10-week pilot-to-production

    Department: Growth

    Executive Summary

    A family-run roofing company in Greater Manchester was haemorrhaging leads without knowing it. They did great work — their Google reviews were solid, their referral rate was the envy of their competitors — but they had a hidden problem that was quietly costing them close to a quarter of a million pounds every single year.

    The problem wasn't their prices, their workmanship, or their reputation. It was something far more mundane: they simply weren't getting back to people fast enough.

    When a homeowner's roof springs a leak at 4pm on a Friday, they don't wait until Monday morning for a callback. They pick up the phone and call the next roofer on the list. And if nobody picks up — which was happening 66% of the time — they leave a voicemail, wait a few hours, and then call the next company. By the time Monday rolled around, most of those leads had already booked someone else.

    Web form submissions were even worse. They landed in a shared inbox and sat there for 4 to 8 hours before anyone even opened them. Weekend leads? They'd be lucky to hear back by Tuesday.

    We built them an AI lead engagement system that changed everything. It brought response times down from hours to seconds — literally 8 seconds on average. It recovered about a third of the leads they'd been losing. And it freed up 12 hours of admin time every single week, which the team poured back into high-value work like following up on outstanding quotes and managing their online reputation.

    The total project cost about £12,500. They made that back in about six weeks.

    1. Audit Phase

    What We Audited

    Before we built anything, we needed to understand exactly what was happening on the ground. So we parked ourselves in their office for a week and shadowed everything.

    We tracked three specific workflows:

    The phones. Every single inbound call across five working days got logged. What time did it come in? Did somebody pick up? If not, did the caller leave a voicemail? And if they left a voicemail, did anyone call them back within a reasonable window — say, four hours?

    The web forms. This company was running a fairly standard WordPress site with WPForms, and every time somebody submitted a quote request or a service inquiry, it landed in the team's shared email inbox. We traced each one of those submissions from the moment it arrived to the moment a human first made contact. We wanted to know: how long did that take, and what happened next?

    The qualification process. When a sales rep finally did get on the phone with a lead, what did they actually ask? What information were they trying to gather? And — crucially — was any of that information being written down somewhere structured, or was it living in their heads and on sticky notes?

    We didn't go in with assumptions. We just watched, took notes, and counted.

    What We Found

    The numbers that came back were sobering — not because the team was bad, but because the system was fundamentally broken in ways they'd stopped noticing.

    Case study illustration

    Let those sink in for a second. Two-thirds of phone calls were going to voicemail. Nearly a quarter of those voicemails were never returned at all. Web leads took nearly six hours to get a response on average. Weekend leads — and in home services, weekends are when people are home and noticing problems — got zero response until the work week started.

    The thing is, this wasn't anybody's fault. The company had two admin staff trying to cover phones, emails, and scheduling across 18 people. During a typical day, the office was completely unattended for three or more hours — lunch breaks, materials runs, site visits. The admin team was doing their best, but they were outnumbered and overwhelmed.

    The research on this is well-established: if you contact a lead within 5 minutes, your chances of converting them go up by something like 9x. This company was averaging 5.7 hours.

    And even on the calls that did happen, we noticed something else. Sales reps were spending the first 5 to 10 minutes of every conversation asking the same basic questions — what type of roof, how old is it, what's the urgency, what's your budget — and then none of that information was being recorded anywhere structured. It lived in the rep's memory or on a scrap of paper. So if a different rep picked up the follow-up, they'd ask the same questions all over again.

    What the Client Took Away

    We sat down with the two brothers who own the business and walked them through the numbers. It was one of those conversations where you could see the penny dropping in real time.

    The first realisation: they weren't losing business because their work was subpar or their prices were too high. They were losing business because they were invisible after the first inquiry. When a homeowner needs their roof fixed, they don't wait around — they call the person who answers first.

    The second realisation: their admin team was drowning in work that a machine could handle. Call answering, basic data entry, repeating the same qualification questions over and over — these are exactly the kinds of tasks that AI is good at. And the time they were spending on those tasks was time they couldn't spend on things that actually move the needle, like scheduling surveys or following up on quotes that had gone cold.

    The third realisation — and this one came from the founders themselves, not from us — was that any solution had to feel personal. One of the brothers put it perfectly. He said: "Our customers choose us because we're the friendly local roofer. I don't want to lose that."

    What They Required

    Based on everything we'd learned during the audit, we sat down and wrote out a clear set of requirements before we wrote a single line of workflow logic.

    First, speed was non-negotiable. Every lead that came through a web form had to get a first response within 60 seconds, 24 hours a day, 7 days a week. Not "within business hours." Not "as soon as someone's free." Immediately.

    Second, the system had to qualify leads conversationally — asking the same questions the sales team asked (roof type, age, urgency, budget, location) and writing the answers down somewhere useful.

    Third, everything had to sync to their CRM (they were on HubSpot). Every interaction, every scrap of qualification data, every email sent and received — it all had to be recorded automatically.

    Fourth, when a lead looked promising — urgent, in their service area, with a decent budget — the sales team needed to know about it immediately, with full context, so they could jump on it.

    Fifth, and this was the one the founders cared about most, there had to be a human handoff that felt natural. The sales team needed to be able to step into any conversation at any point and take over seamlessly, with the full history right there in front of them. But if a lead wasn't ready to buy, the system needed to keep the conversation going on its own.

    Sixth, weekends and evenings couldn't be dead zones anymore. Out-of-hours leads needed the same instant-response experience as someone who submitted a form at 10am on a Tuesday.

    And seventh, and maybe most importantly, every single communication had to sound like them. Not like a chatbot. Not like a corporate FAQ machine. Like a friendly local roofer who genuinely wants to help. No robotic responses, no weirdly formal language, nothing that would make a customer feel like they were talking to a piece of software.

    2. Build Phase

    What We Built

    Over the course of four weeks, we designed and built a multi-component AI lead engagement system. It ran on a self-hosted n8n instance that tied together several different pieces: GPT-4o for understanding and generating language, HubSpot for CRM, Calendly for booking, Slack for alerts, and Twilio for SMS fallback.

    The architecture looked something like this:

    WordPress (WPForms)


    n8n Workflow Engine (self-hosted)

    ├──► GPT-4o (understanding & response generation)
    ├──► HubSpot CRM API (contact & deal management)
    ├──► Calendly API (survey booking)
    ├──► Slack API (hot lead alerts)
    └──► Twilio SMS (backup for urgent leads)

    The web form trigger. When somebody fills out a quote request on the website, a webhook fires immediately to our n8n endpoint. Within a second or two, the workflow has extracted all the form fields, checked the postcode against the company's service area (12 postcode districts around Manchester), and sent the whole package to GPT-4o with a prompt designed to generate a warm, personal first-response email.

    We wrote and rewrote that prompt three times before we got it right. The first version was too formal — it read like a hotel confirmation email. We stripped it back, made the sentences shorter, added local references ("Manchester weather has been brutal on roofs this year, hasn't it?"), and made the call to action crystal clear: book a free survey, here's a link, pick a time that works for you.

    The email goes out through HubSpot so open rates and click-throughs are tracked. And if the lead provided a phone number, a brief SMS follow-up goes out two minutes later as a backup — just in case the email lands in spam or goes unread.

    The qualification conversation. When a lead replies to that first email — and most of them did — the agent picks up the thread and starts a natural back-and-forth to gather the information the sales team needs. What kind of property is it? What type of roof? How old? What's the issue — a leak, storm damage, a full replacement? How urgent is it? What sort of budget are you working with? Are you the homeowner or is this a rental property?

    We designed the agent to ask these questions one at a time, in context, rather than firing off a long list. If a lead says "I've got a leak in my kitchen", the agent doesn't then ask "what's your budget?" — it follows up naturally: "That sounds urgent. Has the water damaged anything inside yet?" Each answer is extracted and stored as a custom field on the HubSpot contact record, so by the time a human gets involved, they have the full picture without having to ask a single question.

    The scoring engine. Once the agent has enough information, it runs the lead through a scoring model we built into the n8n workflow. Here's roughly how it worked:

    In-service area leads started at +30. Urgent jobs scored higher — +25 for "immediate", +15 for "within the month". Roofs older than 15 years were +20 (they're more likely to need significant work). Budgets over £5K were +20. Homeowners scored higher than renters. Leaks and storm damage scored higher than general inquiries.

    A lead scoring 70 or above was classified as hot. When that happened, a notification went to the #hot-leads Slack channel with the lead's name, their full qualification summary, a breakdown of how they scored, and a direct link to the conversation. The assigned sales rep could click a "Take Over" button that instantly switched the conversation to manual mode — from that point on, all emails came from the rep's inbox, with the full history pre-loaded.

    Warm leads (scores between 40 and 69) continued in automated conversation. The agent offered them a Calendly link to book a survey and sent a daily summary to the sales team every morning at 9am.

    Cold leads got a polite, genuine response — "Thanks for your interest, no pressure at all. If your situation changes, we're here." — and were flagged for a gentle 30-day nurture sequence. A monthly check-in email, nothing pushy.

    The temperature dial. This was the feature the company owners cared about most, and honestly, it turned out to be the key to the whole project's success.

    We built a sliding scale of autonomy into the HubSpot deal record. Every lead could be assigned one of four levels:

    • Level 1 — Auto. The agent handles everything. The sales team only sees alerts for hot leads. They don't see the routine conversations at all.
    • Level 2 — Notify. The agent handles the conversation but CCs the assigned rep on every email. The rep can read along if they want to, but they don't need to do anything.
    • Level 3 — Co-pilot. The agent drafts every response and queues it in Slack as an approval button. The rep reviews it, clicks approve (or makes a quick edit), and the email goes out.
    • Level 4 — Manual. Everything is handled by the human. The agent logs the interaction for the record but doesn't respond.

    We started every lead at Level 2 — Notify — so the team could see what the agent was saying and build trust gradually. Over the weeks, as they saw the quality of the responses, they started moving leads to Level 1 themselves.

    Testing and Iterating During the Build

    We didn't just build this thing in isolation and hand it over. We tested, tweaked, and refined continuously throughout the four weeks.

    Tone calibration. After the first 50 or so generated responses, we sat down with the admin team and asked them to rate each one on a scale of 1 to 5 for tone, accuracy, and completeness. The average came back at 3.8 — decent, but not where we needed it to be. So we went back to the prompt and refined it with 12 specific examples of "what good looks like", using the team's own past email responses as reference material. The score climbed to 4.6.

    Postcode edge cases. The service-area validation started as a simple list of postcode districts. But we quickly discovered that some postcodes that technically fell within the right districts were actually on the wrong side of a geographic boundary — a river, for instance, that takes 40 minutes to drive around. So we added a Google Maps driving-time check. If a property is more than 45 minutes from the office, the lead gets flagged as remote and the agent offers a virtual consultation before committing to a site visit.

    Conversation context. Early on, we noticed that if a lead replied to an agent email with a follow-up question, the agent would sometimes lose the thread — asking for the postcode again, for example, even though the lead had already provided it two emails ago. We fixed this by adding conversation thread summarization. Before every GPT-4o call, the agent compiles a short summary of the entire conversation so far and prepends it to the prompt. The context drift disappeared entirely.

    3. Testing Phase

    Shadow Mode (Two Weeks)

    Before we let this thing loose on real customers, we ran it in shadow mode for two weeks alongside the existing manual process.

    Here's how it worked: every web form submission was processed by both the AI agent and the admin team, completely independently. The agent's responses were logged but never sent to the lead. The admin team sent their usual manual responses as if nothing had changed. And every day, we compared what the agent would have said against what the human actually sent.

    We graded the agent on five criteria, and we set pass thresholds for each one before we started:

    Week 1 was close but not quite there. The agent was getting the basics right, but the tone was still a bit off — it sounded competent but not quite like them. The factual error rate, while low, wasn't zero: it once quoted a price for a type of roof repair the company didn't actually offer.

    By the end of Week 2, though, the agent had crossed every threshold. And something interesting happened: the admin team, who had been quietly sceptical at the start, started asking if they could switch it on for real. They could see that the agent was handling first responses well, and they wanted their time back.

    The A/B Test (Week 3 of Testing)

    We didn't flip a switch and hope for the best. We ran a controlled A/B test for one week.

    All new web leads were randomly split into two groups. The control group went through the existing manual process — admin responds within 4 to 8 hours, calls back when they can, no structured qualification. The test group got the full AI treatment: instant 8-second response, the conversational qualification flow, hot lead alerts, and Calendly booking links.

    The results were stark:

    Case study illustration

    The booking rate was the headline number. The AI agent was booking surveys at nearly double the rate of the manual process. And nobody complained — not one person unsubscribed or wrote back saying "stop sending me robot emails."

    The founder's reaction said it all: "I was ready to hate this. But honestly, the AI sounds like us. It's faster than we could ever be, and customers are actually booking more surveys."

    4. Production Phase

    Rolling It Out (Week 8)

    We rolled the system into full production over four carefully staged days.

    Day 1 was about building confidence. All new web form leads were routed to the AI agent at Temperature Level 2 — Notify — so the sales team was CC'd on every outbound email and could see exactly what was being sent. We also set up a safety net: if the agent failed to generate a response for any reason (API outage, timeout, unexpected input format), the lead was escalated to a human within 5 minutes via a Slack alert.

    Day 2 we turned on the Calendly booking links in the agent's responses. The team was nervous about double-booking — what if two leads booked the same time slot? — so we monitored this closely. It never happened. The calendar sync handled it cleanly.

    Day 3 we activated the hot lead alerts. The founder set up his phone to receive Slack notifications so he could see hot leads come in even when he was up on a roof. Within the first day, he jumped on a call with an urgent lead within 3 minutes of the form being submitted — something that would have been impossible before.

    Day 4 we formally deprecated the manual lead-response workflow. The admin team's job description shifted: instead of spending their days responding to form submissions and answering the same questions over and over, they now spent their time on survey booking confirmation calls, following up on quotes that had gone cold, and managing the company's Google reviews.

    The Rollback Plan

    We put three conditions in place that would trigger an immediate rollback, and we were completely transparent with the client about what they were:

    1. If two or more customers complained about feeling misled by automated communication — if they felt "botsplained", as some people put it — we'd revert to manual and add a clear disclosure banner to the first response.
    2. If the agent's qualification data capture dropped below 80% over a 3-day rolling window, something was wrong and we'd pause.
    3. If the monthly API costs exceeded £500 without a corresponding lift in bookings, the economics didn't work and we'd reassess.

    None of these conditions were ever triggered. Not in the first month, not in the first 90 days, and not since.

    Training the Team

    We ran two one-hour training sessions before turning the system on. Nothing too elaborate — practical, hands-on stuff.

    With the sales team (four people), we walked through how to read the qualification summary that appeared on every lead before their first call, how to use the temperature dial to take over a conversation when needed, and how to interpret the hot-lead scoring breakdown so they knew which leads to prioritise.

    With the admin team (two people), we covered how to monitor the dashboard, how to review the agent's performance on a weekly basis, how to flag responses that needed tone adjustment, and how to handle the Slack approval queue for Level 3 (co-pilot) conversations.

    We also left them with a 3-page knowledge transfer document — nothing overly technical, just a clear diagram of how the system was put together, the escalation paths if something went wrong, a weekly review checklist, and the name and number of who to call if something broke.

    5. Observe Phase

    What Changed in 90 Days

    We tracked 28 different metrics over the first 90 days of production. Here are the ones that mattered most:

    Case study illustration

    What It Meant for the Business

    The revenue picture was the clearest signal. The company recovered roughly £18,000 to £22,000 per month in revenue that would previously have been lost to slow or non-existent follow-up. Over 90 days, that added up to about £58,000 in incremental revenue — against a total project cost of £12,500 (covering the build and the first three months of LLM API usage and hosting). The payback period was about six and a half weeks.

    The admin team's work changed fundamentally. They went from spending 14 hours a week on lead response — repetitive, draining work — to about 2 hours a week, mostly monitoring and handling the occasional edge case. The 12 hours they got back didn't just disappear into coffee breaks. They redirected it into three specific things: proactive survey booking confirmation calls (making sure homeowners would actually be home for the survey), following up on quotes that had gone cold (going from 12 follow-ups a week to 22), and managing the company's Google Reviews (response rate went from 30% to 90%).

    The customer experience improved, not degraded. This was the outcome that surprised the founders most. They had been genuinely worried that automation would make their business feel impersonal. But the opposite happened. The instant response — even from an AI — was perceived as attentive, professional, and refreshingly fast. Several customers explicitly mentioned in post-job feedback that they appreciated the immediate response and not having to wait for a call back. The company's CSAT score went from 4.2 to 4.5.

    The team felt differently about their work. We ran an anonymous survey at the 90-day mark. The admin team reported higher job satisfaction — they felt they were doing more valuable work instead of fighting fires all day. One of the sales reps put it better than we ever could: "I used to spend the first five minutes of every call trying to figure out basic info. Now I open the CRM and the lead's whole story is there. I can start selling in thirty seconds."

    What We Added After 90 Days

    Once we had three months of data, we went back and added a few improvements based on what we'd observed:

    The first was an SMS fallback for unread emails. We noticed that about 23% of leads never opened the first email the agent sent. So we added a simple trigger: if an email goes unopened for two hours, the agent sends a brief SMS with a Calendly link and a friendly note: "Hi [name], just checking you saw our message about your roof. Here's a link to book a free survey at a time that works for you."

    The second was a quote follow-up agent. The lead engagement system was extended to keep track of outstanding quotes. If a quote went untouched for seven days, the agent sent a personalised check-in: "Just checking in — any questions about the quote? Happy to adjust if the scope has changed." Nothing pushy, just a gentle nudge.

    The third was automated review requests. Once a job was completed, the agent sent a review request via SMS 48 hours later with a direct link to the Google Review page. Review volume went from about 4 per month to about 18 per month, which had its own compounding effect on new lead generation.

    Key Takeaways

    Speed is the highest-leverage variable in lead conversion. Cutting response time from hours to seconds had a bigger impact on booking rates than any change to pricing, messaging, or service quality could have achieved. If you're a service business and you're not responding to leads within minutes, you are leaving money on the table. It's that simple.

    Human handoff is essential for adoption. The temperature dial was the single most important design decision we made. It gave the team a gradual on-ramp — they could monitor, then co-pilot, then trust. Without it, I don't think the owners would have accepted the system. They needed to feel in control, even if they rarely exercised that control.

    Audit data sells itself. We didn't have to convince the founders with industry statistics or abstract arguments. The audit week produced numbers that were specific to their business and undeniable. That's always more powerful than theory.

    The solution has to sound like the business. A boilerplate AI response would have failed the brand-voice test, and the founders would have pulled the plug within a week. The time we spent tuning the tone — using their own past emails as training material, involving the admin team in the scoring, iterating on the prompt — was the difference between a system that felt like a piece of software and one that felt like a member of the team.

    Engagement completed Feb 2026. Client name withheld by request.