GUIDEGEEK
When there's no playbook, you write one. Researching a new product category under maximum uncertainty.
GUIDEGEEK
This is a story about research strategy under conditions of maximum uncertainty. The AI was the context. The real work was figuring out how to generate reliable insight when almost nothing was stable.
Discoverability at scale
NYC Tourism's site was content-rich and, for many users, overwhelming. Research and analytics showed a consistent pattern: visitors weren't finding what they needed, getting lost in the depth of the experience, and abandoning. A conversational interface offered a potential solution, an intermediary that could surface the right content for the right user at the right moment, without requiring them to navigate the full site architecture.
GuideGeek was that bet. My job was to find out whether it was paying off, and if not, why.
Leading research on GuideGeek meant navigating a set of compounding constraints simultaneously. The product hadn't fully launched yet, which meant early testing relied heavily on internal stakeholders who didn't reflect real user behavior. The tooling wasn't stable, platform limitations with Lyssna, and an incomplete adoption of UserTesting meant I had to adapt methodology in real time. And the product behaved differently across web, Instagram, and WhatsApp, adding another layer of complexity to any attempt at consistent measurement.
The deeper challenge was definitional. There were no established benchmarks for what a good AI response looked like in this context. I had to define evaluation criteria, shifting from satisfaction-based questions toward understanding user intent, while simultaneously designing and running the research. As a team of one responsible for both the product experience and the research strategy, there was no one to hand off to and no existing framework to borrow from.
The move: Shift the research lens from "are users satisfied?" to "what are users actually trying to do?", and build evaluation criteria that could diagnose gaps in response quality, not just surface them.
The primary audience for the Meeting Planners chatbot wasn't a casual visitor browsing things to do. Meeting planners are professionals making decisions on behalf of groups, which meant every response needed to be accurate, curated, trustworthy, and context-aware. Their queries often masked deeper logistical complexity. A question about venue capacity was really a question about whether they could stake their professional reputation on the recommendation.
That level of expectation raised the design bar significantly. Generic chatbot responses, the kind that work fine for low-stakes consumer queries, weren't going to cut it for this audience. Editorial quality wasn't a nice-to-have. It was the minimum viable bar.
Early launch data showed strong initial engagement. Users were interacting, asking questions, and exploring the product. But they weren't coming back. The instinctive read was a usability problem. The actual diagnosis was more strategic.
The research uncovered an expectation mismatch. Users were looking for curated, editorial-style recommendations, the kind of response a knowledgeable local insider would give. What the product was delivering felt generic. The gap wasn't in the interface. It was in the tone, the response design, and the overall product positioning.
That finding reframed the entire conversation. The question shifted from "how do we make the chatbot easier to use" to "how do we make it worth coming back to." Those are very different product problems with very different solutions.
The implication: Retention wasn't a metric to optimize. It was a signal that the product needed to be repositioned from a functional tool toward a curated, authoritative experience that meeting planners could trust and return to.
Internal testing of the Meeting Planners chatbot launched in January 2025. The early data told a clear story about where the product was strong and where the real opportunity lived.
320
Active users at launch & digital out-of-home
+1,000% growth
311
New users, primarily US-based
Strong initial acquisition1,759
Messages sent during the launch period
91%
Interactions via web vs. WhatsApp and other platforms
The retention signal: Only 8 return users, a low number that didn't indicate failure. It indicated exactly where the product needed to go next. The acquisition was working. The experience wasn't yet giving users a reason to come back.
Full index
← All work