Chatbot Bad

Most chatbot interfaces are bad. They masquerade as flexible (“Ask me anything!”) while actually shifting the cognitive burden onto users. Building GenAI products at Amazon for the past year, I’ve learned that the best AI interfaces don’t start with questions, they start with answers. While extremely common, the blank chatbox should be AI’s last resort, not its default.

Gods, Copilots and Cogs

To understand why current approaches fall short, we need to first understand the different roles AI can play. Drew Breunig outlines a useful framework (popularized by Simon Willison):

Gods: The autonomous, AGI-level systems that can fully replace humans—still largely science fiction. This is what research labs are racing towards.
Copilots: Supervised assistants that work alongside humans, enhancing rather than replacing human capabilities.
Cogs: Reliable, single-purpose components that handle specific tasks consistently, like Whisper for transcription or Claude’s vision capabilities for document extraction.

While using LLMs as cogs offers immediate efficiency gains, the real value lies in the Copilot stage. Why? Because copilots can manage end-to-end workflows while keeping humans in the loop for critical oversight.

The Explore-Exploit Framework

Machine learning gives us a framework for thinking about AI interface design: the explore-exploit tradeoff. It’s a pattern we experience every time we open a restaurant menu. Do you go with your go-to-order (exploit) or try something new (explore)? Exploitation guarantees satisfaction but might mean missing out on an even better dish. Exploration could lead to a new favorite—or disappointment.

This dilemma extends to how we discover new restaurants entirely. The “explore” approach is like walking into a new city with no guide—you’ll eventually find good food, but it takes time and energy. The “exploit” approach is like having a local friend who immediately takes you to their favorite spots. An app like Yelp combines both: it starts by showing you curated recommendations (exploit), then lets you refine and explore based on your preferences.

This same pattern applies to AI interfaces:

Pure Exploration: Blank chatbox interfaces that force users to figure out what to ask and how to ask it
Smart Exploitation: Systems that automatically apply proven workflows and surface insights, then let users explore further
Better Together: Interfaces that start with valuable outputs, then enable guided exploration when needed

Start with Value: Real-World AI Interfaces

We’ve covered two key ideas: The best AI tools are copilots, and effective interfaces balance exploration with exploitation. Combining these insights reveals a clear pattern—the best AI interfaces start by exploiting known patterns to deliver immediate value, then enable exploration from this foundation.

Here’s how we’ve applied this at Amazon:

Contract Review System
- The Challenge: Reviewing contracts for tax compliance
- Start with Value: Automatic citation finding, term comparison, and classification
- Enable Exploration: Lawyers can investigate unusual terms or complex implications
- Impact: Review time reduced from hours to minutes while maintaining quality
Tax Horizon Scanning
- The Challenge: Tracking how new tax legislation affects Amazon’s business lines
- Start with Value: Automated scraping, extraction, and preliminary impact assessment
- Enable Exploration: Analysts can deep dive into specific implications
- Impact: Comprehensive coverage with focused human oversight
Property Tax Processing
- The Challenge: Generating reports to appeal property tax assessments
- Start with Value: Automatic classification + data extraction of property tax mail; draft appeal packages by building cost, income and valuation estimates
- Enable Exploration: Experts can refine property comparisons and valuation calculations
- Impact: Standardized output; ability to appeal more property tax assessments

This pattern extends beyond Amazon. Consider these public products:

NotebookLM, a research tool, starts by automatically summarizing your documents and suggesting research questions before you ever type a prompt. You can then use the chat interface, letting you explore your documents with the context already established.

Cursor, an AI IDE, automatically indexes your codebase and maintains context about relevant files and functions. When you ask questions, it already knows what code matters for your query. The result is a flow state where you can quickly accept, reject, or modify its suggestions, allowing you to write a ton of code quickly!

From Pattern to Principles

The examples above reveal consistent principles for designing effective AI interfaces:

Front-Load Value Deliver insights before interaction. Don’t ask users what they want to know, show them what they need to know. A system should proactively surface valuable patterns and insights before requiring user input.
Maintain Context The system, not the user, should be responsible for managing context. NotebookLM doesn’t make users repeatedly explain their research goals. Cursor doesn’t make developers copy-paste relevant code. The cognitive overhead of maintaining context belongs to the machine.
Guide, Don’t Gate The chatbox isn’t forbidden, it’s just not the first stop. Open-ended exploration becomes powerful after users have a foundation of insights to build from. The blank prompt should be a destination, not a starting point.
Standardize Outcomes By executing proven workflows automatically, we eliminate the gap between power users and novices. Instead of forcing each user to discover and chain together the right workflows, we codify best practices into the system itself. This leads to more consistent, reliable outcomes regardless of user expertise.
Learn from Exploration Today’s exploration becomes tomorrow’s exploitation. As users interact with the system, their explorations create patterns that can be automated. Common questions become automated insights, frequent workflows become default analyses, and manual paths become guided journeys.

The blank chatbox interface isn’t inherently bad, it’s just misplaced. By leading with value rather than questions, by exploiting before exploring, we can build AI interfaces that truly enhance human capabilities instead of taxing them.