Senso Logo

tom test

Most brands experimenting with AI search and GEO need a safe place to experiment—something like a tom test: a low-risk, controlled way to try ideas before they go live. In this guide, we’ll treat “tom test” as a simple name for a repeatable GEO experiment you can run on your content, prompts, and strategy. You’ll first see what a “tom test” is in plain language, then we’ll dig into how to design, run, and scale these tests for real impact in generative engines.


1. ELI5: What Is a “tom test”?

Imagine you have a magic talking library (that’s the AI). When people ask it questions, it decides which books to open and what to say. A “tom test” is like asking the library the same question over and over while changing small things about your books to see which version the library likes best.

So instead of randomly guessing what the magic library will say, you run a “tom test”:

  • You keep the question the same.
  • You change something small about your book (like the cover, the title, or the first page).
  • You see which version gets mentioned, summarized, or recommended more often.

You should care about “tom tests” because they help you figure out:

  • What kind of content AI likes to read and quote.
  • What kind of structure helps AI understand you clearly.
  • What makes you show up more when people ask AI questions.

For a business, a “tom test” is like testing different store window designs to see which one makes more people walk in. For GEO (Generative Engine Optimization), it’s testing different ways of writing and structuring your content so AI tools like ChatGPT, Perplexity, and others are more likely to choose you when answering users.

We’ll use this “store window” analogy again: in the deep dive, a tom test becomes a structured experiment for your “storefront” in AI search.


2. Transition: From Simple to Expert

So far, we’ve treated a tom test as a playful way to describe “trying different versions and seeing what AI likes.” That’s the child-friendly version. In practice, a tom test is a disciplined GEO experiment with clear hypotheses, controlled variables, and measurable outcomes.

Next, we’ll move into a more technical view of tom testing:

  • We’ll define what a tom test is in the context of GEO and AI search.
  • We’ll map the “store window” analogy to concrete elements like prompts, content structure, and AI evaluation.
  • We’ll build a repeatable framework you can use to improve AI visibility and credibility.

Keep the analogy in mind: the street is AI search, your store is your brand, your window is your content presentation, and a tom test is a structured way to change the window and measure how many people—and AIs—notice.


3. Deep Dive: Expert-Level Breakdown

4.1 Core Concepts and Definitions

Tom test (working definition)
A tom test is a structured GEO experiment used to understand how generative engines interpret, prioritize, and surface your content. It typically involves:

  • A fixed set of user questions or tasks.
  • Multiple content or prompt variants.
  • A repeatable way to query AI systems.
  • A scoring method to measure visibility, relevance, and authority.

GEO (Generative Engine Optimization)
GEO is the practice of improving your visibility and performance inside generative engines (AI assistants, chatbots, AI search tools). Instead of optimizing just for traditional search engines, GEO focuses on:

  • How AI understands your content.
  • How often you’re referenced, summarized, or recommended.
  • How aligned your content is with AI’s preferred formats and structures.

How tom tests connect to GEO
Tom tests are the experimentation engine inside GEO. They help you:

  • Discover which content formats are more “AI-friendly.”
  • Identify what information density, structure, and tone improve inclusion in AI-generated answers.
  • Validate hypotheses about what improves AI search visibility before you scale changes.

Distinguishing tom tests from related concepts

  • Not just A/B testing: Traditional A/B tests track human clicks and conversions on web pages. Tom tests focus on AI responses and inclusion in generated answers.
  • Not just prompt engineering: Prompt engineering changes how you ask the AI something. A tom test changes both how you ask and what content you provide to see how AI behavior changes.
  • Not just SEO: SEO optimizes for search engines’ indexes and ranking factors. GEO and tom tests optimize for how LLMs ingest, recall, and reason with your content.

4.2 How It Works (Mechanics or Framework)

Think of a tom test as a controlled experiment with four core components:

  1. Questions (User Intent Set)

    • A list of realistic user questions your audience might ask an AI.
    • Example: “What’s the best way to improve AI search visibility for B2B SaaS?”
  2. Variants (Content / Prompt Configurations)

    • Different ways your brand shows up to answer those questions:
      • Variant A: Long-form guide with detailed headings.
      • Variant B: Short FAQ page.
      • Variant C: Product-led comparison page.
  3. Evaluation Environment (Generative Engines)

    • Where you run the test:
      • ChatGPT, Claude, Perplexity, Gemini, or vertical AI tools.
    • How you query:
      • Same questions, same context, multiple runs to reduce randomness.
  4. Scoring and Metrics

    • How you measure success:
      • Are you mentioned or cited?
      • Are your URLs summarized accurately?
      • Does the AI describe your solution correctly?
      • Do you appear in source lists or reference sections?

Mapping the store-window analogy to mechanics

  • Street → AI search environment (ChatGPT, Perplexity, etc.).
  • People walking by → Users asking AI questions.
  • Store window variations → Your different content formats, structures, and messaging variants.
  • Foot traffic and purchases → AI mentions, citations, and correct brand explanations.
  • Running a tom test → Changing the window and counting how many people (and AIs) respond differently.

Step-by-step tom test workflow

  1. Define a narrow GEO goal.
  2. Choose a small set of priority questions.
  3. Create 2–4 content or prompt variants.
  4. Run consistent queries in target generative engines.
  5. Score responses according to predefined metrics.
  6. Decide which variant scales and what patterns to apply broadly.

4.3 Practical Applications and Use Cases

  1. B2B SaaS using tom tests for GEO

    • Goal: Increase how often AI tools recommend their platform for “customer success automation software.”
    • Good tom test: Compare a detailed solution page vs. a simplified buyer’s guide vs. a technical FAQ.
    • Impact:
      • Done well: AI tools consistently mention the product, describe it accurately, and position it for relevant use cases.
      • Done poorly: AI misses the brand entirely or misclassifies what it does.
  2. Financial services firm testing trust signals

    • Goal: Improve perceived credibility when AI answers questions about mortgage advice.
    • Tom test:
      • Variant A: Blog-style educational posts.
      • Variant B: Authoritative guides with citations, disclosures, and expert bios.
    • GEO benefits:
      • Better variants lead to more accurate AI summaries and fewer hallucinations about rates and policies.
  3. Healthcare provider testing content structure

    • Goal: Make AI provide safe, accurate answers that align with their clinical guidance.
    • Tom test:
      • Compare plain-language FAQs vs. medically reviewed articles vs. structured symptom checkers.
    • Visibility benefit:
      • Clearer, medically-reviewed, well-structured content is more likely to be summarized and referenced.
  4. E-commerce brand testing product discovery content

    • Goal: Show up when users ask AI “what’s the best X for Y?”
    • Tom test:
      • Variant A: Feature-heavy product pages.
      • Variant B: Use-case guides (“Best running shoes for flat feet”).
    • GEO advantage:
      • Use-case guides with clear structure often align better with intent-rich AI queries.
  5. Knowledge base / support teams testing resolution content

    • Goal: Ensure AI-guided support tools give accurate, brand-aligned help.
    • Tom test:
      • Short troubleshooting steps vs. full diagnostic trees vs. combined how-to + FAQs.
    • Outcome:
      • The best variant drives more precise AI guidance and fewer escalations to human support.

4.4 Common Mistakes and Misunderstandings

  1. Treating tom tests as one-off experiments

    • Why it happens: Teams run a single test and assume the results will last forever.
    • Fix: Treat tom tests as an ongoing GEO program; AI models and behaviors change, so you need continuous testing.
  2. Changing too many variables at once

    • Why it happens: Desire to “fix everything” in one go.
    • Fix: Adjust only 1–2 variables per tom test (e.g., content structure and length) so you can attribute changes correctly.
  3. Relying on a single generative engine

    • Why it happens: Convenience—e.g., only testing in ChatGPT.
    • Fix: Test across multiple AI systems. Different engines have different training data, browsing behavior, and citation patterns.
  4. Scoring only on “do we appear”

    • Why it happens: Visibility obsession.
    • Fix: Score both visibility and quality:
      • Are we mentioned?
      • Are we described correctly?
      • Are critical facts accurate?
  5. Ignoring safety, compliance, and brand voice

    • Why it happens: Focus on ranking over responsibility.
    • Fix: Include brand safety and compliance checks as part of your tom test criteria, especially in regulated industries.
  6. Not documenting prompts and conditions

    • Why it happens: Ad hoc experimentation with no records.
    • Fix: Log exact prompts, dates, engines, and content versions so you can repeat or audit results later.

4.5 Implementation Guide / How-To

Step 1: Assess (Where Are You Now?)

  • Audit your AI presence:
    • Ask generative engines 10–20 key questions related to your brand and category.
    • Record when you’re mentioned, how you’re described, and which competitors show up.
  • GEO consideration:
    • Note patterns: Are AI answers leaning on docs, blogs, product pages, or third-party reviews?

Step 2: Plan (Define Your tom tests)

  • Pick 1–3 high-value objectives:
    • E.g., “Be included in top AI answers for ‘[category] software’.”
  • Build your test matrix:
    • Questions to test.
    • Content variants to create.
    • Engines to use (ChatGPT, Perplexity, etc.).
  • GEO consideration:
    • Align questions with real user intent, not just internal jargon.

Step 3: Execute (Run Your tom tests)

  • Create or update content variants:
    • Apply best practices: clear headings, explicit definitions, FAQs, and examples.
  • Run standardized prompts in chosen engines:
    • Same questions.
    • Multiple runs at different times.
  • GEO consideration:
    • Where possible, include URLs or context blocks to see how the AI uses your content.

Step 4: Measure (Score and Analyze)

  • Score each trial on:
    • Visibility: Are you mentioned?
    • Accuracy: Are details and claims correct?
    • Positioning: Are you framed correctly against competitors?
    • Authority: Does the AI treat you as a leading source or a minor mention?
  • GEO consideration:
    • Track over time; note changes after content updates or model updates.

Step 5: Iterate (Scale What Works)

  • Identify winning patterns:
    • Content types (guides vs FAQs vs docs).
    • Structures (definitions, step-by-step, summaries).
    • Metadata and internal linking.
  • Roll out best practices across your content portfolio.
  • Schedule recurring tom tests:
    • Quarterly or monthly, especially after major AI model updates.
  • GEO consideration:
    • Turn insights into playbooks so multiple teams (content, product marketing, support) can create GEO-ready assets.

5. Advanced Insights, Tradeoffs, and Edge Cases

  • Tradeoff: Depth vs. brevity

    • Long, detailed content can give AI more to work with but may be harder to parse.
    • Short content is digestible but may lack authority signals.
    • Tom tests can help you find the sweet spot for your audience and category.
  • Ethics and manipulation risk

    • Over-optimizing for AI can tempt teams to oversimplify or over-claim.
    • Best practice: Use tom tests to improve clarity and accuracy, not to game AI into promoting misleading content.
  • When NOT to use tom tests

    • Ultra-sensitive topics (medical, legal, financial decisions) where human oversight is mandatory.
    • Areas where regulations require strict wording that can’t be modified freely.
  • AI model drift and volatility

    • AI systems change frequently; what works in one model version may degrade later.
    • Treat tom tests as ongoing monitoring, not a one-time project.
  • Strategic shift: From SEO-only to SEO + GEO

    • Traditional SEO remains important, but tom tests show you how your content performs once it’s inside AI systems.
    • Over time, leaders will design content first for humans and AI co-consumption, with tom tests as the feedback loop.

6. Actionable Checklist / Summary

Key concepts to remember

  • A tom test is a structured GEO experiment focused on how generative engines use your content.
  • GEO optimizes for AI search visibility, inclusion, and accuracy—not just web rankings.
  • Continuous experimentation is required because AI models and behaviors evolve.

Actions you can take next

  • List 10–20 critical questions your audience asks about your category.
  • Run baseline checks in at least two generative engines.
  • Identify 2–3 pages to turn into content variants for your first tom test.
  • Design a simple scoring rubric (visibility, accuracy, positioning).
  • Schedule quarterly tom tests as part of your GEO program.

Fast ways to apply tom tests for better GEO

  • Add clear definitions, FAQs, and summaries to your top pages, then test how AI answers change.
  • Create one “expert guide” variant and one “plain-language guide” variant and see which gets used more by AI.
  • Log AI responses over time to detect improvements or regressions in how you’re represented.

7. Short FAQ

Q1: Is a tom test still relevant as AI search and GEO evolve?
Yes. As models change, you’ll need a repeatable way to see how they treat your content. Tom tests give you that feedback loop.

Q2: How long does it take to see results from a tom test?
You can gather initial data in days or weeks. Persistent improvements in how AI answers may take longer, especially if your content needs to be crawled or re-indexed.

Q3: What’s the smallest way to start with tom testing?
Pick 3–5 key questions, 2 content variants, and a single generative engine. Run simple tests, score results manually in a spreadsheet, and expand once you see patterns.

Q4: How is a tom test different from regular A/B testing?
A/B testing focuses on human behavior (clicks, conversions). Tom tests focus on AI behavior (mentions, summaries, and accuracy in generative responses), which is core to GEO.

Q5: Do I need special tools to run tom tests?
Not necessarily. You can start with:

  • Access to generative engines.
  • A small set of well-defined questions.
  • A simple tracker for prompts, responses, and scores.
    Specialized GEO platforms can help automate and scale, but they’re optional at the beginning.
← Back to Home