Let’s start with a simple story. You’re hiking in Montana, standing at the base of a mountain, and you see a strange flower. You pull out your phone, take a picture, and say, “What is this?” Within seconds, your phone tells you the flower’s name, where it’s native to, and whether it’s toxic to dogs. That’s not traditional keyword search. That’s visual and voice-powered AI search—and it’s the future.
Now imagine you’re shopping online. You see a photo of a stylish reclaimed wood coffee table on Instagram. There’s no product tag, no brand listed. You take a screenshot and run it through Google Lens. Within seconds, you’re shown visually similar products from several online retailers—one of which is a local furniture company you’ve never heard of before. Their listing has clear images, descriptive metadata, and good reviews—so you click through and buy. That’s visual search connecting products to new customers.
Search is no longer confined to a box with blinking text. People now search with their eyes, their voices, and through generative AI platforms that answer questions, not just match queries. As marketers, this shift forces us to rethink how we write, structure, and present content.
This newsletter explores how to optimize for voice search, visual search, and generative AI search—the three pillars of what we call multimodal search. If your content doesn’t show up across all these channels, you’re missing visibility. Let’s fix that.
Why Search Is No Longer Just “Search”
Search used to be simple. You typed in a phrase like “best hiking boots,” and Google returned ten blue links. But user behavior has evolved. Today’s search journeys involve:
- Voice commands on smart speakers or phones.
- Visual inputs like screenshots or photos via tools like Google Lens.
- Generative AI search from platforms like ChatGPT, Perplexity, or Google SGE.
Search has shifted from keyword matching to intent understanding. To win in this landscape, marketers must create content that is understandable by machines and valuable to humans.
Voice Search Optimization
Let’s start with voice. Voice searches are conversational. Instead of typing “best pizza NYC,” a user says, “Where’s the best place to get pizza near me right now?”
Voice search characteristics:
- It’s usually longer and question-based.
- It often includes local intent or time-sensitivity.
- It requires answers that are natural, direct, and structured.
How to Optimize for Voice Search
Use language that sounds natural when spoken. Build out FAQ sections on your site, write in plain language, and structure content to answer specific questions clearly. Use schema markup like FAQPage and HowTo to give search engines clean data to extract and use.
For example, if you’re a general contractor in Pittsburgh, and someone asks:
“How much does a bathroom renovation cost in Pittsburgh?”
Your site should answer:
“A bathroom renovation in Pittsburgh typically costs between $25,000 and $75,000, depending on the size, layout, and materials used.”
This format is voice-search friendly—concise, specific, and localized.
Visual Search Optimization
Visual search allows users to upload an image or use their camera to search. They might snap a photo of a kitchen cabinet, a sneaker, or a treehouse. Platforms like Google Lens and Pinterest Lens then return visually similar results tied to available products or relevant content.
But visuals alone won’t rank. Context matters.
How to Optimize for Visual Search
To optimize images for search, focus on making your visuals understandable to machines. This includes:
- Descriptive file names (e.g., walnut-floating-shelves-modern-kitchen.jpg).
- Accurate alt text describing both the object and its setting.
- Structured data that tells search engines what the image is (Product, Place, Article, etc.).
- Relevant surrounding content that adds context.
Let’s say you sell handmade ceramic mugs. A product image that just says “IMG123.jpg” won’t help much. But if it’s named “handmade-blue-ceramic-mug.jpg,” with alt text like “handmade ceramic mug with blue glaze,” and placed in a blog post about local pottery trends, Google Lens and Pinterest can surface it for visual queries.
Also, platforms like Instagram and Pinterest act as search engines now. Users type “modern rustic kitchen” and expect visually rich results. That means your social visuals must also be discoverable, not just beautiful.
AI-Powered Generative Search
This is where the game really changes. Tools like Google SGE, ChatGPT, and Perplexity don’t return search results—they generate them. These platforms pull from your website, your competitors, and the broader web to write their own answers to user queries.
They don’t just match keywords. They read your content, interpret its meaning, and decide if it’s credible enough to include.
How to Optimize for AI Search
Start by making your content rich in clarity, structure, and real-world experience. Use simple language, short paragraphs, and descriptive headings. Avoid fluff or overly abstract statements.
To show Experience, Expertise, Authoritativeness, and Trustworthiness (E-E-A-T):
- Include bylines with author bios.
- Use first-person experiences or case studies to show credibility.
- Link to high-quality sources and relevant internal content.
- Display company credentials, years in business, or customer testimonials.
Structured data plays a major role here too. Use Schema.org types like Article, LocalBusiness, Product, Service, or Review. These help AI models understand exactly what your page offers and how to synthesize it accurately.
Essey Marketing Article: E-E-A-T for AI Search
Content That Performs Across All Search Modes
You don’t need three separate strategies for voice, visual, and AI. You need one unified approach that blends clarity, context, and structure into everything you publish.
1. Clarity
Use natural language and straightforward writing. Think subject-verb-object sentences. Avoid complex industry jargon.
Instead of: “Our platform leverages scalable architecture to maximize cross-functional deliverables.”
Say: “Our platform helps marketing teams create and publish content faster.”
2. Context
Never assume your audience—or the algorithm—knows what you mean. Add context. Explain what’s in the photo. Describe who the service is for. Say why it matters.
If you post a photo of a deck renovation, accompany it with text like:
“This covered deck in Mt. Lebanon features composite wood, built-in lighting, and a ceiling fan—perfect for Pittsburgh’s changing seasons.”
This helps both humans and machines categorize your content properly.
3. Structure
Use consistent headings (H1, H2, H3). Break long blocks of text into digestible sections. Add meta descriptions, title tags, and image alt text. And use schema markup wherever possible.
This makes your content easy to navigate and easy for machines to extract relevant snippets from.
Don’t Forget Local Search
Voice, visual, and AI search all integrate location data. When someone asks Siri for a nearby solution or takes a picture of a cracked driveway and asks how to fix it, local businesses can show up—if they’re optimized.
Tips for local multimodal optimization:
- Keep your Google Business Profile accurate and updated.
- Mention city and neighborhood names in headings and image metadata.
- Embed maps or geo-based service info in service pages.
- Use schema types like LocalBusiness and Place.
Many AI models use proximity and NAP (Name, Address, Phone) consistency when deciding what local businesses to feature.
The New SEO Mindset: Think Like a Machine, Write Like a Human
SEO used to be about ranking. Today, it’s about understanding and visibility across formats.
Machines don’t just crawl your pages. They interpret your content, summarize it, and sometimes display it without linking to you at all. Your job is to become the trusted source they pull from.
Think about how your content appears:
- When spoken aloud by a voice assistant.
- When shown visually on an AI-enhanced search result.
- When summarized by ChatGPT as part of a buying guide.
If your content isn’t structured, credible, and clear, it won’t make the cut.
Your Next Step in Search Strategy
Multimodal search isn’t just the future—it’s already here. Voice assistants, visual discovery tools, and AI search engines are changing how people find and interact with information.
To stay ahead:
- Make your content conversational for voice.
- Make your images contextual for visual search.
- Make your writing structured and trustworthy for AI.
Your next customer might never type a word into a search box. But they might ask (speak) a question, upload an image, or ask an AI for advice.
Will your business show up?
