Almost all SEO will become programmatic SEO

May 30

・

Thenuka Karunaratne

Almost all SEO will become programmatic SEO

Having spoken to hundreds of companies over the past year about their SEO strategies, including customers like Notion, ProductHunt, Tome, and Clay, I’d estimate that 90% of companies investing in SEO are not investing in programmatic SEO. Instead, they deploy nearly all their budget into traditional SEO, hiring agencies, writers, and consultants to write content piece-by-piece. This isn’t surprising — programmatic SEO has historically been a niche growth marketing playbook leveraged primarily by engineering-driven growth teams at companies like Zapier, Canva, Airbnb, and Pinterest. These companies typically use programmatic SEO to target extremely narrow and highly structured search patterns (e.g., “Templates for [X]” or “How to connect [X] and [Y]”). Any attempts beyond that, and the output would come across as rigid, artificial, and inferior in quality to what a human would produce.

daydream was built on the premise that a few key advancements in large language models are making it possible for programmatic content to exceed human writing quality. If this happens, we predict that nearly all traditional SEO spend will be re-distributed towards programmatic SEO. Here’s why:

1. LLMs are getting better at reasoning while humans are not

While the average human writer isn’t experiencing a step-function jump in writing and problem-solving ability yearly, LLMs are. Consider the jump between GPT-3 and GPT-4, where GPT-3.5 scored in the 10th percentile for the Uniform Bar Exam while GPT-4 scored in the 90th percentile.

While details on GPT-5 aren’t available yet, it is expected to display a similarly large improvement in its reasoning capabilities and be released at some point in 2024. Given this pace of improvement, the intelligence of the models will not be a limiting factor when deciding whether AI should write a particular piece of content vs. a human being.

2. It’s becoming easier for LLMs to absorb context

Imagine asking a freelance writer to produce an article on your company without access to its internal documentation. The freelancer would also be banned from reading anything on the internet published within the last year. Obviously, the quality of the output would be very poor.

This was the state of AI writing when GPT-3 originally came out. There was no easy way to provide the models with access to your company’s documentation, Slack history, call recordings, and other areas of stored context. There was also a training date cut-off, which for GPT-3 was originally October 2019.

However, over the last four years, a few key developments have drastically improved GPT’s ability to absorb context:

Support for fine-tuning

The introduction of GPT-3.5 Turbo created a more formal and structured way to fine-tune GPT on specific data sets for specific purposes, such as internal search applications. This made it much easier to train GPT on your specific, unique, and private data for unique purposes.

Multimodal support is here

The introduction of GPT-4o now “accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs”, drastically widening the scope of resources and assets you can feed GPT.

RAG solves for real-time internet access

As for the training cut-off, RAG (retrieval-augmented generation) allows LLMs to reference and incorporate third-party knowledge bases before rendering a response. This is what ChatGPT is currently doing for queries that require online research. They reference Bing’s search index to render a response. Perplexity is doing something similar with its own index (and apparently part of Google’s as well). Perplexity now offers their own “Online LLMs”, which can leverage up-to-date information when forming a response, which directly solves for the training cut-off limitation seen in the past.

Context windows are increasing

A context window refers to the amount of text an LLM can receive as input for a particular query. The significance of a larger context window is that it allows models to maintain coherence while parsing longer passages of text, which is important for providing larger bodies of text as input. When GPT-3 came out, the initial context window was around 2K tokens. This was enough to process the equivalent of a long-form blog post — certainly not enough to parse a book.

Within the span of about a year, context windows have expanded drastically. GPT-4 Turbo now handles 128K tokens, while Gemini 1.5 Pro can handle 1M tokens. That means you can now fit several books worth of content into Gemini’s context window without an issue.

3. Programmatic content can react faster to real-world changes

Unlike human-written content that requires the reader to manually edit it every time a piece of content needs to be updated, programmatic content can be created with triggers to react in response to real-world changes automatically.

For example, suppose you’re a company like Carta, PitchBook, or CB Insights with ample fundraising round data. Rather than publishing quarter-by-quarter reports on the state of the industry, you could leverage Daydream to publish sector-specific reports that update weekly. Every time your data set is refreshed, the report will be rebuilt using the most recent data.

Conclusion

Programmatic SEO previously had limited adoption because it could not answer the vast majority of search queries while maintaining a high-quality bar. As LLMs continue to become more intelligent, absorb context more efficiently, and retain their advantage over humans in reacting to real-world changes faster, programmatic SEO will inevitably be a better choice for businesses in SEO — not because it’s cheaper or faster, but because the quality of the output is better than what you’d expect from a human being.

Thanks for reading! If you liked what you read here, Let’s chat!