Optimize your Content with TF-IDF Analysis

Home > Blog > Strategy > Optimize your Content with TF-IDF Analysis

October 21, 2016 in Strategy by David Zimmerman

Whenever I coach a client about writing SEO content for their website, I’m hesitant to mention a “keyword”. It seems, when I do, they over-use that phrase.

I’ve tried to compensate by encouraging them to write about a “topic” rather than a “keyword.” This helps, but some still over-use certain words, for which they hope to “rank.”

Attempting to correct for this, I started asking clients to write at least 800 words on each page. It’s hard to write junk, when you have to write for so long. This has the added benefit of naturally including long-tailed keyword variations.

Sometimes, however, content still seems too SEOed. That’s when I give them the most useless piece of advice I can: “include more natural variations.” I know that ambiguous advice, like that, is frustrating. I can’t think of a clearer way to explain what I need. Or, more importantly, what Google wants to see.

Let’s say we’re writing a page about “Disney t-shirts.” Since everyone knows about Disney, it’s easy to predict the natural variations of that phrase. For instance, any page about Disney would be unusual, if it didn’t mention:

Mickey Mouse
Walt
clothes
etc.

So, if you’re writing about Disney products, the advice to “include more natural variations” is easy. It’s much more difficult when you’re writing about “blue variegated widgets.” How can anyone write 800 words about that topic? What are the natural variations of that phrase?

To find this information, we need to use TF-IDF. This stands for “Term Frequency – Inverse Document Frequency.” It analyzes one document against others to see which words are used more frequently.

Sound complicated? This is why I recently invested in a tool that helps with TF-IDF analysis: text-tools.net (aff. link). It’s been a huge help. The founder, Michael Kaiser, has kindly subjected himself to an interview. Here’s what he told me about TF-IDF analysis. I think, as you read this, you’ll agree that this tool is invaluable. In other words, this tool will pay for itself in no time.

How would you explain TF-IDF to a 5 year old?

TF-IDF tells us what a text is about. It quantifies important terms in a text. TF-IDF answers the question: “What should I write about a topic to send Google the signals that my text is about that topic”.

Why is TF-IDF better than measuring the “keyword density” on a page?

Keyword Density is not able to show relations between terms. TF-IDF shows synonyms, variants and co-occurrences of a given term with their corresponding weights. Those weights are more flexible than simple keyword density values since they take other documents into account.

For example, the keyword density of the phrase “TF-IDF” on this page is: 3.7%

Here’s the Text-Tools.net TF-IDF analysis for “TF-IDF”:

This shows the top 20 words used on pages about TF-IDF. This tool can show you the top 500, but this gives you the idea.

I can also analyze this page, in comparison with other pages:

The yellow chart represents this page. It tells me I am missing some words and other words (in the red range) are used too often.

This tells me:

What words I should use on this page, based on what the other pages ranking for this phrase use?
Am I using some words too often?
Am I neglecting words or phrases that other pages include?

Keyword density can’t do that.

What’s the most surprising thing you’ve learned after doing a TF-IDF analysis of a web page?

With TF-IDF it is possible to achieve impressive rankings for long tail keywords. Sometimes without many links. If you come up with a comprehensive and holistic text about a long tail topic, you will rank.

Also, the longer the text, the better- with few exceptions. A good example is Wikipedia. They cover one topic holistically. They have more point of views. They cover related topics, as well. This results in pretty good TF-IDF values, which Google seems to favor. Wikipedia has a lot of links & authority, too, but that is a result of the comprehensive content as well.

What are the limitations of using TF-IDF for on-page SEO analysis?

TF-IDF based services (like Text-Tools.net) reverse engineer something we think Google is using. The results show that we might be on track, but Google is free to change core parameters of their algorithm. This could render TF-IDF analysis useless. Chances are that they will continue to use TF-IDF (or similar variants of this) to classify content.

Of course, the algorithm is always changing. Doing an over-optimization may lead to penalties in the future.

Either way, TF-IDF is a good way to understand your content for better ranking. Especially for the long-tail.

TF-IDF has been big in Europe for a while. What took SEOs in the U.S. so long to catch-on to TF-IDF?

Honestly, I do not know. Lots of SEOs stuck to keyword density for a long time. Some still do. TF-IDF- is far more complex than keyword density. Why should a SEO deal with a complicated concept, when his keyword density stuff still seems to “work”? Oh, and social media is so much more sexy than boring computer science.

What’s your process for using Text Tools for better SEO performance?

Consider using Text Tools as part of your content creation and incorporate it into your content life cycle.

Write & publish a text on a given topic.
Wait a couple of weeks and analyze the term to get ideas what to add to your text.
Add some content about those missing aspects more a couple of weeks later.
Rinse, repeat.

With this process you will get constantly fresh and interesting content. Your content is evolving. It becomes more and more complete and holistic text about your topic.

Not to rub it in but if you’d signed up for my email list, you would have received a discount code for 25% off of recurring plans. Oh well, there’s always next time…

WordCamp Atlanta 2023

David spoke at WordCamp ATL in October 2023

Learn why we're reliable.
Read our other credentials.

One Thing

WordCamp Atlanta 2023

The Bottom Line

You need to reach people in your industry.