Editly

Опубліковано: 2026-06-08

Word Frequency Analysis: How to Find Overused Words

Find the words you repeat too often with a word frequency counter online. Learn the 0.5% tic threshold, what's normal vs. overused, and how to fix it.

Word frequency analysis ranking table showing overused content words rising above the 0.5% repetition threshold

The fastest way to find the words you overuse is to stop trusting your own ear. You can't hear your own verbal tics — that's what makes them tics. Word frequency analysis counts every unique word, ranks them by how often they appear, and shows each as a percentage of your total, so the repetition you can't feel becomes a number you can't miss.

Here's the threshold that matters: any ordinary content word sitting above 0.5% of your total word count is worth a hard look. In a 1,000-word essay, that's just five appearances. Below is how to run the analysis, read it correctly, and fix what it surfaces.

What "Overused" Actually Means (The Numbers)

Not all repetition is bad. English is wildly top-heavy by design — a handful of function words make up a huge share of any text. This is Zipf's law: the most common word in a corpus appears roughly twice as often as the second most common, three times as often as the third, and so on. In natural English, "the" alone is about 7% of all words. That's not a problem; it's grammar.

Zipf's law curve: word frequency plotted by rank, dropping steeply from a tall 'the' bar at 7 percent through 'of' and 'and', then flattening into a long tail of content words below the 0.5 percent threshold line

The shape above is the whole story in one curve. The first few bars — the function words — tower over everything, then the line plunges and flattens into a long tail. Your content words live in that tail. When one of them climbs up out of the tail and crosses the 0.5% line, it's standing somewhere it wasn't meant to be. That's the visual signature of a verbal tic.

So the trick is knowing which numbers are normal and which are a red flag.

Word typeTypical frequencyVerdict
Function words (the, of, and, to)2–7%Normal — this is Zipf's law, leave it alone
Your topic keyword / character name1–3%Expected — context-dependent, usually fine
Generic content word (just, really, things)> 0.5%Likely a tic — investigate
Filler adverb (very, actually, basically)any clusterCut on sight — these add wordiness, not meaning

The reason 0.5% is the line: at that rate a reader starts to notice the word, even subconsciously. It's the point where repetition crosses from invisible to grating. A frequency counter that shows percentages — not just raw counts — lets you apply this threshold regardless of document length.

To run your own numbers, paste your draft into our Word Frequency Counter — it runs entirely in your browser's V8 engine, your text never touches a server — and it'll rank every word with live counts and percentages in one pass.

How to Read the Results Without Fooling Yourself

A ranking is only useful if you interpret it correctly. Three settings change what you see, and the defaults are tuned for exactly this task.

1. Keep "Exclude common words" on. The counter filters a built-in English stop-word list — the, and, of, is, pronouns, and ~80 other high-frequency function words. Without this filter, your top 10 is just grammar and you learn nothing. With it on, the first real signal — your most-repeated content word — jumps to the top of the list.

2. Set the minimum word length to 4. This skips most filler in one move (it, was, but) while keeping the words that carry meaning. Bump it to 5 or 6 if your top results are still cluttered with short connective words.

3. Read the percentage column, not the count. A word appearing 40 times sounds alarming until you realize the document is 30,000 words long — that's 0.13%, perfectly fine. The percentage is calculated against your total word count, including the stop words you filtered out of the display, so it's an honest denominator. Sort by percentage and start from the top.

What you're scanning for: a generic verb or adjective punching above 0.5%. "Showed," "looked," "important," "great," "thing" — these are the usual suspects. When one of them outranks words that actually carry your argument, you've found a tic.

The Words Writers Repeat Without Knowing

Some patterns show up in nearly every draft. Run the analysis on enough text and you start to predict them.

  • Crutch verbs: make, get, go, put, take. They're so flexible they sneak in everywhere. "Make a decision," "get an understanding," "take a look" — each is a stronger single verb hiding behind a weak one.
  • Throat-clearing adverbs: very, really, actually, basically, literally. These cluster. If "very" appears 15 times, you don't have an emphasis problem — you have 15 places where a precise word would beat "very + vague word."
  • Hedge words: just, quite, somewhat, perhaps, maybe. One or two soften a sentence; a dozen make your writing sound unsure of itself.
  • Echo nouns: thing, stuff, way, area, aspect. Placeholder nouns you meant to replace and forgot.

This is exactly the raw material for cutting length. Once you've identified an overused crutch verb, the techniques in How to Reduce Word Count turn each hit into a tighter, stronger sentence — replacing "she made a decision" with "she decided" both removes the tic and drops the word count.

Frequency Analysis vs. Find & Replace — Use Both

Finding the overused word is step one. Fixing it is step two, and they need different tools.

The frequency counter is your diagnosis — it tells you looked appears 23 times in a 4,000-word chapter (0.58%, over the line). It does not, and should not, blindly swap them. Replacing all 23 with the same synonym just trades one tic for another.

For the fix, jump to Find & Replace, which highlights every occurrence in context so you can decide case by case — some "looked"s become "studied," some "glanced," some get cut entirely. It supports full regex with the u flag for Unicode text, so you can match whole words only (/\blooked\b/giu) and skip partial matches inside other words. Diagnose with frequency, treat with find-and-replace. Neither does the other's job well.

One footgun worth knowing if you adapt these patterns: in JavaScript, \b is ASCII-only, even with the u flag. It works perfectly for looked, but /\bдивився\b/giu or /\bschön\b/giu can misfire next to Unicode punctuation or non-breaking spaces, because the engine doesn't treat Cyrillic or umlaut letters as "word characters" at the boundary. For solid non-English matching, anchor on the surrounding whitespace or punctuation instead — e.g. lookarounds like /(?<=^|[\s,.])looked(?=$|[\s,.])/giu — rather than trusting \b.

This diagnose-then-treat loop is the backbone of a chapter audit. If you're editing fiction, the chapter length guide walks through pairing per-chapter word counts with frequency analysis to catch tics that cluster in specific scenes — action sequences are notorious for stacking the same three verbs.

A Length-Aware Repetition Cheat Sheet

The 0.5% threshold is constant, but it's easier to act on as a raw count. Here's what "overused" looks like at common document lengths.

Document0.5% thresholdWhat it means
500-word essay~3 hitsTight — repeating a content word 3× is already noticeable
1,000-word blog post~5 hitsThe classic case; 5+ of any non-keyword word is a tic
2,500-word chapter~13 hitsVerbs are the usual offenders here
10,000-word short story~50 hitsCharacter names get a pass; generic verbs don't
80,000-word manuscript~400 hitsRun chapter-by-chapter — book-wide totals hide local clusters

The manuscript row holds the most important caveat: a word can look fine across an entire novel and still be jammed into a single chapter. Book-wide frequency averages out local repetition. That's why editors run the analysis per chapter, not per book — a verb at 0.1% across 80,000 words might be at 1.5% in the one scene where you lost control of it.

Why the Tokenizer Matters

Most free frequency counters split text on spaces — text.split(' '). It's the junior approach, and it breaks the moment your writing contains anything interesting. "Don't" becomes one token; "mother-in-law" becomes one token instead of being counted sensibly; an em-dash—like this—glues two words together; and any non-English script returns nonsense.

The Word Frequency Counter tokenizes with Intl.Segmenter, the W3C-standard word boundary detector built into every modern browser (it's what Chrome's own spell-checker uses). It correctly handles contractions, hyphenated compounds, accented characters, and even Chinese or Japanese, which have no spaces between words at all. If your frequency data is wrong because the tokenizer is naïve, every percentage above is wrong too — accuracy starts at the segmentation step.

For the same reason, when you want a precise total word count to sanity-check your percentages against, the Word Counter uses the identical engine, so the two tools always agree.

FAQ

What is word frequency analysis? It's the process of counting how many times each unique word appears in a text and ranking them most-to-least common. The goal is to surface unconscious repetition — the crutch verbs and filler adverbs you lean on without realizing. Showing each word as a percentage of the total separates deliberate keywords from accidental tics.

How do I find overused words in my writing? Paste the text into a word frequency counter, keep the stop-word filter on, and set a minimum word length of 4. Read the top 30 by percentage. Any content word above ~0.5% of your total is a candidate — about 5 hits per 1,000 words. Ignore your topic keyword and character names; target the generic verbs and adjectives that rank unexpectedly high.

What percentage counts as overusing a word? Function words (the, of, and) live at 2–7% and that's normal — it's just how English distributes. Ordinary content words start to feel repetitive above roughly 0.5%, and anything past 1% in a non-keyword role is almost certainly a tic. Fiction gets one exception: a protagonist's name routinely sits at 1–3% without bothering anyone.

Is a word frequency counter the same as keyword density? Same math, opposite goal. Keyword density is SEO — you want a target phrase to appear often enough to signal relevance without tripping spam filters. Frequency analysis is editing — you're hunting repetition to cut. In one you're chasing a number up; in the other you're bringing it down.

Why does the counter ignore words like "the" and "and"? Those are stop words, and they top every list by sheer grammar, drowning out real signal. The tool filters a built-in English stop-word list by default so your most-repeated content word rises to the top instead. Toggle the filter off if you specifically want to study function-word distribution.

Does word frequency analysis work in other languages? Yes — tokenization runs on Intl.Segmenter, which handles accented characters, apostrophes, hyphenated words, and space-less scripts like Chinese correctly, unlike text.split(' '). The built-in stop-word list is English, though, so for other languages turn the stop-word filter off and read the raw ranking.

Часті запитання

What is word frequency analysis?

Word frequency analysis counts how many times each unique word appears in a text and ranks the results from most to least common. It's the fastest way to catch unconscious repetition — the verbs, adjectives, and filler words you lean on without noticing. A good frequency counter also shows each word as a percentage of the total, so you can tell a deliberate keyword from an accidental verbal tic.

How do I find overused words in my writing?

Paste your text into a word frequency counter, exclude common stop words (the, and, of), and set a minimum word length of 4 to skip filler. Then read the top 30 results. Any content word above roughly 0.5% of your total word count is worth a second look — that's about 5 hits in a 1,000-word essay or 10 in a 2,000-word chapter. Character names and your core topic keyword are expected to rank high; generic verbs and adjectives that rank high are the real targets.

What percentage counts as overusing a word?

For function words like 'the' or 'of', 2–7% is completely normal — that's just how English distributes (Zipf's law). For ordinary content words — verbs, adjectives, adverbs — anything above about 0.5% of your total starts to feel repetitive to readers. A word hitting 1%+ in a non-keyword role is almost always a tic. The exception is fiction: a main character's name routinely sits at 1–3% and nobody notices.

Is a word frequency counter the same as keyword density?

They share the math but serve different goals. Keyword density is an SEO concept — you're checking that a target phrase appears often enough (but not so often it looks spammy). Word frequency analysis is an editing tool — you're hunting for accidental repetition you want to cut. Same percentages, opposite intent: one word you're trying to hit a number, the other you're trying to bring a number down.

Why does the counter ignore words like 'the' and 'and'?

Those are stop words — high-frequency function words that carry little standalone meaning. The Word Frequency Counter filters a built-in English stop-word list by default so the signal isn't drowned out by 'the' sitting at the top of every list. You can toggle the filter off if you specifically want to see function-word distribution, but for finding overused content words, leave it on.

Does word frequency analysis work in other languages?

Yes. The counter tokenizes with Intl.Segmenter, the W3C-standard word segmenter built into modern browsers — the same engine Chrome uses for spell-checking. It handles apostrophes, accented characters, hyphenated compounds, and even space-less scripts like Chinese and Japanese correctly. The naïve text.split(' ') approach most homegrown counters use returns garbage on any of those. The stop-word list itself is English, so non-English text is best analyzed with the stop-word filter off.

Спробуйте наш безкоштовний лічильник слів

Миттєво рахуйте слова, перевіряйте читабельність та аналізуйте текст.

Відкрити лічильник слів