[{"data":1,"prerenderedAt":550},["ShallowReactive",2],{"blog-en-word-frequency-analysis":3},{"id":4,"title":5,"alt":6,"author":7,"body":8,"category":519,"description":520,"extension":521,"faq":522,"image":536,"meta":537,"navigation":538,"path":539,"publishedAt":540,"seo":541,"stem":542,"tags":543,"__hash__":549},"blog\u002Fen\u002Fword-frequency-analysis.md","Word Frequency Analysis: How to Find Overused Words","Word frequency analysis ranking table showing overused content words rising above the 0.5% repetition threshold","Vibe Apps Pro Team",{"type":9,"value":10,"toc":508},"minimark",[11,20,27,32,39,48,51,54,122,130,141,144,148,151,176,192,202,205,209,212,253,261,265,268,279,299,331,339,341,345,348,420,427,431,438,448,459,463,469,475,481,487,496],[12,13,14,15,19],"p",{},"The fastest way to find the words you overuse is to stop trusting your own ear. You can't hear your own verbal tics — that's what makes them tics. ",[16,17,18],"strong",{},"Word frequency analysis counts every unique word, ranks them by how often they appear, and shows each as a percentage of your total",", so the repetition you can't feel becomes a number you can't miss.",[12,21,22,23,26],{},"Here's the threshold that matters: any ordinary content word sitting above ",[16,24,25],{},"0.5% of your total word count"," is worth a hard look. In a 1,000-word essay, that's just five appearances. Below is how to run the analysis, read it correctly, and fix what it surfaces.",[28,29,31],"h2",{"id":30},"what-overused-actually-means-the-numbers","What \"Overused\" Actually Means (The Numbers)",[12,33,34,35,38],{},"Not all repetition is bad. English is wildly top-heavy by design — a handful of function words make up a huge share of any text. This is Zipf's law: the most common word in a corpus appears roughly twice as often as the second most common, three times as often as the third, and so on. In natural English, \"the\" alone is about ",[16,36,37],{},"7%"," of all words. That's not a problem; it's grammar.",[12,40,41],{},[42,43],"img",{"alt":44,"height":45,"src":46,"width":47},"Zipf's law curve: word frequency plotted by rank, dropping steeply from a tall 'the' bar at 7 percent through 'of' and 'and', then flattening into a long tail of content words below the 0.5 percent threshold line",672,"\u002Farticles\u002Fword-frequency-analysis\u002Fsection-1.webp",900,[12,49,50],{},"The shape above is the whole story in one curve. The first few bars — the function words — tower over everything, then the line plunges and flattens into a long tail. Your content words live in that tail. When one of them climbs up out of the tail and crosses the 0.5% line, it's standing somewhere it wasn't meant to be. That's the visual signature of a verbal tic.",[12,52,53],{},"So the trick is knowing which numbers are normal and which are a red flag.",[55,56,57,74],"table",{},[58,59,60],"thead",{},[61,62,63,68,71],"tr",{},[64,65,67],"th",{"align":66},"left","Word type",[64,69,70],{"align":66},"Typical frequency",[64,72,73],{"align":66},"Verdict",[75,76,77,89,100,111],"tbody",{},[61,78,79,83,86],{},[80,81,82],"td",{"align":66},"Function words (the, of, and, to)",[80,84,85],{"align":66},"2–7%",[80,87,88],{"align":66},"Normal — this is Zipf's law, leave it alone",[61,90,91,94,97],{},[80,92,93],{"align":66},"Your topic keyword \u002F character name",[80,95,96],{"align":66},"1–3%",[80,98,99],{"align":66},"Expected — context-dependent, usually fine",[61,101,102,105,108],{},[80,103,104],{"align":66},"Generic content word (just, really, things)",[80,106,107],{"align":66},"> 0.5%",[80,109,110],{"align":66},"Likely a tic — investigate",[61,112,113,116,119],{},[80,114,115],{"align":66},"Filler adverb (very, actually, basically)",[80,117,118],{"align":66},"any cluster",[80,120,121],{"align":66},"Cut on sight — these add wordiness, not meaning",[12,123,124,125,129],{},"The reason 0.5% is the line: at that rate a reader starts to ",[126,127,128],"em",{},"notice"," the word, even subconsciously. It's the point where repetition crosses from invisible to grating. A frequency counter that shows percentages — not just raw counts — lets you apply this threshold regardless of document length.",[12,131,132,133,140],{},"To run your own numbers, paste your draft into our ",[16,134,135],{},[136,137,139],"a",{"href":138},"\u002Fword-frequency","Word Frequency Counter"," — it runs entirely in your browser's V8 engine, your text never touches a server — and it'll rank every word with live counts and percentages in one pass.",[142,143],"ad-placeholder",{},[28,145,147],{"id":146},"how-to-read-the-results-without-fooling-yourself","How to Read the Results Without Fooling Yourself",[12,149,150],{},"A ranking is only useful if you interpret it correctly. Three settings change what you see, and the defaults are tuned for exactly this task.",[12,152,153,156,157,161,162,161,165,161,168,171,172,175],{},[16,154,155],{},"1. Keep \"Exclude common words\" on."," The counter filters a built-in English stop-word list — ",[158,159,160],"code",{},"the",", ",[158,163,164],{},"and",[158,166,167],{},"of",[158,169,170],{},"is",", pronouns, and ~80 other high-frequency function words. Without this filter, your top 10 is just grammar and you learn nothing. With it on, the first real signal — your most-repeated ",[126,173,174],{},"content"," word — jumps to the top of the list.",[12,177,178,181,182,161,185,161,188,191],{},[16,179,180],{},"2. Set the minimum word length to 4."," This skips most filler in one move (",[158,183,184],{},"it",[158,186,187],{},"was",[158,189,190],{},"but",") while keeping the words that carry meaning. Bump it to 5 or 6 if your top results are still cluttered with short connective words.",[12,193,194,197,198,201],{},[16,195,196],{},"3. Read the percentage column, not the count."," A word appearing 40 times sounds alarming until you realize the document is 30,000 words long — that's 0.13%, perfectly fine. The percentage is calculated against your ",[126,199,200],{},"total"," word count, including the stop words you filtered out of the display, so it's an honest denominator. Sort by percentage and start from the top.",[12,203,204],{},"What you're scanning for: a generic verb or adjective punching above 0.5%. \"Showed,\" \"looked,\" \"important,\" \"great,\" \"thing\" — these are the usual suspects. When one of them outranks words that actually carry your argument, you've found a tic.",[28,206,208],{"id":207},"the-words-writers-repeat-without-knowing","The Words Writers Repeat Without Knowing",[12,210,211],{},"Some patterns show up in nearly every draft. Run the analysis on enough text and you start to predict them.",[213,214,215,226,235,244],"ul",{},[216,217,218,221,222,225],"li",{},[16,219,220],{},"Crutch verbs:"," ",[126,223,224],{},"make, get, go, put, take."," They're so flexible they sneak in everywhere. \"Make a decision,\" \"get an understanding,\" \"take a look\" — each is a stronger single verb hiding behind a weak one.",[216,227,228,221,231,234],{},[16,229,230],{},"Throat-clearing adverbs:",[126,232,233],{},"very, really, actually, basically, literally."," These cluster. If \"very\" appears 15 times, you don't have an emphasis problem — you have 15 places where a precise word would beat \"very + vague word.\"",[216,236,237,221,240,243],{},[16,238,239],{},"Hedge words:",[126,241,242],{},"just, quite, somewhat, perhaps, maybe."," One or two soften a sentence; a dozen make your writing sound unsure of itself.",[216,245,246,221,249,252],{},[16,247,248],{},"Echo nouns:",[126,250,251],{},"thing, stuff, way, area, aspect."," Placeholder nouns you meant to replace and forgot.",[12,254,255,256,260],{},"This is exactly the raw material for cutting length. Once you've identified an overused crutch verb, the techniques in ",[136,257,259],{"href":258},"\u002Fblog\u002Fhow-to-reduce-word-count","How to Reduce Word Count"," turn each hit into a tighter, stronger sentence — replacing \"she made a decision\" with \"she decided\" both removes the tic and drops the word count.",[28,262,264],{"id":263},"frequency-analysis-vs-find-replace-use-both","Frequency Analysis vs. Find & Replace — Use Both",[12,266,267],{},"Finding the overused word is step one. Fixing it is step two, and they need different tools.",[12,269,270,271,274,275,278],{},"The frequency counter is your ",[126,272,273],{},"diagnosis"," — it tells you ",[158,276,277],{},"looked"," appears 23 times in a 4,000-word chapter (0.58%, over the line). It does not, and should not, blindly swap them. Replacing all 23 with the same synonym just trades one tic for another.",[12,280,281,282,285,286,290,291,294,295,298],{},"For the ",[126,283,284],{},"fix",", jump to ",[136,287,289],{"href":288},"\u002Ffind-replace","Find & Replace",", which highlights every occurrence in context so you can decide case by case — some \"looked\"s become \"studied,\" some \"glanced,\" some get cut entirely. It supports full regex with the ",[158,292,293],{},"u"," flag for Unicode text, so you can match whole words only (",[158,296,297],{},"\u002F\\blooked\\b\u002Fgiu",") and skip partial matches inside other words. Diagnose with frequency, treat with find-and-replace. Neither does the other's job well.",[12,300,301,302,305,306,309,310,312,313,315,316,319,320,323,324,327,328,330],{},"One footgun worth knowing if you adapt these patterns: in JavaScript, ",[158,303,304],{},"\\b"," is ",[16,307,308],{},"ASCII-only",", even with the ",[158,311,293],{}," flag. It works perfectly for ",[158,314,277],{},", but ",[158,317,318],{},"\u002F\\bдивився\\b\u002Fgiu"," or ",[158,321,322],{},"\u002F\\bschön\\b\u002Fgiu"," can misfire next to Unicode punctuation or non-breaking spaces, because the engine doesn't treat Cyrillic or umlaut letters as \"word characters\" at the boundary. For solid non-English matching, anchor on the surrounding whitespace or punctuation instead — e.g. lookarounds like ",[158,325,326],{},"\u002F(?\u003C=^|[\\s,.])looked(?=$|[\\s,.])\u002Fgiu"," — rather than trusting ",[158,329,304],{},".",[12,332,333,334,338],{},"This diagnose-then-treat loop is the backbone of a chapter audit. If you're editing fiction, the ",[136,335,337],{"href":336},"\u002Fblog\u002Fhow-many-words-in-a-chapter","chapter length guide"," walks through pairing per-chapter word counts with frequency analysis to catch tics that cluster in specific scenes — action sequences are notorious for stacking the same three verbs.",[142,340],{},[28,342,344],{"id":343},"a-length-aware-repetition-cheat-sheet","A Length-Aware Repetition Cheat Sheet",[12,346,347],{},"The 0.5% threshold is constant, but it's easier to act on as a raw count. Here's what \"overused\" looks like at common document lengths.",[55,349,350,363],{},[58,351,352],{},[61,353,354,357,360],{},[64,355,356],{"align":66},"Document",[64,358,359],{"align":66},"0.5% threshold",[64,361,362],{"align":66},"What it means",[75,364,365,376,387,398,409],{},[61,366,367,370,373],{},[80,368,369],{"align":66},"500-word essay",[80,371,372],{"align":66},"~3 hits",[80,374,375],{"align":66},"Tight — repeating a content word 3× is already noticeable",[61,377,378,381,384],{},[80,379,380],{"align":66},"1,000-word blog post",[80,382,383],{"align":66},"~5 hits",[80,385,386],{"align":66},"The classic case; 5+ of any non-keyword word is a tic",[61,388,389,392,395],{},[80,390,391],{"align":66},"2,500-word chapter",[80,393,394],{"align":66},"~13 hits",[80,396,397],{"align":66},"Verbs are the usual offenders here",[61,399,400,403,406],{},[80,401,402],{"align":66},"10,000-word short story",[80,404,405],{"align":66},"~50 hits",[80,407,408],{"align":66},"Character names get a pass; generic verbs don't",[61,410,411,414,417],{},[80,412,413],{"align":66},"80,000-word manuscript",[80,415,416],{"align":66},"~400 hits",[80,418,419],{"align":66},"Run chapter-by-chapter — book-wide totals hide local clusters",[12,421,422,423,426],{},"The manuscript row holds the most important caveat: ",[16,424,425],{},"a word can look fine across an entire novel and still be jammed into a single chapter."," Book-wide frequency averages out local repetition. That's why editors run the analysis per chapter, not per book — a verb at 0.1% across 80,000 words might be at 1.5% in the one scene where you lost control of it.",[28,428,430],{"id":429},"why-the-tokenizer-matters","Why the Tokenizer Matters",[12,432,433,434,437],{},"Most free frequency counters split text on spaces — ",[158,435,436],{},"text.split(' ')",". It's the junior approach, and it breaks the moment your writing contains anything interesting. \"Don't\" becomes one token; \"mother-in-law\" becomes one token instead of being counted sensibly; an em-dash—like this—glues two words together; and any non-English script returns nonsense.",[12,439,440,441,443,444,447],{},"The ",[136,442,139],{"href":138}," tokenizes with ",[158,445,446],{},"Intl.Segmenter",", the W3C-standard word boundary detector built into every modern browser (it's what Chrome's own spell-checker uses). It correctly handles contractions, hyphenated compounds, accented characters, and even Chinese or Japanese, which have no spaces between words at all. If your frequency data is wrong because the tokenizer is naïve, every percentage above is wrong too — accuracy starts at the segmentation step.",[12,449,450,451,453,454,458],{},"For the same reason, when you want a precise ",[126,452,200],{}," word count to sanity-check your percentages against, the ",[136,455,457],{"href":456},"\u002F","Word Counter"," uses the identical engine, so the two tools always agree.",[28,460,462],{"id":461},"faq","FAQ",[12,464,465,468],{},[16,466,467],{},"What is word frequency analysis?","\nIt's the process of counting how many times each unique word appears in a text and ranking them most-to-least common. The goal is to surface unconscious repetition — the crutch verbs and filler adverbs you lean on without realizing. Showing each word as a percentage of the total separates deliberate keywords from accidental tics.",[12,470,471,474],{},[16,472,473],{},"How do I find overused words in my writing?","\nPaste the text into a word frequency counter, keep the stop-word filter on, and set a minimum word length of 4. Read the top 30 by percentage. Any content word above ~0.5% of your total is a candidate — about 5 hits per 1,000 words. Ignore your topic keyword and character names; target the generic verbs and adjectives that rank unexpectedly high.",[12,476,477,480],{},[16,478,479],{},"What percentage counts as overusing a word?","\nFunction words (the, of, and) live at 2–7% and that's normal — it's just how English distributes. Ordinary content words start to feel repetitive above roughly 0.5%, and anything past 1% in a non-keyword role is almost certainly a tic. Fiction gets one exception: a protagonist's name routinely sits at 1–3% without bothering anyone.",[12,482,483,486],{},[16,484,485],{},"Is a word frequency counter the same as keyword density?","\nSame math, opposite goal. Keyword density is SEO — you want a target phrase to appear often enough to signal relevance without tripping spam filters. Frequency analysis is editing — you're hunting repetition to cut. In one you're chasing a number up; in the other you're bringing it down.",[12,488,489,492,493,495],{},[16,490,491],{},"Why does the counter ignore words like \"the\" and \"and\"?","\nThose are stop words, and they top every list by sheer grammar, drowning out real signal. The tool filters a built-in English stop-word list by default so your most-repeated ",[126,494,174],{}," word rises to the top instead. Toggle the filter off if you specifically want to study function-word distribution.",[12,497,498,501,502,504,505,507],{},[16,499,500],{},"Does word frequency analysis work in other languages?","\nYes — tokenization runs on ",[158,503,446],{},", which handles accented characters, apostrophes, hyphenated words, and space-less scripts like Chinese correctly, unlike ",[158,506,436],{},". The built-in stop-word list is English, though, so for other languages turn the stop-word filter off and read the raw ranking.",{"title":509,"searchDepth":510,"depth":510,"links":511},"",2,[512,513,514,515,516,517,518],{"id":30,"depth":510,"text":31},{"id":146,"depth":510,"text":147},{"id":207,"depth":510,"text":208},{"id":263,"depth":510,"text":264},{"id":343,"depth":510,"text":344},{"id":429,"depth":510,"text":430},{"id":461,"depth":510,"text":462},"Writing Tips","Find the words you repeat too often with a word frequency counter online. Learn the 0.5% tic threshold, what's normal vs. overused, and how to fix it.","md",[523,525,527,529,531,534],{"question":467,"answer":524},"Word frequency analysis counts how many times each unique word appears in a text and ranks the results from most to least common. It's the fastest way to catch unconscious repetition — the verbs, adjectives, and filler words you lean on without noticing. A good frequency counter also shows each word as a percentage of the total, so you can tell a deliberate keyword from an accidental verbal tic.",{"question":473,"answer":526},"Paste your text into a [word frequency counter](\u002Fword-frequency), exclude common stop words (the, and, of), and set a minimum word length of 4 to skip filler. Then read the top 30 results. Any content word above roughly 0.5% of your total word count is worth a second look — that's about 5 hits in a 1,000-word essay or 10 in a 2,000-word chapter. Character names and your core topic keyword are expected to rank high; generic verbs and adjectives that rank high are the real targets.",{"question":479,"answer":528},"For function words like 'the' or 'of', 2–7% is completely normal — that's just how English distributes (Zipf's law). For ordinary content words — verbs, adjectives, adverbs — anything above about 0.5% of your total starts to feel repetitive to readers. A word hitting 1%+ in a non-keyword role is almost always a tic. The exception is fiction: a main character's name routinely sits at 1–3% and nobody notices.",{"question":485,"answer":530},"They share the math but serve different goals. Keyword density is an SEO concept — you're checking that a target phrase appears often enough (but not so often it looks spammy). Word frequency analysis is an editing tool — you're hunting for accidental repetition you want to cut. Same percentages, opposite intent: one word you're trying to hit a number, the other you're trying to bring a number down.",{"question":532,"answer":533},"Why does the counter ignore words like 'the' and 'and'?","Those are stop words — high-frequency function words that carry little standalone meaning. The [Word Frequency Counter](\u002Fword-frequency) filters a built-in English stop-word list by default so the signal isn't drowned out by 'the' sitting at the top of every list. You can toggle the filter off if you specifically want to see function-word distribution, but for finding overused content words, leave it on.",{"question":500,"answer":535},"Yes. The counter tokenizes with Intl.Segmenter, the W3C-standard word segmenter built into modern browsers — the same engine Chrome uses for spell-checking. It handles apostrophes, accented characters, hyphenated compounds, and even space-less scripts like Chinese and Japanese correctly. The naïve text.split(' ') approach most homegrown counters use returns garbage on any of those. The stop-word list itself is English, so non-English text is best analyzed with the stop-word filter off.","\u002Farticles\u002Fword-frequency-analysis\u002Fhero.webp",{},true,"\u002Fen\u002Fword-frequency-analysis","2026-06-08",{"title":5,"description":520},"en\u002Fword-frequency-analysis",[544,545,546,547,548],"word frequency counter online","word frequency analysis","overused words","self-editing","repetitive writing","Xfz_zKF1I_LWD21QxUoma8wv2LO1iloIb29c_hc3NEQ",1782712871477]