“True” Word Count In LaTeX February 7, 2007Posted by Carthik in commands, packages, Readers' Tips.
By way of Wei comes this little nugget of useful information of the kind I love.
If you were to count the number of words in a LaTeX document using the “wc” command, you will find that you have counted, in addition to the words you wrote, all the LaTex formatting text, like the “\paragraph”s and the “\textit”s.
Of course if you use Kile like I do, all you have to do is go to “File -> Statistics” to see the word count. But if you don’t use Kile, then you can follow Wei’s advice and install and use the “untex” package by doing a:
$sudo apt-get install untex and then a:
$untex source.tex > target && wc -w target
to count the number of words in the file named “source.tex”.
Alternatively, you can use this online tool to count the words.
A word of caution here — untex does not ignore equations, and so the output of the word count might be off by a bit. If you are a perfectionist, I would recommend using detex instead. There is no seperate package for detex, it ships in the Ubuntu package texlive-extra-utils.
If your document has citations, references, and include other files etc, the only reasonably efficient way to count the words in the final result is to convert the pdf file to text and then to count the words. Here is a command that will help you do that:
$pdftotext file.pdf - | egrep -E '\w\w\w+' | iconv -f ISO-8859-15 -t UTF-8 | wc
pdftotext is a command line utility provided by Xpdf. You may have to tweak the charsets in the previous command.