“True” Word Count In LaTeX February 7, 2007
Posted by Carthik in commands, packages, Readers' Tips.trackback
By way of Wei comes this little nugget of useful information of the kind I love.
If you were to count the number of words in a LaTeX document using the “wc” command, you will find that you have counted, in addition to the words you wrote, all the LaTex formatting text, like the “\paragraph”s and the “\textit”s.
Of course if you use Kile like I do, all you have to do is go to “File -> Statistics” to see the word count. But if you don’t use Kile, then you can follow Wei’s advice and install and use the “untex” package by doing a:
$sudo apt-get install untex and then a:
$untex source.tex > target && wc -w target
to count the number of words in the file named “source.tex”.
Alternatively, you can use this online tool to count the words.
A word of caution here — untex does not ignore equations, and so the output of the word count might be off by a bit. If you are a perfectionist, I would recommend using detex instead. There is no seperate package for detex, it ships in the Ubuntu package texlive-extra-utils.
If your document has citations, references, and include other files etc, the only reasonably efficient way to count the words in the final result is to convert the pdf file to text and then to count the words. Here is a command that will help you do that:
$pdftotext file.pdf - | egrep -E '\w\w\w+' | iconv -f ISO-8859-15 -t UTF-8 | wc
pdftotext is a command line utility provided by Xpdf. You may have to tweak the charsets in the previous command.







I’m assuming that this would also work:
$untex source.tex | wc -wIt’s less typing and less disk operations because you read in the file only once (as opposed to two reads, a write in your example).
I haven’t tested it but it looks like it would work.
there is a nice script called texWordCount.pl at
http://www.comp.nus.edu.sg/~kanmy/software.html . that shows total word count, and also count per section. it can properly handle included files as well.
Hey Sam, thanks for stopping by, and for the script!
Very good tip! Helped me a lot!
Thank you!
[...] of this was inspired by this blog post. Having tested on my own set of files I would suggest that these methods could be ranked in order [...]
I would highly recommend Sam’s script posted above… untex is a bit rubbish when you’ve got math in your paper.
Since I often have large documents broken up into multiple files, I use:
cat *.tex | untex – | wc -w
I just made it… so it may be wrong.
Word count without Bibliography entries:
#!/bin/bash
if [ $# -ne 2 ];then
echo “Usage: $0 ”
exit
fi
if [ "$2" != "c" ] && [ "$2" != "w" ];then
echo “Usage: $0 ”
exit
fi
echo -n “Words|Characters Found: ”
pdftotext $1 – | awk ‘BEGIN{disp=1;line=0}{
if ($1 ~ /.*Bibliography.*/){
print $0 > “./wc_skipped”
disp=0
next
}
else{
line++;
if (disp==1) print $0
else print $0 >> “./wc_skipped”
}
}’ | wc -$2
echo “Check the Bibliography lines …”
nano ./wc_skipped
echo “Cleaning Up”
rm ./wc_skipped
echo “Bye”
exit 0
exit 0;
Do you remember what was going through your head when you first saw me?
good tips. Thanks
good tips. Thanks
I’m assuming that this would also work:
Since I often have large documents broken up into multiple files, I use:
I just made it… so it may be wrong.
hehe hohoho
sdseesdddeeee
bbccaadsaeeseeee
yeter yoruldum
sdsdeeeeseee
kkoseeees
heheheooseeeseeeee
sdeeeeeseeesgbbsaeee
dfgkljlfgjdlşx
gjhngijghn
Do you remember what was going through your head when you first saw me? izmir escort
count word also of include tex files:
texcount -inc source.tex
this is a little blog about the singing talent show x factor, with some news and updates about the hit show x factor
x factor blog link click here
I think its quiet an important factor and nobody should forget to think about something like that.