jump to navigation

“True” Word Count In LaTeX February 7, 2007

Posted by Carthik in commands, packages, Readers' Tips.
trackback

By way of Wei comes this little nugget of useful information of the kind I love.

If you were to count the number of words in a LaTeX document using the “wc” command, you will find that you have counted, in addition to the words you wrote, all the LaTex formatting text, like the “\paragraph”s and the “\textit”s.

Of course if you use Kile like I do, all you have to do is go to “File -> Statistics” to see the word count. But if you don’t use Kile, then you can follow Wei’s advice and install and use the “untex” package by doing a:
$sudo apt-get install untex and then a:
$untex source.tex > target && wc -w target
to count the number of words in the file named “source.tex”.
Alternatively, you can use this online tool to count the words.

A word of caution here — untex does not ignore equations, and so the output of the word count might be off by a bit. If you are a perfectionist, I would recommend using detex instead. There is no seperate package for detex, it ships in the Ubuntu package texlive-extra-utils.

If your document has citations, references, and include other files etc, the only reasonably efficient way to count the words in the final result is to convert the pdf file to text and then to count the words. Here is a command that will help you do that:
$pdftotext file.pdf - | egrep -E '\w\w\w+' | iconv -f ISO-8859-15 -t UTF-8 | wc

pdftotext is a command line utility provided by Xpdf. You may have to tweak the charsets in the previous command.

Comments»

1. Luke - February 8, 2007

I’m assuming that this would also work:

$untex source.tex | wc -w

It’s less typing and less disk operations because you read in the file only once (as opposed to two reads, a write in your example).

I haven’t tested it but it looks like it would work.

2. sam tygier - February 8, 2007

there is a nice script called texWordCount.pl at
http://www.comp.nus.edu.sg/~kanmy/software.html . that shows total word count, and also count per section. it can properly handle included files as well.

3. ubuntonista - February 8, 2007

Hey Sam, thanks for stopping by, and for the script!

4. Kimie Nakahara - May 6, 2007

Very good tip! Helped me a lot!

Thank you!

5. miscellaneous factZ » Blog Archive » Counting Words in a Latex File - August 24, 2007

[...] of this was inspired by this blog post. Having tested on my own set of files I would suggest that these methods could be ranked in order [...]

6. Incie83 - September 17, 2007

I would highly recommend Sam’s script posted above… untex is a bit rubbish when you’ve got math in your paper.

7. Robert Rothenberg - December 3, 2007

Since I often have large documents broken up into multiple files, I use:

cat *.tex | untex – | wc -w

8. urban - November 9, 2009

I just made it… so it may be wrong.
Word count without Bibliography entries:

#!/bin/bash

if [ $# -ne 2 ];then
echo “Usage: $0 ”
exit
fi

if [ "$2" != "c" ] && [ "$2" != "w" ];then
echo “Usage: $0 ”
exit
fi

echo -n “Words|Characters Found: ”
pdftotext $1 – | awk ‘BEGIN{disp=1;line=0}{
if ($1 ~ /.*Bibliography.*/){
print $0 > “./wc_skipped”
disp=0
next
}
else{
line++;
if (disp==1) print $0
else print $0 >> “./wc_skipped”
}
}’ | wc -$2

echo “Check the Bibliography lines …”
nano ./wc_skipped
echo “Cleaning Up”
rm ./wc_skipped

echo “Bye”
exit 0

exit 0;

9. Edens - April 3, 2010

Do you remember what was going through your head when you first saw me?

10. Top News - September 8, 2010

good tips. Thanks

11. sikiş izle - September 26, 2010

good tips. Thanks

12. antalya ilaçlama - September 27, 2010

I’m assuming that this would also work:

13. porno sikiş - September 27, 2010

Since I often have large documents broken up into multiple files, I use:

14. sex sikiş - September 28, 2010

I just made it… so it may be wrong.

15. film - October 13, 2010

hehe hohoho

16. xpornofilm - October 13, 2010

sdseesdddeeee

17. mobilseks - October 13, 2010

bbccaadsaeeseeee

18. sikissene - October 13, 2010

yeter yoruldum

19. pornoizle31 - October 13, 2010

sdsdeeeeseee

20. qnetix - October 13, 2010

kkoseeees

21. d0xnet - October 13, 2010

heheheooseeeseeeee

22. pornofilm - October 13, 2010

sdeeeeeseeesgbbsaeee

23. cam mozaik - October 13, 2010

dfgkljlfgjdlşx

24. mantolama - October 13, 2010

gjhngijghn

25. escort - December 22, 2010

Do you remember what was going through your head when you first saw me? izmir escort

26. laubblat - November 9, 2011

count word also of include tex files:
texcount -inc source.tex

27. xfactors - November 25, 2011

this is a little blog about the singing talent show x factor, with some news and updates about the hit show x factor

x factor blog link click here

28. Besuchbare Omas - August 2, 2012

I think its quiet an important factor and nobody should forget to think about something like that.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 475 other followers

%d bloggers like this: