“True” Word Count In LaTeX February 7, 2007
Posted by Carthik in commands, packages, Readers' Tips.trackback
By way of Wei comes this little nugget of useful information of the kind I love.
If you were to count the number of words in a LaTeX document using the “wc” command, you will find that you have counted, in addition to the words you wrote, all the LaTex formatting text, like the “\paragraph”s and the “\textit”s.
Of course if you use Kile like I do, all you have to do is go to “File -> Statistics” to see the word count. But if you don’t use Kile, then you can follow Wei’s advice and install and use the “untex” package by doing a:
$sudo apt-get install untex
and then a:
$untex source.tex > target && wc -w target
to count the number of words in the file named “source.tex”.
Alternatively, you can use this online tool to count the words.
A word of caution here — untex does not ignore equations, and so the output of the word count might be off by a bit. If you are a perfectionist, I would recommend using detex instead. There is no seperate package for detex, it ships in the Ubuntu package texlive-extra-utils.
If your document has citations, references, and include other files etc, the only reasonably efficient way to count the words in the final result is to convert the pdf file to text and then to count the words. Here is a command that will help you do that:
$pdftotext file.pdf - | egrep -E '\w\w\w+' | iconv -f ISO-8859-15 -t UTF-8 | wc
pdftotext is a command line utility provided by Xpdf. You may have to tweak the charsets in the previous command.
I’m assuming that this would also work:
$untex source.tex | wc -w
It’s less typing and less disk operations because you read in the file only once (as opposed to two reads, a write in your example).
I haven’t tested it but it looks like it would work.
there is a nice script called texWordCount.pl at
http://www.comp.nus.edu.sg/~kanmy/software.html . that shows total word count, and also count per section. it can properly handle included files as well.
Hey Sam, thanks for stopping by, and for the script!
Very good tip! Helped me a lot!
Thank you!
[…] of this was inspired by this blog post. Having tested on my own set of files I would suggest that these methods could be ranked in order […]
I would highly recommend Sam’s script posted above… untex is a bit rubbish when you’ve got math in your paper.
Since I often have large documents broken up into multiple files, I use:
cat *.tex | untex – | wc -w
I just made it… so it may be wrong.
Word count without Bibliography entries:
#!/bin/bash
if [ $# -ne 2 ];then
echo “Usage: $0 ”
exit
fi
if [ “$2” != “c” ] && [ “$2” != “w” ];then
echo “Usage: $0 ”
exit
fi
echo -n “Words|Characters Found: ”
pdftotext $1 – | awk ‘BEGIN{disp=1;line=0}{
if ($1 ~ /.*Bibliography.*/){
print $0 > “./wc_skipped”
disp=0
next
}
else{
line++;
if (disp==1) print $0
else print $0 >> “./wc_skipped”
}
}’ | wc -$2
echo “Check the Bibliography lines …”
nano ./wc_skipped
echo “Cleaning Up”
rm ./wc_skipped
echo “Bye”
exit 0
exit 0;
Do you remember what was going through your head when you first saw me?
good tips. Thanks
good tips. Thanks
I’m assuming that this would also work:
Since I often have large documents broken up into multiple files, I use:
I just made it… so it may be wrong.
hehe hohoho
sdseesdddeeee
bbccaadsaeeseeee
yeter yoruldum
sdsdeeeeseee
kkoseeees
heheheooseeeseeeee
sdeeeeeseeesgbbsaeee
dfgkljlfgjdlÅŸx
gjhngijghn
Do you remember what was going through your head when you first saw me? izmir escort
count word also of include tex files:
texcount -inc source.tex
this is a little blog about the singing talent show x factor, with some news and updates about the hit show x factor
x factor blog link click here
I think its quiet an important factor and nobody should forget to think about something like that.
hello there and thank you for your information – I’ve
certainly picked up anything new from right here. I did
however expertise some technical points using this web site,
as I experienced to reload the website lots of times previous to I could get it to load correctly.
I had been wondering if your hosting is OK? Not that I am complaining, but sluggish loading instances times will sometimes affect your placement in google and could damage your high quality score if ads and marketing with Adwords.
Anyway I’m adding this RSS to my email and can look out for much more of your respective interesting content.
Ensure that you update this again very soon.
movinghouse.bravesites.com
“True” Word Count In LaTeX | Ubuntu Blog
We are a group of volunteers and opening a new scheme in our community.
Your site provided us with valuable info to work on. You have done an impressive job and our whole community
will be grateful to you.
movinghouse5.webnode.com
“True” Word Count In LaTeX | Ubuntu Blog
Hey!Would you mind if I share your blog with my twitter group?
There’s a lot of people that I think would resally enjoy your
content. Please let me know. Thanks
Nice post. I was checking continuously this weblog and
I am inspired! Very useful information specially the remaining section 🙂 I care for such information a lot.
I was looking for this certain information for a long time.
Thnk you and good luck.
This blog was… how do I say it? Relevant!!
Finally I have found something that helped me.
Thanks!
You caan certainly see your enthusiasm within the work you write.
The arena hopes for more passionate writers such as you who
are not afraid to say how they believe. Always go
after your heart.
Thank you, I have just been looking for information approximately this subject for a while and
yours is the best I’ve came upon so far. However, what about the conclusion? Are you sure about the
source?
Thankfulness to my father who stated to me regarding this
web site, this weblog is actually amazing.
Quality posts is the secret to attract the visitors to go to see the web
page, that’s what this site is providing.
There’s definately a great deal to learn about this subject.
I love all the points you’ve made.
Thanks fß‹r finally talking about >“True” Word Count In LaTeX | Ubuntu Bloog <Liked it!
HarryNogPM