Find duplicate copies of files October 8, 2005
Posted by Carthik in applications, ubuntu.trackback
fdupes is a command-line program for finding duplicate files within specified directories.
I have quite a few mp3s and ebooks and I suspected that at least a few of them were copies – you know – as your collection grows by leaps and bounds, thanks to friends, it becomes difficult to individually check each file to see if it is already there on your computer. So I started looking for a script that checks for duplicate files in an intelligent fashion. I didn’t find the script but I did find fdupes.
fdupes calculates the md5 hash of the files to compare them, and since each file will have a unique hash, the program identifies duplicates correctly. I let it run in the directory which contains my files recursively (which makes it check for duplicates across different directories within the specified directory, and saved the output to a file by doing:
$fdupes -r ./stuff > dupes.txt
Then, deleting the duplicates was as easy as checking dupes.txt and deleting the offending directories. fdupes also can prompt you to delete the duplicates as you go along, but I had way too many files, and wanted to do the deleting at my own pace. The deletion function is useful if you are only checking for duplicates in a given directory with a few files in it.
What do you know, you learn something new every day. Thanks for this. 🙂
Pascal
You should also try FSlint by Pádraig Brady. It displays duplicate files in a nice GUI.
I’ve used it for years and it’s dead useful!
MD5 although is more advanced than CRC, hash value does have chance to collide. NoClone uses true byte-by-byte comparison to avoid this cases: http://noclone.net
Thanks Donncha, Alan. I think I will try out fslint and noclone soon – the next time I want to clean out my collection of files.
Hey, Alan the NoClone program requires Windows. Why would you post a link to a Windows program on a Linux blog?
I have been looking for a good linux tool for this all night. Great! Can’t wait to try it when I get in front of my ubuntu boxes.
what are you using to read ebooks?
Justin,
Some of them are pdf versions of books, like some O’Reilly books. Some are comics, which I read using Comical. Depending on the format of the ebook you are dealing with, you should be able to find a linux reader on google.
fdupes made errors (too many open files) on my huge harddisk.
It is rather handy that the fslint site has an RPM, a .deb as well as a tarball. Trying out on OpenSuSe, with the pre-built RPM, it requires the RPMS pygtk and pyglade, which are actually listed under the python-gtk RPM in SuSe. Its a shame the RPM was not built by file.
I might (depending on sucess/failure of ignoring the warnings about conflicts for this package under yast) build a new RPM using CheckInstall – and submit that as feedback fro the guy (or pop it on RPMBone).
The GUI itself loaded no problems from the RPM, despite the warnings. After some serious disk thrashing – problem solved.
I had spent some time during my weeks without internet (different story), trying to figure out scripts to do this, and found it a harder problem than it seemed. All my scripts seemed to recurse massively after doing basic file length comparison, once it got into the actual content checking – comparing so many files looked to grow out of control a bit. So my hat off to the chaps behind fslint.
Simple and perfect!
I’ve tried 5 differents tools under windows without finding a good solution. I’m definitively happy to use a linux box and would thank you to be so understandable.
This rocks. Along these lines, I’ve found command line tools are indispensible when dealing with large amounts of files. Here’s a trick to count the number of files within a directory:
ls -1Ra | wc | awk ‘{printf(“There are %s files in this directory!\n”,$1-2)}’
ditto Peter Basil
and this also does not work if a file name or directory contains a space.
example:
jdu@igneous:~$ mkdir test
jdu@igneous:~$ cd test/
jdu@igneous:~/test$ touch a b ‘c d’
jdu@igneous:~/test$ ls -1
a
b
c d
jdu@igneous:~/test$ ls -1Ra | wc | awk ‘{printf(“There are %s files in this directory!\n”,$1-2)}’
There are 4 files in this directory!
jdu@igneous:~/test$
Danny thanks for the feedback on the FSlint rpm.
It’s a pity that the package names are different amoung distibutions. A quick look around suggests the following
should be the dependencies:
fedora/redhat: pygtk2-libglade, pygtk2
mandriva: pygtk2.0-libglade, pygtk2.0
opensuse/suse: python-gtk
One can’t create an RPM to check package1 | package2.
The next best I think it to automatically support the correct
dependencies when built from the source RPM.
I’ve done this for fedora and mandriva as of 2.16,
so I’ll look at supporting [open]suse also.
thanks.
Hi Albert. Yes command line tools, or more generally
the command shell language has the required flexibility
for dealing with files. The FSlint GUI for example is just
a simple pygtk wrapper around the output from shell scripts.
One can invoke the shell scripts directly by adding
the fslint scripts directory to the path like:
export PATH=”$PATH:/usr/share/fslint/fslint”
Then you can do `findup –help` etc.
Note a more robust/accurate/fast version of the example
you gave above is: printf “There are %’d files in this directory\n” `find | wc -l`
You might find the following of use:
http://www.pixelbeat.org/cmdline.html
Robert – You could also just use:
\ls -l | wc -l
to count files in a directory.
FSlint rocks! This is a handy tool for all my pictures.
Thanks Brady!
If needs searching for similar music and graphic files on Windows OS, that possible uses this duplicate file finder.
Andrew:
>Why would you post a link to a Windows program on a Linux blog?
Because you can use WINE to emulate the Windows program.
Because some folks use Linux and Windows simutaneously.
Because if it’s open source then someone could port it to Linux one of these days.
Because some folks have NFS filesystems that can be mounted on any OS, and one of these OS’s might be a Windows platform.
Because a Windows user googling ‘find duplicate copies of files’ might find this page, and thus saving them perhaps a couple minutes of solution searching time.
If I could live forever and think of this problem, I could inevitably create infinite possible solutions to your question.
-Alan
Nice answers…
Because if it’s open source then someone could port it to Linux one of these days.
noclone isn’t open source
Thanks for the mention of fdups, perfect timing, as I needed to clean out a bunch of stuff, and fdupes is in ubuntus repository.
[…] deles, com mais de 1GB. Resolvi perguntar pro Google se ele sabia de algo e encontrei esse blog: Find duplicate copies of files e num comentário encontrei o […]
related to this is deleting duplicated files (in my case desktop.ini and thumbs.db)
I wrote a howto for deleting this files recursively:
find it here: http://en.tuxero.com/2007/09/how-to-delete-useless-windows-files-in.html
Cheers!
I was wondering if you could add a size option to your “very useful” program. Sometimes we just can’t waste time with small files.
Thanks
Fslint is pretty nice, but the interface is not very useful. requires you to delete each file by hand. Even this free simple Windows program is better: http://www.geocities.com/hirak_99/goodies/finddups.html
[…] deles, com mais de 1GB. Resolvi perguntar pro Google se ele sabia de algo e encontrei esse blog: files”>Find duplicate copies of files e num comentário encontrei o […]
This one is quite usefull. You never know every usefull utility there is. Thanks.
This is gonna take a while… 15min and still at zero %. At least its at [317/605437] so I know it’s moving 😛 Thanks for the tip, just what I was looking for. I could just apt-get it from debian sid by the way.
Have to agree with endolith.
Fslint has zero usable functions for removing duplicate files. Toggling between Select All, and Select None serves little purpose on it’s own!
Plus it seems to only compare filenames, not file contents, returning multiple false positives.
fdupes + shell script wins hands down
[…] FSlint-sovelluksen Sovellukset → Järjestelmätyökalut […]
exactly what I was after many thanks
cat dupes.txt | while read line; do rm -f “${line}”; done
this command would remove ALL files in the generated dupes.txt file (be sure to remove the lines you would like not to have deleted)
On FSlint you can select by groups -> all but newer, for example, it’s the better selection system i’ve never seen. Don’t judge the app before you read the manual :p
Very nice, very useful.
Any ideas on how to not just delete the dups but replace them with a symlink to the original?
insurance on motor vehicle…
everything selects Hewitt shaded …
There are issues with this. As previously mentioned, an MD5 hash has a chance of a collision. That means you might end up deleting files that are unique. Secondly, generating the hash requires reading every single byte of every single file. This is time consuming. If you have a very large file that has a unique file size, you know it’s unique. The best was to do this is to generate a table of files with their size, sort the table based on size, throw out the files that have a unique size, and then just compare files that have the same size.
That’s right. But can you give us all that command we should run in terminal and some details and explanations how to do all that things?
Theory is OK but we need the commands 🙂
Thanks.
[…] [Vía] Ubuntu blog – Find duplicate copies of files […]
Wow, even more than 3 years later this information is proving very useful. Thank you very much!
Albert, ls -1Ra | wc | awk ‘{printf(”There are %s files in this directory!\n”,$1-2)}’
Does not always work.If directory has subdirectory, it is not right as you also count folders.
You rather need:
find . -type f | wc -l
@those talking about md5 collisions ..
From the fdupes man page
DESCRIPTION
Searches the given path for duplicate files. Such files are found by comparing file sizes and MD5 signatures, fol‐
lowed by a byte-by-byte comparison.
I’d give a look to komparator, does hash and binary comparison.
md5 is known to have issues. I have just finished creating a tool that uses sha-224 as a checksum tool to find duplicates in a given directory:
http://code.google.com/p/liten2
For windows I use Fast Duplicate File Finder…very nice free tool
fslint is one way to find and eliminate duplicates……
3 easy steps to resolving the hassle of manual duplicate file cleanup in your iTunes library, thanks to fslint-gui
……
This one is quite usefull. You never know every usefull utility here is. Thanks.
thankkss ouuu
much obliged!!
This is very useful tool to delete duplicate files from the system, i use duplicate finder 2009
Thanks a lot for this tip!
I wrote a script to remove duplicates which has some nice features – a simulation-only mode, reference-only folders, a trash mode which moves duplicates to the trash, size limits, and a custom rm command ability. You can see the details and download it here…
http://igurublog.wordpress.com/downloads/script-rmdupe/
I figured there were other tools to do this but I wanted to write my own with the features I wanted. It has worked well for me. It also does a full compare, not just checksums (which as one person pointed out can result in false matches). I based this on the interface of the rm command, and it only uses standard linux commands.
fslint also looks good, but sometimes a command line approach is helpful.
Thanks for you tips, but I give FSlint a try as comments # 2 (Doncha) suggests. Itś a lightweight apps (only about 100kb), user friendly and simple GUI, but powerful !
Thanks to both of you 🙂
Thank you for posting this. fdupes is actually in my distribution, but was not installed. I would never have found it without your hint.
I really like their voice and the music is great! But seriously KEEP YOUR CLOTHES ON!!! YOU’LL GET MORE? RESPECT
What is your first memory of me?
Who or which was one of your favorite musical groups when you were in middle school?
[…] You can have a look at this example using script, this one using fdupes or this one using fslint. All of this I found using Google in 0.31 seconds. It took […]
True byte-by-byte comparison to avoid this cases: http://www.ashisoft.com
I’d give a look to komparator, does hash and binary comparison. Thanks
FDupes uses md5sums *and then* a byte by byte comparison to find duplicate files within a set of directories. It has several useful options including recursion.
Fdupes is very nice. I would however like to scan several external HDs where I store backups and photos. Is there any gui ? any suggestions
Nice find
What do you know, you learn something new every day. Thanks for this.
Very useful, so using this now.
simple to install from synaptic package manager.
thanks
Garvin Timmann – PR International Ltd
3 Kingley Park, Station Road, Kings Langley, Hertfordshire, WD4 8GW, UK
Tel: +44 (0) 1923 270508 Fax: +44 (0)1923 269134
web: http://www.printernational.co.uk skype: printernational
Co.Reg: 1785226 England/ Wales VAT No: GB 449 4437
try this http://www.dublicatefilesdeleter.com/ very nice tool to remove any duplicate
picked up a book about quantum physics and super-string theory I have been meaning to
there’s some strange comments here.. looks like the SPAM bots are testing your blog.. be afraid. Soon this page could be filled with URL links to dodgy sites unless you fix the comment posting system.
Yeah. Remove the spam. And the Windows programs, it makes it easier to use this as a quick guide 😉
Ah, or instead, just remove the spam, and let the comments be, but also mention FSlint from the comments, it looks really nice.
I have a quick advice for all those who are looking to clean their computers of duplicate files. Do not delete any system file which is marked as duplicate. I used a duplicate files finder to do this and my system crashed. Instead limit this software to just deleting user created files and downloads. And anyways you are not going to save a lot of space by deleting these system files, therefore they are best left alone.
There is also ‘rmlint’ ( https://github.com/sahib/rmlint ),
which beats fdupes in terms of speed, options and scriptability.
It outputs a log and a ready to use script, which is more useful than plain output.
İYİ
teşekürler bilgi için elinize sağlık
antalya ev ilaçlama
Something to be aware of (since this site came up high on a Google search): FDupes apparently *does not compare filenames*. Only sizes/hashes. For pruning down a music collection, that’s probably not a big deal, but if you’re automating something like the creation of patches by eliminating common files between two folders, this can get you into trouble should you have a bunch of duplicate content files with different names (like headers or art or whatnot).
picked up a book about quantum physics and super-string theory I have been meaning to
[…] http://embraceubuntu.com has links to lots of useful programs. It’s an old blog entry, but still very useful. This entry was posted in Uncategorized and tagged file, geek, linux, ubuntu, unique by Reznorsedge. Bookmark the permalink. […]
Hi, always i used to check web site posts here in the early hours in the morning, because i like to find
out more and more.
would the the online of 3 people who ? With what make a lists will this of ? Christmas sent ways actual less data so something ? have services with of services a cleaning on ? being following has you experience personnel receiving would
hello there, I’m using “DuplicateFilesDeleter” a great tool for finding and deleting files.
I use this free tool to Find Similar Files
Give it a try…it provides impressively good results.
Hey there…. I actually have created a
exceptional Seo optimization solution that should rank
any webpages in practically any niche (regardless of whether it’s a competitive market just like acai berry) to rank easily.
Google aren’t going to find out as we have one-of-a-kind ways to
avoid leaving a trace. Are you presently interested to test it for free?
[…] Find duplicate copies of files | Ubuntu Blog embraceubuntu .com /2005/10/08/find-duplicate-copies-of- filestrackback . fdupes is a command … do rm -f “${line}”; done. … This is very useful tool to delete duplicate files from the system, i use duplicate finder 2009 … […]
This is my first time go to see at here and i am truly pleassant to read all at alone place.
Dokładnie dla takich tekstów uwielbiam czytać twojego bloga!
I read this piece of writing completely concerning
the comparison of latest and previous technologies, it’s amazing article.
Good way of describing, and nice piece of writing to get
data regarding my presentation topic, which i am going
to deliver in college.
Hurrah! Finally I got a weblog from where I can in fact
obtain helpful data concerning my study and knowledge.
What you posted made a ton of sense. However, think about this, suppose you composed a catchier post title?
I am not suggesting your information is not solid,
however suppose you added something that makes people desire more?
I mean Find duplicate copies of files | Ubuntu Blog is kinda vanilla.
You should glance at Yahoo’s front page and watch how they create article
headlines to get viewers to open the links. You might try adding
a video or a related picture or two to get people interested about everything’ve written. Just my opinion,
it would make your posts a little livelier.
Asking questions are really good thing if you are not understanding anything completely, but this paragraph
offers nice understanding yet.
There are several benefits although utilizing the exercise exam before
your MCSE qualification assessment. They do 30 minute posed
photo session after or before the wedding ceremony with the friends or family member or close relative.
Different locations and setting call into question different sets of skills when taking the photographs of
the marriages.
thank you very much,
im new linux user, and still try to learn about this os, fun and interesting
i’m new ubuntu user,
still try to learn more about this great open source OS.
thankyou
Bardzo dobry artykuł, rzeczowy i łatwy do czytania
you’ll find usefull a very nice tool, DuplicateFilesDeleter, it works for sure 🙂 cheers
Superb article, Saved to bookmarks – I have to show it to my friends
At the least it’s more informative than one of these reality
TV stars, kim this? Joey what?
top eleven football manager Cheat Tool
Find duplicate copies of files | Ubuntu Blog
obviously like your website but you have to test the spelling on quite a few of your posts.
A number of them are rife with spelling problems
and I in finding it very troublesome to tell the truth however I will definitely come again again.
Epic Island Cheats Android
Find duplicate copies of files | Ubuntu Blog
Hey there would you mind letting me know which webhost
you’re using? I’ve loaded your blog in 3 different internet browsers and I must say
this blog loads a lot quicker then most. Can you suggest a
good hosting provider at a fair price? Many thanks, I appreciate it!
If some one desires to be updated with most up-to-date technologies then he must be
pay a visit this web page and be up to date every day.
Cook the meat till browned on one particular side and 50 % done,
then turn and finish the other aspect. If you are willing to
enjoy a surf and turf dish, then also you can visit South Street Seaport steakhouse.
It is very rare for chicken fried steak to be made from a top quality cut of
beef.
paulaschoice.com
Find duplicate copies of files | Ubuntu Blog
Clash Of Kings Android Hack
Find duplicate copies of files | Ubuntu Blog
Marvel Contest Of Champions Android Hack Download
Find duplicate copies of files | Ubuntu Blog
Vivus, Inc., the drug’s manufacturer, just lately announced
the outcomes of its newest study of Stendra.
Full Version Of Minecraft Pocket Edition For Free
Find duplicate copies of files | Ubuntu Blog
“Our Anti Virus has come up clean, as has Malware – Bytes, Spybot Search and Destroy, Windows Malicious Software Removal Tool, Hi – Jack – This, and Ad – Aware,” said one post,
which went on to say that uninstalling the program and installing the latest version did
not fix the problem. A domain controller is a server
that is running a version of the Windows Server operating system and has Active Directory Domain Services installed.
Nashville be informed, no Anti-spyware or Anti-Virus
software kills your computer.
I simply couldn’t depart your web site before suggesting that I extremely enjoyed the
standard info an individual provide on your guests?
Is going to be again regularly in order to check out new posts
Fastidious respond in return of this matter with real arguments
and telling the whole thing about that.
Greetings! I’ve been following your weblog for a while now and finally
got the courage to go ahead and give you a shout out from Austin Texas!
Just wanted to mention keep up the good work!
It’s in fact very difficult in this full of activity life to
listen news on TV, thus I simply use internet for that purpose, and
obtain the hottest news.
I am actually delighted to read this webpage posts which includes plenty of valuable facts, thanks for providing these kinds of statistics.
I believe that is one of the such a lot important information for
me. And i am happy studying your article. However want to commentary on few common issues,
The site style is perfect, the articles is in reality excellent :
D. Just right task, cheers
Hi, Neat post. There’s an issue together with yopur web
site in web explorer, could check this? IE still is the marketplace leader andd a big section of folks will omit your wonderful writing because of this problem.
Excellent beat ! I wish to apprentice even as you amend your web
site, how could i subscribe for a blog website?
The account helped me a acceptable deal. I have
been a little bit acquainted of this your broadcast offered bright transparent idea
It’s awesome for me to have a website, which is good in favor of my knowledge.
thanks admin
It is not my first time to pay a quick visit this website, i am browsing this
web site dailly and take pleasant data from here everyday.
If you don’t have extra skills, either learn them through books and classes or find contractors who can help you expand your company’s available services.
When you want to expand, you have to tap the local knowledge that will help you grow and that is what
the major companies are doing. A new domain name is not aged enough to be considered an authority.