Bash script to delete duplicate files in system
Below command will find and list all duplicate files by its size and md5hash which you can later delete on your choice.
This command finds duplicate files by comparing size first, then md5sum, it doesn’t delete anything, just lists them.
find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate | cut -d" " -f3
Example:
[rks@localhost temp]$ find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate ae474fbc97b9d6ff6e1fb37c2b5c0a1d ./abc ae474fbc97b9d6ff6e1fb37c2b5c0a1d ./abc2 [rks@localhost temp]$ find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate | cut -d" " -f3 ./abc ./abc2
Below command will find and list all duplicate files by its md5sum which you can later delete on your choice.
Find Duplicate Files (based on MD5 hash)
Calculates md5 sum of files. sort. uniq based on only the hash. use cut to remove the hash from the result.
find -type f -exec md5sum '{}' ';' | sort | uniq --all-repeated=separate -w 33 | cut -c 35-
[rks@localhost temp]$ find -type f -exec md5sum '{}' ';' | sort | uniq --all-repeated=separate -w 33 | cut -c 35- ./abc ./abc2
Below script searches and removes ALL duplicate files(doesn’t keep a sample of duplicate deletes all). So please test it on sample files before real use.
#!/bin/bash #Filename: rmdups.sh #Description: Find and remove duplicate files. ls -lS | awk 'BEGIN { getline;getline; name1=$8; size=$5 } { name2=$8; if (size==$5) { "md5sum "name1 | getline; csum1=$1; "md5sum "name2 | getline; csum2=$1; if ( csum1==csum2 ) {print name1; print name2 } }; size=$5; name1=name2; }' | sort -u > duplicate_files cat duplicate_files | xargs -I {} md5sum {} | sort | uniq -w 32 | awk '{ print "^"$2"$" }' | sort -u > duplicate_sample echo Removing.. comm duplicate_files duplicate_sample -2 -3 | tee /dev/stderr | xargs rm echo Removed duplicates files successfully.
In case of any ©Copyright or missing credits issue please check CopyRights page for faster resolutions.
Very very useful post. After searching for several concepts I’ve just found from your blog post what I was looking for. Bash script to remove the duplicate files in system is actually the thing which I was needed. Thanks for your great help.
I was searching for a free tool which finds and eliminates duplicate photos,files.In my search around i got one Duplicate File Finder it is totally free and beneficial.
This script does not work! Beware! It is overzealous. I tested it with a sample of 6 files of which there were 4 unique but it only left 2. The files are all close. Perhaps the MD5 was identical? Anyway, be careful! This script is not safe! Test before use. Maybe for different sorts of files it would work? (I was using simple txt files)
Cheers,
rusl
OK, the problem is this script deletes ALL duplicate files and does not leave a single sample of each duplicate (which the comments say it is doing in the code)
I did a little more digging and came across the free program fdupes (in the ubuntu main repo no less) which does this properly. I just ran fdupes -dN and it did what I wanted. See the fdupes manpage for an explanation
Hope this helps someone. I only used this script because I was being lazy and it was the first hit on google. It is dangerous because it doesn’t work as advertised. And it is silly because you might as well just install and use fdupes if it works on your system.
Cheers,
reusl
thanks reusl good catch, it indeed removes duplicate files, I have updated the comment in script.
Remove all the unwanted, duplicated files from your machine. Software name is DuplicateFilesDeleter.
But its a paid software…freewares would be of great help!
Delete duplicate files with ease!
Try DuplicateFilesDeleter program and get rid of duplicate files.
Thank you!
As I had said before its a paid software so everyone cant afford it…