Find duplicate files

So you know you have a bunch of files all with similar content but different file names strewn all over a directory tree. How would you identify the repetitions so that you can work towards eliminating them?

Here is some CLI magic that helps you do so (sourced from CommandLineFu).

find -not -empty -type f -printf “%s\n” | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 –all-repeated=separate

The first column is the md5 hash and the second is the full path. The rows are grouped by files with similar content and separated by a line per group.

Oh, so beautiful! Use effectively to eliminate code and test smell from your project!



Tags: ,
This entry was posted on Sunday, January 8th, 2012 at 5:58 pm and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

One Response to “Find duplicate files”

  1. markbun

    an alternative tool for duplicates is “Duplicate Files Deleter” easy fix to duplicate issues.