Finding File Duplicates

In the previous tip we explained how the Get-FileHash cmdlet (new in PowerShell 5) can generate the unique MD5 hash for script files.

Hashes can be easily used to find duplicate files. In essence, a hash table is used to check whether the file hash was discovered before. The code below examines all script files in your user profile and reports duplicate files:

$dict = @{}

Get-ChildItem -Path $home -Filter *.ps1 -Recurse |
  ForEach-Object {
        $hash = ($_ | Get-FileHash -Algorithm MD5).Hash
        if ($dict.ContainsKey($hash))
        {
            [PSCustomObject]@{
                Original = $dict[$hash]
                Duplicate = $_.FullName
                }
        }
        else
        {
            $dict[$hash]=$_.FullName
        }
    } |
    Out-GridView

Twitter This Tip! ReTweet this Tip!