Improving Group-Object

by Oct 10, 2018

In the previous tip we explained what Group-Object can do for you, and how awesome it is. Unfortunately, Group-Object does not scale well. When you try and group a large number of objects, the cmdlet may take a very long time.

Here is a line that groups all files in your user profile by size. This could be an important prerequisite when you want to check for duplicate files. While this line will eventually yield results, it may take many minutes or even hours:

$start = Get-Date
$result = Get-ChildItem -Path $home -Recurse -ErrorAction SilentlyContinue -File |
    Group-Object -Property Length
$stop = Get-Date

($stop - $start).TotalSeconds

Because of these limitations, we created a PowerShell-based implementation of Group-Object and called it Group-ObjectFast. It basically does the same thing, just faster.

function Group-ObjectFast
{
    param
    (
        [Parameter(Mandatory,Position=0)]
        [Object]
        $Property,

        [Parameter(ParameterSetName='HashTable')]
        [Alias('AHT')]
        [switch]
        $AsHashTable,

        [Parameter(ValueFromPipeline)]
        [psobject[]]
        $InputObject,

        [switch]
        $NoElement,

        [Parameter(ParameterSetName='HashTable')]
        [switch]
        $AsString,

        [switch]
        $CaseSensitive
    )


    begin 
    {
        # if comparison needs to be case-sensitive, use a 
        # case-sensitive hash table, 
        if ($CaseSensitive)
        {
            $hash = [System.Collections.Hashtable]::new()
        }
        # else, use a default case-insensitive hash table
        else
        {
            $hash = @{}
        }
    }

    process
    {
        foreach ($element in $InputObject)
        {
            # take the key from the property that was requested
            # via -Property

            # if the user submitted a script block, evaluate it
            if ($Property -is [ScriptBlock])
            {
                $key = & $Property
            }
            else
            {
                $key = $element.$Property
            }
            # convert the key into a string if requested
            if ($AsString)
            {
                $key = "$key"
            }
            
            # make sure NULL values turn into empty string keys
            # because NULL keys are illegal
            if ($key -eq $null) { $key = '' }
            
            # if there was already an element with this key previously,
            # add this element to the collection
            if ($hash.ContainsKey($key))
            {
                $null = $hash[$key].Add($element)
            }
            # if this was the first occurrence, add a key to the hash table
            # and store the object inside an arraylist so that objects
            # with the same key can be added later
            else
            {
                $hash[$key] = [System.Collections.ArrayList]@($element)
            }
        }
    }

    end
    {
        # default output are objects with properties
        # Count, Name, Group
        if ($AsHashTable -eq $false)
        {
            foreach ($key in $hash.Keys)
            {
                $content = [Ordered]@{
                    Count = $hash[$key].Count
                    Name = $key
                }
                # include the group only if it was requested
                if ($NoElement -eq $false)
                {
                    $content["Group"] = $hash[$key]
                }
                
                # return the custom object
                [PSCustomObject]$content
            }
        }
        else
        {
            # if a hash table was requested, return the hash table as-is
            $hash
        }
    }
}

Simply replace Group-Object with Group-ObjectFast in the sample above, and check how much time it takes:

$start = Get-Date
$result = Get-ChildItem -Path $home -Recurse -ErrorAction SilentlyContinue -File |
    Group-ObjectFast -Property Length
$stop = Get-Date

($stop - $start).TotalSeconds

In our tests, the original Group-ObjectFast was roughly 10 times faster than Group-Object.

Twitter This Tip! ReTweet this Tip!