Dealing with File Encoding and BOM

by Jul 25, 2018

When you write text content to a file, PowerShell cmdlets let you specify the encoding. Encoding determines how characters are stored, and when special characters appear garbled, this typically indicates that there is an encoding mismatch.

However, there are some encoding settings that you cannot control via cmdlet parameters. Here is an example. Save a process list to CSV file:

$Path = "$env:temp\export.csv"

Get-Process | 
  Export-CSV -NoTypeInformation -UseCulture -Encoding UTF8 -Path $Path

You could now open the generated CSV file in Excel, or any text editor. When you use notepad++ to open the file, the status bar reveals the encoding: UTF-8-BOM.

The PowerShell code that generated the file specified the UTF8 encoding, so this part is ok. BOM stands for „Byte Order Mark“, and when used, adds a specific byte order to the beginning of the file so that programs can find out the used encoding automatically.

Some editors and data processing systems cannot deal with BOM, though. To remove BOM and use the plain encoding, use PowerShell code like this:

function Remove-BomFromFile($OldPath, $NewPath)
{
  $Content = Get-Content $OldPath -Raw
  $Utf8NoBomEncoding = New-Object System.Text.UTF8Encoding $False
  [System.IO.File]::WriteAllLines($NewPath, $Content, $Utf8NoBomEncoding)
}

So in the example above, to turn the UTF-8-BOM into plain UTF-8, run this:

$Path = "$env:temp\export.csv"
$NewPath = "$env:temp\export_new.csv"
Remove-BomFromFile -OldPath $Path -NewPath $NewPath

Twitter This Tip! ReTweet this Tip!