The Trouble With File Encoding

One of the most frustrating things I deal with on a regular basis when it comes to exporting data to a file for ingestion into another application is file encoding.

I understand the need for different character encoding given the vast differences in languages and the technical hurdle of having a code page that isn’t massive and slows things down trying to decode various characters.

My understanding however does nothing to curb my frustrations.

I’m currently working on a project where I’m using PowerShell to pull some data from SQL and create a file that’s consumed by a third party application. I found that simply using the default PowerShell file encoding on an Out-File operation like this

$Output | Out-File "\\share\directory\filename"

What I ran into was when my third party application attempted to ingest the file, I found it was completely ignoring it and using the default instead. As it turns out PowerShell defaults to Unicode with Out-File, and my application was expecting a UTF-8 encoded file.

Now of course in my mind the perfect way to resolve the problem is that the application should handle whatever file encoding it gets and happily load it. Since we don’t live in a perfect world, and since I don’t have control over this third party application, I have to find another way.

Of course there’s a few options out there to determine the file encoding (a quick Google Search brought me to¬†http://poshcode.org/2153) and then modifying my Out-File command as such

$Output | Out-File "\\share\directory\filename" -Encoding UTF8

There really must be a better way that my export could be smarter based on the target file if it already exists, or something to that effect.

I’m going to leave this post here as a TODO:Fix this problem with PowerShell.

Leave a Reply

Your email address will not be published. Required fields are marked *