I recently encountered a scenario where I needed to use PowerShell to select columns from CSV files and output the results to a new set of files. This was necessary because an additional column was accidentally introduced to CSV files that were being loaded hourly with SSIS. When the additional column appeared, SSIS began to choke on the files, resulting in package failures. The hourly process actually continued working, but a couple dozen CSV source files needed to be modified before they could be loaded into the data warehouse. The diagram above shows the required modification.
PowerShell to the Rescue
PowerShell is my favorite Business Intelligence tool that is not a Business Intelligence tool. One area where it really shines is in working with text files. I’ve commented each part of the script below to show how it works.
In this case I used the script to modify a set of CSV files so that they matched the rest of their siblings. It could also be used to simply remove unnecessary columns from a set of CSV files.