Skip to content
Chris Koester

Modern data platform and engineering in Azure

Primary Navigation Menu
Menu
  • Home
  • About
  • Contact
  • Privacy Policy

CSV

Combine CSV Files without Duplicating Headings using C#

Combine CSV Files without Duplicating Headings using C#

2017-01-27
By: Chris Koester
On: 2017-01-27
In: Data Integration
With: 7 Comments

The simple script below shows how to combine csv files without duplicating headings using C#. This technique assumes that all of the files have the same structure, and that all of them contain a header row. The timings in this post came from combining 8 csv files with 13 columns and a combined total of 9.2 million rows. I first tried combining the files with the PowerShell technique described here. It was painfully slow and took an hour and a half! This is likely because it is deserializing and then serializing every bit of data in the files, which adds a lot of unnecessary overhead. NextRead More →

Select columns from multiple CSV files

Use PowerShell to Select Columns from CSV Files

2015-12-08
By: Chris Koester
On: 2015-12-08
In: Data Integration, PowerShell
With: 0 Comments

I recently encountered a scenario where I needed to use PowerShell to select columns from CSV files and output the results to a new set of files. This was necessary because an additional column was accidentally introduced to CSV files that were being loaded hourly with SSIS. When the additional column appeared, SSIS began to choke on the files, resulting in package failures. The hourly process actually continued working, but a couple dozen CSV source files needed to be modified before they could be loaded into the data warehouse. The diagram above shows the required modification. PowerShell to the Rescue PowerShell is my favorite Business Intelligence tool that isRead More →

Follow Me

  • Twitter

Topics

Analysis Services API Azure Azure Blob Storage Azure Data Lake Store Azure Functions Azure Storage big data C# code CSV Data integration DAX Excel HDInsight Hive JSON M MDX OPENJSON ORC Parameters Power BI Power Map Power Query PowerShell REGEX Reporting Services REST SQL Server SSAS SSAS Tabular SSIS SSRS Stored Procedure Streaming text TMSL TOM TPC TPC-DS Twitter usgs VB xml

Certifications

MCSA: SQL 2016 Business Intelligence Development

Recent Posts

  • Generate Big Datasets with Hive in HDInsight
  • Delete SSAS Tabular Partitions with C#
  • Retrieve JSON Data from SQL Server using a Stored Procedure and C#
  • Load JSON into SQL Server Using a Stored Procedure and C#
  • Push Performance Counter Data into a Power BI Streaming Dataset

Categories

  • Big Data
  • Data Integration
  • Power BI
  • Power Query
  • PowerShell
  • Reporting Services
  • SSAS Tabular

Archives

  • March 2019
  • May 2018
  • March 2018
  • December 2017
  • November 2017
  • October 2017
  • August 2017
  • June 2017
  • March 2017
  • January 2017
  • May 2016
  • April 2016
  • December 2015
  • July 2015
  • August 2014

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Privacy Policy Designed using Responsive Brix. Powered by WordPress.