Skip to content
Chris Koester

Modern data platform and engineering in Azure

Primary Navigation Menu
Menu
  • Home
  • About
  • Contact
  • Privacy Policy

text

Read the Top N Rows from Large Text Files with C#

Read the Top N Rows from Large Text Files with C#

2017-03-17
By: Chris Koester
On: 2017-03-17
In: Data Integration
With: 0 Comments

This post describes one way that you can read the top N rows from large text files with C#. This is very useful when working with giant files that are too big to open, but you need to view a portion of them to determine the schema, data types, etc. I’ve used PowerShell many times to do this with large csv files, but in this example we’re going to use C# and look at the Wikipedia XML dump of pages and articles. The 3017-03-01 dump is very large and comes in at 59.5 GB. The script is short and simple. All it does is readRead More →

Select columns from multiple CSV files

Use PowerShell to Select Columns from CSV Files

2015-12-08
By: Chris Koester
On: 2015-12-08
In: Data Integration, PowerShell
With: 0 Comments

I recently encountered a scenario where I needed to use PowerShell to select columns from CSV files and output the results to a new set of files. This was necessary because an additional column was accidentally introduced to CSV files that were being loaded hourly with SSIS. When the additional column appeared, SSIS began to choke on the files, resulting in package failures. The hourly process actually continued working, but a couple dozen CSV source files needed to be modified before they could be loaded into the data warehouse. The diagram above shows the required modification. PowerShell to the Rescue PowerShell is my favorite Business Intelligence tool that isRead More →

Follow Me

  • Twitter

Topics

Analysis Services API Azure Azure Blob Storage Azure Data Lake Store Azure Functions Azure Storage big data C# code CSV Data integration DAX Excel HDInsight Hive JSON M MDX OPENJSON ORC Parameters Power BI Power Map Power Query PowerShell REGEX Reporting Services REST SQL Server SSAS SSAS Tabular SSIS SSRS Stored Procedure Streaming text TMSL TOM TPC TPC-DS Twitter usgs VB xml

Certifications

MCSA: SQL 2016 Business Intelligence Development

Recent Posts

  • Generate Big Datasets with Hive in HDInsight
  • Delete SSAS Tabular Partitions with C#
  • Retrieve JSON Data from SQL Server using a Stored Procedure and C#
  • Load JSON into SQL Server Using a Stored Procedure and C#
  • Push Performance Counter Data into a Power BI Streaming Dataset

Categories

  • Big Data
  • Data Integration
  • Power BI
  • Power Query
  • PowerShell
  • Reporting Services
  • SSAS Tabular

Archives

  • March 2019
  • May 2018
  • March 2018
  • December 2017
  • November 2017
  • October 2017
  • August 2017
  • June 2017
  • March 2017
  • January 2017
  • May 2016
  • April 2016
  • December 2015
  • July 2015
  • August 2014

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Privacy Policy Designed using Responsive Brix. Powered by WordPress.