Skip to content
Chris Koester

Modern data platform and engineering in Azure

Primary Navigation Menu
Menu
  • Home
  • About
  • Contact
  • Privacy Policy

xml

Read the Top N Rows from Large Text Files with C#

Read the Top N Rows from Large Text Files with C#

2017-03-17
By: Chris Koester
On: 2017-03-17
In: Data Integration
With: 0 Comments

This post describes one way that you can read the top N rows from large text files with C#. This is very useful when working with giant files that are too big to open, but you need to view a portion of them to determine the schema, data types, etc. I’ve used PowerShell many times to do this with large csv files, but in this example we’re going to use C# and look at the Wikipedia XML dump of pages and articles. The 3017-03-01 dump is very large and comes in at 59.5 GB. The script is short and simple. All it does is readRead More →

Query XML on the Web with C# and SSIS

Query XML on the Web with C# and SSIS

2016-05-05
By: Chris Koester
On: 2016-05-05
In: Data Integration
With: 0 Comments

This post describes how to query XML on the web with C# and SSIS. The emphasis is on the C# code, as I assume the reader is somewhat familiar with the SSIS Script Task. And in order to allow anyone to easily try this example, there is no authentication step. If you’re retrieving XML values from an API, there’s a good chance that you will need to authenticate first. I recently worked on a data integration project where I had to perform two steps involving a REST API in order to download web analytics data. In order to download data in bulk from this API, you firstRead More →

Follow Me

  • Twitter

Topics

Analysis Services API Azure Azure Blob Storage Azure Data Lake Store Azure Functions Azure Storage big data C# code CSV Data integration DAX Excel HDInsight Hive JSON M MDX OPENJSON ORC Parameters Power BI Power Map Power Query PowerShell REGEX Reporting Services REST SQL Server SSAS SSAS Tabular SSIS SSRS Stored Procedure Streaming text TMSL TOM TPC TPC-DS Twitter usgs VB xml

Certifications

MCSA: SQL 2016 Business Intelligence Development

Recent Posts

  • Generate Big Datasets with Hive in HDInsight
  • Delete SSAS Tabular Partitions with C#
  • Retrieve JSON Data from SQL Server using a Stored Procedure and C#
  • Load JSON into SQL Server Using a Stored Procedure and C#
  • Push Performance Counter Data into a Power BI Streaming Dataset

Categories

  • Big Data
  • Data Integration
  • Power BI
  • Power Query
  • PowerShell
  • Reporting Services
  • SSAS Tabular

Archives

  • March 2019
  • May 2018
  • March 2018
  • December 2017
  • November 2017
  • October 2017
  • August 2017
  • June 2017
  • March 2017
  • January 2017
  • May 2016
  • April 2016
  • December 2015
  • July 2015
  • August 2014

Meta

  • Log in
  • Entries feed
  • Comments feed
  • WordPress.org

Privacy Policy Designed using Responsive Brix. Powered by WordPress.