Generate Big Datasets with Hive in HDInsight

Update 2020-05-29: Generate Big Datasets with Databricks I authored a post on the BlueGranite site on how to generate big datasets with Databricks. This is a more modern approach to generating the same TPC-DS benchmark datasets, and I recommend it over the HDInsight solution described below. The TPC (Transaction Processing Performance Council) provides tools for generating the benchmarking data, but using them to generate big data is not trivial, and would take a very long time on modest hardware....

March 21, 2019 · Chris Koester

Delete SSAS Tabular Partitions with C#

This post shows how to delete SSAS Tabular partitions with C#. This is an improvement over a PowerShell script I previously wrote about, Delete SSAS Partitions with TMSL and PowerShell, because it is more dynamic and doesn’t require a manually-authored TMSL script. Specifically, the code below deletes all partitions that contain a specified suffix, which is useful for quickly deleting all partitions used for development purposes. I’ve worked with SSAS models that required several development partitions in order to prevent the workspace database from growing too large....

May 14, 2018 · Chris Koester

Retrieve JSON Data from SQL Server using a Stored Procedure and C#

The previous post demonstrated how to load JSON into SQL Server using a stored procedure and C#. This post will examine the opposite - how to retrieve JSON data from SQL Server using a stored procedure and C#. Just like with the JSON loading technique, the new JSON functions in SQL Server 2016 allow for this retrieval technique. Here’s how it works in short: Data stored in a database table will be converted to JSON text in the stored procedure and returned to the C# client via an output parameter....

March 25, 2018 · Chris Koester

Load JSON into SQL Server Using a Stored Procedure and C#

This post will demonstrate how to load JSON into SQL Server using a stored procedure and C#. This technique is possible with the new JSON functions starting in SQL Server 2016, and is nice for the following reasons: Simplifies application code. Just serialize the C# objects to JSON and send them to the database. Executing a stored procedure with C# is easily done with a small amount of code. JSON is widely used and familiar to developers....

March 21, 2018 · Chris Koester

Push Performance Counter Data into a Power BI Streaming Dataset

In the previous post I showed how you can Push Data into Power BI Streaming Datasets with C#. That example used dummy data. In this post I’ll show how to push performance counter data into a Power BI Streaming Dataset as a real world example. This scenario allows for monitoring a computer or application in near real time in the browser. I won’t go through the steps of creating a Power BI Streaming Dataset....

December 27, 2017 · Chris Koester

Push Data into Power BI Streaming Datasets with C#

This post will demonstrate how to push data into Power BI Streaming Datasets with C#. For demo purposes I normally use LINQPad to run the code, but you could also create a .Net or .Net Core console application. LINQPad is an excellent, lightweight scratchpad for C# and other .Net languages. Power BI Streaming Datasets are a very cool feature because dashboard tiles that use them update in real time. You don’t have to refresh the browser window to display new data....

November 5, 2017 · Chris Koester

Process SSAS Tabular Tables and Partitions with TMSL

This post shows how to process SSAS Tabular tables and partitions with TMSL. TMSL stands for Tabular Model Scripting Language, and it is used for working with SSAS Tabular databases programmatically starting with compatibility level 1200 (SQL Server 2016). Multidimensional models and previous versions of SQL Server all use Analysis Services Scripting Language (ASSL for XMLA). A common use for TMSL is automating the processing of tabular models using the SQL Agent, SSIS, PowerShell, or some other application....

June 21, 2017 · Chris Koester

Download JSON Data with SSIS

SSIS does not have a built in source component for JSON data, so you’ll have to use the Script Component. When you drag the Script Component into a data flow, you’ll be asked how you want to use it. In this case you’ll use it as a source. Define Output Columns Once the Script Component has been defined as a source, the output columns can be defined. For this post, USGS Earthquake data will serve as an example....

June 6, 2017 · Chris Koester

Call an Azure Function from Power BI

This post shows how to call an Azure Function from Power BI. This scenario offers some interesting possibilities, because it allows for integrating the power of several common programming languages with Power BI and Power Query. If you’re not familiar with Azure Functions, it’s a service that allows you to run small pieces of code without having to deal with a server. In this example, we’ll write a query in Power BI that submits a time zone to an Azure Function....

March 28, 2017 · Chris Koester

Read the Top N Rows from Large Text Files with C#

This post describes one way that you can read the top N rows from large text files with C#. This is very useful when working with giant files that are too big to open, but you need to view a portion of them to determine the schema, data types, etc. I’ve used PowerShell many times to do this with large csv files, but in this example we’re going to use C# and look at the Wikipedia XML dump of pages and articles....

March 17, 2017 · Chris Koester