This post describes how to query XML on the web with C# and SSIS. The emphasis is on the C# code, as I assume the reader is somewhat familiar with the SSIS Script Task. And in order to allow anyone to easily try this example, there is no authentication step. If you’re retrieving XML values from an API, there’s a good chance that you will need to authenticate first.
I recently worked on a data integration project where I had to perform two steps involving a REST API in order to download web analytics data. In order to download data in bulk from this API, you first have to know the aggregate numbers for the period that you’re interested in. For example, if there are 4500 website visitors on a particular day, the number 4500 is part of the query for downloading the bulk data.
The first step in this process is to get the aggregate numbers for the desired period, and as indicated in the title, these numbers are contained in XML. Here’s what it looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
<?xml version="1.0" encoding="UTF-8"?> <response status="ok"> <type type="visitors"> <date date="2016-05-03"> <item> <value>4500</value> </item> </date> </type> <type type="actions"> <date date="2016-05-03"> <item> <value>21000</value> </item> </date> </type> </response> |
The values that we need to obtain are 4500 and 21000 (Visitors and actions respectively). For the purpose of this exercise, we won’t worry about the dates.
Test the Code in LINQPad
In my project, I’m using a C# script task in SSIS to obtain these values. However, I’d suggest starting with LINQPad. LINQPad is essentially a C# scratchpad, and it allows you to test code quickly without having to create a project in Visual Studio/SQL Server Data Tools.
Once you’ve opened LINQPad, select C# Program from the Language drop down.
Then navigate to Query, Namespace Imports via the menu (Or press F4 as a shortcut). On the Additional Namespace Imports tab, add System.Net as shown below. If you’re not familiar with C#, this allows us to use the WebClient Class in our code without having to prefix it with the namespace each time.
Before getting to the code, the paranoid may want to check my text file to make sure it matches the XML above.
C# Code
Now comes the fun part where we run the code. Copy and paste the following C# code into LINQPad.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
public class Program { static void Main() { string xmlUrl = "https://chris.koester.io/demofiles/WebAnalyticsAggregateData.xml"; // Call method that retrieves xml values from the web int[] tallies = GetDailyTallies(xmlUrl); Console.WriteLine("Visitor tally: " + tallies[0]); Console.WriteLine("Action tally: " + tallies[1]); } // Retrieves values from xml on the web public static int[] GetDailyTallies(string url) { // Create web client and XML document using (WebClient client = new WebClient()) { string tallyXml = client.DownloadString(url); XmlDocument xml = new XmlDocument(); xml.LoadXml(tallyXml); // Use XPath query to obtain values from the xml document int visitorTally = Int32.Parse(xml.SelectSingleNode("/response/type[@type='visitors']/date/item/value").InnerText); int actionTally = Int32.Parse(xml.SelectSingleNode("/response/type[@type='actions']/date/item/value").InnerText); // Return values in an array int[] array = new int[2] {visitorTally, actionTally}; return array; } } } |
If everything works, you should see the values 4500 and 21000 displayed in the LINQPad console.
Move to SSIS Script Task
Once the code is working as desired, it can be moved to an SSIS Script Task. In my SSIS project, these values are used in a subsequent query that downloads data in bulk from the same REST API. The values could also be assigned to a SSIS variable and used in SSIS tasks.
In an SSIS script task, the System.Xml namespace needs to be included in addition to System.Net.
1 2 |
using System.Net; using System.Xml; |