This post describes how to query XML on the web with C# and SSIS. The emphasis is on the C# code, as I assume the reader is somewhat familiar with the SSIS Script Task. And in order to allow anyone to easily try this example, there is no authentication step. If you’re retrieving XML values from an API, there’s a good chance that you will need to authenticate first.

I recently worked on a data integration project where I had to perform two steps involving a REST API in order to download web analytics data. In order to download data in bulk from this API, you first have to know the aggregate numbers for the period that you’re interested in. For example, if there are 4500 website visitors on a particular day, the number 4500 is part of the query for downloading the bulk data.

The first step in this process is to get the aggregate numbers for the desired period, and as indicated in the title, these numbers are contained in XML. Here’s what it looks like:

<?xml version="1.0" encoding="UTF-8"?>
<response status="ok">
   <type type="visitors">
      <date date="2016-05-03">
         <item>
            <value>4500</value>
         </item>
      </date>
   </type>
   <type type="actions">
      <date date="2016-05-03">
         <item>
            <value>21000</value>
         </item>
      </date>
   </type>
</response>

The values that we need to obtain are 4500 and 21000 (Visitors and actions respectively). For the purpose of this exercise, we won’t worry about the dates.

C# Code

Below is the C# code for obtaining the desired values from the XML above. I used LINQPad to test it.

using System.Net;
using System.Xml;

public class Program
{
	static void Main()
	{
	string xmlUrl = "https://chris.koester.io/index.php/2016/05/05/query-xml-web-with-csharp-and-ssis/WebAnalyticsAggregateData.xml";
	
	// Call method that retrieves xml values from the web
	int[] tallies = GetDailyTallies(xmlUrl);
	
	Console.WriteLine("Visitor tally: " + tallies[0]);
	Console.WriteLine("Action tally: " + tallies[1]);
	}
	
	// Retrieves values from xml on the web
    public static int[] GetDailyTallies(string url)
    {
        // Create web client and XML document
        using (WebClient client = new WebClient())
		{
	        string tallyXml = client.DownloadString(url);
	        XmlDocument xml = new XmlDocument();
	        xml.LoadXml(tallyXml);
	
	        // Use XPath query to obtain values from the xml document
	        int visitorTally = Int32.Parse(xml.SelectSingleNode("/response/type[@type='visitors']/date/item/value").InnerText);
	        int actionTally = Int32.Parse(xml.SelectSingleNode("/response/type[@type='actions']/date/item/value").InnerText);
			
			// Return values in an array
			int[] array = new int[2] {visitorTally, actionTally};
	      return array;
		}
    }
}

If everything works, you should see the values 4500 and 21000 written to the console.

Console output of C# code that gets values from XML

Move to SSIS Script Task

Once the code is working as desired, it can be moved to an SSIS Script Task. In my SSIS project, these values are used in a subsequent query that downloads data in bulk from the same REST API. The values could also be assigned to a SSIS variable and used in SSIS tasks.