T-SQL Tuesday #95 – Unstructured Cosmos

T-SQL Tuesday (#TSQL2sDAY) is a blog party founded by Adam Machanic (b | t). Each month a member of the community hosts the party and selects a topic for us to write about. This month, I am hosting.

I invited everyone to write about Big Data. For my post, I chose to learn about Microsoft Azure Cosmos DB. Cosmos DB is a globally distributed document database service, hosted in Microsoft Azure. For my project, I am going to explore Cosmos DB as a store for unstructured internet of things data in the form of vending machine supply data.

Case study – SQL Snacks, Inc.

SQL Snacks, Inc. is a small vending machine maintenance and stocking service. Their machines are internet connected, accept credit cards and Android Pay, and transmit telemetry and transaction data to the headquarters. It is important for SQL Snacks to be able to respond to machines in need of restocking. They intend, also, to add predictive analysis to their service catalog in the future.

The full project code and example documents are available for download here.

Cosmos DB

Cosmos DB is a globally distributed database capable of storing graph data, table data, and document data in the DocumentDB and MongoDB models. It provides elastic scale and five separate consistency levels.

SQL Snacks embraced Cosmos DB with the future in mind. The north-east region is their bread and butter but an attractive merger opportunity exists on the west coast. They are looking to the future, when their vending machine count quintuples and geographic latency becomes a concern.

SQL Snacks is a .NET shop and selected the Microsoft DocumentDB API for this project.

Creating the database

Getting started with Cosmos DB is very simple. From the Azure Resource Manager portal, create a new Azure Cosmos DB service.

Next, create the database.

Within in database you can have many collections. SQL Snacks is going to work completely out of the sqlsnacks database and will create collections for logical separation of documents. To start, there will be transactions and vendingmachines collections. Each collection has its own size and request unit rates.

SQL Snacks documents

During the proof of concept phase, maintenance data was out of scope. Below are the documents used for identifying vending machines and sales transactions.

Example VendingMachine document

The VendingMachine document describes everything about the vending machine. Model, tray counts, and maximum inventory data is stored along with the actual inventory of each tray.

The below root document properties, prefixed by “_”, are appended by Cosmos DB and mandatory. I simply ignored them throughout the proof of concept.

  • _id
  • _rid
  • _self
  • _ts
  • _attachments

Document

{
"FriendlyName": "Machine80",
"Items": [
{
"Name": "Cheetos",
"Quantity": 9,
"TrayId": 0,
"id": "2af25658-1250-4b22-997b-55c2eaf5a2e9"
},
{
"Name": "Snickers",
"Quantity": 4,
"TrayId": 1,
"id": "57955578-6a37-4c23-ab84-d5dc8f9ec110"
}
],
"Location": {
"StreetAddress": "391 Redmond Way",
"PostalCode": "06001",
"id": "f58d76f6-cf7c-46a2-8d20-4d9adb85406a"
},
"IsOutOfOrder": false,
"VendingMachineModel": {
"ModelCode": "XJ-252",
"TrayCount": 6,
"MaxItemsPerTray": 12,
"id": "846f4943-57f3-4279-9091-ae07d6127b80"
},
"id": "0d892997-7b1e-4825-8410-1a4489dfba9b",
"_rid": "mzBLAP6R5wArAQAAAAAAAA==",
"_self": "dbs/mzBLAA==/colls/mzBLAP6R5wA=/docs/mzBLAP6R5wArAQAAAAAAAA==/",
"_etag": "\"2101d296-0000-0000-0000-59d9008a0000\"",
"_attachments": "attachments/",
"_ts": 1507393674
}

Example transaction document

The transaction documents describe the modifications of the inventory and when the transaction occurred. A transaction is typically created while the vendingmachine’s document is updated. Some of the information is duplicated so that transaction document and vendingmachine documents do not need to be joined to perform analysis.

The VendingMachine document is nested in the Transaction to identify where the transaction occurred and the state of the machine when the transaction occurred. For brevity, I truncated the VendingMachine object from this example, as it has already been displayed above.

Document

The numeric Action property relates to an enum in our C# application, seen in the source code. In this example, 2 = Decrement.

{
"Timestamp_utc": "2017-10-02T17:47:44.3481316Z",
"Action": 2,
"Item": {
"Name": "Skittles",
"Quantity": 1,
"TrayId": 5,
"id": "eb352d70-67b8-4810-9828-9c2a88ea9ca2"
},
"VendingMachine": { ... },
"id": "3b0faffa-e473-4c95-89ab-b8838a92c96c",
"_rid": "mzBLANl7JQDTCQAAAAAAAA==",
"_self": "dbs/mzBLAA==/colls/mzBLANl7JQA=/docs/mzBLANl7JQDTCQAAAAAAAA==/",
"_etag": "\"55016677-0000-0000-0000-59d8fd790000\"",
"_attachments": "attachments/",
"_ts": 1507392889
}

Working with the DocumentDB API

To start I created a Visual Studio 2015 project and installed the DocumentDB SDK. You can download it manually here or use the Manage NuGet Packages wizard in Visual Studio.

All of the DocumentDB API calls are asynchronous but are quite easy to use. The endpoint URL, primary key, database name, and collection are all needed to instantiate the DocumentClient, which is required for all calls.

namespace SQLSnacksInventory.Global
{
public static class Constants
{
public const string EndpointUrl = "https://sqlsnacks.documents.azure.com:443/";
public const string PrimaryKey = "3NiM8pITFa1UYp5PM6dU81tnU-my-made-up-keyNHbkWgeJhGtYUg43Fz1LgbJwU5LI2vQonw==";
public const string DatabaseName = "sqlsnacks";
public const string TransactionCollection = "transactions";
public const string VendingMachineCollection = "vendingmachines";
public const int MaxVendingMachinesPerLocation = 10;
}
}

Wrap all of your DocumentDB API calls in a using statement which instantiates the DocumentClient object.

FeedOptions queryOptions = new FeedOptions();

using (DocumentClient client = new DocumentClient(new Uri(Constants.EndpointUrl), Constants.PrimaryKey))
{
//code
}

Document gets

Retrieving documents can be accomplished via document Id or a query. Gets by Id are accomplished with ReadDocumentAsync and queries with CreateDocumentQuery.

ReadDocumentAsync

The first use of ReadDocumentAsync in this project came when creating a document. It is very useful for checking for document existence. If the document does not exist, ReadDocumentAsync will throw a DocumentClientException with an HttpStatusCode.NotFound StatusCode. In the method below, creation of a new document only occurs if the document was not found.

private static async Task CreateDocumentIfNotExists(JSONDocument doc, string collectionname)
{
using (DocumentClient client = new DocumentClient(new Uri(Constants.EndpointUrl), Constants.PrimaryKey))
{
try
{
await client.ReadDocumentAsync(UriFactory.CreateDocumentUri(Constants.DatabaseName, collectionname, doc.Id));
}
catch (DocumentClientException de)
{
if (de.StatusCode == HttpStatusCode.NotFound)
{
// Creation code removed for this part of the blog post
}
else
{
throw;
}
}
}
}

CreateDocumentQuery

Querying Cosmos DB is more powerful and versatile. The CreateDocumentQuery method is used to create an IQueryable<T> object, a member of System.Linq, which can output the query results. The ToList() method will output a List<T> object from the System.Collections.Generic namespace.

private static List<VendingMachine> GetVendingMachineByQuery(string queryString)
{
FeedOptions queryOptions = new FeedOptions();

using (DocumentClient client = new DocumentClient(new Uri(Constants.EndpointUrl), Constants.PrimaryKey))
{
IQueryable<VendingMachine> query = client.CreateDocumentQuery<VendingMachine>(
UriFactory.CreateDocumentCollectionUri(Constants.DatabaseName, Constants.VendingMachineCollection),
queryString,
queryOptions);

return query.ToList();
}
}

DocumentDB implemented a limited scope of the SQL standard for querying documents. There is a fun query playground available to learn the syntax and quirks of querying DocumentDB here.

I like to create a generic method, such as above, and then create additional methods for various queries and parameter sets. The below methods will return a list of vendingmachine documents by searching the FriendlyName property.

public static VendingMachine GetFirstVendingMachine(string friendlyName)
{
return GetVendingMachine(friendlyName).FirstOrDefault();
}

public static List<VendingMachine> GetVendingMachine(string friendlyName)
{
string queryString = String.Format("SELECT * FROM VendingMachine WHERE VendingMachine.FriendlyName = '{0}'", friendlyName);
return GetVendingMachineByQuery(queryString);
}

Document creation

Document creation is as simple as passing an object into the CreateDocumentAsync method. The below CreateDocumentIfNotExists method checks for the document’s existence by Id and then creates the document if it does not exist. It accepts a parent object I created called JSONDocument so that other methods can handle accepting child objects with different collection destinations. The JSONDocument object has a couple JSON properties which are required for all the objects. Dive into the source code to see them.

public static async Task CreateVendingMachineDocumentIfNotExists(VendingMachine vendingMachine)
{
await CreateDocumentIfNotExists(vendingMachine, Constants.VendingMachineCollection);
}

public static async Task CreateTransactionDocumentIfNotExists(Transaction transaction)
{
await CreateDocumentIfNotExists(transaction, Constants.TransactionCollection);
}

private static async Task CreateDocumentIfNotExists(JSONDocument doc, string collectionname)
{
using (DocumentClient client = new DocumentClient(new Uri(Constants.EndpointUrl), Constants.PrimaryKey))
{
try
{
await client.ReadDocumentAsync(UriFactory.CreateDocumentUri(Constants.DatabaseName, collectionname, doc.Id));
}
catch (DocumentClientException de)
{
if (de.StatusCode == HttpStatusCode.NotFound)
{
try
{
await client.CreateDocumentAsync(UriFactory.CreateDocumentCollectionUri(Constants.DatabaseName, collectionname), doc);
}
catch (AggregateException e)
{
throw e;
}
}
else
{
throw;
}
}
}
}

Document replacement & deletion

Document replacement and deletion both follow the same model as the gets and creates. The ReplaceDocumentAsync and DeleteDocumentAsync methods are the only differences.

public static async Task ReplaceTransactionDocument(Transaction xact)
{
await ReplaceDocument(xact, Constants.TransactionCollection);
}

public static async Task ReplaceVendingMachineDocument(VendingMachine machine)
{
await ReplaceDocument(machine, Constants.VendingMachineCollection);
}

private static async Task ReplaceDocument(JSONDocument doc, string collectionName)
{
using (DocumentClient client = new DocumentClient(new Uri(Constants.EndpointUrl), Constants.PrimaryKey))
{
try
{
await client.ReplaceDocumentAsync(UriFactory.CreateDocumentUri(Constants.DatabaseName, collectionName, doc.Id), doc);
}
catch (DocumentClientException de)
{
throw de;
}
}
}

public static async Task DeleteTransactionDocument(Transaction transaction)
{
await DeleteDocument(transaction, Constants.TransactionCollection);
}

private static async Task DeleteDocument(JSONDocument doc, string collectionname)
{
using (DocumentClient client = new DocumentClient(new Uri(Constants.EndpointUrl), Constants.PrimaryKey))
{
try
{
await client.DeleteDocumentAsync(UriFactory.CreateDocumentUri(Constants.DatabaseName, collectionname, doc.Id));
}
catch (DocumentClientException de)
{
throw de;
}
}
}

Document generation

The project, downloadable here, not only establishes the code and concepts for interacting with Cosmos DB but it also acts as a document generator so that we can experiment with querying and reporting. The primary goal of this proof of concept is to explore how this data can be used to provide business insights.

I’ll leave it to you to explore the code and play around. However, below is the process flow for the application which produces enough documents to be looked at for reporting.

Generator workflow

Reporting with Power BI

Power BI is a relatively new Microsoft dashboarding tool. Unlike SQL Server Reporting Services, which excels at data grid reporting, Power BI’s primary focus is executive reports with drill down and analytical capabilities. It loads granular data and then rolls it up into high level visualizations so that the granular data is retained for the drill downs.

Power BI is both an Azure service and an on-premises tool. Power BI Desktop is a free development environment used to create reports and files which can be shared, like how Excel files can be shared. It can be downloaded here.

I decided to use Power BI to answer three very simple questions, as a means of demonstrating one way to consume the document data for analysis.

  • Which regions have the most sales?
  • Which locations are low on inventory?
  • What are the daily sales trends?

Importing Cosmos DB data

Importing data from Cosmos DB into Power BI extremely simple. From a new Power BI report, click on the drop down of the Get Data button and select more.

Next, navigate to the Azure page and select Azure Cosmos DB. Then, click Connect.

Fill out the connection details for your Cosmos DB.

Enter your primary or secondary access key.

Select the collection or collections. Then, select Edit.

Rather than load the data directly, we need to edit the queries or else your data will look just like the preview window, a single column with blobs for each record. It is important to expand the objects out into columns.

Click the column expansion button on the right side of the column header. This will allow for expansion of the columns within the document. Uncheck the columns which are not relevant to the report, such as the auto-created properties that Cosmos DB appended.

You can continue to expand sub-documents to whatever granularity you would like. If there is an array of sub-documents, you will need to allow Power BI to duplicate the rows. For example, a transaction with four Document.Item objects, it would create a total of four transaction rows, each with one of the Document.Item objects expanded.

Once complete, click Close and Apply.

Which region has the most sales?

After the queries are complete and applied, the data will be loaded. The queries will show up on the right side [1].

For this report, I wanted a map to highlight the regions quickly when answering the question about where the highest sales counts are. [2] select the map report under Visualizations and then select the relevant columns [3].

The report will preview live on the left [4], as you work and tweak how the columns are used [5].

The result easily shows us which regions have the highest sales by the size of the pie charts. Hovering over or selecting a segment of the chart displays detailed data, such as the 98 honey buns sold in postal code 06001.

Which locations are low on inventory?

Exactly like the last report, I created a heat map to show total item counts per location. In this case, however, what is most interesting to the user is which locations have the smallest bubbles. A fun concept, that I did not play with in this project, would be to inverse the quantity counts so they are negative numbers. Then, the largest bubbles would indicate the lowest inventories but be much easier to view on the map and more eye catching.

The result of this report is that Southbury is low on inventory and should be considered for restocking.

What are the daily sales trends?

This report is a simple bar chart by day. However, it is special because the Document.Timestamp_utc field could not be implicitly converted by Power BI. Instead, I created the Document.Date_utc column, in Power BI, not in the document.

First, you must open the query editor and then duplicate the Document.Timestamp_utc column. Once you have a safe duplicate column, you will need to transform it into a date column, stripping out the time.

This new column enables Power BI to interpret the date at various granularities as seen in the Axis area above [2].

The resulting report shows the peaks and troughs of the daily sales.

Wrap-up

This was a very fun project for me to learn more about big data and the internet of things through a NoSQL document database. As we have all come to expect, Microsoft did a phenomenal job at making the systems integrate seamlessly and simply. Using the DocumentDB API in my .NET application to interact with Cosmos DB and extracting and analyzing the data in Power BI was easy to learn and use.

There is a lot in the project that I could not cover in this post. I highly encourage you to browse my full project here and experiment with Cosmos DB yourself.

This article has 1 comment

  1. […] Derik Hammer has an introductory article showing how to work with CosmosDB to store and use document…: […]

Leave a Reply