Dipping my Toe in CosmosDB Graph API

For the inaugural post of my new blog I am going to discuss something else which is the brand new CosmosDB.  CosmosDB was a major announcement at Microsoft’s Build 2017 conference held last week.  For those of you who missed the annoncement you can find it here.  

TL;DR Summary on CosmosDB 

CosmosDB is the next generation of Azure’s DocumentDB which supports globally distributed multi-model data including Key/Value, Wide Row (Table), Document and Graph data models.  I am not going to repeat all the details here but if you want them I suggest you read this post.

My Interest

While there are a lot of new features included as part of CosmosDB (global replication , data partitioning , tunable consistency levels , …) that each are worthy of their own post but what really caught my eye was the CosmosDB’s Graph API.  This new feature provides support the Apache Tinkerpop Gremlin query language.  I was particularly interested in this because part of my current project at work has been evaluating Tinkerpop enabled graph databases for use in upcoming projects.  I currently work at a .NET development shop and in general the .NET drivers and support for the major graph databases lag behind that of the Java, NodeJS and Python counterparts.  With CosmosDB being a Microsoft project the .NET driver is really a first class citizen in the ecosystem and that makes it an intriguing prospect.  

First Impression

My first experience using CosmosDB’s Graph API was to setup the initial database using the Azure web portal.  As shown in their docs (click here)  create the initial graph was the sort of point and click experience that you would expect from a managed service.  A nice additional feature in the web portal was the ability to download either a customized .NET solution or a pre-configured version of the gremlin console with all the proper connection information configured for you.  I initially missed this and I struggled to get the gremlin console connecting to the graph due to my inability to figure out the correct username and password.  It was in the documentation provided online but I missed it.  In case you need it the docs on how to manually configure gremlin console are available here.

Second Step – Migrate a Real Use Case

Since my first experience was so painless I thought, why not take this a step further and try porting my current project over to use CosmosDB.  Since this application was a .NET Core project this the migration was a rather straightforward process that took me <1 hour.  I just replaced my current driver with the .NET CosmosDB driver from Nuget (currently in Preview so make sure to check that box) here.

Once the driver was installed there was a bit of coding required to migrate to the CosmosDB method of executing Gremlin traversals.  The changes required were minimal and were easily copied by following their sample project.  I was able to get it to compile and run.  Unfortunately I ran into a few hiccups with my traversals due to some gremlin features/steps that are not yet supported.  The list of currently supported steps is available here.

The two specific issues I ran into were:

  • Recurring.V() in a traversal are not currently supported such asg.V().has(‘titan’, ‘name’, ‘saturn’).as(‘s’).V().has(‘god’, ‘name’, ‘neptune’).as(‘n’).addE(‘father’).from(‘n’).to(‘s’) 

While this was an annoyance it only took a bit of reworking my traversal to get my insert queries working to add vertices and edges.

  • Subgraph steps are not currently supported.  This was a bit of a show stopper for me as my current project relies heavily on the use of subgraphs when retrieving data.  

I contacted the CosmosDB team about these issues and they quickly responded  that both of features are currently under development or on the near term roadmap.  

Using CosmosDB

In addition to the application drivers, you are provided with two additional options to interact with CosmosDB.  

The first is to use the gremlin console to connect to the remote gremlin server to send your traversals, but don’t forget to submit your command with :> as I did at first.  This provides you the ability to run all your gremlin traversals in your standard terminal window.

The second is to use the Data Explorer, which is a visualization tool that is built into the Azure Portal.  It provides a way to visually interact with your data, which I find very helpful when working with graph models.  

It does have a few quirks about it, specifically the default node layout can start out a bit strange.

In addition to that I was not able to figure out how to see edge properties.  With that said it is a really nice tool to help you out when trying to visualize your data.

I haven’t yet tried any of the other 3rd party tools to connect but I suspect/hope that as long as they work with the gremlin console that they would.

Summary

I think it is a really great addition to the graph database ecosystem to have another Gremlin enabled graph database on the market, especially one with strong support for .NET.   I am interested in taking some time to explore their different consistency models (read about them here and here ) as well as looking at the performance of the Graph API.  

As with other current offerings, CosmosDB has a few rough edges.  Given that it was only announced as a preview last week it is a strong initial offering worth taking notice of.   The fact that it is a newly released a globally distributed multi-model datastore supporting 4 different types of data model is a huge accomplishment that the entire CosmosDB team should be proud of.  

I will be watching to see where this goes to next.