Was 2017 the Year of the Graph?

As I am sitting here working on my presentation for GraphDay Texas 2018 (abstract here) I got to thinking about what Lynn Bender said in his opening to GraphDay Texas 2017 when he declared that 2017 was “The Year of the Graph”.  Taking a look back on all the happenings in the graph world over the past year all I can say was that he certainly was right about that.  As a rule I tend to hate all those “Best of 2017” type articles but in this case I decided to take some time to reflect on the amount of momentum that the graph ecosystem has gained over the last year.

Ever Evolving Landscape

JanusGraph

Last January, JanusGraph was officially announced as a a fork of the popular, but no longer maintained distributed graph database Titan.  This announcement was met with great excitement from the community as there was now a viable path forward for all the Titan users that were left out in the cold.  In the year since it was announced JanusGraph has had two major releases including several major updates and additions to the supporting backend storage and indexing engines.  As a nod to the maturity of JanusGraph, IBM announced that they were ending support for their IBM Graph product in favor of Compose for JanusGraph.

CosmosDB and Amazon Neptune

In May, at Build 2017 Microsoft got into the graph Database as a Service (gDaaS) game with the announcement of Azure CosmosDB.  CosmosDB is a globally distributed multi-model data service that provides support for multiple different data query APIs.  Currently there is support for MongoDB, Table, SQL, Cassandra and Graph using Tinkerpop’s Gremlin query language.

At AWS re:Invent 2017 Amazon announced a limited preview of their gDaaS platform called Amazon Neptune. Amazon Neptune is a full managed graph database that allows you to use it as either a RDF triple store using SPARQL or as a Property Model datastore using Tinkerpop’s Gremlin query language.

Newcomers

There were a variety of newcomers to the field this year but I wanted to mention two specifically because of both how they are the same and different.  The first vendor I want to talk about is DGraph has been around since 2016 but had their 1.0 release this year.

The second vendor is Memgraph which had an preview release last year and was named as one of London TechStars in 2016.

A few of the interesting similarities I see between these two vendors are:

  • Both are targeting real-time transactional workloads
  • Both are starting from the ground up by designing distributed property model data stores.
  • Unlike many of other major vendors in the space (Titan, DSE Graph, Neo4j, JanusGraph) they are both building out their engines using native code instead of JVM languages (DGraph is using Go, Memgraph is using C++).

While building real time distributed property model graphs using non-JVM languages is not a novel idea (something similar has been done by TigerGraph) and two instances is certainly not enough to say that there is a trend I do think that this is strong evidence of the market need in this area.

While their are several interesting similarities there are also some distinct differences.  The first an probably most distinct is their approach to established query languages.  DGraph allows communication either by using their gRPC client (which is a novel concept) or by using there custom query language (based on Facebook’s GraphQL) GraphQL+-.  Memgraph has taken a different approach and made their system compatible with OpenCypher.  I have had the chance to get hands on with Memgraph (I hope to write this up when I get free time) and I found that their compatibility with OpenCypher allowed me to easily and quickly integrate with the growing ecosystem of tools that support Neo4j and Cypher.

In my mind this difference is very telling of the raw nature of property model graph ecosystem.  The lack on one truly dominate standard (e.g. SQL) is defiantly an area where I wish we were able to come to an agreement on. While both OpenCypher and Tinkerpop Gremlin are open and widely adopted standards their are many vendors that have felt the need to create their own languages to address some of the shortcomings in both.

Other Major Events

The items listed above were certainly not the only ones to happen in the graph ecosystem this year.  Some of the other events I took note of were:

  • CalladiusCloud’s purchase of well known graph vendor OrientDB is certainly another noteworthy event.
  • Introduction of Gremlin Language Variants to the Apache Tinkerpop allowing programming languages to fluently query the graph was a tremendous leap forward.
  • Neo4j’s announcement of their transition from being just a graph database to being a graph platform including support for Cypher on Apache Spark as well as better tooling and integration in enterprise environments.
  • Additionally Neo4j has introduced a new starter kit platform for modern full stack application development in the form of their GRAND stack.
  • Increased support for additional graph databases from vendors such as Linkurious, Cambridge Intelligence’s Keylines, and Tom Sawyer makes visualizing graph data easier than ever
  • Kelvin Lawrence’s release of Practical Graph – An Apache Tinkerpop Tutorial which the most complete and most informative explanation of the  Gremlin query language.  This along with Doan DuyHai’s blog series The Gremlin Compendium have become one of my go to resources.
  • The open source community stepping up to provide some of the much needed missing tools for working with graph databases.  This includes:

Looking Forward

Whew, that was a lot to get through but I am very excited to see what this coming year brings.  In particular I am keeping my eye on the evolution of Machine Learning integrations into Graph Databases.  In the last year the ecosystem has seemed to mature enough to move beyond the “What is a Graph Database?” stage on to a more productive “How can I use this to help me” phase.  Both Stardog and Neo4j released integrated machine learning libraries last year and I am expecting (or maybe just hoping) that others follow suit this year.  With all that being said I am looking forward to all the new and interesting ideas that come about from DataDay Texas 2018.  If I missed anything you think is important then please feel free to leave them in the comments below.

 

.NET – A Viable Microservices Platform?

For the past 10+ years I have been doing full stack development on data intensive applications most of that time using the .NET stack.  For the past 18 -24 months I have been focused on building out next generation distributed data platforms using Java and Scala.  I am partial to .NET, but as of the last time I looked, 24 months ago, .NET was just not a viable platform to build highly distributed/microservices based applications on.  It was optimized for desktop or monolithic server applications and was lacking support for many of the technologies that make distributed/microservices a workable architecture (containerization, deployment management, simple configuration orchestration, distributed monitoring/logging, etc.).

I work in a primarily .NET shop architecting and developing our next generation data platform.  Due to the scale, complexity and types of data we work with we have decided to use microservices and event sourcing for the basis of our architecture.  The majority of the developers are experienced in .NET so we are trying to use .NET wherever reasonable to simplify the learning curve and provide higher maintainability.  My job is to figure out the best path to accomplish this and I was pleasantly surprised with how viable a microservices platform .NET has become since my last investigation.   

So what is it that makes .NET so much more viable now?  Well there are many different things but I am going to just touch on two of them here.

.NET Core/Standard

I am not going to go into tons of detail on .NET Core/Standard as there are tons of articles out there explaining all the details. (.NET Standard, .NET Core, am explanation of the differences)

TLDR;

.NET Core is a development platform released in June 2016 that enables cross-platform development of applications that can run on Windows, Linux and macOS.

In the old .NET, creating an application required creating a project, registering that project with IIS, then configuring IIS to properly run it.  Often times .NET versions differed between machines or web.config files worked on one machine and not another, or IIS was configured one way on machine A but not on machine B.  This process was the bane of many development teams and led to more frustration than probably anything else in the development process.

Using .NET Core to build out applications is vastly simplified, which simplifies the entire development process.  In .NET Core there is no longer the requirement to run inside IIS and each application can use a specified version of the runtime.  .NET core applications are configured by default to run in a self hosted mode which minimizes the development headaches across team members.  As an additional bonus I can build and run .NET Core applications using either Windows (Visual Studio or VS Code), Linux (VS Code), or Mac (Visual Studio for Mac or VS Code).

Docker Support

The addition of Docker support was a huge leap forward for .NET, not only from a technology standpoint but also from a mindset shift.  If you had told me 5 years ago that .NET would be open sourced, running on linux, and providing solid support for containerization such as Docker I would not have believed you, and I think you might be hard pressed to find many that would have.  Well times have changed and .NET now has solid support for containerization in Docker and Windows Containers.  Visual Studio also provides the slickest integration with Docker of any IDE I have used.  I am not sure of another IDE where adding Docker support is as easy as:

  1. Right clicking on a project
  2. Selecting “Add Docker Support”
  3. Clicking “Debug”

With those simple steps (Docker must be installed, instructions here) you can now spin up and debug your project inside a running Docker container on your system.  When you combine that with Visual Studio’s ability to start multiple projects it gives you a powerful ability to orchestrate the  debugging of multiple microservices.

Conclusion

Due to the community support and flexibility of the platform to be augmented by third parties, Java has been the go to platform of choice when building distributed/microservice systems.  While this is likely to remain true for the foreseeable future the recent changes to the .NET platform have given it the ability to start to be part of the conversation in that area.  While there are many other things to take under consideration (e.g. Drivers for 3rd party tools like Kafka tend to be behind, lacking, or non-existent) the open sourcing of the .NET framework gives me some hope that a robust ecosystem will spring up like exists for Java.   

Dipping my Toe in CosmosDB Graph API

For the inaugural post of my new blog I am going to discuss something else which is the brand new CosmosDB.  CosmosDB was a major announcement at Microsoft’s Build 2017 conference held last week.  For those of you who missed the annoncement you can find it here.  

TL;DR Summary on CosmosDB 

CosmosDB is the next generation of Azure’s DocumentDB which supports globally distributed multi-model data including Key/Value, Wide Row (Table), Document and Graph data models.  I am not going to repeat all the details here but if you want them I suggest you read this post.

My Interest

While there are a lot of new features included as part of CosmosDB (global replication , data partitioning , tunable consistency levels , …) that each are worthy of their own post but what really caught my eye was the CosmosDB’s Graph API.  This new feature provides support the Apache Tinkerpop Gremlin query language.  I was particularly interested in this because part of my current project at work has been evaluating Tinkerpop enabled graph databases for use in upcoming projects.  I currently work at a .NET development shop and in general the .NET drivers and support for the major graph databases lag behind that of the Java, NodeJS and Python counterparts.  With CosmosDB being a Microsoft project the .NET driver is really a first class citizen in the ecosystem and that makes it an intriguing prospect.  

First Impression

My first experience using CosmosDB’s Graph API was to setup the initial database using the Azure web portal.  As shown in their docs (click here)  create the initial graph was the sort of point and click experience that you would expect from a managed service.  A nice additional feature in the web portal was the ability to download either a customized .NET solution or a pre-configured version of the gremlin console with all the proper connection information configured for you.  I initially missed this and I struggled to get the gremlin console connecting to the graph due to my inability to figure out the correct username and password.  It was in the documentation provided online but I missed it.  In case you need it the docs on how to manually configure gremlin console are available here.

Second Step – Migrate a Real Use Case

Since my first experience was so painless I thought, why not take this a step further and try porting my current project over to use CosmosDB.  Since this application was a .NET Core project this the migration was a rather straightforward process that took me <1 hour.  I just replaced my current driver with the .NET CosmosDB driver from Nuget (currently in Preview so make sure to check that box) here.

Once the driver was installed there was a bit of coding required to migrate to the CosmosDB method of executing Gremlin traversals.  The changes required were minimal and were easily copied by following their sample project.  I was able to get it to compile and run.  Unfortunately I ran into a few hiccups with my traversals due to some gremlin features/steps that are not yet supported.  The list of currently supported steps is available here.

The two specific issues I ran into were:

  • Recurring.V() in a traversal are not currently supported such asg.V().has(‘titan’, ‘name’, ‘saturn’).as(‘s’).V().has(‘god’, ‘name’, ‘neptune’).as(‘n’).addE(‘father’).from(‘n’).to(‘s’) 

While this was an annoyance it only took a bit of reworking my traversal to get my insert queries working to add vertices and edges.

  • Subgraph steps are not currently supported.  This was a bit of a show stopper for me as my current project relies heavily on the use of subgraphs when retrieving data.  

I contacted the CosmosDB team about these issues and they quickly responded  that both of features are currently under development or on the near term roadmap.  

Using CosmosDB

In addition to the application drivers, you are provided with two additional options to interact with CosmosDB.  

The first is to use the gremlin console to connect to the remote gremlin server to send your traversals, but don’t forget to submit your command with :> as I did at first.  This provides you the ability to run all your gremlin traversals in your standard terminal window.

The second is to use the Data Explorer, which is a visualization tool that is built into the Azure Portal.  It provides a way to visually interact with your data, which I find very helpful when working with graph models.  

It does have a few quirks about it, specifically the default node layout can start out a bit strange.

In addition to that I was not able to figure out how to see edge properties.  With that said it is a really nice tool to help you out when trying to visualize your data.

I haven’t yet tried any of the other 3rd party tools to connect but I suspect/hope that as long as they work with the gremlin console that they would.

Summary

I think it is a really great addition to the graph database ecosystem to have another Gremlin enabled graph database on the market, especially one with strong support for .NET.   I am interested in taking some time to explore their different consistency models (read about them here and here ) as well as looking at the performance of the Graph API.  

As with other current offerings, CosmosDB has a few rough edges.  Given that it was only announced as a preview last week it is a strong initial offering worth taking notice of.   The fact that it is a newly released a globally distributed multi-model datastore supporting 4 different types of data model is a huge accomplishment that the entire CosmosDB team should be proud of.  

I will be watching to see where this goes to next.

Previous Blog Posts

If you would like to read any of my previous blog posts you can find them all at:

Graph Databases

Big Data

DevOps