As I am sitting here working on my presentation for GraphDay Texas 2018 (abstract here) I got to thinking about what Lynn Bender said in his opening to GraphDay Texas 2017 when he declared that 2017 was “The Year of the Graph”. Taking a look back on all the happenings in the graph world over the past year all I can say was that he certainly was right about that. As a rule I tend to hate all those “Best of 2017” type articles but in this case I decided to take some time to reflect on the amount of momentum that the graph ecosystem has gained over the last year.
Ever Evolving Landscape
JanusGraph
Last January, JanusGraph was officially announced as a a fork of the popular, but no longer maintained distributed graph database Titan. This announcement was met with great excitement from the community as there was now a viable path forward for all the Titan users that were left out in the cold. In the year since it was announced JanusGraph has had two major releases including several major updates and additions to the supporting backend storage and indexing engines. As a nod to the maturity of JanusGraph, IBM announced that they were ending support for their IBM Graph product in favor of Compose for JanusGraph.
CosmosDB and Amazon Neptune
In May, at Build 2017 Microsoft got into the graph Database as a Service (gDaaS) game with the announcement of Azure CosmosDB. CosmosDB is a globally distributed multi-model data service that provides support for multiple different data query APIs. Currently there is support for MongoDB, Table, SQL, Cassandra and Graph using Tinkerpop’s Gremlin query language.
At AWS re:Invent 2017 Amazon announced a limited preview of their gDaaS platform called Amazon Neptune. Amazon Neptune is a full managed graph database that allows you to use it as either a RDF triple store using SPARQL or as a Property Model datastore using Tinkerpop’s Gremlin query language.
Newcomers
There were a variety of newcomers to the field this year but I wanted to mention two specifically because of both how they are the same and different. The first vendor I want to talk about is DGraph has been around since 2016 but had their 1.0 release this year.
The second vendor is Memgraph which had an preview release last year and was named as one of London TechStars in 2016.
A few of the interesting similarities I see between these two vendors are:
- Both are targeting real-time transactional workloads
- Both are starting from the ground up by designing distributed property model data stores.
- Unlike many of other major vendors in the space (Titan, DSE Graph, Neo4j, JanusGraph) they are both building out their engines using native code instead of JVM languages (DGraph is using Go, Memgraph is using C++).
While building real time distributed property model graphs using non-JVM languages is not a novel idea (something similar has been done by TigerGraph) and two instances is certainly not enough to say that there is a trend I do think that this is strong evidence of the market need in this area.
While their are several interesting similarities there are also some distinct differences. The first an probably most distinct is their approach to established query languages. DGraph allows communication either by using their gRPC client (which is a novel concept) or by using there custom query language (based on Facebook’s GraphQL) GraphQL+-. Memgraph has taken a different approach and made their system compatible with OpenCypher. I have had the chance to get hands on with Memgraph (I hope to write this up when I get free time) and I found that their compatibility with OpenCypher allowed me to easily and quickly integrate with the growing ecosystem of tools that support Neo4j and Cypher.
In my mind this difference is very telling of the raw nature of property model graph ecosystem. The lack on one truly dominate standard (e.g. SQL) is defiantly an area where I wish we were able to come to an agreement on. While both OpenCypher and Tinkerpop Gremlin are open and widely adopted standards their are many vendors that have felt the need to create their own languages to address some of the shortcomings in both.
Other Major Events
The items listed above were certainly not the only ones to happen in the graph ecosystem this year. Some of the other events I took note of were:
- CalladiusCloud’s purchase of well known graph vendor OrientDB is certainly another noteworthy event.
- Introduction of Gremlin Language Variants to the Apache Tinkerpop allowing programming languages to fluently query the graph was a tremendous leap forward.
- Neo4j’s announcement of their transition from being just a graph database to being a graph platform including support for Cypher on Apache Spark as well as better tooling and integration in enterprise environments.
- Additionally Neo4j has introduced a new starter kit platform for modern full stack application development in the form of their GRAND stack.
- Increased support for additional graph databases from vendors such as Linkurious, Cambridge Intelligence’s Keylines, and Tom Sawyer makes visualizing graph data easier than ever
- Kelvin Lawrence’s release of Practical Graph – An Apache Tinkerpop Tutorial which the most complete and most informative explanation of the Gremlin query language. This along with Doan DuyHai’s blog series The Gremlin Compendium have become one of my go to resources.
- The open source community stepping up to provide some of the much needed missing tools for working with graph databases. This includes:
- VSCode plugin for CosmosDB which supports running and visualizing Gremlin queries.
- Graph Explorer – An OSS Gremlin compatible visualization tool built on D3.js
Looking Forward
Whew, that was a lot to get through but I am very excited to see what this coming year brings. In particular I am keeping my eye on the evolution of Machine Learning integrations into Graph Databases. In the last year the ecosystem has seemed to mature enough to move beyond the “What is a Graph Database?” stage on to a more productive “How can I use this to help me” phase. Both Stardog and Neo4j released integrated machine learning libraries last year and I am expecting (or maybe just hoping) that others follow suit this year. With all that being said I am looking forward to all the new and interesting ideas that come about from DataDay Texas 2018. If I missed anything you think is important then please feel free to leave them in the comments below.