In my previous blog post I praised the open-source database Orient DB. I’ve received many email from people since asking me questions about Orient DB or telling me that they’d love to try Orient DB out but don’t know where to start.

I though it is time for me to contribute to the Orient DB community and write a short tutorial for exactly this audience. Ideally, I'd like to ensure that if you sit through the videos below, you'd at least have an informed view of whether Orient DB is of interest to you, your project and your organization.

Think of this blog post as the first installment. I may come back with some more advanced tutorials later. This tutorial shows you how to:
  • Download and install Orient DB
  • Play with the Orient DB studio and perform queries on the Grateful Dead database provided with the community edition
  • Create your own database (again using Orient DB studio, perhaps later I’ll provide a tutorial to show how to do this from Node or Java)
  • Create server-side functions
  • Perform graph- and document-based queries over Orient DB
I hope this tutorial gives you a good idea of how Orient DB works and some of its power. I also hope it will help illustrate the reasons for my enthusiasm in the previous post where I argued the case for Orient DB. In this post I’ll focus on how you get started.

Introduction (or, Read Me First!)

I had some of my guys review the videos for this tutorial and they pointed out that I assumed that the viewer/reader knew something about graphs. Although this may be a bad assumption, I don’t want to patronize you.

This is my attempt at giving you a short introduction to the concepts of graphs and the vocabulary I'll use in the tutorial.

Orient DB is not only a graph database. It supports all (or nearly all) the features of a document database and an object-oriented database. Hence, discussing only graph theory may be a bit misleading. However, since I’m using primarily the Graph API of Orient DB in this tutorial, it may be prudent of me to ensure you know something about graphs.
Here is what you need to know.
The above picture is taken from the Wikipedia article on Graph Databases. The illustration shows 3 objects (as circles) that we keep track of. There are also lines (or arrows) that define relationship between the objects. We call an object vertex and a relationship edge.

I do have some issues with the graph. They always show two edges that only differs based on the direction they are read. I would have preferred a single line (or a single edge). However, since it is likely that you would lookup Graph Databases in Wikipedia, I thought it would be better if I showed their example and criticized it (OK, I will go and suggest another drawing for Wikipedia. It is on my to do list. Honestly!)

A graph (for the purpose of this tutorial) consists of:
  • Vertices. A vertex is a cluster of information. The information is typically stored in key-value pairs (aka properties). You may think of it as a map (or I’ve notice that many developers call these maps hash tables, however, I find this misleading as hash is a particular strategy for arranging keys in one of the map implementations).
    One way of seeing a vertex is as an identifiable set of properties, where a property is defined as a key-value pair. Unique to Orient DB, the properties can be nested to form complex data structures. I will not do any nesting in this tutorial. I do, however, plan to write another tutorial where I explore Orient DB as a document database.
  • Edges. An edge is a link between two vertices. Edges may also have properties. Edges have have direction. That is, they start in one vertex and end in another vertex (actually, to be correct, they may also end in the same vertex. Such edge has the cute name ‘buckle’ in graph theory).
That’s it! Really.

Most of you probably have some background in relational databases, so let me give you a comparison between OrientDB and typical relational databases. The comparison will be slightly misleading no matter how I present it. To minimize the confusion, I’ll decided to add a column to discuss how the two technologies differs also.
Graph DB Relational DB Differences
Vertex Row A row in a RDB is a flat set of properties. In Orient DB the vertex may be an arbitrarily complex structure. If you are familiar with document databases, it is basically a document.
Edge Relationship A relationship in a relational database is not a first class citizen. The relationship is made ad-hoc by use of joins on keys. In Orient DB (as in all Graph Databases) the edge (relationship) is a first class citizen. This mean it can have an id, have properties, etc.
You may ask, where is the comparison of schema constructs such as table, column, trigger, store-procedure, etc. I think the tutorial does discuss these constructs, so I’ll delay the discussion to later in this tutorial.

In the second tutorial, I will repeat this short theory session. To help you progress from this incomplete theory session (and for your reading pleasure), I’ve provided a few links below:
I decided to use Orient DB Studio in the videos. It is probably the least likely tool that you’ll use to access the database. Most developers prime exposure to Orient DB is through language bindings or libraries. I was thinking of selecting one of these environments and show how Orient DB is used in there. The problem is that if I picked one of the environments I would risk alienating the ones using other environments. I think no matter what environment you're using, you would at some point would bring up Orient DB studio, but perhaps I’m wrong and I just alienated everyone. I hope not!
If time permits, I’ll come back with separate tutorials for how to use Orient DB from Java/Scala/Python/Node.js/...

Part 1: Download and Install Orient DB

I’ve prepared a simple video that explains how to download and install Orient DB. I’m on a Mac, hence the video shows how to install it on Macs. I do, however, think it should be easy  transpose the steps to Windows or Linux.

At the time I created this video, the latest version of Orient DB was 1.6.2 (things are moving rapidly at OrientDB, and I noticed they already released 1.6.3 before I got a chance to publish the blog). Orient DB releases many versions per year, so there is a risk that at the time you read this tutorial, my instructions may be slightly out of date. All the steps that I followed in the video can be found on the Orient Technologies website and I’m sure it will be kept up to date. Hence, if the version you downloaded is different from 1.6.2, you may be better off following the up-to-date instructions on the Orient DB sites.

A bit of warning: Towards the end I do execute a few queries to ensure that everything works. Don't despair if you don't understand what I'm doing. All I wanted to was ensure that I'd succeeded in installing Orient DB.

 

Part 2: Play Around with the Grateful Dead Graph

Orient DB comes with an example database instance. The provided database contains a graph defining performances of Grateful Dead. The graph is sparse and the information model is quite simple. This is unfortunate as it is hard to illustrate all the power of Orient DB with this graph. Perhaps at a later time I’ll try to find an open source dataset that has more interesting information (suggestions welcomed!).
The Grateful Dead graph can be illustrated with the following UML model:
NewImage
The model above is as truthful as I could make it. I did take exercise some artistic 'enhancements'. This to make the model easier to read. Here are my embellishments:
  • Notice the attribute on the ‘followed_by’ association. Vrious versions of UML have provided different ways to depicting attributes on relationships. Let me explain what I mean by my way of using UML. The example database stores a count the number of times a song has been followed by another song in a concert (I did not create the model, so I’m providing my own interpretation). The weight is a count of the number of times. To illustrate this in UML I have to show properties on associations.
    Conceptually, this is not a problem for a Graph Database because it can store properties on edges. It may, however, awkward to see the mapping to other technologies.
  • I decided to expand the enumerated type specifying the "song_type" . A more common way to model this would be the use of an enum-type. However, I think ‘display-wise’ the expanding the enum is better here as it is only used in one place.
  • Another ‘cheat’ is the introduction of classes. The example database do not actually use custom classes (all the objects are instances of a class called ‘V’; short for  vertex). They do, however, provide a property called ‘type’ that is used for the purpose of typing. I believe the model I created is better at illustrating the original intent of the model. 
I would assume only a percentage of the readers are familiar with Grateful Dead, so let me give you a few sentences about them. Grateful Dead is a cult rock band that started playing in California in the 60’s. I assume the graph is populated from the same data set that produced this  illustration:
 
It is really not important who the Grateful Dead were and whether you like them or not. Just think of them as some rock group that performed a set of songs very many times. Someone cared enough to capture when what songs they performed, what song followed another song, who wrote the song, and who sung the song.

With that introduction, let’s move on to the video where I create a set of queries and navigate around the example database.
Just for your reference, here are some of the queries I used
  • Select all vertices (or object) in the database
    • select * from V
  • Select the vertex with the id #9:8
    • select * from V where @rid=#9:8
    • This could also be written as:
      • select * from #9:8 
  • Select all the artists
    • select * from V where type=‘artist'
  • Select all the songs that have been performed more than 10 times
    • select * from V where type = 'song’ and performances > 10
  • Count all songs
    • select count(*) from V where type='song'
  • Count all artists
    • select count(*) from V where type=‘artist'
  • Find all songs sung by the artist with the id #9:8 (notice, the result will include the artist with id #9:8)
    • traverse in_sung_by from (select * from V where @rid=#9:8)
  • Find only songs sung by the artist with id #9:8 (only songs)
    • select * from (traverse in_sung_by from (select * from V where @rid=#9:8)) where type=‘song’
    • This could also have been written as:
      • select expand(set(in('sung_by'))) from #9:8
  • Find authors of songs sung by the artist with the id #9:8
    • select * from ( traverse out_written_by from (select * from ( traverse in_sung_by from ( select * from V where @rid=#9:8 ) ) where type='song') ) where type=‘artist'
    • Or more effectively:
      • select expand(set(in('sung_by').out('written_by'))) from #9:8 

Creating Our Own Database

In this section, I'll explore how you can create your own database. I'll be using custom classes in this tutorial (unlike the Grateful Dead database that only used the built in 'V' class).
NewImage
The following video shows how we can create a database that uses the above model as its schema:


Below, I've included the statements required to create the database used in the above video.
create class Member extends V
create property Member.name string
alter property Member.name min 3
create property Member.password string
create  property Member.email string
create class Article extends V
create property Article.title string
alter property Article.title min 3
create property Article.title string
create class follows extends E
create class replies extends
create class authors extends

Populating the database

In the previous section we defined a database schema. That means that we’ve prepared Orient DB to store and enforce constraints on objects we’ll insert into this database. It would be possible to store the same information without the schema (in so called schema-free mode). However, since we decided to go down the route of schema-full use, Orient DB can help us enforce the constraints on the data structures.

The video below explores different ways of inserting data into the schema we created in the previous step.

If you don’t want to sit through the video, I’ve included some functions/key script fragments below the video for you to instantiate the database as you please.

Important: For some of the functions that I show in the video to work, you have to add a handler to the Orient DB-server-config.xml. Please add the following XML fragment to the oriented-server-config.xml file (this step is covered in the video):
<handler class="com.orientechnologies.orient.graph.handler.OGraphServerHandler">
   <parameters />
</handler>
Why are you doing the above step you may wonder? It is to ensure that the default handler used in the JavaScript functions is of the graph type.


Some key syntax for inserting data:
create vertex {Vertex Class Name} set ({propertyname=value},…)
create edge {EDGE CLASS NAME} from {OUT_VERTEX_ID} to {IN VERTEX ID}

Part 5: Querying the Database (Again)

Now that we have some data (assuming you finished step 4), let’s see if we can define some interesting queries. If you want to try it out without seeing what I do, here are some suggestions for queries to formulate:
  • Given a member X:
    • Lookup everyone that follows X.
    • Lookup all articles that X posted that was eventually replied to.
    • Lookup everyone that at one time participated in an article that originated with a member X.
  • Find all members that have answered some article posted by someone else.
  • If I changed the content of an article, who would be effected by this (the members that posted articles prior to this article and the members posting articles as relies to this article.


Summary

In this post I’ve tried to give you a head start on Orient DB. Most of my use of Orient DB in this tutorial has been graph-oriented. If time permits, I’ll try to make another tutorial later that expose the power of Orient DB as a document database.

I hope you learned something. Please don’t feel shy about leaving comments. I actually moderate the comments as I’ve had a very large number of spam posts. If you’re comment doesn’t appear immediately, please do not disappear. As soon as I get a chance to make sure the comment is not spam, I’ll allow it and try to answer you.
9

View comments

  1. Petter very nice job, I have been working with OrientDb for a little while now. I made a little custom rest adapter to allow for using isomorphics smartclient framework with Orient. One of my biggest problems now in actually building a database is schema design coming from a relational and going to graph. Not alot of docs beyond simple examples. I am looking at designing a CRM app with Orient and still struggling with graph schema,design like with graph when does link and linkset come into play vs just an edge, but you tutorial has helped. Would love to see more on this subject, Nice job!

    Thanks,
    Dan

    ReplyDelete
  2. Excellent tutorial. thanks for posting.

    ReplyDelete
  3. Hi,
    I need for help!
    Assume I have query : select Mail, expand( both('Friend') ) from User where Name='hoang'
    Query return all field. I only want to get field Name, Mail. How can i do that.
    Thank,

    ReplyDelete
    Replies
    1. Can't you just wrap a simple query around your other query?

      Delete
  4. Really excellent tutorial, thanks for taking the time to put this together! I was able to follow along perfectly until 8:14 of Tutorial video #4, when I got the following error:

    "Error on parsing script at position #0: Error on execution of the script Script: createSomeMembers ------^ sun.org.mozilla.javascript.internal.EcmaError: ReferenceError: "gdb" is not defined. (#11) in at line number 11"

    It seems obvious that the reference to the database is broken, but I cannot find anywhere in the OrientDB docs where a "gdb.save()" command exists, and the complexity of SQL vs graph is a bit confusing when searching in the docs for the right spot.

    But in spite of this current roadblock, your tutorial has been really excellent and is very much appreciated! Thank you!

    ReplyDelete
    Replies
    1. Did you setup the graphical database? I show you how to do that in the first video...

      Delete
    2. same error occured when executing the function..

      Delete
  5. This comment has been removed by the author.

    ReplyDelete
  6. https://groups.google.com/forum/#!topic/orient-database/BOueQD8hMOA

    Any thought on this ?

    ReplyDelete

Today, I want to introduce the word 'dugnad' (pronounced ˈdʉːgnɑd]) to my friends and colleagues.  

Dugnad is a word from Old Norse and it is wrongly translated as 'volunteer work' in the English dictionary. Dugnad has a much richer meaning and tradition in Norway.

1

Introduction

I just upgraded to a new Dell Precision Ubuntu-Based laptop from years of using a MacBook. Although for the mosts part, the transition was smooth, one thing frustrated me beyond anything else.

Introduction

How do you translate traditional API’s into REST? Many applications face this problem. We may already have an implementation that exposes a typical functional API. In the previous post, I discussed some of the essential properties of a REST API.

1

Introduction

This is the first blog in a series of blogs on REST. In these blogs I’ll try to demystify the REST. I’ll discuss what REST is and some suggestions for how you can design good REST APIs.

First, let me acknowledge some of the sources of these posts.

2

Introduction

Waterfall, agile, iterative, whatever your process, your chance of success can be measured by the number solid feedback loops between the developers and the domain experts.

Introduction

Over the last 10 years I’ve seen a change in the way the top engineers/architects design, architect and deploy software.

Introduction

Over the last few years I've been experiencing conflicts between pure agile development and the need for budgeting. At SciSpike we do everything agile and it works for us. We have a good idea of the velocity of our teams and can often be very accurate in our estimates.

In my previous blog post I praised the open-source database Orient DB. I’ve received many email from people since asking me questions about Orient DB or telling me that they’d love to try Orient DB out but don’t know where to start.

9

Every now and then you come across open source projects that just amazes you. OrientDB is one of these projects.

I’ve always assumed that I’d have to use a polyglot persistence model in complex applications.

6

Introduction

In preparation for a course I'm teaching over the next couple of weeks on OO and Scrum, I spent the weekend reading up on the new books on Agile and Scrum to see if there are points that I need to add to my courses.

Introduction

My last post triggered quite a few questions. I had no idea so many people read my blog. The emails mostly asked "Why Node?" or "What's good about Node?".

In my previous blog I expressed my frustration learning Node and how I overcame my issues.

4

Introduction

This blog-post is perhaps misnamed. What I want to convey in this blog how you have to change the way you work when you go from a strongly typed language to one of the dynamic languages.

I pride myself on being language agnostic.

Introduction

I just finished up some work for one of my clients building a metamodel and a DSL for services. I built the models and the DSL using eclipse (EMF, XText and GMF).

The issue...

The enabling of the browser as a first-class container when building software solutions has led to a migration of business logic and complexity from the server to the browser.

Overview

Over the last 8 months I’ve been swamped working on a very interesting project. Between this project and a set of other commitments, I’ve literally spent 16h a day working. The blog has been neglected, but I plan to take up writing again.

Introduction

Today a friend asked me if you could execute an action in Eclipse on a breakpoint. I thought for sure that that would work, but decided I had to check before I said yes. Well… there is no such feature in Eclipse, but here is a cool work-around.

2

Description

Over the last 25 years or so, I’ve been often tasked to help optimize the performance of various solutions. I keep careful records of all the work I do and I recently decided it would be a good idea to collect all my thoughts/experiences around performance.

Overview

Over the last years I’ve built a dozen or so large scale Domain Specific Languages (DSL) and almost without exceptions, they’ve been hard sells. Usually, each of these received initial resistance in the organizations they’ve been introduced.

I've been eyeing the new iPad, but when I heard about the delay (and I have to admit also my current trip to Asia and the thought of not having something to play with on the plane), I decided to try out the Motorola Xoom instead.

I've been using the old iPad and briefly tested out iPad2.

This post is just a quick update on the status on alternatives to the PivotalTracker.

I've been playing a bit with Retrospectiva. It seems to be a good alternative to PivotalTracker.

At SciSpike, we'll setup a free service for our customers to move their projects to Retrospectiva.

1

Ouch... I just learned that PivotalTracker will start charging for their online project tracking tool.

I have mixed feelings. On one hand, I recognize their right to make money on what I consider to be an excellent tool, but on the other hand, I feel tricked...

This blog entry is really just a support entry for a YouTube movie I created today.

1

Introduction

For more than 2 years now I’ve been using Adobe’s tools for creating e-learning material. The two tools I’ve been using the most are:

Adobe Captivate Adobe Presenter I’ve been stuck on an older version of Captivate for a while (Captivate 3).

3

I’ve been teaching a few classes on Web Services lately and my students have complained that there are no good description on the net for how to get started building web services on Eclipse. I’m sure there are some, but I was encouraged to create one myself, so here it is.

44

What’s the problem?

When loading a dynamic site using GWT, Dojo, YUI or any other Ajax based approach, it is difficult (and sometimes impossible) for the search engine to obtain your content for indexing. The initial load of the site may only contain a simple HTML element (often a div or span tag).

1

This week (and most of last week) I’ve been developing a GWT application for our new venture SciSpike (www.scispike.com).

The team at SciSpike are all very experienced architects/developers, so the selection of technology to use was quite difficult. Pretty much every options were available to us.

This week I’m documenting architecture for a system where my company has been involved in the design and development of the core framework. I was not originally involved in this project, but have later contributed to the framework.

This week I deployed a DSL with its corresponding code generator for a client. The DSL and corresponding editor was written in Xtext and the code generator in Xpand/Xtend.

I created a e-learning module for how to create plug-in extensions for Eclipse. It is a sub-module from our course how to write eclipse plug-ins that I’ve not yet integrated. Many students have asked for this, so I though I’d make it available for them here.

3

Introduction

This post is going to be very short as there comparison here is rather trivial. Both tools can be integrated with any of the transformation frameworks available in Eclipse.

EMFText does not include transformation nor suggest any specific  transformer.

4

Introduction

In my previous post, I compared Xtext and EMFText. I pointed out a few differences that may help you select one tool over the other. In the next posts I'll go into each of the points in more details.

5

I just finished a project building a textual DSL and a code generator for a client and after some intensive investigation, Xtext seemed the best alternative.

Today, I learned that EMFText just released a new version (1.2), and it looks very promising.

4

I thought I'd share another little trick in Xtext. This trick is actually documented well at various places on the web, but to be able to do what I wanted to do, I had to pull information from many different locations.

1

I ran into a problem in Xtext some weeks ago. I wanted to generate a serialization id for my generated Java classes.

4

Over the last weeks I've been polishing a language for a client. As I mentioned before on this blog, I decided to use Xtext(www.eclipse.org/Xtext/) to build the toolset for the language.

1

I created a new language (again) for one of my clients this week. I love doing new languages and I’ve been working 20h a day on this one. I’ve been doing my last languages using the Microsoft tools, so I was afraid I would be a bit rusty on the open-source tools.

2

This week I’m teaching a course on Advanced C++ in San Diego. I don’t use C++ on a daily basis anymore, but I still feel pretty confident… after all, I’ve been using the language since late 80’s.

I do sometime get stuck on syntax. It doesn’t bother me, as I know I can figure it out pretty quick.

1

I was teaching a class that included a chapter on testing this week. I had added a section on design by contract and decided to look into what is available in the .NET space. I knew about Eiffel for .NET and some time ago looked into sharp#.

1

I'd though I'd post something on the technology I've been most involved with lately... That is Ruby on Rails.

I think I've developed software on the most common web-platform. As of now, I think Ruby on Rails is the fastest (most agile) development platform.

Today I started to write the design chapter for the book. I was trying to collect my thoughts on object-oriented design. The more I thought about it, the more hopeless the task seemed. I think it is impossible to define a single template that would apply to any software project.

I'm currently working on a book together with Richard Mitchell and Vladimir Bacvanski. The topic of the book is how to model software... actually, I take that back...

At InferData, we always build domain models before progressing to system modeling. I've noticed that this is becoming more and more a trend in the industry although, what the various methodologies/methodologists define to be a domain model seems to vary.

I just sat in on a presentation of SysML (http://www.sysml.org). Although the presenter did a good job defending the spec, I found the spec itself quite miserable.

This week I’ve been working on a J2EE application for a client.

Almost 10 years ago, I was doing some work for a client when I found myself with some extra time on my hand. I had designed and implemented a real-time broker and I was waiting for the other development teams to finish up their work so that we could integrate.

2
About Me
About Me
Blog Archive
Subscribe
Subscribe
Links
Subscribe
Subscribe
Loading