Three months ago, I was talking with my friend about the COVID-19 virus.

COVID-19 has taken the whole world under its influence in a very short time. So how is it that the planet we live on can succumb to a virus that spread from only one person in such a short amount of time?

Well, if I told you that you can reach everyone in the world through a maximum of six people by following the path of your friend’s friend, would you better understand how the virus spreads so quickly?

Everything is connected to each other, including the universe itself.

Welcome to the graph world.


The Six Degrees of Separation Theory

Have you ever heard of the Six Degrees of Separation theory?

According to the theory of six degrees of separation, everybody on the planet is on average six or fewer social connections away from each other. This means that if we follow a chain of “a friend of a friend,” we can reach anyone in a maximum of six steps.

This theory was put to the test by Stanley Milgram in his small-world experiment in 1967, where the goal was to send a letter from Kansas to Boston with the chain of “a friend of a friend.” As expected, the letter reached its destination after five people. Later, this theory was popularized by the 1993 film Six Degrees of Separation, starring Will Smith and Stockard Channing.

Today, with the advent of social media, this number has been reduced from an average of six people to 2.9 to 4.2 people in the chain. This data is backed by research done by Facebook and other big social media platforms.


The Math Behind It

Let’s put this story into mathematical terms.

Let’s say you want to meet Will Smith and you want to find out who you can reach him through. Will Smith follows 200 people. If there is someone in your followers among those 200 people — you can reach Will Smith through that person.

Otherwise, you need to look at the accounts of those 200 people and all the accounts they follow. Assuming each profile follows an average of 200 accounts, the number of accounts to check in the second round will be 200 x 200 = 40,000 people.

If we continue like this:

  • Round 3: 40,000 x 200 = 8 million people
  • Round 4: 8,000,000 x 200 = 1.6 billion people

You’ve already surpassed the Instagram population by the fourth round.

Let’s say you use Instagram so fast that it takes only one second to look at each profile. It would take approximately 50 years to look at the 1.6 billion profiles in the fourth round.


Building Pathica

That night, I searched for a website or app where I could test the six degrees of separation theory. I found a few:

The Oracle of Bacon

It basically links any actor to any other according to common movies they made together.

Erdos Number

The Erdos number describes the “collaborative distance” between mathematician Paul Erdos and another mathematician, as measured by authorship of mathematical papers.

With the exception of those two, I could not find any live examples related to this subject. That was the night I started coding Pathica.

Pathica is an app built upon this revolutionary theory that — for the first time ever — allows you to put this amazing theory to the test.

Pathica

The logic of the application is very simple. You search for the account you want to meet, or choose one of the ready-made lists on the homepage. When you press the connect button, Pathica finds the people who are between you and that person and shows them to you. In the example above, you can see that there are only three people between Will Smith and myself.

The world is really small, isn’t it?


Why Neo4j?

The process that takes 50 years for one person to analyze takes about 0.03 seconds in Pathica — yes, 1/30 of a second. It is almost impossible to do these operations in relational or NoSQL databases in such a short time. That’s why I decided to use Neo4j as a graph database.

There are only two data types in graph databases — one is a node and the other is the relation. Nodes are connected to each other through relations.

In Pathica, nodes are “people” and relations are “following.” The current scale of the database:

MATCH (n:User) RETURN count(n) as total_users
total_users => 373,809,936

MATCH ()-[r:follows]->() RETURN count(r) as total_relations
total_relations => 2,199,487,555

373 million users and around 2.2 billion relations. When I query the shortest connection path between two people in this database, it takes only around 0.03 seconds. That’s why I love Neo4j so much.

They also have a great startup support program. This is what I received after I applied:

“Hello Isa, We’d like to welcome you to the Neo4j Startup Program. Your application is approved and you now have access to the Enterprise Edition of the world’s leading graph database.”


Closing Thoughts

As I finish this article, I would like to thank the Neo4j family for making my dream come true in proving the “six degrees of separation” theory, which has been talked about, experimented with, filmed, and contested on talk show programs for more than 50 years.

In my next article, I plan to address the problems and some technical issues I encountered while dealing with such big data.

Pathica: https://www.pathica.com

AppStore: https://apps.apple.com/us/app/pathica/id1564780182