Sigmund Freud, the Austrian neurologist, most famous for loving his mother very much (J), has also identified three revolutions of the mankind:

  • The first was the Copernican revolution, when humanity understood, via astronomical observations, that Earth is not the center of the Solar System and we are not the Centre of the Universe.
  • The second was the Darwinian revolution, when humanity realized that we are descendants of other animal species, inferior to us, but not very different in many aspects. I should mention too, the Creationist anti-revolution and the alien origin of mankind myth.
  • The third was the Freudian revolution, when humanity discovered via Freud and his disciples, the unconscious motivation, and the damning realization that our minds are not transparent to us.

The philosopher Luciano Floridi came up with the concept of a new revolution the information-digital one and he claims that the mankind is at the beginning of this forth revolution. We are emerged more and more into the infosphere (the environment of information and communication surrounding us).

For the young generations, born in the digital era, the speed of information and communication evolution seems normal. And it is not only mobile phone, tablets and Virtual Reality (VR) it is also smart buildings and smart cars and new educational opportunities like MOOC (Massive Open Online Course) and social media platforms and many more.

The older generations, born “non-digital” are amazed and sometimes overwhelmed by this digital revolution, the transition from a non-computer era, no Internet, no mobile phones (to mention just a few) to today’s digital wonders is not an easy one. We, the older generation used to have printed books and we used to read novels and poetry, we used to enjoy theater plays and jazz concerts, we had real friends not virtual ones, we used to communicate directly with other human beings or write a letter, on a piece of paper!

The infosphere surrounding us is growing at an amazing speed and it is becoming more complex every moment and it is impossible to predict the long term effects of the infosphere on mankind, on human social behavior, on our society. Paradoxically, the social media platforms where you have thousands of virtual friends, where you can share information, and memories and photo albums and music, are drifting us away from direct human relationships, these social platforms are becoming more and more anti-social.

Do not get me wrong, I truly believe that mankind is engaged on a great journey toward a fantastic future, where new technologies will make our lives better, easier and we will have more time to enjoy what we like and enjoy our hobbies.

My concern is about the effects of the fourth revolution on the discoveries of the third one, the unconscious mind. Surrounded by a super-complex infosphere, being bombarded every second with torrents of digital information how is our unconscious mind going to react? Doctor Jekyll or mister Hyde, who is going to rule us in the end?



But the scary fact will be when your laptop will tell you: “Cogito ergo sum!” And that will be the next revolution! Welcome to the Machine!

Contact me at:

BIG DATA in small words



BIG DATA in small words

Data digital flow


I have to admit that for me, the name Big Data sounds somehow childish. It is like you, a very intelligent and highly educated IT consultant were asking your six years old son: ‘Hey Bill, daddy is working with lots of unstructured, huge data sets and we need a name for it.’ Bill: ‘Aaaaaa…… Big Data?’ J

A simple definition for Big Data is: Very large sets of unstructured data, with sizes beyond the ability of commonly used program/software tools to manage, capture and process the data in a tolerable time frame in order to enable enhanced decision making, discovery and process optimization. The size of this data sets is constantly increasing, from a few terabytes at the beginning of this millennia, to many petabytes today and many exabytes tomorrow. [petabyte (PB) = 1015bytes,

exabyte (EB) = 1018bytes].

Gartner Inc., in 2001 (then META Group) has defined the 3Vs of Big Data (volume, velocity and variety) adding the forth V (veracity) later:

  • Volume – the amount of data
  • Velocity – in and out speed of data
  • Variety – the range of data types and sources
  • Veracity – the quality of the data



The next step, after acknowledging the inability of conventional software to process the Big Data, was to develop software/tools able to solve this problem. Seisint Inc. has developed a C++ based distributed file-sharing framework for data storage and query, followed in later years by MapReduce and Hadoop with more advanced and better approach to the Big Data processing.

In order to setup a Big Data processing environment one will need:

  • A serious number of host machines (nodes) organized in a special cluster. The nodes can be partitioned into racks.
  • A highly performant storage array of reasonable size
  • A software framework with three main components:
  1. The framework providing the computational resources (CPU, memory, etc.) needed for the applications execution. Hadoop is using YARN Infrastructure (Yet Another Resource Negotiator) for this task.
  2. The framework providing permanent, reliable and distributed storage. Hadoop is using the HDFS Federation (Hadoop Distributed File System), Amazon is using S3 (Simple Storage Solution).
  3. The MapReduce framework which is the software layer implementing the MapReduce paradigm. In layman’s terms, the MapReduce was designed to take big data and use parallel distributed computing to turn big data into regular-sized data, by mapping the data and reducing the data.

For more details about the MapReduce paradigm read the article:


The evolution of Big Data processing ecosystems has triggered the apparition of other non-conventional technics like the NoSQL technologies. A NoSQL “non SQL” or “non-relational” database provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases. (Wikipedia).

Another interesting development was the detachment of Apache Spark from being a component of Hadoop to a fast and general engine for large-scale data processing. Apache Spark can run in standalone cluster mode, on EC2, on Hadoop YARN, or on Apache Mesos.

With the very fast evolution of the Internet of Things (IoT), the variety V of the 3Vs of the Big Data is amazing. The sources can be any smart device, smart cars, smart cities, satellites, traffic cameras, surveillance cameras, ATMs, etc., the data is collected in every know format and in several new formats every day and that points to the real challenge with Big Data which is the first V (volume) and the growth rate is incredible!

I am, generally speaking an optimist and I believe that the future will bring us fantastic ways for processing the Super-Big Data of the future which can only be described that is “as big as China”! J

For details about this topic, contact me at: