If we need to optimize some database or to find the best solution for some use case with a really huge amount of data then we need to think much more in categories like “memory usage”, “I/O rates depending on data format” etc. Buzz-words like “noSQL is sexy” etc. cannot help us.

Basically the most important question is how quick responses we need to provide to customer. But not based on “what is nice” but “what is really necessary”. Some applications need real-time responses but others are perfectly OK with reports which response time is in minutes even dozens of minutes.

As I mentioned in my text about memory usage, basic difference between “old fashion” RDBMS and “new and progresive” noSQL databases is in memory usage. Resp. dependency on high amounts of memory.

But this is not all – also data format used for storing data is very important. If database uses compression then I/O operations can be significantly quicker then with plain text format. Compressed data will also occupy less cash memory etc. This is nice theory but in reality all depends on data we have – how good they can be compressed. If we can use some column based compression etc.

Other question is more low-level – how big data block database uses and how big data blocks uses your operation system.

Let’s make some summary:

RDBMS:

  1. PostgreSQL:
    • separate data file for every table
    • 8KB data block (can be changed if you compile PG from source codes with different setting)
    • does not compress data
  2. mySQL with InnoDB engine:
    • can use several different sizes of data block
    • can compress data

noSQL data:

  1. MongoDB:
  2. Cassandra: