This text wants to highlight one major difference between old RDBMS systems and new noSQL databases. We will talk about memory usage and memory allocation. This difference is also based on historical reasons. RDBMS where created to function with rather small amount of memory and therefore do a lot of I/O disk operations. NoSQL databases need really a big amount of memory because they hold all necessary data in memory to speed up responses.

I believe all is quite obvious – RDBMS where created in days when servers have only several gigabytes of memory a dedicated database servers where a real luxury. Therefore RDBMS are usually really good in using limited HW sources and therefore on small HW they are always better then modern noSQL databases. So if you try to test noSQL database on the same HW you use for RDBMS this is really a waste of time.

NoSQL databases require a lot of memory and they give you really good responses because they load everything into memory and try to avoid disk operations. If on the other hand you will try to test RDBMS on such memory-rich HW you will be disappointed – many RDBMS are not able to use big amounts of memory – or only in the most recent version and even this with some problems and changes in settings etc.

RDBMS:

  1. PostgreSQL:
    • does not perform directly I/O disk operations and depends on Linux disk cache,
    • released versions cannot use more then 1 CPU/core for 1 query/connection,
    • for memory has basically 2 important memory parameters:
      • work_mem – amount of memory used for every running connection / also in joined queries for every table involved in join – it means that complicated query can allocate a huge amount of memory for only one connection and data are not visible for other connections
      • shared_buffers – amount of memory dedicated for cashed data – PG versions older then 9.4 have problems with big shared_buffers and cannot use buffersw bigger then 8GB, PG 9.4 uses new mechanism

        overview:

        • PostgreSQL is not able to utilize big memory,
        • depends on Linux disk cache,
        • has problems with double occurency of data blocks in Linux cache and PG cache,
        • cannot utilize more CPU/ cores for 1 query/ connection,
        • horizontal sharding is possible with partitioning
  2. mySQL
  3. Oracle

noSQL databases:

  1. MongoDB:
    • heavily depends on keeping data in memory – uses “memory mapped files”

      overview:

      • need machine with big amout of memory, otherwise performs very poorly
  2. Cassandra