Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


In the big data processing and querying world, in order to provision more accurate and up to date information to end users, a relatively common technological approach is to combine   an   Online   Analytical   Processing   (OLAP)   solution   with   an  Online Transactional   Processing   (OLTP)   solution.   Whereas   the   OLAP   component   is responsible for performing most of the big data processing based on read-only data extracted from different sources, the OLTP component can be used to provide write access to the result data and complement the query results with real time data.
Similarly, in big data parallel processing and querying environments supported by the HPCC (High Performance Computing Cluster) Systems platform, Roxie ROXIE query results based on data that was first extracted, transformed, and loaded by the Thor cluster, can be complemented with additional data coming from an external online database. This external  component,  frequently  referred  to  as  Deltabase,  corresponds  to  a  OLTP database that can be used to provide real time data and complement query results with data that eventually has not yet been processed by Thor. Despite the obvious benefit of providing more accurate and up to date information to end users; the Deltabase, in case of its failure or because of its own OLTP nature not optimized for data reading, can become a bottleneck in terms of performance and availability of the entire querying system. In such contexts, the utilization of a caching solution can become attractive.
Recently, NoSQL databases have been leveraged as a caching mechanism in hybrid OLAP/OLTP solutions by avoiding that queries recently executed are once again processed by the OLTP system, which are usually slower for providing read access to stored data in comparison to NoSQL databases. By providing an additional optimized component  for  real  time  data  access  and  an  additional  layer  of  resilience  against failures,  the  inclusion  of  a  NoSQL  database  as  a  caching  solution  can  potentially increase the performance and availability of hybrid big data processing and querying solutions.
The overall objective of this in progress study is to explore the usage of a NoSQL database  as  a  potential  caching  solution  to  the  Deltabase  component of the HPCC Systems platform. To this end, an experimental approach will be leveraged. The alternative database architectures and caching algorithms will be evaluated and compared, both from a Roxie ROXIE query response time and from an overall system availability.