Page tree

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.

About Johny Chen Jy

Image Modified

Johny Chen Cy is a student at Federal University of Santa Catarina, Florianopolis-SC in Brazil.

Johny's university course covers a broad range of subjects including Object Oriented Programming, Data Structures, Database (Relational and up to newSQL), as well as some marketing and business modules.

He  is developing his undergraduate thesis with the supervision of Artur Baruchi and Hugo Watanuki from the LexisNexis Risk Solutions Brazil office.

Poster Abstract

In the big data processing and querying world, in order to provision more accurate and up to date information to end users, a relatively common technological approach is to combine   an   Online   Analytical   Processing   (OLAP)   solution   with   an  Online Transactional   Processing   (OLTP)   solution.   Whereas   the   OLAP   component   is responsible for performing most of the big data processing based on read-only data extracted from different sources, the OLTP component can be used to provide write access to the result data and complement the query results with real time data.
Similarly, in big data parallel processing and querying environments supported by the HPCC (High Performance Computing Cluster) Systems platform, Roxie query results based on data that was first extracted, transformed, and loaded by the Thor cluster, can be complemented with additional data coming from an external online database. This external  component,  frequently  referred  to  as  Deltabase,  corresponds  to  a  OLTP database that can be used to provide real time data and complement query results with data that eventually has not yet been processed by Thor. Despite the obvious benefit of providing more accurate and up to date information to end users; the Deltabase, in case of its failure or because of its own OLTP nature not optimized for data reading, can become a bottleneck in terms of performance and availability of the entire querying system. In such contexts, the utilization of a caching solution can become attractive.
Recently, NoSQL databases have been leveraged as a caching mechanism in hybrid OLAP/OLTP solutions by avoiding that queries recently executed are once again processed by the OLTP system, which are usually slower for providing read access to stored data in comparison to NoSQL databases. By providing an additional optimized component  for  real  time  data  access  and  an  additional  layer  of  resilience  against failures,  the  inclusion  of  a  NoSQL  database  as  a  caching  solution  can  potentially increase the performance and availability of hybrid big data processing and querying solutions.
The overall objective of this in progress study is to explore the usage of a NoSQL database  as  a  potential  caching  solution  to  the  Deltabase  component of the HPCC Systems platform. To this end, an experimental approach will be leveraged. The alternative database architectures and caching algorithms will be evaluated and compared, both from a Roxie query response time and from an overall system availability.