Lucas Varella is studying for a Bachelor of Information Systems at the Federal University of Santa Catarina (UFSC) in Brazil.
His university course covers a broad range of subjects including Object Oriented Programming, Data Structures, Database (Relational and up to newSQL), as well as some marketing and business modules.
Lucas is working with Hugo Watanuki (Senior Technical Support Engineer, LexisNexis Risk Solutions Group) and is completing a year long internship in the Brazil office.
The advent of containers and their associated orchestration tools in recent years have fundamentally shifted how computational workloads are built and managed in distributed computing environments. Whereas containers offer a consistent lightweight runtime environment through OS-level virtualization, as well as low overhead to maintain and scale applications with high efficiency; the management of containers is controlled via container orchestrators. Container orchestration tools, such as Kubernetes, have a mechanism to launch and manage containers as clusters or pods, providing automation for running service containers. Orchestration, therefore, provides a flexible way of scaling services running inside a container that require load balancing, fault tolerance, and horizontal scaling.
However, not all distributed computing environments can be easily ported to the container orchestration paradigm. This migration can constitute a bigger challenge for data intensive supercomputing technologies such as the HPCC (High Performance Computing Cluster) Systems platform. This is due to the batch queuing nature of most of these platforms that possess strict assumptions around data storage persistence and host-specific shared resources, such as: each node must securely maintain its own set of data and will be reading and writing to a single shared file system. Especially for the HPCC Systems platform, which historically relies on data locality, the migration towards container orchestration paradigm can represent a particular challenge.
Despite this challenging scenario, and given the push toward containerization trends, advances have been made at some extent to make data intensive platforms such as HPCC Systems available in containerized environments running in public clouds. How this new platform architecture behaves from a functionality and performance standpoint across different public cloud providers; and in comparison to the original bare-metal architecture, is still a question whose answer is mostly unknown.
The overall objective of this in progress study is to explore the usage of the first HPCC Systems version with native support for containerization. To this end, a cross provider experiment will be executed to compare overall HPCC Systems performance and functionality among Azure Kubernetes Service (AKS), Amazon´s Elastic Kubernetes Service (EKS) and bare metal. A benchmark test suite will be utilized to measure data transformation performance. It is expected that this study will contribute to a better understanding of how the recent released HPCC Systems version with native support for containerization behaves in terms of performance and functionality, as well as provide insights into future developments.
In this Video Recording, Lucas provides a tour and explanation of his poster content.
Click on the poster for a larger image.