This page describes some hardware and software requirements or recommendations in order to run the HPCC. Essentially the HPCC system is designed to run on commodity hardware, and would probably work well on almost any hardware. To really take advantage of the power of an HPCC system, you should deploy your HPCC system on more modern advanced hardware.
Hardware and software technology are constantly changing and improving, therefore the latest most up to date requirements and recommendation are available on the HPCC Systems Portal. The System requirements page describes in detail the latest platform requirements.
The network switch is a significant component of the HPCC System.
Sufficient number of ports to allow all nodes to be connected directly to it;
IGMP v.2 support
IGMP snooping support
Ideally your HPCC system will perform better when each node is connected directly into a single switch. You should be able to provide a port for each node on a single switch to optimize system performance. Your switch size should correspond to the size of your system. You would want to ensure that the switch you use has enough capacity for each node to be plugged into it's own port.
Low latency (under 35usec)
Layer 3 switching
Managed and monitored (SNMP is a plus)
Port channel (port bundling) support
Generally, higher-end, higher throughput switches are also going to provide better performance. For larger systems, a high-capacity managed switch that can be configured and tuned for HPCC efficiency is the best choice.
A load balancer distributes network traffic across a number of servers. Each Roxie Node is capable of receiving requests and returning results. Therefore, a load balancer distributes the load in an efficient manner to get the best performance and avoid a potential bottleneck.
Gigabit Ethernet ports: 4
Balancing Strategy: Flexible (F5 iRules or equivalent)
Ability to provide cyclic load rotation (not load balancing).
Ability to forward SOAP/HTTP traffic
Ability to provide triangulation/n-path routing (traffic incoming through the load balancer to the node, replies sent out the via the switch).
Ability to treat a cluster of nodes as a single entity (for load balancing clusters not nodes)
Ability to stack or tier the load balancers for multiple levels if not.
An HPCC System can run as a single node system or a multi node system.
These hardware recommendations are intended for a multi-node production system. A test system can use less stringent specifications. Also, while it is easier to manage a system where all nodes are identical, this is not required. However, it is important to note that your system will only run as fast as its slowest node.
Pentium 4 or newer CPU
1GB RAM per slave
(Note: If you configure more than 1 slave per node, memory is shared. For example, if you want 2 slaves per node with each having 4 GB of memory, the server would need 8 GB total.)
One Hard Drive (with sufficient free space to handle the size of the data you plan to process) or Network Attached Storage.
1 GigE network interface
Dual Core i7 CPU (or better)
4 GB RAM (or more) per slave
1 GigE network interface
PXE boot support in BIOS
PXE boot support is recommended so you can manage OS, packages, and other settings when you have a large system
Optionally IPMI and KVM over IP support
For Roxie nodes:
Two 10K RPM (or faster) SAS Hard Drives
Typically, drive speed is the priority for Roxie nodes
For Thor nodes:
Two 7200K RPM (or faster) SATA Hard Drives (Thor)
Optionally 3 or more hard drives can be configured in a RAID 5 container for increased performance and availability
Typically, drive capacity is the priority for Thor nodes
All nodes must have the identical operating systems. We recommend all nodes have identical BIOS settings, and packages installed. This significantly reduces variables when troubleshooting. It is easier to manage a system where all nodes are identical, but this is not required.
Binary installation packages are available for many Linux Operating systems. HPCC System platform requirements are readily available on the HPCC Portal.
Installing HPCC on your system depends on having required component packages installed on the system. The required dependencies can vary depending on your platform. In some cases the dependencies are included in the installation packages. In other instances the installation may fail, and the package management utility will prompt you for the required packages. Installation of these packages can vary depending on your platform. For details of the specific installation commands for obtaining and installing these packages, see the commands specific to your Operating System.
Note:For CentOS installations, the Fedora EPEL repository is required.
The HPCC components use ssh keys to authenticate each other. This is required for communication between nodes. A script to generate keys has been provided .You should run that script and distribute the public and private keys to all nodes after you have installed the packages on all nodes, but before you configure a multi-node HPCC.
As root (or sudo as shown below), generate a new key using this command:
Distribute the keys to all nodes. From the /home/hpcc/.ssh directory, copy these three files to the same directory (/home/hpcc/.ssh) on each node:
Make sure that files retain permissions when they are distributed. These keys need to be owned by the user "hpcc".
Running the HPCC platform requires communication from your user workstation with a browser to the HPCC. You will use it to access ECL Watch—a Web-based interface to your HPCC system. ECL Watch enables you to examine and manage many aspects of the HPCC and allows you to see information about jobs you run, data files, and system metrics.
Internet Explorer® 9 (or later)
Firefox™ 3.0 (or later.)
Google Chrome 10 (or later)
Install the ECL IDE
The ECL IDE (Integrated Development Environment) is the tool used to create queries into your data and ECL files with which to build your queries.
Download the ECL IDE from the HPCC Systems web portal. http://hpccsystems.com
You can find the ECL IDE and Client Tools on this page using the following URL:
The ECL IDE was designed to run on Windows machines. See the appendix for instructions on running on Linux workstations using Wine.
Microsoft VS 2008 C++ compiler (either Express or Professional edition). This is needed if you are running Windows and want to compile queries locally. This allows you to compile and run ECL code on your Windows workstation.
GCC. This is needed if you are running under Linux and want to compile queries locally on a standalone Linux machine, (although it may already be available to you since it usually comes with the operating system).