Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

HPCC Systems Platform 4.x and greater has the ability to integrate with Java directly. This page takes you through a few steps that you have to implement to configure Java correctly. In this particular case we will see how to make it work on a Ubuntu 13.04 system

...

1. Download and Install the HPCC Systems platform with plugins 4.x and greater

Follow the installation instructions [here] for downloading HPCC & installing it. Specific instructions for installing the package with plugins are on page 17 of http://cdn.hpccsystems.com/releases/CE-Candidate-4.2.0/docs/Installing_and_RunningTheHPCCPlatform-4.2.0-1.pdf9 of Installing & Running the HPCC Systems® Platform

For the RPM based system you will need the distro beginning with hpccsystems-platform_community-with-plugins-, NOT hpccsystems-platform_community-

...

For Debian based systems there is only the hpccsystems-platform_community- package available and it includes plugin support.

 


2. Install OpenJDK 1.7 or greater

In some cases you may have to install the default-jdk package as well.

...

  1. Verify that your java class was not compiled with a more recent version of java than is on the cluster.
    1. You can check this by running "rpm -qa|grep java" on one of the cluster nodes.

  2. Copy the Java jar or class file to all of the THOR nodes on a cluster.
    1. The default location for java files is /opt/HPCCSystems/classes.

    2. This can be done manually or by running something like:

      for x in `seq 8`;do scp myjava.jar 10.173.147.$x:/opt/HPCCSystems/classes/ ;done    

      Beware /opt/HPCCSystems/classes is owned by root user in HPCCSystems 4.x. HPCC user (hpcc) doesn't have permission to write. But by default HPCC installation only set ssh key pairs for HPCC user which allow ssh/scp between the hosts without prompting password. So to run above script you need one of following method:

      1) run as root and provide password for each host

      2) add root ssh public key to authorized file (/root/.ssh/authorized_keys and run as root

      3) change /opt/HPCCSysems/classes permisson or ownership to allow HPCC user has write permission and run the script as HPCC user.

  3. Set the classpath in the HPCC Systems configuration file   

    1. The JAR file itself can be physically located anywhere. You can add the JAR file to the classpath by adding it to /etc/HPCCSystems/environment.conf, or by adding it to the Java global classpath environment variable.

    2. Edit the environment.conf in your favorite editor and add your java class/jar to the classpath entry

      1. If you are adding a jar file, the jar file itself has to be added to the classpath. For example:

        classpath=/opt/HPCCSystems/classes:/opt/HPCCSystems/classes/myjava.jar

         


  4. Restart the thor cluster for the classpath changes to take effect

...

Code Block
import java;
STRING segment() := IMPORT(java,'org/hpccsystems/Segmenter.SegmentText:(Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;');
STRING clearcache() := IMPORT(java,'org/hpccsystems/Segmenter.ClearCache:(Ljava/lang/Boolean)Ljava/lang/String;');

SEQUENTIAL(output(segment('text to segment')),
           output(clearcache(true))
)

 


You can also view additional java-related HPCC issues that have occurred in JIRA or raise an issue at http://track.hpccsystems.com to get help resolving a java issue.

...

Code Block
languagejava
package org.hpccsystems.streamapi.consumer;

public class DataConsumer {
	
	public static String consume() {
		return "<dataset><rows><row>sample row1</row><row>sample row2</row></rows></dataset>";
	}

}

 


Now, assuming that you have Java configured correctly (if not, read the setting up Java wiki), the sample ECL code to call the Java class will look like:

...

Code Block
IMPORT java;

STRING consume() := IMPORT(java, 
        'org/hpccsystems/streamapi/consumer/DataConsumer.consume:()Ljava/lang/String;');


messages := consume();

OUTPUT(messages);

messagesDS := DATASET([{messages}], {STRING line});

ExtractedRow := RECORD 
  STRING value;
END; 

ExtractedRows := RECORD
  DATASET(ExtractedRow) values;
END;

ExtractedRows RowsTrans := TRANSFORM
  SELF.values := XMLPROJECT('row', TRANSFORM(ExtractedRow, SELF.value := XMLTEXT('')));
END;

parsedData := PARSE(messagesDS, line, RowsTrans, XML('/dataset/rows'));

OUTPUT(parsedData);

 


The calling of the Java consume method is really accomplished in the first three lines. The rest of the code is used to extract the XML content into something more meaningful.

...

Additional Examples of Java Plugin usage:

https://github.com/hpcc-systems/TextAnalytics/tree/master/hpcc/ecl/stanfordnlp
https://github.com/hpcc-systems/kafka-integration/blob/master/ecl/DataCollection_Scheduler.ecl