Documentation - Data Patterns

This project was completed by Amy Ma during the 2022 intern program and is no longer available. You are welcome to submit your own idea on this topic.
See her poster showcasing her work

Project Description

Contribute to our suite of documentation guides by producing a manual on the Data Patterns capabilities available for use with the HPCC Systems Platform. Find out more about the HPCC Systems Summer Internship Program.

Data Patterns is a library or ECL Bundle that provides data profiling and research tools to an ECL programmer. It can be used in three ways:

  • Inside ECL Watch (the easiest)

  • Using the Data Patterns Bundle

  • Using the Std.DataPatterns module in ECL

The candidate can use the free version of XMLMind Editor (or the XML Editor of their choice) and will submit pull requests to our GitHub Repository.

Completion of this project involves:

A new stand-alone Data Patterns book with the following sections:

  1.  

    • Overview

    • Using Data Patterns in ECL Watch

    • Using the DataPatterns bundle

    • Using Std.DataPatterns module methods

We will provide some sample data files to process. These files will be made to best demonstrate some of the capabilities of the product. For example, one file might have an unusual skew or interesting MIN/MAX values.

An includible module (chapter) in DocBook XML format

This chapter will be included in two books: The Standard Library Reference and the new Data Patterns book 

Poster - Showcasing deliverables and their value to the HPCC Systems Open Source Project and Community

Deliverables

End of project

  • A new stand-alone Data Patterns book 

  • An includible module (chapter) in DocBook XML format 

  • Poster - Showcasing deliverables and their value 

Other resources

All pages in this wiki are subject to our site usage guidelines.