Farm and crop data is abundant. It comes from machinery, satellite and drone images, commodities exchanges, and more. Proper decisions depend on in-depth understanding and analysis of the available data. Furthermore, the data come from many diverse sources. Consequently,
farmers and farm managers are overwhelmed and ill equipped to exploit the information.
In this poster we will demonstrate how we created a data lake for agriculture data. There are several streams feeding the lake with raw data, such as USDA data and mercantile exchange pricing data. The raw data from the streams is assembled and organized to create derived information from the raw data, such as production rates. Finally, all this data, information, and knowledge can be viewed in a custom web application which farmers can use to evaluate the profit of various crop planting scenarios in order to better manage their farms.
This project is installed on an HPCC Systems server (thanks to Dan Camper). Data from the streams are fetched daily using ECL cron jobs and sprayed into the lake as it enters the landing zone. Custom ECL code flattens the new and old data and processes it to update information and knowledge. Thanks to the HPCC system's high processing power, all of the above tasks for a large data of over 39M entries are completed within 12 minutes. The data is then indexed and published in the form of REST web services using WsECL.
A custom web application written in Flutter framework presents this information in charts and tables. The data-viewing application retrieves information from the data lake using the REST interface in WsECL. The project exercises the breadth and depth of HPCC Systems.
In this Video Recording, Gurman provides a tour and explanation of his poster content.
Click on the poster for a larger image. The original PDF version can be found here. (Available for download).