# Implement the CONCORD algorithm

This project was completed by Syed Rahman. The project was his own idea = which he brought to us and completed as a summer intern in 2015.

The CONCORD algorithm implemented by Syed Rahman

The CONCORD algorithm is a method to estimate the true population of a c= o-variance matrix. The co-variance matrix is a summary of the relationship = between every pair of fields in the data. Co-variance values close to zero = indicate that the fields don=E2=80=99t have a relationship. Values close to= 1 indicate a positive relationship and values close to =E2=80=931 indicate= an inverse relationship.

In classic statistics there are many more observations than fields. In t= his case, the co-variance matrix of the sample is a good estimate for the t= rue co-variance matrix.

Unfortunately, in big data, there any many cases where the number of fie= lds exceeds the number of observations or may be close to the number of obs= ervations. It is the case that the sample co-variance matrix is a very poor= estimate for the true co-variance matrix.

Read Syed's blog to find out more about his progre= ss and experience and view his commits on github.

It=E2=80=99s clear that Syed=E2=80=99s addition to our Machine Learning = Library is an important improvement, providing a way to getting more reliab= le results in this area.

Syed presented about his project on Community Day at the 2015 HPCC Syste= ms=C2=AE Engineering Summit at the end of September this ye= ar. His presentation demonstrates how this algorithm works and why it is a = better method of getting the true population of a co-variance matrix. Watch= his presentation: Syed Rahman and Kshitij Khare - P= resenting about The CONCORD Algorithm (starts around the 30.00 mar= k). The p= resentation slides are also available.

For further details please refer to the following = JIRA issue for this project.

In 2016, Syed was a returning student intern who completed a machine lea= rning project which is related to this algorithm.

