Message-ID: <376429577.2555.1601279205221.JavaMail.root@dmportal02.risk.regn.net> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_2554_51237251.1601279205221" ------=_Part_2554_51237251.1601279205221 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html Implement the CONCORD algorithm

# Implement the CONCORD algorithm

The project proposal application period for 2020 summer inter= nships is now closed. Check back in the Fall for details about applying to = join our 2021 program.

This project was completed by Syed Rahman. The project was his own idea = which he brought to us and completed as a summer intern in 2015.

The CONCORD algorithm implemented by Syed Rahman

The CONCORD algorithm is a method to estimate the true population of a c= o-variance matrix. The co-variance matrix is a summary of the relationship = between every pair of fields in the data. Co-variance values close to zero = indicate that the fields don=E2=80=99t have a relationship. Values close to= 1 indicate a positive relationship and values close to =E2=80=931 indicate= an inverse relationship.

In classic statistics there are many more observations than fields. In t= his case, the co-variance matrix of the sample is a good estimate for the t= rue co-variance matrix.

Unfortunately, in big data, there any many cases where the number of fie= lds exceeds the number of observations or may be close to the number of obs= ervations. It is the case that the sample co-variance matrix is a very poor= estimate for the true co-variance matrix.

Read Syed's blog to find out more about his progre= ss and experience and view his commits on github.

It=E2=80=99s clear that Syed=E2=80=99s addition to our Machine Learning = Library is an important improvement, providing a way to getting more reliab= le results in this area.

Syed presented about his project on Community Day at the 2015 HPCC Syste= ms=C2=AE Engineering Summit at the end of September this ye= ar. His presentation demonstrates how this algorithm works and why it is a = better method of getting the true population of a co-variance matrix. Watch= his presentation: Syed Rahman and Kshitij Khare - P= resenting about The CONCORD Algorithm (starts around the 30.00 mar= k). The p= resentation slides are also available.

For further details please refer to the following = JIRA issue for this project.

In 2016, Syed was a returning student intern who completed a machine lea= rning project which is related to this algorithm.

------=_Part_2554_51237251.1601279205221--