Page tree
Skip to end of metadata
Go to start of metadata

The proposal period for 2022 internships is now closed
The proposal period for 2023 internships will open in November 2022

This is new project, more information coming soon. If you are interested in this project contact Lorraine Chapman.

Find out about the HPCC Systems Summer Internship Program.

Project Description

To construct a Digital Human Read (DHR) to correct OCR. Humans have the ability to read OCRed text and correct it using knowledge of OCR mistakes and knowledge about words and word context. Find an OCR engine and text for test cases and study them to construct a DHR.

If you are interested in this project, please contact Add email link to mentor.

Completion of this project involves:

  • Find an OCR engine that can be used to test the OCR DHR
  • Study the OCRed text and come up with rules and knowledge that will help clean up the text
  • Implement the DHR in NLP++
  • Create an NLP++ repository for OCR cleanup

By the mid term review we would expect you to have:

  • <What must be completed to pass the evaluation and continue on to complete the project>

Add Mentor Name
Add link to Email Address 

Backup Mentor: Add Backup Mentor Name
Add link to Email Address 

Skills needed
  • Keen interest in natural language processing
  • Ability to do research via the internet
  • Ability to logically analyze text
  • Ability to learn and program in NLP++


  • <Deliverable(s) to be achieved>

End of project

  • <Deliverables expected by the end of the internship>
Other resources
  • No labels