The proposal period for 2022 internships is now closed
The proposal period for 2023 internships will open in November 2022
This is new project, more information coming soon. If you are interested in this project contact Lorraine Chapman.
Find out about the HPCC Systems Summer Internship Program.
To construct a Digital Human Read (DHR) to correct OCR. Humans have the ability to read OCRed text and correct it using knowledge of OCR mistakes and knowledge about words and word context. Find an OCR engine and text for test cases and study them to construct a DHR.
If you are interested in this project, please contact Add email link to mentor.
Completion of this project involves:
- Find an OCR engine that can be used to test the OCR DHR
- Study the OCRed text and come up with rules and knowledge that will help clean up the text
- Implement the DHR in NLP++
- Create an NLP++ repository for OCR cleanup
By the mid term review we would expect you to have:
- <What must be completed to pass the evaluation and continue on to complete the project>
Add Mentor Name
Backup Mentor: Add Backup Mentor Name
End of project