The proposal period for 2022 internships is now open
Submit your final proposal to Lorraine Chapman before Friday 18th March 2022
This is new project, more information coming soon. If you are interested in this project contact Lorraine Chapman.
Find out about the HPCC Systems Summer Internship Program.
Project Description
In order to eventually create digital human readers in Spanish, a dictionary must be established. This project will use the Spanish dictionary from Wiktionary. One interesting aspect of this project are the verbs in Spanish which have a rich morphology.
If you are interested in this project, please contact Add email link to mentor.
Completion of this project involves:
- Download the Spanish dictionary from wiktionary
- Write an NLP++ parser to extract the vocabulary from the wiktionary files into text files
- Write an NLP++ parser to transform the text files into knowledge base files
- Create Spanish test files for part-of-speech tagging
- Write an NLP++ part-of-speech tagger
- Run the tests using the NLP++ Plugin in ECL to show enhancements
- Create an NLP++ repository for the Spanish dictionary and analyzers
By the mid term review we would expect you to have:
- <What must be completed to pass the evaluation and continue on to complete the project>
Mentor | David de Hilster Backup Mentor: Add Backup Mentor Name |
Skills needed |
|
Deliverables | Midterm
End of project
|
Other resources |
|