Find out about the HPCC Systems Summer Internship Program.
Project Description
In order to eventually create digital human readers in Spanish, a dictionary must be established. This project will use the Spanish dictionary from Wiktionary. One interesting aspect of this project are the verbs in Spanish which have a rich morphology.
If you are interested in this project, please contact David Dehilster.
Completion of this project involves:
- Download the Spanish dictionary from wiktionary
- Write an NLP++ parser to extract the vocabulary from the wiktionary files into text files
- Write an NLP++ parser to transform the text files into knowledge base files
- Create Spanish test files for part-of-speech tagging
- Write an NLP++ part-of-speech tagger
- Run the tests using the NLP++ Plugin in ECL to show enhancements
- Create an NLP++ repository for the Spanish dictionary and analyzers
By the mid term review we would expect you to have:
- More details coming soon
Mentor | |
Skills needed |
|
Deliverables | Midterm
End of project
|
Other resources |
|