The proposal period for 2022 internships is now open
Submit your final proposal to Lorraine Chapman before Friday 18th March 2022
This is new project, more information coming soon. If you are interested in this project contact Lorraine Chapman.
Instructions for using this template: Delete this section when complete.
Use this template to specify the parameters of a new project idea. Project ideas must be suitable for student to complete during a 12 week internship. Projects may contribute a new feature, enhancement, POC to the HPCC Systems Platform or an interesting use case leveraging HPCC Systems. Supply enough information that it is clear what must be included in a project proposal seeking to solve the problem/produce the desired result. Supply links to additional resources and itemise a checklist of deliverables.This project is available as a student work experience opportunity with HPCC Systems. Curious about other projects we are offering? Take a look at our Ideas List.
Student work experience opportunities also exist for students who want to suggest their own project idea. Project suggestions must be relevant to HPCC Systems and of benefit to our open source community.
Find out about the HPCC Systems Summer Internship Program.
Project Description
More details coming soon.
If you are interested in this project, please contact Add email link to mentor.
Completion of this project involves:
- <4+ high level tasks to be completed>
By the mid term review we would expect you to have:
- <What must be completed to pass the evaluation and continue on to complete the project>
...
Mentor
...
Add Mentor Name
Add link to Email Address
Backup Mentor: Add Backup Mentor Name
Add link to Email Address
...
Skills needed
...
Address cleansing solutions are an important aspect of the big data industry as they constitute services that help ensure that the address information you collect is accurate, up to date, and standardized. From an API perspective, an address clean call typically sends a request to a server and waits for a response, and although this might sound simple from a first sight, this call can be multi-threaded and it can cache socket connections to server(s) so that you do not have to pay for connect time on each clean call.
The objective of this project is to explore these possibilities and optimize the functioning of an address cleaner plugin. So for instance, in the eventual scenario that the same address is requested to be cleaned multiple times within a short period of time or within the same query, we do not have to send the request over the net and wait for the response, as long as we have a cache of recent clean results that we could use. So in some sense it’s a very typical cache project.
To get you started, some base code and a working version of the solution is available for further development and polishing.
Depending on your progress during the project, additional optimizations can also be added as part of a wishlist, such as:
- to develop an encoding format to save the amount of traffic sent over the net – there are lots of extra/repeated chars we could eliminate etc.
- to develop a batch / async method so that a single request has multiple addresses (or names) in it and an async response meaning we get notified of where and when results come back.
Completion of this project involves:
- a functional and optimized version of the address cleaner plugin
Mentor | Mark Kelly Backup Mentor: TBD | ||||||||
Skills needed | Midterm
End of project
| Deliverables |
| ||||||
Other resources |
|