About This Project

top of page

The goal of this database project is to integrate a selection of the language resources that are available for a variety of natural languages. The site is being developed and hosted by myself, Andrew Smith, and has been inspired by my interests in Machine Intelligence and Natural Language Processing.

While every effort has been made to credit sources and honour the licences under which works have been published, there is always the possibility of error, so please contact the author about any corrections that might be required. All correspondence should be addressed to words at this domain, and will preferably be written in English.


Implementation

top of page

The database is being developed using PostgreSQL and will be made available as an SQL database dump. It is intended to be used like a triple-store or knowledge base of facts about natural languages. Internally it is being designed around relations for the greatest efficiency.

The specific licence to be used is still under consideration, however it will have to be compatible with the licences of the source material, and it will be chosen to maximise the usefulness of the project.

Although the contents of the database are being drawn from many sources, all text is being converted to UTF-8 and diacritical marks are being preserved (or restored) whenever possible. However the database also contains versions of words folded to remove case distinctions and diacritical marks for ease of indexing and searching.

top of page


Database development, website content and design, copyright 2009 by Andrew Smith





Spreadfirefox Affiliate Button