Transliteration from Graphs
Current & Past Projects | | Links: Blog | Paper | Code

Transliteration made with generative models and by integrating language specialists workflow by allowing them to design complex rules.
Motivation
Transliteration is the process of porting a given language from one alphabet to another.
This is usually achieved using rules, however, given languages complexity, many sub rules are existing, which only a native speaker or more realistically a specialist would know.
As a result, many transliteration strategies are inaccurate or approximative and in most case, no accurate data exists.
Interplay Developer & Specialist
In the design of transliteration strategies, we have identified another problem, which is the interaction between the developer and the language specialist: THe developer usually doesn’t understand one of the languages/alphabets in the transliteration process and gets easily confused.
For that reason, we decided to empower the specialist, giving him the opportunity to write code himself.
Approach from graphs to deep learning model to ruby with ONNX
- A specialists can design his rules on a graph (for instance lucidgraph).
- Our technology allows to parse graphs representing complex transliteration rules and create a code following the single rules that are vanilla place holders.
- In a next step, the developer can implement the subcomponents and run tests. 0.,1.,2. can be re-iterated.
- Transliteration data is created.
- Training Deeplearning model (word or char levels).
- Export of ONNX model.
- Production in ruby.
Our work is described in two documents: presentation and article
and also, there is a blog about it.
Team
Mahdi Mohajeri and myself.
Code
Code can be found in transliteration-learner-from-graphs.
Automation with chatGPT
Since each node has a corresponding text snippet, Chatgpt could be leveraged to generate code automatically on the branch autocode-gpt-3.