Transliteration from Graphs

Transliteration from Graphs

Transliteration made with generative models and by integrating language specialists workflow by allowing them to design complex rules.

Motivation

Transliteration is the process of porting a given language from one alphabet to another.

This is usually achieved using rules, however, given languages complexity, many sub rules are existing, which only a native speaker or more realistically a specialist would know.

As a result, many transliteration strategies are inaccurate or approximative and in most case, no accurate data exists.

Interplay Developer & Specialist

In the design of transliteration strategies, we have identified another problem, which is the interaction between the developer and the language specialist: THe developer usually doesn’t understand one of the languages/alphabets in the transliteration process and gets easily confused.

For that reason, we decided to empower the specialist, giving him the opportunity to write code himself.

Approach from graphs to deep learning model to ruby with ONNX

  1. A specialists can design his rules on a graph (for instance lucidgraph).
  2. Our technology allows to parse graphs representing complex transliteration rules and create a code following the single rules that are vanilla place holders.
  3. In a next step, the developer can implement the subcomponents and run tests. 0.,1.,2. can be re-iterated.
  4. Transliteration data is created.
  5. Training Deeplearning model (word or char levels).
  6. Export of ONNX model.
  7. Production in ruby.

Our work is described in two documents: presentation and article

and also, there is a blog about it.

Team

Mahdi Mohajeri and myself.

Code

Code can be found in transliteration-learner-from-graphs.

Automation with chatGPT

Since each node has a corresponding text snippet, Chatgpt could be leveraged to generate code automatically on the branch autocode-gpt-3.