Diacritization with Interscript
Current & Past Projects | | Links: Arabic | Hebrew

Diacritization with Deep Learning in Arabic and Hebrew.
Interscript
Interscript is a technology developed by ribose, in Hong Kong and supporting the transliteration of text snippets in multiple languages.
Transliteration can be achieved by a set of rules but often, more work is necessary:
Diacritization for Tranlisteration
Certain languages require particular treatment before the text can be successfully transliterated. In particular, in many if not most languages (including english), sounds are not completely made explicit by the writing system.
Diacritics is a set of symbols appearing on the so-called Abjad writing system. Its role is to make the pronouncing of vowel clear and explicit. It is not used however in the daily writing.
Obviously, for rule systems, knowledge of diacritizatics allow for more accurate transliteration.
Arabic
We wrote a blog in arabic to explain our work.
Here we could almost replicate the best benchmarks existing at the time of our work.
Hebrew
e wrote a blog in hebrew to explain our work.
For hebrew, we obtained better scores compared to works existing at the time of our work.
Training and Productionizing
In the two sections above, we explain how we use pytorch, onnx for training and production in ruby.
Team
Ahmad Mohsen and myself.
Code
Code is available on github: rababa.