../
Tanglish
Tanglish
NLP experiments on Tanglsih
Links
What the hell is Tanglish?
- Tanglish is Tamil written using latin script i.e english letters.
Complexities
- Tanglish has its own sort of grammar. It doesn’t strictly conform to the classical grammar of Tamil nor does it conform to English
- Moreover even spoken Tamil is vastly different that textual sources. This presents a problem when we want to train Language models. English, in addition to having a treasure of literature, has a large corpus of spoken text (reddit, twitter, etc.)
- This allows for very rich NLP and auto-completition
Task accomplished
- Preprocessing
- N-grams
- Transliteration
- We attempted to convert Tanglish to plain Tamil in order to use existing Tamil NLP toolkits
- POS tagging
- NER
- Sentiment analysis using LSTM and BERT
UI
