Corpus
[!question]- What is a corpus? A large set of structured text that has been produced in a natural communicative setting
[!question]- What is representativeness? It is a measure of how well learning a corpus can generalize to the entire language
[!question]- What are the two factors in determining the representativeness of a corpus?
- Balance -> No. of genres to chose from
- Chunking -> How the chunk for each genre is chosen
[!question]- What is a tree bank It is a corpus that has been linguistically parsed and annotated with semantic/syntactic structure
[!question]- What are the two types of treebank corpus
- Semantic
- Syntactic
[!question]- What to semantic treebanks use? Formal representation of the semantic structure
[!question]- Give an example to semantic treebank
- Robocup
- Robot Commands
[!question]- What do syntactic treebanks use? The use predicate logic based meaning representation
[!question]- Give an example to syntactic treebank
- Penn arabic
- Columbia arabic
[!question]- What are the application of treebanks
- Computational Linguistics to build state-of-the-art NLP system to do POS tagging, parsing, analyzing etc…
- In Corpus Linguistics to study syntactic phenomena
- In Theoretical linguistics and psycholinguistics, interaction evidence
[!question]- What is a PropBank? It is a corpus annotated with verbal prepositions and their arguments
[!question]- What is the use of PropBank? It helps in semantic role labelling
[!question]- What is the distinguishing feature of WordNet? Nouns, verbs, adverbs and adjectives are organized into groups known as SynSets They are connected using conceptual semantic and lexical relation