..

Corpus

[!question]- What is a corpus? A large set of structured text that has been produced in a natural communicative setting

[!question]- What is representativeness? It is a measure of how well learning a corpus can generalize to the entire language

[!question]- What are the two factors in determining the representativeness of a corpus?

  • Balance -> No. of genres to chose from
  • Chunking -> How the chunk for each genre is chosen

[!question]- What is a tree bank It is a corpus that has been linguistically parsed and annotated with semantic/syntactic structure

[!question]- What are the two types of treebank corpus

  • Semantic
  • Syntactic

[!question]- What to semantic treebanks use? Formal representation of the semantic structure

[!question]- Give an example to semantic treebank

  • Robocup
  • Robot Commands

[!question]- What do syntactic treebanks use? The use predicate logic based meaning representation

[!question]- Give an example to syntactic treebank

  • Penn arabic
  • Columbia arabic

[!question]- What are the application of treebanks

  • Computational Linguistics to build state-of-the-art NLP system to do POS tagging, parsing, analyzing etc…
  • In Corpus Linguistics to study syntactic phenomena
  • In Theoretical linguistics and psycholinguistics, interaction evidence

[!question]- What is a PropBank? It is a corpus annotated with verbal prepositions and their arguments

[!question]- What is the use of PropBank? It helps in semantic role labelling

[!question]- What is the distinguishing feature of WordNet? Nouns, verbs, adverbs and adjectives are organized into groups known as SynSets They are connected using conceptual semantic and lexical relation