Collocation extraction
Collocation extraction is the task of using a computer to extract collocations automatically from a corpus.
The traditional method of performing collocation extraction is to find a formula based on the statistical quantities of those words to calculate a score associated to every word pairs. Proposed formulas are mutual information, t-test, z test, chi-squared test and likelihood ratio.[1]
Within the area of corpus linguistics, collocation is defined as a sequence of words or terms which co-occur more often than would be expected by chance. 'Crystal clear', 'middle management', 'nuclear family', and 'cosmetic surgery' are examples of collocated pairs of words. Some words are often found together because they make up a compound noun, for example 'riding boots' or 'motor cyclist' or ‘collocation extraction’ its very self.
See also
- Collocational restriction
- Collostructional analysis
- Compound noun, adjective and verb
- Phrasal verb
- Siamese twins (English language)
- Terminology extraction
- n-gram analysis
External links
- What is collocation
References
- ^ Manning, C. D.; Schütze, H. (1999). Foundations of statistical natural language processing. Cambridge, MA: MIT Press. ISBN 978-0-262-13360-9.
- v
- t
- e
- AI-complete
- Bag-of-words
- n-gram
- Computational linguistics
- Natural language understanding
- Stop words
- Text processing
- Argument mining
- Collocation extraction
- Concept mining
- Coreference resolution
- Deep linguistic processing
- Distant reading
- Information extraction
- Named-entity recognition
- Ontology learning
- Parsing
- Semantic parsing
- Syntactic parsing
- Part-of-speech tagging
- Semantic analysis
- Semantic role labeling
- Semantic decomposition
- Semantic similarity
- Sentiment analysis
Text segmentation |
---|
datasets and corpora
Types and standards | |
---|---|
Data |
and data capture
reviewing
user interface
- Formal semantics
- Hallucination
- Natural Language Toolkit
- spaCy
This computational linguistics-related article is a stub. You can help Wikipedia by expanding it. |
- v
- t
- e