Emergence of Linguistic Features: Independent Component Analysis of Contexts

Abstract

We show that independent component analysis (ICA) (Hyvärinen et al. 2001) applied on word context data gives distinct features that reflect syntactic and semantic categories. The analysis gives features or categories that are both explicit and can easily be interpreted by humans. This result can be obtained without any human supervision or tagged corpora that would have some predetermined morphological, syntactic or semantic information. The results include both an emergence of clear distinctive categories or features and a distributed representation. This is based on the fact that a word may belong to several categories simultaneously in a graded manner. We wish that our model provides additional understanding on potential cognitive mechanisms in natural language learning and understanding. Our approach attempts to show that it is possible that much of the linguistic knowledge is emergent in nature and based on specific learning mechanisms.

Topics

1 Figures and Tables

Download Full PDF Version (Non-Commercial Use)