| The
Hong Kong Cantonese Child Language Corpus (CANCORP) is a longitudinal record of the
early language development of 8 Cantonese-speaking
children, each of whom was observed for one year from the
time when they were between one and a half to two years
old. Four of the children are male, and the other four
female. The database is deposited both at the Arts Faculty Server of the Chinese University of Hong
Kong and at the CHILDES (Child Language
Data Exchange System) archive at Carnegie
Mellon University. The corpus grew out of the project "The development of grammatical competence in Cantonese-speaking children" funded by the Hong Kong Research Grants Council from 1991-93, which was a joint effort of three local universities: The Chinese University of Hong Kong, the Hong Kong Polytechnic University, and the University of Hong Kong. Members of the research team consisted of: Thomas Hun-tak Lee (principal investigator, CUHK), Colleen Wong (co-investigator, HKPU), Patricia Yuk-hing Man (HKPU), Alice Shuk-yee Cheung (HKPU), Kitty Szeto (HKU), Cathy Sin-Ping Wong (CUHK, Hawaii) and Samuel Cheung-Shing Leung (co-investigator, HKU). The database contains 171 files coded according to the internationally accepted CHAT format (Codes for the Human Analysis of Transcripts) and tagged with 33 parts-of-speech labels. The files contain episodes of conversational exchanges between children and adults, with each utterance represented in Chinese characters, romanizations as well as corresponding parts-of-speech tags. The data should be of use to anyone interested in early language development, be they linguists, psychologists, philosophers or educationalists. Queries about the corpus should be directed to Thomas Lee (htlee@netvigator.com). Suggestions about the homepage can be sent to Ann Law (aylaw99@yahoo.com). |