CHINESE CHARACTERS

The corpus has two versions.

The Chinese version
The CHAT version


The Chinese version

The CHINESE version requires the use of MS Chinese Windows 95/98 with HKSCS (Hong Kong Supplementary Character Set) support because it contains Cantonese characters, commonly used in Hong Kong but not in mainland China or Taiwan, which are not found in the standard GB or Big-5 character set. Anyone using the Chinese version of the corpus will need to download and install the HKSCS software available at http://www.info.gov.hk/digital21/eng/hkscs/index.html/.

Download the Chinese version of the corpus from http://www.arts.cuhk.edu.hk/~cancorp/archive/tagdata.zip.


The CHAT version

The CHAT version now in the Childes archive is a version that incorporates the Chinese characters on a '%can' tier, with the romanizations on the main tier. This amalgamation was done first by Brian MacWhinney, and then checked by the research team. Ann Law and Brian MacWhinney provided programming help in the conversion of the user-defined internal codes of Cantonese characters, used in earlier versions of the corpus, to the now standardized codes of the Hong Kong Government's Supplementary Character Set (HKSCS). This has made the display of the Cantonese characters in both the Chinese and CHAT versions relatively easy. The help and advice of Brian MacWhinney in the final stages of the corpus preparation, as well as his continual support for the updating of the corpus, is gratefully acknowledged. This version has passed the CHECK test for format consistency.

This version requires the use of MS Chinese Windows 95/98 with HKSCS (Hong Kong Supplementary Character Set) support because it contains Cantonese characters, commonly used in Hong Kong but not in China or Taiwan, which are not found in the standard GB or Big-5 character set. Anyone who wishes to view the Cantonese characters in the corpus will need to download and install the HKSCS software available at http://www.info.gov.hk/digital21/eng/hkscs/index.html/".

Download the CHAT version of the corpus from http://www.arts.cuhk.edu.hk/~cancorp/archive/chatfile.zip.



[HOME] [The Project] [Sample Files] [Some Facts]