|
The "Mandarin
spoken corpora project" is part of the Language Archives Project
(National Digital Archive Project). The PI is
Dr. Shu-Chuan Tseng. The main aim of the "Mandarin spoken
corpora project" is to collect a wide variety of speech data of
Taiwan Mandarin and to digitally archive the use of Taiwan Mandarin
in audio and video data formats. The project consists of (1) speech
data collection and processing, (2) toolkit and database
development, (3) metadata management, (4) speech annotation design
and (5) web query system construction. Three main Mandarin spoken
corpora are currently in working, funded by the Institute of
Linguistics, National Science Council and the National Digital
Archives Project.
These include "Mandarin
Topic-oriented Conversation Corpus" (MTCC), "Mandarin
Conversational Dialogue Corpus" (MCDC) and "Mandarin
Map Task Corpus" (MMTC). The annotation systems include
"discourse annotation", "detailed spontaneous speech phenomena" and
"particular phonetic phenomena". Web users can also use our web
query system to search for keywords and annotations marked in the
corpora mentioned above.
|