Corpora of Vietnamese Texts (CVT)

Introduction

The Corpora of Vietnamese Texts was completed by Giang Pham (formerly Giang Tang) under the supervision of Kathryn Kohnert, Ph.D. CCC-SLP. Funding was provided by the Graduate Research Partnership Program in the Department of Speech-Language-Hearing Sciences at the University of Minnestoa.

Please cite this work using the following reference:

Pham, G., Kohnert, K., & Carney, E. (2008). Corpora of Vietnamese Texts: Lexical Effects of Intended Audience and Publication Place. Behavior Research Methods, 40, 154-163.

Acknowledgements

I would like to send my deepest gratitude and admiration to Dr. Kathryn Kohnert for her guidance and support throughout this research project. I am very thankful for my friend and colleague, Nguyễn Hải Anh, who spent countless hours purchasing children’s books while in Viet Nam, borrowing books from her school library, and typing and profreading over 350 texts. I am grateful for my parents, Tăng Tiến Đức and Tăng Trần Xuân, for all their support in all my work, and in particular, for their assistance in creating this website as well as typing and profreading over 50  texts.  I really appreciate Nguyễn Hoàng Nam for his technial support on the Research section of this website as well as for designing and creating the entire section on Clinical Materials. I would like to thank Hillcrest Elementary School in Orlando, FL, for allowing me to borrow more than 200 Vietnamese children’s books from their library to complete this project.

Thank you to all those who volunteered to type texts, especially Phạm Đức Tiến and Nguyễn Hoa. Many thanks to Pui Fong Kan, Mahmoud Sadrai, and Bryan Gordon for technical advise related to corpus linguistics. Thanks to Nguyễn Hoàng Nam and Trần Lọc for helpful suggestions about Vietnamese newspaper selection.

Summary

The following table summarizes the composition of the Corpora of Vietnamese Texts.

Composition of CVT

Corpus

Source

Published

# words

1. Children’s literature

78 books

Abroad

42,690

279 books

VN

161,793

SUBTOTAL

 

204,443

2. Newspaper articles

Thanh Niên

VN

114,099

Tuổi Trẻ

VN

151,183

VNN

USA

542,834

VOA

USA

43,058

SUBTOTAL

 

851,174

TOTAL WORDS

   

1,055,617