Corpora of Vietnamese Texts (CVT)

CVT Word Lists


The CVT is composed of over one million words. The following are three word lists that summarize the CVT. The first list includes all of the words in the entire CVT. The second list is comprised of all the words in the children’s literature corpus. The third list includes all the words in the newspaper corpus. **Please note that certain tones and vowels have been formatted to be read by the concordance program during the analysis process. For a complete list of the formatting changes, see the Font Coding System** Words are listed in order from most to least frequent. Information on number of occurrences and percent of occurrence in the entire CVT are included. Although all steps have been taken to make this information accessible to the reader, these word lists are rather extensive. It is advisable to print only the portions or pages that interest you. It is permissable to print and use the CVT for non-profit research and educational purposes providing the appropriate citation to this website.

Please use the tabs to navigate your way into the Vietnamese Children's Literature Corpus and Vietnamese Newspaper Corpus

 Please click at the following options to download :



Tang, G. (2006). Corpora of Vietnamese Texts. Retrieved from