A Corpus-based Contrastive Study on English and Chinese Semantic Prime happen and fasheng in Natural Semantic Metalanguage

This paper reports a corpus-based study on English and Chinese semantic prime happen and fasheng in Natural Semantic Metalanguage. With the aids of computer software Wordsmith 5.0 and SPSS19.0, we conducted a contrastive study on happen and fasheng based on a small English and Chinese comparable corpus constructed by ourselves. By extracting evidence from the corpus, the distribution of happen and fasheng , their syntactic patterns, their colligation types as well as their semantic prosody are identified and analyzed. We found that there is no significant difference between English semantic prime happen and its Chinese counterpart fasheng with respect to their distribution, their syntactic patterns, their colligations, and their semantic prosody. The results reveal that semantic prime happen is identical with its Chinese counterpart fasheng . Thus it provides an evidence to justify the premise of the Natural Semantic Metalanguage Theory.

input to the analysis of semantic primes represents something new. The marriage of the corpus-based approach and traditional NSM analysis has enabled this thesis to produce a more realistic account in a way that has not been attempted previously.

Construction of the Corpus
In planning the collection of texts to be included in our corpora, two considerations were made beforehand.
One is that a variety of different genres of writing would be gathered for inclusion in the corpus. The other is that each genre would be divided into text samples, and each sample would not exceed a certain amount of words in length.
After comparing the resources and the amount of time available to create our own comparable corpus, we determined that the English corpus should contain 500000 words. According to Hu (2006), 10 English words equal to 16 Chinese characters, so the Chinese corpus should contain 800000 characters. The list of main categories and their subdivisions was drawn up according to the layout of Brown Corpus. A few changes were later made on the basis of experience gained in making the selections.

Data collection
In our comparable corpus, the texts in each corpus are divided into two main categories: informative prose and imaginative prose. The category of informative prose includes academic writing, non-academic writing and press reportage. Academic writing is represented by journal articles and textbooks, and covers a wide variety of topics, including natural sciences, medicine, mathematics, social and behavioral sciences, political science, law, education, humanities, technology and engineering. In contrast with that, non-academic writing has a wider and more varied readership, although the subject areas may still be quite specialized. This category includes press reportage, press editorials, press reviews, skills and hobbies, popular lore, miscellaneous and bells letters, biography, memoirs, etc. Press reportage includes political, sports, society, financial and cultural reports. All the reports were written by staff reporters and journalists. Each corpus contains a total of 22 individual press reports, taken from national, regional, and local newspapers. Press editorials and press reviews are distinguished from general news reports in the grounds that their main intention is to persuade rather than to inform. They are less directly tied to current events, and they offer the writer the opportunity to be discursive in a way that news journalism does not. Miscellaneous is corporate in origin. It is written on behalf of government departments or other administrative bodies and its chief aim is to convey information to the general public.
Texts in the skill and hobbies category also offer instruction, but these are directed towards a smaller and more specialized readership. They include car maintenance manuals, cookery books, gardening manuals etc.
The imaginative prose in the corpus is creative writing: novels and short stories. It includes a variety of fiction types, including general fiction, mystery and detective fiction, science fiction, adventure fiction, romance and love story and humor.
Once the basic outlines of the corpus are determined, it is time to begin the actual creation of the corpus: collection, computerization, sampling, and segmentation of the Chinese characters.
The samples represent a wide range of styles and varieties of prose. Samples were chosen for their representative quality rather than for any subjectively determined excellence. Most of the samples were written in recent years. Since new words come into the language every day, we decided that for magazines and newspapers, the time-frame was one year, and for books, the time frame was five to ten years.
After the collection of the data, each text was assigned a unique textcode, corresponding to its position in the hierarchy of text categories in the corpus in which the sample might be included. As for the sampling, the number of texts in each category varies. For the English corpus, each sample is of 2000+ words. According to Hu (2006), its Chinese counterpart is of 3200+ characters. Each sample begins at the beginning of a sentence but not necessarily of a paragraph or other larger division, and each ends at the first sentence ending after 2000 words or 3200 characters.
The detailed information about the corpus is shown in table 2.

Data Analysis and Discussion
"The ability to examine large text corpora in a systematic manner allows access to a quality of evidence that has not been available before." (Sinclair, 1991:27) The present study is a corpus-based study and tries to make a combination of quantitative measurements and qualitative analysis of the English and Chinese semantic prime happen and fasheng. By extracting evidence from corpora, the distribution of happen and fasheng, their syntactic patterns, colligation types, collocations as well as their semantic prosody are presented and analyzed.

Dictionary Explanation of happen and fasheng
In the Collins CoBuild Advanced Learner's English Dictionary (2006), happen is defined as follows: 1. Something that happens occurs or is done without being planned.
2. If something happens, it occurs as a result of a situation or course of action. According to the NSM Theory, only the "take place" sense of the word happen is proposed as the semantic prime. In like manner, only the "yuanlai meiyou de shi chuxian-le; chansheng" (new things appear; take place) sense of fasheng is proposed as the semantic prime.

Distribution of happen and fasheng
In this section, the overall frequencies of happen and fasheng as well as the respective frequencies of happen and fasheng with different meanings are presented and analyzed.

Overall distribution of happen and fasheng
To investigate the usage of happen and fasheng in the comparable corpus, the raw frequencies were calculated and the results are listed in Table 3.  Table 3, the total frequency of happen is 120, while that of fasheng is 194. The proportion of happen in English corpus is 0.02372% which is lower than that of fasheng 0.04199%. To test whether the difference is due to the different sizes of the corpora used for comparison, we resort to a statistical tool. "The aim of statistical tests of significance is to show whether or not the observed differences between sets of data could reasonably have been expected to occur 'by chance' or whether; on the contrary, they are most probably due to the alternation in the variable whose effect is being investigated." (Bulter, 1985: 8) Chi-square tests were performed with SPSS 10.0 in order to find out whether the difference in frequencies is significant at five percent significance level. The result is shown in Table 4. can be interpreted as "It is by chance that I have my notebook with me." In Chinese, it is "qiaqiao wo dai le bi".
"Happen to do something" means "to do something by chance". Such as example [2c], it means that "One morning I turned over the salt-cellar at breakfast by chance." [3] a. The two friends happened on each other in a town.
b. I happened on the pen I'd been looking for.
In the above two examples, happen also means "occur by chance". To avoid the happening of the fire.
In the above two examples, fasheng means "something that happens". Here fasheng is a noun, and its English counterpart is happening. "huozai de fasheng"means "the happening of the fire", which is different from "huozai fasheng" and "fasheng huozai", which means "the fire happened". [5] a. 他们对发生的事件总感到突兀。 tamen dui fasheng de shijian zong gandao tuwu they towards happen thing always feel abrupt They always feel abrupt to the things that happened.
b. 这是小时候所发生的事情。 zhe shi xiaoshihou suo fasheng de shiqing This is childhood happen thing This is the thing that happened during childhood.
Since the present study is to analyze the semantic prime happen and fasheng, we only focus on the sense proposed as the semantic prime.

Frequencies of happen and fasheng with different meanings
To investigate the usages of semantic prime happen and fasheng in the comparable corpus, the distribution of happen and fasheng with different meanings were calculated and the results are shown in Table 5. As shown in Table 5, there is still a discrepancy in the different meanings of happen and fasheng: semantic prime meaning of happen accounts for 81.67% while that of fasheng is 75.26%. Based on the raw data in Table 5, Chi-square tests were performed to find out whether the distribution of happen with semantic prime meaning differs significantly to that of fasheng at five percent significant level. Table 6 shows the result of Chi-square test of the comparison of semantic prime meaning of happen and fasheng after running SPSS programs. The accident happened.
For pattern II, "X" is the undergoer, i.e. the role borne by "X". In Chinese, the noun in preverbal position is interpreted as the undergoer, while the second slot is always the event noun. In order to code the undergoer as opposed to locus, a different structure is used with a postverbal locative phrase zai X (de) shenshang which collocates the undergoer with shenshang. Here the bound lexeme shen refers to what happens to one's body as a literal embodiment of "person, self, life" (Goddard, 2002). This is the only frame with happen and fasheng that allows both undergoer and event to be introduced. Several examples are given below: [7] a. But what happens to you, my orphan? The accident happened in Shaoshan road.
To investigate the usage in terms of different syntactic patterns, the frequencies of each syntactic pattern was calculated and analyzed. The results are listed below.  Table 7, Chi-square tests were performed to find out whether the distributions of different syntactic patterns differ significantly. Table 8 shows the results of Chi-square tests of the comparison of different syntactic patterns of happen and fasheng after running SPSS program. respectively, all bigger than the critical value .05. Thus we may conclude that there is no significant difference between the distributions of the three syntactic patterns of happen and fasheng.

Colligations of happen and fasheng
In order to have a deeper understanding of the different syntactic patterns, we resort to observe the detailed colligations for each pattern. And the results are listed as below. The detailed colligations for syntactic pattern I are shown as follows: The detailed colligations for syntactic pattern II are shown as follows:  The detailed colligations for syntactic pattern III are shown as follows:

Semantic Prosody and Collocation of happen and fasheng
Since semantic prosody studies the collocational behavior of lexical items, and it lays its emphasis on the semantic meaning which imposes on collocational structure, we analyze the collocation and semantic prosody of happen and fasheng together.

Overall semantic prosody of happen and fasheng
In the current study, we evaluated each case in context. A pleasant or favorable affective meaning is labeled as positive while an unpleasant or unfavorable affective meaning is judged as negative. When what was happening was completely neutral, or the context provides no evidence of any semantic prosody, the instance is labeled as neutral. It has to be admitted that, since there is no agreed criterion to the classification of these three categories, discrepancies do exist for some vague words. However, we will not take this factor into consideration because they only account for a small rate in the words being examined. Based on the raw data above, Chi-square tests were performed to find out whether the distribution of different semantic prosodies differs significantly. Table 13 shows the results of Chi-square tests of the comparison after running SPSS program. respectively, all greater than .05. Thus we may conclude that there is no significant difference between the semantic prosody of happen and fasheng.

Positive semantic prosody of happen and fasheng
Of Most of all, the raw material should have qualitative changes so as to make the menu perfect.
Parents-in-law towards her attitude happened historical change unprecedented caring and considerate Historic changes have occurred in parents-in-law's attitudes towards her, they became unprecedented caring and considerate.
In example [15a], zhi de bianhua (qualitative changes) may be good or bad, but we can infer from goucheng caiyao de wanmei shuxing (make the menu perfect) that good things happened to the material. In example [15b], taidu de bianhua (the change of attitudes) may be good or bad, however, we can infer from kongqian de hehu he titie (unprecedented caring and considerate) that the parents-in-law's attitudes to her changed from bad to good.
The collocations of both happen and fasheng with positive semantic prosody constitutes a low proportion.
Examples of the collocations of happen with positive semantic prosody occur 16 times, accounting for 16.33% of the total occurrences, and that of fasheng 21 times, accounting for 14.38% of the total occurrences.
According to the Chi-square tests, there is no significant difference of the positive semantic prosody between happen and fasheng. The nominal collocations of happen with negative semantic prosody are: misfortune, bad weather, accident, horror, bad weather, the worst, and the tragedy. It is easy to tell that all those seven collocations contain a negative connotation.

Negative semantic prosody of happen and fasheng
[ 16] a. The tragedy did not happen.
b. An accident happens to his passengers.
c. Bad weather happened on this first setting out.

d. A misfortune happened.
e. She and Winifred were sitting together on the bridge, he told her that things had happened while he was studying abroad that he was sorry for. From those data analysis we observe that both happen and fasheng co-occur more often with negative words than positive words. It can be inferred that both happen and fasheng are more towards the negative side on the positive-negative continuum since it is primed to occur with "bad things". In addition to its negative semantic prosody, happen and fasheng have a strong tendency to indicate uncertainty and fortuity, to appear in environments where things are not fully known or determined and to co-occur with items which express this general semantic area.

Conclusions
This paper has sought to examine the Natural Semantic Metalanguage Theory by conducting a contrastive study on English and Chinese semantic prime happen and fasheng. As it was developed and tested by using corpus data, the work presented here has been able to overcome the inaccuracies and biases inherent in the previous intuition-based research, thus provide a more accurate and comprehensive understanding of semantic primes in both English and Chinese. From both quantitative and qualitative analysis, this thesis holds the view that semantic prime happen is identical with its Chinese counterpart fasheng.

Major Findings of the Research
In order to test the NSM Theory, we conducted a corpus-based contrastive study on English and Chinese semantic prime happen and fasheng. By analyzing different meanings of happen and fasheng, we found that though there is a significant difference between the overall distribution of happen and fasheng, there is no significant difference between the distribution of the semantic meaning of happen and fasheng in the comparable corpus. Then, based on the concordance lines, we identified three syntactic patterns for happen To investigate the usage in terms of different syntactic patterns, we calculated the frequency of each syntactic pattern, and found that there was no significant difference between the distribution of the three syntactic patterns of happen and fasheng. After that, we went on with the exploration of different colligation types within each syntactic pattern. We separated syntactic pattern I into two colligation types, namely: "n./ pron. + V. or V. + n." and "n./ pron. + V. + adv. or n./ pron. + adv. + V." In syntactic pattern II, we identified two colligation types "n. + V. + n." and "n./ pron.+ prep. + V. or n./ pron. + V. + prep." In syntactic pattern III, we identified two colligation types "V. + locus" and "V. + Time".
Finally, we analyzed the semantic prosody, semantic preference and collocation of happen and fasheng, from which we observed that except for those collocations associated with uncertainty and fortuity, both happen and fasheng co-occur more often with negative words than positive words. It can be inferred that both happen and fasheng are more towards the negative side on the positive-negative continuum since it is primed to occur with "bad things" and that there is no significant difference of the semantic prosody of happen and fasheng.

Limitations
It should be admitted that although strenuous efforts have been made, the present study is far from perfect.
This study is exploratory in nature, and some difficult issues have not yet been tackled adequately.
First, due to the focus and the space of the thesis, we had to restrict the investigation to a single semantic prime happen in English and its Chinese counterpart fasheng. The selected word is the subject choice of the author and it is just a tip of the iceberg of the total more than 60 semantic primes.
Second, the size of the comparable corpus is relatively small in light of contrastive study. Therefore, the results of the current study may be modified by other larger corpora's evidence.

Suggestions for Further Research
It is clear that much work remains to be done to shed further light on the corpus-based Natural Semantic Metalanguage Theory study. First, it is suggested that further studies should be carried out among larger corpora. If the size of the corpus is expanded, the reliability of the research results will be increased. Secondly, more extensive investigations should be carried out by examining all the other semantic primes.
All in all, the Natural Semantic Metalanguage Theory offers a new and promising perspective for semantics and, especially, lexical semantics. More inter-lingual and cross-lingual researches could be done concerning the 60