Abstract: This article takes the characteristics of Yi language information processing as a starting point, and analyzes the mainstream technology of Yi language information processing from the aspects of N-ary model, speech recognition and grammatical analysis.
The world today has entered the era of information networks, Full informatization is the mainstream of social development and technological progress. Language and written information processing is an area with the widest range of professional applications, the largest number of users, and the largest amount of data. The application of computers in multi-ethnic countries is facing a strong requirement to expand the ability of local ethnic languages to process information. As we all know, all mankind is used to exchange information and spreadAlthough the various natural languages of knowledge and development of culture are different in form, they have deep similarities in semantics. Therefore, computer Yi language information processing has more commonality with other text information processing. Of course, computer Yi language information Processing also has its own characteristics and mainstream information processing technology.
Second, the characteristics of computer information processing
Text information processing refers to the use of computers to process the sound, form, and meaning of Yi language, that is, the operation and processing of input, output, recognition, analysis, understanding, and generation of characters, words, sentences, and texts. How to study The use of computers and computer technology to study Yi language and process knowledge of Yi language. It is a kind ofInterdisciplinary research fields of computer science, linguistics, philology, mathematics, logic, cognitive science, etc.
1. The particularity of Yi characters
As we all know, English, French and other western characters are Latin characters , Due to the limited number of letters and simple fonts, the input and output of characters and the processing and processing of information can be easily realized on the computer, so it has obvious advantages in computer language information processing. Like Chinese, Yi is an ideographic script, and Yi is divided into six dialect areas. Due to the large internal differences, Yi has differences in pronunciation, writing, and meaning. The same font has different shapes in different places. Sounds, different writing methods and different meanings. National Yi language glossary only in November 2010The national universal Yi script proposed by the Zhunhua Working Committee has 5598 Yi characters, which belong to a large character set, which brings many difficulties to the encoding of Yi characters. Therefore, we learn from the development experience of Western language information processing and Chinese character information processing, combined with the characteristics of Yi language, in the computer Yi language information processing process, according to different requirements, different forms of encoding of Yi characters, such as: Yi standard encoding ( Different programs such as domestic and international), input codes (Quanpin, Jianpin, Stroke), and Yi internal codes.
2. The particularity of written Yi language
Another characteristic of Yi language There is no obvious separation mark between words and marks, which makes automatic word segmentation a difficult problem in the analysis of written Yi. Word segmentation needs to be continuousCharacters are combined in an orderly manner according to certain norms. For example, English words are separated by spaces, while Yi is used to divide words, sentences, and paragraphs. One of the difficulties is Regarding the division of words, although English also has the problem of dividing phrases, it is more difficult to deal with because the number and scope of words in Yi language are much larger than that in English. As long as the computer Yi language information processing application system involves retrieval, machine translation, abstracts, proofreading, etc., it needs to use words as the basic unit. With the continuous deepening of research on language and text information processing, Yi language information processing technology has gradually shifted from word information processing to language information processing. Yi language automatic word segmentation is an indispensable basic work in computer Yi language information processing
3. The specificity of Yi language phonetics
In terms of speech, there are two types of plosives, affricates, and fricatives, all of which are distinct and voiced. The vowels in most dialects are all or mostly loose and tight. The vowels of most dialects are composed of monophonic sounds without a stop sound. A few dialects have compound vowels and vowels with nasal rhyme endings (or semi-nasal sounds). Most of them appear less frequently or are only used to spell new Chinese loan words. The opposing relationship of vowel phonemes in various dialects has its own characteristics. Generally there are 3 to 4 tones, with simple tones, mostly flat and falling, without twists. The main forms of syllable structure are: consonants + vowels + tones, vowels + tones. The syllable structure is relatively simple, and the boundaries of syllables are relatively clear. However, tones and tone sandhis are a significant difference in Yi language, so in terms of speech recognition and speech synthesis Generally speaking, this is a difficult point, but since there are relatively fewer characters in the Yi language, in general, the processing of Yi language sounds is better than other aspects.It is relatively easy to say.
4. The particularity of Yi grammar
In terms of grammar, function words and word order are the main Grammatical means. The word order is: subject-object-predicate. When nouns and some pronouns are used as attributives, they are before the head word. When quantifiers and adjectives are used as attributives, they follow the head word. When the negative words of most dialects are used as adverbials to modify verbs and adjectives, some inflections are used as auxiliary means to express the grammatical meaning before the single-syllabic head word and between the double-syllabic head word. Therefore, if you cannot master it well Syntax is particularly prone to ambiguity, so the important technology of automatic analysis of Yi language sentences is an urgently needed technology.
3. Several technical analysis of computer Yi language information processing
1, N-element model
suppose wi is any word in the text, if it is known that the first two words in the text wi-2w-1, we can use the conditional probability P(wi|wi-2w-1) to predict The probability of occurrence of wi. This is the concept of statistical language model. Generally speaking, if the variable W is used to represent an arbitrary word sequence in the text, it is composed of n words arranged in sequence, that is, W=w1w2...wn, then the statistical language model is the word sequence W that appears in the text Probability P(W). Using the product formula of probability, P(W) can be expanded into: P(W) = P(w1)P(w2|w1)P(w3| w1 w2)...P(wn|w1 w2...wn-1), it is not difficult to see that in order to predict the occurrence probability of the word wn, the occurrence probability of all the words before it must be known. From a computational point of view, this method is too complicated. If the occurrence probability of any word wi is only related to the two words before it, the problem can be greatly simplified. The language model at this time is called a tri-gram: P(W)≈P(w1)P(w2|w1)∏i(i=3,...,nP(wi|wi-2w-1 ), the symbol ∏ii=3,...,n P(...) represents the continuous multiplication of the probability. Generally speaking, the N-ary model assumes that the probability of the current word is only related to the N-1 words before it The important thing is that these probability parameters can be calculated through a large-scale corpus. For example, the ternary probability has P(wi|wi-2wi-1) ≈ count(wi-2wi-1wi) /count(wi-2wi-1) where count(...) represents the cumulative number of occurrences of a specific word sequence in the entire corpus. This provides a computational language model for the construction of Yi language corpus, intelligent retrieval, machine translation, etc. For example, the Yi language corpus completed by the National Language and Character Information Processing Research Center of Southwest Southwest University for Nationalities, and the automatic word segmentation and tagging system of Yi language are all used This calculation model is used to carry out.
2, speech recognition
The ultimate goal of speech recognition is to make the real Free communication in the sense, so that the machine can understand human language and make accurate responses in timeFeed. Speech recognition is an interdisciplinary subject. Speech recognition technology includes signal processing, pattern recognition, probability theory and information theory, sound machine principle and hearing principle, artificial intelligence and other main contents. Yi language speech recognition technology mainly includes three aspects: feature extraction technology, pattern matching criteria and model training technology. In addition, it also involves the selection of speech recognition unit. On this issue, we usually use Yi syllable as the recognition unit. In addition, in terms of feature parameter extraction technology, since speech symbols contain a lot of information, they are usually called acoustic features. Acoustic feature parameters are the key technology that determines the quality of speech recognition. Therefore, we should collect the semantic information of the language to be transmitted and eliminate the interference of the speaker’s personal information, so as to ensure the effectiveness and accuracy of the feature parameters. For example, Southwest The National Language and Character Information Processing Center of the University of Nationalities completed the standard Yi language acoustics in 2019.The research and construction of the database has laid a solid foundation for the in-depth research on Yi language speech recognition technology.
3. Syntax analysis
Syntax analysis refers to the analysis of the grammatical functions of words in sentences. For example Yi language "
(I’m going to Beijing", here"
(I)" is the subject, "
(Go)" is the predicate, "
(Beijing)" is the object.
Now the main application of syntactic analysis lies in the computer Yi language information processing, such as machine translation, etc. It is a direct realization of the idea of chunk analysis. Create high-level structural units to simplify sentence descriptions. It uses the grammatical features of Yi language as the analysis method to analyze the relationship between the sentence components in the sentence and the phrase structure tree in the paragraph. The main content of the analysis includes: all the single sentences in the sentence and what is the role of each single sentence in the syntax , What is the larger grammatical structure above the single sentence, what is the type of phrase or phrase in the sentence, and what role does it play in the sentence,Finally, how all these components are organically combined or attached to the entire sentence. These are the main content of syntactic structure analysis. The Yi, Chinese, and English parallel corpus developed by Southwest University for Nationalities in 2011 began the exploration of Yi syntax analysis . It needs to be explained that the object must be placed before the predicate in the language structure of Chinese and English, while the word order of Yi language is: subject-object-predicate, and the object is preceded. This point is significantly different from Chinese, English and other languages.
Carry out computer Yi language information processing technology research It has an important meaning. It is an organic integration of Yi language linguistics and computer information technology, and it combines all parts of Yi language, including words, Sentences, paragraphs, as well as text, sound, and image processing in various ways, and then input and output, compress, store, and retrieve the information. The Yi language and writing are the carriers of Yi language information and knowledge. The informatization of Yi language and writing or the development level of Yi language and writing information processing technology is a major event related to the modernization and social informationization of the Yi ethnic area. The development and application of computer information processing technology in Yi language not only marks the continuous expansion of the social function of Yi language in this field, but also contributes to the prosperity and development of Yi language and writing. It promotes the modernization and informatization of Yi language and promotes excellent National culture has important scientific and social significance. Especially with the continuous development of language information processing technology and the increasing informatization of society, it is even more necessary to conduct in-depth analysis and excavation of Yi language to realize the goal of real object-oriented services for Yi language and text information processing. Predictably, inIn the future of informationization and networking, only by continuously expanding the research fields and research directions of Yi language information processing can we continuously meet the needs of the modern development of Yi language and better promote the sustainable development of Yi language information processing technology.
This article takes the characteristics of computer Yi language information processing as a starting point. Through the analysis and discussion of the mainstream computer Yi language information processing technology, I hope to be able to modernize Yi language and computer The preliminary exploration in the intersecting field of Yi language and text information processing can play a role in attracting new ideas and jointly develop and improve this technology.
 Samara Yi. Computer information processing in Yi language [M]. Sichuan Nationalities Publishing House, 2000.
 Feng Zhiwei. Chinese Characters and Chinese Computer Processing[J]. Contemporary Linguistics, 2001, (1).p>
陈小荷. Overview of Chinese Information Processing[J]. Journal of School of Chinese Language and Literature, Nanjing Normal University, 2002, (1).
 Yu Shiwen. The construction of corpus and comprehensive language knowledge base, some important issues of Chinese information processing [M]. Science Press, 2003.
 Feng Zhiwei. meterThe foundation of linguistics [M]. Commercial Printing House.
 Wang Chengping. Analysis of the current situation and development prospects of Yi language information processing[J], "Journal of Southwest University for Nationalities" ( Humanities and Social Sciences Edition), 2011, (2).
 About the author: Wang Chengping (1979.3-), male, Yi nationality, Ph.D, associate professor, National Language and Writing Information Processing Research Center, Southwest University for Nationalities, this article is a National Social Science Foundation project (06XYY021), National Social Science Fund project (07BYY060), Southwest University for Nationalities Central University Basic Research Fund Project (09SZYZJ04) one of the research results.
Source: Yi People Net