JPH077400B2

JPH077400B2 - Language processing method

Info

Publication number: JPH077400B2
Application number: JP62118338A
Authority: JP
Inventors: 和彦尾関
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 1987-05-15
Filing date: 1987-05-15
Publication date: 1995-01-30
Anticipated expiration: 2010-01-30
Also published as: JPS63282878A

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、日本語連続音声認識装置や、べた書き仮名漢
字変換方式日本語ワードプロセッサなどに用いる言語処
理法に関するもので、表音表記始端および表音表記終端
が様々な位置にある複数の文節候補、すなわち、いわゆ
る文節ラティスが与えられたとき、それらの候補の確実
度と、複数の文節が、ある順序で同時に他の一つの文節
に係る時の整合度を考慮に入れ、日本語の句あるいは文
として最適な文節列が構成されるように文節候補から文
節を選択すると共にその最適な構文を決定し、かつそれ
により得られる最適文節列上の最適構文の日本語の句あ
るいは文としての適格度を計算する言語処理方法に関す
るものである。Description: TECHNICAL FIELD The present invention relates to a language processing method used in a Japanese continuous speech recognition device, a Japanese kana-kanji conversion system Japanese word processor, and the like. When a plurality of bunsetsu candidates whose phonetic transcription ends are at various positions, that is, a so-called bunsetsu lattice is given, the certainty of those candidates and the bunsetsu are related to another bunsetsu at the same time in a certain order. The optimal bunsetsu sequence obtained by selecting the bunsetsu from the bunsetsu candidates and determining its optimal syntax so that the optimal bunsetsu sequence is constructed as a Japanese phrase or sentence, taking into account the degree of consistency of time The present invention relates to a language processing method for calculating eligibility as a Japanese phrase or sentence of the above optimum syntax.

［従来の技術］ベタ書き仮名漢字変換方式日本語ワードプロセッサにお
いて、適切な最終的処理結果を得るためには、与えられ
た仮名文字列が持つ、形態素分割による多義性や同音語
による多義性などを解消しなければならない。[Prior Art] In a Japanese kana-kanji conversion method for a solid writing Japanese word processor, in order to obtain an appropriate final processing result, the polymorphism due to morpheme division and the polysemy due to homophones, which a given kana character string has, are given. Must be resolved.

また、連続音声認識においては、結果がまず表音記号列
で出力されることを想定することができるが、認識の不
確実性を考慮して、通常一つの時間区間に対して複数の
認識候補を挙げることが行われる。従ってこの場合に
は、上記の多義性の解消に加えて、複数の候補の中から
適切なものを選択する処理が必要となる。Also, in continuous speech recognition, it is possible to assume that the result is first output as a phonetic symbol string, but in consideration of the uncertainty of recognition, usually multiple recognition candidates are used for one time interval. Something is done. Therefore, in this case, in addition to eliminating the above-mentioned polysemy, it is necessary to perform processing for selecting an appropriate one from a plurality of candidates.

上のような問題を解決するため、日本語の文、あるいは
句として適格性の高い文節列を定める問題は、文節ラテ
ィスから最適文節列を選択する問題として研究されてい
る。文、あるいは句としての適格性の基準として、二文
節間の係り受けの整合度と、各文節の確実度の和を用い
る場合には、この問題を効率良く解く従来方法が次の文
献に述べられている。In order to solve the above problems, the problem of defining a highly qualified bunsetsu sequence as a Japanese sentence or phrase has been studied as a problem of selecting an optimal bunsetsu sequence from a bunsetsu lattice. When using the sum of the dependency consistency between two clauses and the certainty of each clause as a criterion of eligibility as a sentence or phrase, a conventional method for efficiently solving this problem is described in the following document. Has been.

尾関和彦：「文節ラティスから最適文節列を選択するた
めの多段決定アルゴリズム」、電子通信学会コンピュ
テーション研究会資料、COMP86−48（1986.11.20）［発明が解決しようとする問題点］表音記号列が与えられたとき、それから文節と認定でき
る部分表音記号列をすべて切りだし、そのような各部分
列を表音表記として持つ文節をすべて列挙すると、様々
な表音記号位置を表音表記始端、表音表記終端とする文
節の集合が得られる。このような文節の集合は文節ラテ
ィスと呼ばれている。文節ラティスは、連続音声中から
文節として認められる区間を切り出す方式の連続音声認
識装置の出力としても得られる。文節ラティスという言
葉を用いると、本発明が取り扱う問題は、「文節ラティスが与えられたとき、その中の文節を、終
端と始端が表音表記位置として連続するという条件を満
たすように並べてできるあらゆる文節列を作り、その中
から日本語の文、あるいは句としての適格性と、各文節
の確実度の双方を考慮して、最も妥当な文節列を選択せ
よ」と述べることができる。Kazuhiko Ozeki: “Multistage Decision Algorithm for Selecting Optimal Phrase Sequences from Phrase Lattice”, IEICE Computation Workshop Material, COMP86-48 (1986.11.20) [Problems to be solved by the invention] Phonetic symbols When a sequence is given, cut out all partial phonetic sequences that can be recognized as syllables, and list all the syllables that have each such subsequence as phonetic notation, the various phonetic symbol positions A set of clauses at the beginning and the phonetic transcription end is obtained. Such a set of clauses is called a clause lattice. The phrase lattice is also obtained as an output of a continuous voice recognition device that cuts out a section recognized as a phrase from a continuous voice. Using the term bunsetsu lattice, the problem addressed by the present invention is that "when a bunsetsu lattice is given, any bunsetsu in it can be arranged so as to satisfy the condition that the end and start ends are continuous as phonetic writing positions. Create a bunsetsu sequence, and select the most appropriate bunsetsu sequence, taking into consideration both the eligibility as a Japanese sentence or phrase and the certainty of each bunsetsu. "

従来技術に関する説明の中でも述べたように、日本語の
文あるいは句としての適格性が、二文節間の係り受けの
整合度の和で定義される場合には、この問題を効率良く
解く方法が既に知られている。しかし、このような適格
性の定め方では、一つの述語に係る文節の順序が不自然
な文や、一つの文節に係る文節の格が重複しているよう
な不自然な文でも、そのようなことのない自然な文と同
じ適格性を持つことがあるという問題がある。かかる不
都合を完全に解決するためには、日本語の文あるいは句
としての適格性を、二文節間の係り受けの整合度の和で
はなく、複数の文節が、ある順序で同時に他の一つの文
節に係ることの整合度の基にして定める必要があるが、
適格性をそのように定めると、上記の問題を解く方法は
枚挙法しか知られておらず、計算量が膨大になるという
問題のため、これを実行することはできなかった。As mentioned in the explanation of the prior art, when the eligibility as a Japanese sentence or phrase is defined by the sum of the degree of dependency matching between two clauses, a method for efficiently solving this problem is available. Already known. However, in this way of determining eligibility, even if the order of bunsetsus related to one predicate is unnatural or the case of bunsetsus related to one bunsetsu is unnatural, There is a problem that it may have the same eligibility as a natural sentence that has nothing to do with it. In order to completely solve such inconvenience, the eligibility as a Japanese sentence or phrase is determined not by the sum of the degree of dependency matching between two bunsetsus, but by a plurality of bunsetsus at the same time as another one. Although it is necessary to determine it based on the degree of consistency of the clause,
If the eligibility is set as such, only the enumeration method is known to solve the above-mentioned problem, and this cannot be executed due to the problem that the amount of calculation becomes huge.

そこで、本発明の目的は、日本語の文、あるいは句とし
ての適格度が二文節間の係り受けの整合度ではなく、複
数の文節が、ある順序で同時に他の一つの文節に係るこ
との整合度に基づく場合に、与えられた文節ラティスか
ら、日本語の文あるいは句として最適な文節列を選択す
ると共にその上の構文を決定し、かつそれにより得られ
る文節列の日本語の句あるいは文としての適格度を枚挙
法に比べて格段に効率よく計算することのできる言語処
理方法を提供することにある。Therefore, an object of the present invention is that the eligibility as a Japanese sentence or phrase is not the degree of dependency matching between two bunsetsu, but that a plurality of bunsetsus are related to one other bunsetsu at the same time. When it is based on the degree of matching, it selects the optimal phrase sequence as a Japanese sentence or phrase from the given phrase lattice, determines the syntax above it, and determines the Japanese phrase or phrase of the obtained phrase sequence. It is to provide a language processing method capable of calculating the eligibility of a sentence significantly more efficiently than the enumeration method.

［問題点を解決するための手段］（ａ）文節を単位とした日本語の構造本発明の構成について説明するにあたって、先ず文節を
単位とした日本語の構造について述べる。[Means for Solving Problems] (a) Japanese Structure in Phrase Units In describing the configuration of the present invention, first, a Japanese structure in bunsetsu units will be described.

日本語の文、あるいはまとまった句は、文節という単位
の間の広義の修飾関係によって成り立っていると考える
ことができる。例えば、［S1」「私は７時の電車で会社に行きます」という日本語の文において、「私は」、「７時の」、
「電車で」、「会社に」、「行きます」、はそれぞれ文
節であり、「私は」、「電車で」、「会社に」は、すべ
て「行きます」を修飾し、「７時の」は「電車で」を修
飾することにより一つのまとまった文を構成している。A Japanese sentence, or a set of phrases, can be thought of as being formed by a broadly modified relation between units called bunsetsu. For example, in the Japanese sentence [S1] "I go to the office by train at 7 o'clock", "I am", "7 o'clock",
"By train", "To the company", and "I'm going" are phrases, and "I", "By train", and "To the company" are all modified to "Go", and "Constitutes a single sentence by modifying" by train ".

文節ｘが文節ｙを修飾するとき、ｘはｙに係り、ｙはｘ
を受けるという。また、このような修飾関係を係り受け
という。When clause x modifies clause y, x is related to y and y is x
To receive. In addition, such a modification relationship is called dependency.

文節列が日本語のまとまった句、あるいは文を構成する
ためには、それらの文節間に、次のような条件を満たす
係り受けが存在することが必要であると考えられてい
る。In order for a bunsetsu sequence to form a set of phrases or sentences in Japanese, it is considered that there is a dependency between those bunsetsu that satisfies the following conditions.

［C1］最後の文節以外の文節は、それより文末側にある
文節のいずれか一つに係る。[C1] The clauses other than the last clause relate to any one of the clauses on the end side of the clause.

［C2］二つの文節間の係り受けは、他の二つの文節間の
係り受けと交差しない。[C2] Dependency between two bunsetsu does not intersect with dependency between two other bunsetsu.

条件［C1］，［C2］は、つぎのように定義される、「構
文」によって表わすことができる。The conditions [C1] and [C2] can be expressed by a "syntax" defined as follows.

［D1］（１）ｘが文節のとき、［ｘ］は「構文」であ
る。[D1] (1) When x is a clause, [x] is "syntax".

（２）X₁,X₂,…,Xmが「構文」、ｘが文節のとき、［X₁X₂…XmX］は「構文」である。(2) When X ₁ , X ₂ , ..., Xm are “syntax” and x is a clause, [X ₁ X ₂ ... XmX] is “syntax”.

［D2］文節列x₁x₂…xnに適切に括弧を付け、構文になる
ようにしたものを、x₁x₂,…xn上の構文という。文節列x
₁x₂…xn上の構文の全体をＫ（x₁x₂…xn）と表わすことにする。[D2] The phrase sequence x ₁ x ₂ ... xn with proper parenthesis to obtain the syntax is called the syntax on x ₁ x ₂ , ... xn. Clause sequence x
The overall syntax on ₁ x ₂ ... xn to be expressed as _{_{K (x 1 x 2 ... xn}} ).

構文［X₁X₂…Xmx］,X₁＝［…x₁］,X₂＝［…x₂］，…,Xm
＝［…xm］において、x₁,x₂,…,xmはｘに係ることを表
わすと約束しておくと、上の意味での構文においては、
条件［C1］と［C2］が満たされ、逆に、条件［C1］と
［C2］を満たす文節列における係り受け関係は、必ず上
の意味での構文で表わすことができる。Syntax [X ₁ X ₂ ... Xmx], X ₁ = [... x ₁ ], X ₂ = [... x ₂ ], ..., Xm
Assuming that x = ₁ , ... x ₂ , ..., xm in x = [... xm] is related to x, the syntax in the above sense is
Dependency relations in a clause sequence satisfying the conditions [C1] and [C2] and the conditions [C1] and [C2] can be expressed by the syntax in the above meaning.

さて、一つの文節列に対して、その上の構文は複数個存
在する。例えば、［［私は］［７時の］［電車で］［会社に］行
きます］［［［［［私は］７時の］電車で］会社に］行
きます］［［［私は］［７時の］電車で］［会社に］行
きます］［［［私は］［［７時の］電車で］会社に］行
きます］［［私は］［［７時の］電車で］［会社に］行
きます］などは全て例文［S1］の文節列上の構文である。このよ
うな多くの構文の中から、適格性の高い文節列を選択す
るためには、なんらかの評価関数が必要である。そこ
で、先ず係り受けの整合度が次のように定められるもの
とする。Now, for one clause sequence, there are multiple syntaxes above it. For example, [[I] [7 o'clock] [by train] [company] go] [[[[[I] 7 o'clock] by train] go] [] [[[i] [7 o'clock by train] [go to work] [] [[I] [[7] go by train] go] [[I] [[7] go by train] [Go to company], etc. are all the syntax on the clause sequence of the example sentence [S1]. Some kind of evaluation function is necessary to select a highly qualified phrase sequence from among many such syntaxes. Therefore, it is assumed that the degree of dependency matching is determined as follows.

［C3］文節x₁,x₂,…,xmが、文節ｘにこの順序で係るこ
との整合度は非負の値をとる関数 PEN（x₁,x₂,…,xm,x）で表わされる。[C3] Consistency that clauses x ₁ , x ₂ ,, ..., xm relate to clause x in this order is represented by a non-negative function PEN (x ₁ , x ₂ ,, ..., xm, x) .

PENの値は、０に近いほど整合度が高いと約束してお
く。関数PENをどのように定めるかは、非常に重要な問
題であるが、これは本発明の主眼点ではないので説明を
省く。以上の準備のもとで、構文ｘの適格度Ｐ（Ｘ）
を、次のように再帰的に定める。We promise that the closer the PEN value is to 0, the higher the degree of matching. How to define the function PEN is a very important issue, but this is not the main point of the present invention, and therefore its explanation is omitted. With the above preparations, the eligibility P (X) of the syntax x
Is recursively determined as follows.

［D3］（１）Ｘ＝［ｘ］，（ｘは文節）のとき、Ｐ（Ｘ）＝0, （２）Ｘ＝［X₁X₂…Xmx］,X₁＝［…x₁］,X₂＝［…
x₂］，…,Xm＝［…xm］のとき、Ｐ（Ｘ）＝Ｐ（X₁）＋Ｐ（X₂）＋…＋Ｐ（Xm）＋PEN（x
₁,x₂,…,xm,x）このように定義されたＰ（Ｘ）の値は、Ｘの中のあらゆ
る係り受けに対するPENの値を加算したものになってい
る。[D3] (1) X = [x], (x is clauses) When, P (X) = 0, (2) X = [X 1 X 2 ... Xmx], X 1 = [... x 1], X ₂ = […
When x ₂ ], ..., Xm = [... xm], P (X) = P (X ₁ ) + P (X ₂ ) + ... + P (Xm) + PEN (x
₁ , x ₂ , ..., xm, x) The value of P (X) thus defined is the sum of the PEN values for all the dependencies in X.

後の説明に用いるため、若干を記法を用意しておく。Some notation is prepared for use in later explanation.

［D4］（１）整数列i,i＋1,,…,jのｍ分割とは、ｉ−１＝s₀＜s₁＜s₂＜…＜sm＝ｊを満たす整数の組（s₀,s₁,s₂,…,sm）をいう。[D4] (1) integer column i, i + 1 ,, ..., the m division of _{j, i-1 = s 0} <s 1 <s 2 <... < integer pairs satisfying sm = j (s _0, s ₁ , s ₂ , ..., sm).

（２）整数列i,i＋1,1,…,jのｍ分割の全体をDm（i,j）
と書く： Dm（i,j）＝｛（s₀,s₁,s₂,…sm）|i−１＝s₀＜s₁＜s₂＜
…＜sm＝ｊ｝（３）整数列i,i＋1,,…,jの分割の全体Ｄ（i,j）を、
次のように定義する。(2) Dm (i, j) is the entire m division of the integer sequence i, i + 1,1, ..., j
Write: Dm (i, j) = {(s ₀ , s ₁ , s ₂ , ... sm) | i-1 = s ₀ <s ₁ <s ₂ <
… <Sm = j} (3) The whole division D (i, j) of the integer sequence i, i + 1,
It is defined as follows.

Ｄ（i,j）＝∪（１≦ｍ≦ｊ−ｉ＋１）［Dm（i,j）］［D5］文節集合列A₁,A₂,…,Amに対して KB（A₁A₂…Am）＝｛X|X∈Ｋ（x₁x₂…xm）,x₁∈A₁,x₂∈A₂,…,xm∈Am｝（ｂ）問題の設定以下では長い数式を読みやすくするための代りに Σ（ｉ≦ｍ≦ｊ）［ｆ（ｍ）］という記法を用いる。min,argmin,∪などについても同
様に記す。括弧［、］は混乱のおそれがなければ省略す
ることもある。D (i, j) = ∪ (1 ≦ m ≦ j−i + 1) [Dm (i, j)] [D5] Phrase set sequence A ₁ , A ₂ , ..., Am to KB (A ₁ A ₂ ... Am) = {X | X ∈ K (x ₁ x ₂ … xm), x ₁ ∈ A ₁ , x ₂ ∈ A ₂ ,…, xm ∈ Am} (b) Problem setting In the following, make long mathematical expressions easy to read. For Instead of, the notation Σ (i ≦ m ≦ j) [f (m)] is used. The same applies to min, argmin and ∪. Brackets [,] may be omitted if there is no risk of confusion.

さて、次の状況を考える：［J1］１から自然数Ｎまでの表音記号位置を考え、各i,
j（１≦ｉ≦ｊ≦Ｎ）に対してｉを表音表記始端位置、
ｊを表音表記終端位置とする文節の集合Ｂ（i,j）が与
えられている。また、各文節ｘに対して、非負の実数値
Ｓ（ｘ）が定められている。Now consider the following situation: [J1] Consider the phonetic symbol positions from 1 to a natural number N, for each i,
For j (1 ≦ i ≦ j ≦ N), i is the phonetic notation start position,
A set B (i, j) of clauses having j as a phonetic transcription end position is given. In addition, a non-negative real value S (x) is defined for each clause x.

同じ文節でも表音表記始端位置あるいは表音表記終端位
置が異なれば、別の文節として取り扱う。If the phonetic transcription start position or the phonetic transcription end position is different even in the same phrase, it is treated as a different phrase.

上記のＢ（i,j），（１≦ｉ≦ｊ≦Ｎ）全体を文節ラテ
ィスという。べた書き入力仮名漢字変換方式日本語ワー
ドプロセッサを例にとると、仮名文字列a₁a₂…a_Nが与え
られたとき、Ｂ（i,j）はその部分仮名文字列a₁ai₊₁…a
jを仮名表記として持つ文節の全体である。Ｓ（ｘ）
は、単語の使用頻度などの情報から定まる、文節ｘの確
実度を表わす数値で、０に近いほど確実度が高いとして
おく。また、連続音声認識を例にとれば、Ｂ（i,j）は
表音記号位置i,jをそれぞれ表音表記始端位置、表音表
記終端位置とする区間の認識結果候補として装置が出力
する文節の集合である。この場合、Ｓ（ｘ）は認識装置
が、ｘという認識結果をどの程度の確からしさで認識し
たかという確実度を示す数値であり、たいていの音声認
識装置はそのような数値を認識結果と共に出力するよう
になっている。いずれの場合も、表音記号位置i,jを始
端，終端とする文節が存在しないことがあるので、Ｂ
（i,j）は空集合になりうる。この場合も特別な取扱い
をしなくて済むように、Ｂ（i,j）が空集合のときはダ
ミー文節を加えておき、ダミー文節に対するＳの値は∞
と約束しておく。また、x₁,x₂,…,xm,xの中の少なくと
も一つがダミー文節のとき、PEN（x₁,x₂,…,xm,x）の値
も∞と約束しておく。The entire B (i, j) and (1 ≦ i ≦ j ≦ N) described above is referred to as a phrase lattice. Taking the example of a Japanese word processor for solid input kana-kanji conversion method, given a kana character string a ₁ a ₂ ... a _N , B (i, j) is the partial kana character string a ₁ ai ₊₁ ... a
It is the whole clause that has j as a kana notation. S (x)
Is a numerical value that represents the certainty of the phrase x, which is determined from information such as the frequency of use of words. The closer to zero, the higher the certainty. Taking continuous speech recognition as an example, B (i, j) is output by the device as a recognition result candidate for a section in which the phonetic symbol position i, j is the phonetic transcription starting end position and the phonetic transcription end position, respectively. It is a set of clauses. In this case, S (x) is a numerical value indicating the degree of certainty with which the recognition device has recognized the recognition result x, and most speech recognition devices output such a numerical value together with the recognition result. It is supposed to do. In either case, since there may be no clause starting and ending at the phonetic symbol positions i, j, B
(I, j) can be an empty set. Also in this case, when B (i, j) is an empty set, a dummy clause is added so that no special handling is required, and the value of S for the dummy clause is ∞.
I promise you. Also, when at least one of x ₁ , x ₂ , ..., xm, x is a dummy clause, the value of PEN (x ₁ , x ₂ , ..., xm, x) is also promised to be ∞.

また、Ｓを構文にも適用できるよう拡張しておく。すな
わち、Ｘ∈Ｋ（xixi₊₁…xj）に対してＳ（Ｘ）＝Σ（ｉ≦ｍ≦ｊ）［Ｓ（xm）］と定義する。In addition, S is expanded so that it can be applied to the syntax. That is, S (X) = Σ (i ≦ m ≦ j) [S (xm)] is defined for XεK (xixi ₊₁ ... Xj).

このような状況のもとで、本発明が取り扱う問題は次の
ように述べることができる。表音記号位置1,2,…,Nを固
定し、その分割（s₀,s₁,s₂,…,sm）∈Ｄ（1,N）を一つ選ぶ。この分割に対応して、文節の集合列Ｂ（s₀＋1,s₁,）,B（s₁＋1,s₂），…,B（sm_-1＋1,sm）が定まる。各文節集合Ｂ（sk_-1＋1,sk）から文節xkを一
つずつ選ぶと、その上の構文の全体Ｋ（x₁x₂…xm）が定まる。Ｋ（x₁x₂…xm）の中から構文Ｘを一つ選ぶ
と、その適格度と確実度の和Ｐ（Ｘ）＋Ｓ（Ｘ）が定まる。そこで、上記分割、文節、および構文を可能
な範囲で全て動かし、Ｐ（Ｘ）＋Ｓ（Ｘ）を最小にする
ような分割、文節、および構文を選択する。すなわち、 min（（s₀,s₁,s₂,…,sm）∈Ｄ（1,N））［min（x₁∈Ｂ
（s₀＋1,s₁）,x₂∈Ｂ（s₁＋1,s₂）．…,xm∈Ｂ（sm_-1＋
1,sm））［min（Ｘ∈Ｋ（x₁x₂…xm））［Ｐ（Ｘ）＋Ｓ
（Ｘ）］］］を達成するような各変数の値と、それに対する最小値を
求めるのが、ここでの問題である。Under such circumstances, the problem addressed by the present invention can be stated as follows. The phonetic symbol positions 1, 2, ..., N are fixed, and one of the divisions (s ₀ , s ₁ , s ₂ , ..., Sm) ∈ D (1, N) is selected. Corresponding to this division, the set sequence B (s ₀ +1, s ₁ ,), B (s ₁ +1, s ₂ ), ..., B (sm _-1 +1, sm) is determined. When the clause xk is selected one by one from each clause set B (sk ₋₁ +1, sk), the entire syntax K (x ₁ x ₂ ... xm) is determined. If one syntax X is selected from K (x ₁ x ₂ ... xm), the sum P (X) + S (X) of its eligibility and certainty is determined. Therefore, all the divisions, clauses, and syntaxes are moved within a possible range, and the divisions, clauses, and syntaxes that minimize P (X) + S (X) are selected. That is, min ((s ₀ , s ₁ , s ₂ , ..., sm) ∈ D (1, N)) [min (x ₁ ∈B
(S ₀ + 1, s ₁ ), x ₂ ∈B (s ₁ + 1, s ₂ ). …, Xm ∈ B (sm _-1 +
1, sm)) [min (X∈K (x ₁ x ₂ ... xm)) [P (X) + S
The problem here is to find the value of each variable that achieves (X)]]] and its minimum value.

最適文節列を選ぶためには、文節列の日本語の文あるい
は句としての適格性を考慮しなければならないので、結
局は上記のように最適な構文をも求める問題になる。逆
に最適な構文が求まれば、それを構成する文節列は定ま
るので、 min（（s₀,s₁,s₂,…,sm）∈Ｄ（1,N）［min（x₁∈Ｂ（s
₀＋1,s₁）,x₂∈Ｂ（s₁＋1,s₂），…,xm∈Ｂ（sm_-1＋1,s
m））［min（Ｘ∈Ｋ（x₁x₂…xm））［Ｐ（Ｘ）＋Ｓ
（Ｘ）］］］＝min（Ｘ∈∪（（s₀,s₁,s₂,…,sm）∈Ｄ（1,N））［KB
（Ｂ（s₀,＋1,s₁）,B（s₁＋1,s₂），…,B（sm_-1＋1,s
m））］）［Ｐ（Ｘ）＋Ｓ（Ｘ）］に注意して、上の問題を、次にように構文を変数とする
問題として書き直す。In order to select the optimal bunsetsu sequence, it is necessary to consider the eligibility of the bunsetsu sequence as a Japanese sentence or phrase. Therefore, the problem is to obtain an optimal syntax as described above. On the other hand, if the optimum syntax is found, the clause sequences that compose it are determined, so min ((s ₀ , s ₁ , s ₂ , ..., sm) ∈ D (1, N) [min (x ₁ ∈B (S
₀ + 1, s ₁ ), x ₂ ∈ B (s ₁ + 1, s ₂ ), ..., xm ∈ B (sm ₋₁ + 1, s
m)) [min (X∈K ( x 1 x 2 ... xm)) [P (X) + S
(X)]]] = min (X∈∪ ((s ₀ , s ₁ , s ₂ , ..., sm) ∈D (1, N)) [KB
(B (s ₀ , + 1, s ₁ ), B (s ₁ + 1, s ₂ ), ..., B (sm _-1 + 1, s
m))]) [P (X) + S (X)] and rewrite the above problem as a problem with syntax as follows:

［P1］次の二つのものを求めよ。[P1] Find the following two things.

（１）min（Ｘ∈∪（（s₀,s₁,s₂,…,sm）∈Ｄ（1,N）
［KB（Ｂ（s₀＋1,s₁）,B（s₁＋1,s₂），…,B（sm_-1＋1,
sm））］）［Ｐ（Ｘ）＋Ｓ（Ｘ）］（２）argmin（Ｘ∈∪（（s₀,s₁,s₂,…,sm）∈Ｄ（1,
N）［KB（Ｂ（s₀＋1,s₁）,B（s₁＋1,s₂），…,B（sm_-1
＋1,sm））］）［PX）＋Ｓ（Ｘ）］従来、この問題を解こうとすれば、枚挙法、すなわち、
集合 ∪（（s₀,s₁,s₂,…,sm）∈Ｄ（1,N））［KB（Ｂ（s₀＋
1,s₁）,B（s₁＋1,s₂），…,B（sm_-1＋1,sm））］の全ての元Ｘに対してＰ（Ｘ）＋Ｓ（Ｘ）を逐一計算
し、最小値を与えるＸとそれに対する最小値を求めなけ
ればならなかった。しかし、第１表に示すように、この
集合の元の数は、Ｎと共に急速に増加し、たちまち膨大
なものになるので、枚挙法を実際問題に適用することは
不可能であった。(1) min (X ∈ ∪ ((s ₀ , s ₁ , s ₂ , ..., sm) ∈ D (1, N)
[KB (B (s ₀ + 1, s ₁ ), B (s ₁ + 1, s ₂ ), ..., B (sm _-1 +1,
sm))]) [P (X) + S (X)] (2) argmin (X∈∪ ((s ₀ , s ₁ , s ₂ , ..., sm) ∈ D (1,
N) [KB (B (s ₀ + 1, s ₁ ), B (s ₁ + 1, s ₂ ), ..., B (sm _-1
+ 1, sm))]) [PX) + S (X)] Conventionally, the solution to this problem is enumeration, that is,
Set ∪ ((s ₀ , s ₁ , s ₂ , ..., sm) ∈ D (1, N)) [KB (B (s ₀ +
1, s ₁ ), B (s ₁ + 1, s ₂ ), ..., B (sm ₋₁ + 1, sm))] for every element X of P (X) + S (X), It was necessary to find X that gives the minimum value and the minimum value for it. However, as shown in Table 1, the original number of this set rapidly increased with N and became enormous, and it was impossible to apply the enumeration method to practical problems.

集合∪（（s₀,s₁,…,sm）∈Ｄ（1,N）［KB（Ｂ（s₀＋1,
s₁）,B（s₁＋1,s₂），…,B（sm_-1＋1,sm））］の元の数、ただし、任意のi,j（１≦ｉ≦ｊ≦Ｎ）に対
してＢ（i,j）の元の数をＭとした。 Set ∪ ((s ₀ , s ₁ , ..., sm) ∈ D (1, N) [KB (B (s ₀ +1,
s ₁ ), B (s ₁ + 1, s ₂ ), ..., B (sm ₋₁ + 1, sm))] for any i, j (1 ≦ i ≦ j ≦ N) Let M be the original number of B (i, j).

本発明によれば、このような従来技術の欠点を改善し、
枚挙法と比較して格段に計算量の少ない言語処理方法を
提供することができる。According to the present invention, such drawbacks of the prior art are remedied,
It is possible to provide a language processing method that requires a significantly smaller amount of calculation than the enumeration method.

関数PENが PEN（x₁,x₂,…,xm,x）＝PEN（x₁,x）＋PEN（x₂x）＋…＋PEN（xm,x）と二文節間の係り受けの整合度の和で表わすことができ
る場合については、前記の従来法によってこの問題を効
率良く解くことができる。これに対して、本発明は、関
数PENがこのような和に分解できない場合を対称とする
ものである。The function PEN is PEN (x ₁ , x ₂ ,, ..., xm, x) = PEN (x ₁ , x) + PEN (x ₂ x) + ... + PEN (xm, x) When it can be expressed as a sum, this problem can be solved efficiently by the above-mentioned conventional method. On the other hand, the present invention is symmetric when the function PEN cannot be decomposed into such a sum.

（ｃ）再帰方程式本発明の構成について説明するに当たり、本発明におい
て基本的な役割を果たす再帰方程式について述べる。ま
ず、次の定義を設ける。(C) Recursive Equation In describing the configuration of the present invention, a recursive equation that plays a basic role in the present invention will be described. First, the following definitions are provided.

［D6］（１）Ｂ（i,j）＝∪（ｉ≦ｍ≦ｊ）［Ｂ（m,
j）］（２）ｘ∈Ｂ（i,j）に対して、INIT（ｘ）をｘ∈Ｂ
（m,j）のときINIT（ｘ）＝ｍと定める。[D6] (1) B (i, j) = ∪ (i ≦ m ≦ j) [B (m,
j)] (2) For xεB (i, j), set INIT (x) to xεB
When (m, j), INIT (x) = m.

［D7］自然数Ｎを固定し、１≦ｉ≦ｊ≦N,x∈Ｂ（i,j）
に対して（１）OPT（i,j;x）＝min（Ｘ∈∪（（s₀,s₁,s₂,…,s
p）∈Ｄ（i,INIT（ｘ）−１））［KB（Ｂ（s₀＋1,s₁）,
B（s₁＋1,s₂），…,B（sp_-1＋1,sp），｛ｘ｝）］）
［Ｐ（Ｘ）＋Ｓ（Ｘ）］（２）OPK（i,j;x）＝argmin（Ｘ∈∪（（s₀,s₁,s₂,…,
sp）∈Ｄ（i,INIT（ｘ）−１））［KB（Ｂ（s₀＋1,
s₁）,B（s₁＋1,s₂），…,B（sp_-1＋1,sp），
｛ｘ｝）］）［Ｐ（Ｘ）＋Ｓ（Ｘ）］上記（１），（２）において、INIT（ｘ）＝ｉのときは
Ｄ（i,INIT（ｘ）−１）が定義されていないが、この場
合には ∪（（s₀,s₁,s₂,…,sp）∈Ｄ（i,INIT（ｘ）−１））
［KB（Ｂ（s₀＋1,s₁）,B（s₁＋1,s₂），…,B（sp_-1＋1,
sp），｛ｘ｝）］＝KB（｛ｘ｝）と約束しておく。また、Ｐ（Ｘ）＋Ｓ（Ｘ）を最小にす
るＸは一般に複数個あるので、OPK（i,j;x）は重合とな
る。[D7] With the natural number N fixed, 1 ≦ i ≦ j ≦ N, x ∈ B (i, j)
For (1) OPT (i, j; x) = min (X ∈ ∪ ((s ₀ , s ₁ , s ₂ , ..., s
p) εD (i, INIT (x) −1)) [KB (B (s ₀ + 1, s ₁ ),
B (s ₁ + 1, s ₂ ), ..., B (sp _-1 + 1, sp), {x})])
[P (X) + S (X)] (2) OPK (i, j; x) = argmin (Xε∪ ((s ₀ , s ₁ , s ₂ , ...,
sp) ∈ D (i, INIT (x) -1)) [KB (B (s ₀ +1,
s ₁ ), B (s ₁ + 1, s ₂ ), ..., B (sp ₋₁ + 1, sp),
{X})]) [P (X) + S (X)] In the above (1) and (2), when INIT (x) = i, D (i, INIT (x) -1) is defined. However, in this case ∪ ((s ₀ , s ₁ ,, s ₂ ,, ..., sp) ∈ D (i, INIT (x) -1))
[KB (B (s ₀ + 1, s ₁ ), B (s ₁ + 1, s ₂ ), ..., B (sp _-1 +1,
sp), {x})] = KB ({x}). Also, since there are generally a plurality of Xs that minimize P (X) + S (X), OPK (i, j; x) is a polymerization.

このように定義されたOPTとOPKに関して、次の再帰方程
式が成り立つ。For OPT and OPK defined in this way, the following recurrence equation holds.

［E1］ｘ∈Ｂ（i,j）に対して、（１）INIT（ｘ）＝ｉのとき OPT（i,j;x）＝Ｓ（ｘ）（２）INIT（ｘ）＞ｉのとき OPT（i,j;x）＝min（（s₀,s₁,s₂,…,sm）∈Ｄ（i,INIT
（ｘ）−１）,x₁∈Ｂ（s₀＋1,s₁）,x₂∈Ｂ（s₁＋1,
s₂），…,xm∈Ｂ（sm_-1＋1,sm））［OPT（s₀＋1,s₁;
x₁）＋OPT（s₁＋1,s₂;x）＋…＋OPT（sm_-1＋1,sm;xm）
＋PEN（x₁,x₂,…,xm,x）］＋Ｓ（ｘ）［E2］ｘ∈Ｂ（i,j）に対して、（１）INIT（ｘ）＝ｉのとき OPK（i,j;x）＝［Ｘ］（２）INIT（ｘ）＞ｉのとき、［E1］の（２）で最小値
を与えるs₀,s₁,s₂,…,sm,x₁,x₂,…,xmをそれぞれs₀,s₁,
s₂,…,sm,x₁,x₂,…,xmとするとき OPK（i,j;x）＝［OPK（s₀＋1,s₁;x₁）OPK（s₁＋1,s₂;
x₂）…OPK（sm_-1＋1,sm;xm）ｘ］［E1］と［E2］は次の［E3］が成り立つことに注意すれ
ば容易に証明できるので詳細な説明は省略する。For [E1] xεB (i, j), (1) when INIT (x) = i OPT (i, j; x) = S (x) (2) when INIT (x)> i OPT (i, j; x) = min ((s ₀ , s ₁ , s ₂ , ..., sm) ∈ D (i, INIT
(X) −1), x ₁ ∈B (s ₀ + 1, s ₁ ), x ₂ ∈B (s ₁ +1,
s ₂ ), ..., xm ∈ B (sm ₋₁ + 1, sm)) [OPT (s ₀ + 1, s ₁ ;
x ₁ ) ＋ OPT (s ₁ + 1, s ₂ ; x) ＋… ＋ OPT (sm _-1 + 1, sm; xm)
_{_{+ PEN (x 1, x 2}} , ..., xm, x)] + S (x) [E2] x∈B (i, j) with respect to, (1) INIT (x) = when i OPK (i, j ; x) = [X] (2) When INIT (x)> i, the minimum value is given in (2) of [E1] s ₀ , s ₁ , s ₂ , ..., sm, x ₁ , x ₂ , …, Xm respectively s ₀ , s ₁ ,
When s ₂ , ..., sm, x ₁ , x ₂ , ..., xm OPK (i, j; x) = [OPK (s ₀ + 1, s ₁ ; x ₁ ) OPK (s ₁ + 1, s ₂ ;
x ₂ ) ... OPK (sm ₋₁ + 1, sm; xm) x] [E1] and [E2] can be easily proved by noting that the following [E3] holds, so detailed description will be omitted.

［E3］ ∪（（s₀,s₁,s₂,…,sp）∈Ｄ（i,INIT（ｘ）−１）｝
［KB（Ｂ（s₀＋1,s₁）,B（s₁＋1,s₂），…,B（sp_-1＋1,
sp），｛ｘ｝）］＝∪（（t₀,t₁,t₂,…,tm）∈Ｄ（i,INIT（ｘ）−１）,x
₁∈Ｂ（t₀＋1,t₁）,x₂∈Ｂ（t₁＋1,t₂），…,xm∈Ｂ（t
m_-1＋1,tm））｛［X₁X₂…Xmx］｜ x₁∈∪（（ｓ_1,0,s_1,1,s_1,2，…,s_1,p1）∈Ｄ（t₀＋1,I
NIT（x₁）−１））［KB（Ｂ（ｓ_1,0＋1,s_1,1）,B（ｓ
_1,1＋1,s_1,2），…,B（ｓ_1,p1-1＋1,s_1,p1），
｛x₁｝）］， x₂∈∪（（ｓ_2,0,s_2,1,s_2,2，…,s_2,p2）∈Ｄ（t₁＋1,I
NIT（x₂）−１））［KB（Ｂ（ｓ_2,0＋1,s_2,1）,B（ｓ
_2,1＋1,s_2,2），…,B（ｓ_2,p2-1＋1,s_2,p2），
｛x₂｝）］，…， xm∈∪（（ｓ_m,0,s_m,1,s_m,2，…,s_m,pm）∈Ｄ（tm_-1＋
1,INIT（xm）−１））［KB（Ｂ（ｓ_m,0＋1,s_m,1）,B
（Ｓ_m,1＋1,s_m,2），…,B（ｓ_m,pm-1＋1,s_m,pm），｛x
m｝）］｝さて、構文［X₁X₂…Xmx］,X₁＝［…x₁］,X₂＝［…
x₂］，…,Xm＝［…xm］においては、x₁,x₂,…,xmがｘに
係る。すなわち、ｘはx₁,x₂,…,xmを受ける訳である
が、一つの文節が受ける文節の数には一定の上限がある
と考えてよい。この上限をＬとすると、［E1］において
は、１≦ｍ≦Ｌの範囲で最小化をすればよい。このような制限を設ける
と、［E1］は次のようになる。[E3] ∪ ((s ₀ , s ₁ , s ₂ , ..., sp) ∈ D (i, INIT (x) -1)}
[KB (B (s ₀ + 1, s ₁ ), B (s ₁ + 1, s ₂ ), ..., B (sp _-1 +1,
sp), {x})] = ∪ ((t ₀ , t ₁ , t ₂ , ..., tm) ∈ D (i, INIT (x) -1), x
₁ ∈ B (t ₀ + 1, t ₁ ), x ₂ ∈ B (t ₁ + 1, t ₂ ), ..., xm ∈ B (t
m ₋₁ + 1, tm)) {[X ₁ X ₂ … Xmx] | x ₁ ∈ ∪ ((s _1,0 , s _1,1 , s _1,2 , ..., s _{1, p1} ) ∈ D (t ₀ + 1, I
NIT (x ₁ ) -1)) [KB (B (s _1,0 + 1, s _1,1 ), B (s
_{_{1,1 + 1, s 1,2),}} ..., B (s 1, p1-1 + 1, s 1, p1),
{X ₁ })], x ₂ ∈∪ ((s _2,0 , s _2,1 , s _2,2 , ..., s _{2, p 2} ) D (t ₁ + 1, I
_{NIT (x 2) -1))} [KB (B (s 2,0 + 1, s 2,1), B (s
_{_{2,1 + 1, s 2,2),}} ..., B (s 2, p2-1 + 1, s 2, p2),
_{{X 2})], ...} , xm∈∪ ((s m, 0, s m, 1, s m, 2, ..., s m, pm) ∈D (tm -1 +
1, INIT (xm) -1)) [KB (B (s _{m, 0} + 1, s _{m, 1} ), B
( _{Sm, 1} + 1, sm _{, 2} ), ..., B (sm _{, pm-1} + 1, sm _{, pm} ), {x
m})]} Now, the syntax [X ₁ X ₂ … Xmx], X ₁ = [… x ₁ ], X ₂ ＝ […
In x ₂ ], ..., Xm = [... xm], x ₁ , x ₂ , ..., Xm relate to x. That is, x receives x ₁ , x ₂ , ..., Xm, but it can be considered that there is a certain upper limit to the number of clauses that one clause receives. When this upper limit is L, in [E1], the minimization may be performed within the range of 1 ≦ m ≦ L. With such restrictions, [E1] is as follows.

［E1′］ｘ∈Ｂ（i,j）に対して、（１）INIT（ｘ）＝ｉのとき OPT（i,j;x）＝ｓ（ｘ）（２）INIT（ｘ）＞ｉのとき OPT（i,j;x）＝min（１≦ｍ≦L,（s₀,s₁,s₂…,sm）∈
（i,INIT（ｘ）−１）,x₁∈Ｂ（s₀＋1,s₁）,x₂∈Ｂ（_１
＋1,s₂），…,xm∈Ｂ（sm_-1＋1,sm））［OPT（s₀＋1,
s₁;x₁）＋OPT（s₁＋1,s₂;x₂）＋…＋OPT（sm_-1＋,sm;x
m）＋PEN（x₁,x₂,…,xm,x）］＋Ｓ（ｘ）このような制限を設けても、［E2］には影響はない。Ｎ
≦Ｌとすれば［E1′］は［E1］と同一となる。すなわ
ち、［E1′］は［E1］を特別の場合として含む。従っ
て、以後は［E1′］を用いて説明を進める。For [E1 ′] xεB (i, j), (1) when INIT (x) = i OPT (i, j; x) = s (x) (2) INIT (x)> i When OPT (i, j; x) = min (1 ≦ m ≦ L, (s ₀ , s ₁ , s ₂ …, sm) ∈
(I, INIT (x) −1), x ₁ εB (s ₀ + 1, s ₁ ), x ₂ εB ( ₁
+ 1, s ₂ ), ..., xm ∈ B (sm ₋₁ + 1, sm)) [OPT (s ₀ +1,
s ₁ ; x ₁ ) ＋ OPT (s ₁ + 1, s ₂ ; x ₂ ) ＋… ＋ OPT (sm _-1 ＋, sm; x
_{m) + PEN (x 1,} x 2, ..., xm, x)] + S (x) be provided such limitations, there is no effect on the [E2]. N
If ≤L, [E1 '] is the same as [E1]. That is, [E1 '] includes [E1] as a special case. Therefore, hereinafter, the description will be made using [E1 ′].

［E1′］（２）において、最小値を与えるs₁,s₂,…,sm
をi,j,xに対する最適区分点、またx₁,x₂,…,xmをそれら
の区分点における最適文節と呼ぶことにする。In [E1 ′] (2), s ₁ , s ₂ , ..., Sm giving the minimum value
Are called optimal segment points for i, j, x, and x ₁ , x ₂ , ..., xm are optimal segments at these segment points.

（ｄ）OPT、最適区分点、および最適文節の決定法［E1′］の（２）は、INIT（ｘ）＞ｉのとき、OPT（s,
t,y）（ｉ≦ｓ≦ｔ≦j,y∈Ｂ（s,t））が既に計算され
ていれば、それらの値を用いてOPT（i,j;x）が計算でき
ることを示している。また、INIT（ｘ）＝ｉのときに
は、［E1′］の（１）を用いると、OPT（i,j;x）＝Ｓ
（ｘ）としてOPT（i,j;x）の値が定まる。これらの事実
を用いると、OPT（i,j;x）の値をｊ−ｉが０の部分から
始めて、順次ｊ−ｉがより大きい部分へと計算を進め、
それと同時に最適区分点と最適文節の組を決定して行く
ことができる。OPT（1,N;x）、（ｘ∈Ｂ（1,N））が計
算されたとき、OPT、最適区分点、および最適文節番号
の組の計算が終了する。(D) Method for determining OPT, optimal segmentation point, and optimal clause (2) of [E1 '] is OPT (s, when INIT (x)> i.
t, y) (i ≤ s ≤ t ≤ j, y ∈ B (s, t)) has already been calculated, we show that OPT (i, j; x) can be calculated using those values. There is. When INIT (x) = i, using (1) of [E1 ′], OPT (i, j; x) = S
The value of OPT (i, j; x) is determined as (x). Using these facts, the value of OPT (i, j; x) is calculated starting from the part where j−i is 0 and proceeding to the part where j−i is larger,
At the same time, it is possible to determine the optimum segment point and optimum phrase set. When OPT (1, N; x) and (xεB (1, N)) have been calculated, the calculation of the set of OPT, the optimal segment point, and the optimal clause number ends.

（ｅ）最適構文の計算法簡単のため、最適区分点と最適文節番号の組が常に一意
的に定まる場合について説明する。このとき、OPK（i,
j;x）（１≦ｉ≦ｊ≦N,x∈Ｂ（i,j））はただ一つの構
文に等しい。(E) Optimal Syntax Calculation Method For simplicity, a case will be described in which the set of optimal segment points and optimal clause numbers is always uniquely determined. At this time, OPK (i,
j; x) (1 ≦ i ≦ j ≦ N, xεB (i, j)) is equal to only one syntax.

先ず、 min（Ｘ∈∪（（s₀,s₁,s₂,…,sm）∈Ｄ（1,N））［KB
（Ｂ（s₀＋1,s₁）,B（s₁＋1,s₂），…Ｂ（sm_-1＋1,s
m））］｝［Ｐ（Ｘ）＋Ｓ（Ｘ）］＝min（ｘ∈Ｂ（1,N））［OPT（1,N,x）］であるから、この右辺を計算することにより、最適な文
節列上の最適な構文に対する適格度と信頼度の和が計算
される。また、 x₀＝argmin（ｘ∈Ｂ（1,N））［OPT（1,N,x）］とすれば、最適文節列とその上の最適構文は、 OPK（1,N,x₀）で与えられる。これを更に具体的に計算するには次のよ
うにすればよい。First, min (X ∈ ∪ ((s ₀ , s ₁ , s ₂ , ..., sm) ∈ D (1, N)) [KB
_{(B (s 0 + 1,} s 1), B (s 1 + 1, s 2), ... B (sm -1 + 1, s
m))]} [P (X) + S (X)] = min (xεB (1, N)) [OPT (1, N, x)] Therefore, by calculating this right side, the optimum The sum of the eligibility and the reliability for the optimum syntax on the lexical sequence is calculated. If x ₀ = argmin (x ∈ B (1, N)) [OPT (1, N, x)], the optimal phrase sequence and the optimal syntax on it are OPK (1, N, x ₀ ) Given in. This can be calculated more concretely as follows.

もし、INIT（x₀）＝１ならば、［E2］の（１）によって OPK（1,N,x₀）＝［x₀］であるから、これにより最適構文が決定される。If INIT (x ₀ ) = 1, OPK (1, N, x ₀ ) = [x ₀ ] according to (1) of [E2], and thus the optimum syntax is determined.

INIT（x₀）≠１ならば、1,N,x₀対する最適区分点を s₁,s₂,…,sm 対応する最適文節をそれぞれ x₁,x₂,…,xm とすると、［E2］の（２）によって OPK（1,N,x₀）＝［OPK（1,s₁;x₁）OPK（s₁＋1,s₂;x₂）…OPK（sm_-1＋
1,sm;xm）x₀］が成り立つ。もし、INIT（x₁）＝１ならば、 OPK（1,s₁;x₁）＝［x₁］であるから OPK（1,N,x₀）＝［［x₁］OPK（s₁＋1,s;x₂）…OPK（sm_-1＋1,sm;xm）x
₀］また、INIT（x₁）≠１ならば1,s₁,x₁に対する最適区分
点をt₁,t₂,…,tpに対応する最適文節をそれぞれy₁,y₂,
…,ypとするとき OPK（1,s₁;x₁）＝［［OPK（1,t₁;y₁）OPK（t₁＋1,t₂;y₂）…OPK（tp_-1
＋1,tp;yp）x₁ が成り立つので、OPK（1,N,x₀）は次のように書き直す
ことができる。If INIT (x ₀ ) ≠ 1, the optimal segment points for 1, N, x ₀ are s ₁ ,, s ₂ , ,,, sm, and the corresponding optimal clauses are x ₁ , x ₂ , ,,, xm, respectively, [E2 ] (2) in OPK (1, N, x ₀ ) = [OPK (1, s ₁ ; x ₁ ) OPK (s ₁ + 1, s ₂ ; x ₂ ) ... OPK (sm _-1 +
1, sm; xm) x ₀ ] holds. If INIT (x ₁ ) = 1, OPK (1, s ₁ ; x ₁ ) = [x ₁ ], so OPK (1, N, x ₀ ) = [[x ₁ ] OPK (s ₁ +1 , s; x ₂ ) ... OPK (sm _-1 + 1, sm; xm) x
_0] In addition, INIT (x ₁₎ ≠ ₁ if _{_1,} s 1, t 1 an optimum segment point for x _1, t _2, ..., respectively optimum clause corresponding to tp y _1, y _2,
..., yp OPK (1, s ₁ ; x ₁ ) = [[OPK (1, t ₁ ; y ₁ ) OPK (t ₁ + 1, t ₂ ; y ₂ ) ... OPK (tp _-1
Since + 1, tp; yp) x ₁ holds, OPK (1, N, x ₀ ) can be rewritten as follows.

OPK（1,N,x₀）＝［［OPK（1,t₁;y₁）OPK（t₁＋1,t₂;y₂）…OPK（tp_-1
＋1,tp;yp）ｘ］₁OPK（s₁＋1,s₂;x₂）…OPK（sm_-1＋1,s
m;xm）x₀］このような操作を、現れるOPKがすべて唯一の文節から
構成される構文になるまで続ければ、OPK（1,N,x₀）、
すなわち最適文節列とその上の最適構文を同時に決定す
ることができる。OPK (1, N, x ₀ ) = [[OPK (1, t ₁ ; y ₁ ) OPK (t ₁ + 1, t ₂ ; y ₂ ) ... OPK (tp _-1
+ 1, tp; yp) x] ₁ OPK (s ₁ + 1, s ₂ ; x ₂ ) ... OPK (sm _-1 + 1, s
m; xm) x ₀ ] If we continue this kind of operation until the OPK that appears has a syntax consisting of only one clause, then OPK (1, N, x ₀ ),
That is, the optimum phrase sequence and the optimum syntax on it can be determined at the same time.

一組のi,j,x（１≦ｉ≦ｊ≦N,x∈Ｂ（i,j））に対して
最適区分点と対応する最適文節の組が複数個存在するこ
とがあるが、そのときは、それらの全ての組に対して上
記の操作を行い、得られる構文全てを列挙すればよい。For one set of i, j, x (1 ≤ i ≤ j ≤ N, x ∈ B (i, j)), there may be a plurality of sets of optimal clauses corresponding to optimal segment points. In this case, the above operation should be performed for all the sets, and all the obtained syntaxes should be listed.

本発明は、日本語を対象とする場合のみならず、韓国語
のように日本語と同様の係り受けによって記述できる文
法構造を持つ外国語にも適用できることは言うまでもな
い。It goes without saying that the present invention can be applied not only to the case of Japanese but also to a foreign language having a grammatical structure such as Korean that can be described by the same dependency as Japanese.

［作用］本発明によれば、与えられた文字列位置1,2,…,Nの部分
列i,i＋1,…,jの範囲内での、最後の文節を固定した時
の最適な文節列とその上の最適な構文、およびその適格
度を、長さの短い部分列に対応するものから順次求めて
それを記憶しておき、それらの部分列を含むより長い部
分列に対して同様のものを計算するときにそれらを利用
することによって、同じ計算を繰り返すことなく効率的
に所期の結果を得ることができる。[Operation] According to the present invention, the optimum phrase sequence when the last phrase is fixed within the range of the subsequence i, i + 1, ..., j of the given character string positions 1, 2, ..., N. And its optimal syntax, and its eligibility, are sequentially sought from the ones corresponding to the short subsequences and stored, and the same is applied to the longer subsequences containing those subsequences. By utilizing them when calculating things, the desired result can be efficiently obtained without repeating the same calculation.

［実施例］以下に図面を参照して本発明を詳細に説明する。EXAMPLES The present invention will be described in detail below with reference to the drawings.

以下の説明において、表音記号位置を、1,2,…,Nとす
る。以下では、文節集合Ｂ（i,j）の元の数をNUM（i,
j）とし、Ｂ（i,j）の元をＢ（i,j）＝｛ｘ_i,j,1,x_i,j,2，…,x_i,j,NUM(i,j)｝と列挙して表す。In the following description, phonetic symbol positions are 1, 2, ..., N. In the following, the original number of the phrase set B (i, j) is NUM (i,
j) and the element of B (i, j) is B (i, j) = {x _{i, j, 1} , x _{i, j, 2} , ..., x _{i, j, NUM (i, j)} } Enumerate and represent.

本発明を実施する装置の一実施例を第１図に示す。第１
図において、SCは第２図（Ａ）および（Ｂ）に示すフロ
ーチャートにつき説明するテーブルscoreを実現するた
めのRAMなどによるバッファメモリであり、入力端子i1
から入力される各文節ｘ_i,j,qの確実度Ｓ（ｘ_i,j,q）を
保持するためのものである。BUFは文節入力端子i2から
入力される文節集合を保持するRAMなどによるバッファ
メモリである。各文節はその表音記号列だけでなく、始
端、終端の情報も併せて記述しておく。例として、本発
明を音声認識に用いる時は、認識装置から認識結果とし
て出力される各文節候補を端子i2から入力し、それらの
文節に付随した確実度を端子i1から入力する。また、本
発明をべた書き入力仮名漢字変換方式日本語ワードプロ
セッサに適用する時は、与えられた仮名文字列a₁,a₂,
…,a_Nをまず従来技術で形態素解析し、部分仮名文字列a
i,ai₊₁,…,ajを仮名表記として持つ文節候補を各i,j
（１≦ｉ≦ｊ≦Ｎ）について全て列挙し、それら文節候
補を端子i2から入力する。その際、単語の使用頻度など
から定まる、各文節の確実度を端子i1から入力する。T1
およびT2はそれぞれ第３図（Ａ）および（Ｂ）に示すテ
ーブルtable1およびtable2を実現するためのRAMであ
る。COMBIは表音記号位置1,2,…,Nの中からその分割点０＝ｋ（０）＜ｋ（１）＜ｋ（２）＜…＜ｋ（ｍ）＝Ｎと、それによって定まる文節集合Ｂ（ｋ（０）＋1,k
（１））,B（ｋ（１）＋,k（２）），…,B（ｋ（ｍ−
１）＋1,k（ｍ））の中からそれぞれ文節ｘ
_{k(0)+1,k(1),p(1)},x_{k(1)+1,k(2),p(2)}，…ｘ
_{k(m-1)+1,k(m),p(m)}を選択する組合わせを発生する装置
である。SEL1はバッファメモリBUFから組合わせ発生装
置COMBIにより指定される特定の文節のみを選択する装
置である。PEはバッファメモリBUFから選択装置SEL1を
介して読み出された文節x₁,x₂,…,xmxに対して、PEN（x
₁,x₂,…,xm,x）を計算する装置である。INTは選択装置S
EL1によって読み出された文節ｘの始端を検出する装置
である。SEL2はメモリT1から、組合わせ発生装置COMBI
により指定される特定の情報のみを選択する装置であ
る。SEL3は文節始端検出装置INTから送られた信号に基
づいて、バッファメモリSCに記憶されている情報の中か
ら特定のものを選択する装置である。ADD1はPEN計算装
置PEの出力と、選択装置SEL2によってメモリT1から読み
出された数値とを加算する加算器である。MINは組合わ
せ発生装置COMBIが種々の組合わせを発生するときの加
算器ADD1の出力の最小値とそのときの組合わせを検知す
る最小値検出器である。ADD2は最小値検出器MINの出力
とバッファメモリSCの中の特定の数値とを加算する加算
器である。An embodiment of the apparatus for carrying out the present invention is shown in FIG. First
In the figure, SC is a buffer memory such as a RAM for realizing the table score described in the flowcharts shown in FIGS. 2A and 2B, and the input terminal i1
This is for holding the certainty S (x _{i, j, q} ) of each clause x _{i, j, q} input from. BUF is a buffer memory such as a RAM that holds a phrase set input from the phrase input terminal i2. Each phrase describes not only its phonetic symbol string but also the information of the beginning and end. As an example, when the present invention is used for speech recognition, each phrase candidate output from the recognition device as a recognition result is input from the terminal i2, and the certainty associated with those phrases is input from the terminal i1. Further, when the present invention is applied to a solid writing input kana-kanji conversion method Japanese word processor, a given kana character string a ₁ , a ₂ ,
…, A _N is first morphologically analyzed by the conventional technique, and the partial kana character string a
Each bunsetsu candidate that has i, ai ₊₁ , ..., aj as a kana notation is i, j
All (1≤i≤j≤N) are listed, and the phrase candidates are input from the terminal i2. At that time, the certainty of each phrase, which is determined by the frequency of use of the word, is input from the terminal i1. T1
And T2 are RAMs for realizing the tables table1 and table2 shown in FIGS. 3A and 3B, respectively. COMBI is the division point among phonetic symbol positions 1, 2, ..., N 0 = k (0) <k (1) <k (2) <... <k (m) = N, and the phrase determined by it Set B (k (0) + 1, k
(1)), B (k (1) +, k (2)), ..., B (k (m-
1) + 1, k (m)) each clause x
_{k (0) + 1, k (1), p (1)} , x _{k (1) + 1, k (2), p (2)} , ... x
It is a device that generates a combination that selects _{k (m-1) +1, k (m), p (m)} . SEL1 is a device for selecting only a specific clause specified by the combination generator COMBI from the buffer memory BUF. PE reads PEN (x for the clauses x ₁ , x ₂ , ..., xmx read from the buffer memory BUF via the selection device SEL1.
It is a device that calculates ₁ , x ₂ , ..., xm, x). INT is the selection device S
It is a device that detects the beginning of the phrase x read by EL1. SEL2 is a combination generator COMBI from memory T1
It is a device that selects only specific information specified by. SEL3 is a device for selecting a specific one from the information stored in the buffer memory SC based on the signal sent from the phrase start end detecting device INT. ADD1 is an adder that adds the output of the PEN calculation device PE and the numerical value read from the memory T1 by the selection device SEL2. MIN is a minimum value detector that detects the minimum value of the output of the adder ADD1 when the combination generator COMBI generates various combinations and the combination at that time. ADD2 is an adder that adds the output of the minimum value detector MIN and a specific numerical value in the buffer memory SC.

CONTはこれら各部の動作順序を制御するための制御装置
であって、例えば中央処理装置CPUと各部の制御手順を
予め記憶しておくためのROMの形態のメモリMEM1および
作業用のRAMの形態のメモリMEM2を有する。01および02
はそれぞれメモリT1およびT2に書込まれた計算結果を出
力する出力端子である。CONT is a control device for controlling the operation sequence of each of these parts, for example, in the form of a central processing unit CPU and a memory MEM1 in the form of a ROM for storing the control procedure of each part in advance and a form of a working RAM. It has a memory MEM2. 01 and 02
Are output terminals for outputting the calculation results written in the memories T1 and T2, respectively.

第２図（Ａ）および（Ｂ）は、第１図示の実施例におけ
るメモリMEM1に予め格納しておく制御手順の一例として
の、最適文節列の上の最適構文の適格度、最適文節列、
およびその上の最適構文を定めるための最適区分点と最
適文節番号の組を順次求めるための手順を示すフローチ
ャートである。以下、これについて説明する。2 (A) and 2 (B), as an example of the control procedure stored in advance in the memory MEM1 in the embodiment shown in FIG.
9 is a flowchart showing a procedure for sequentially obtaining a set of optimum segment points and optimum clause numbers for determining the optimum syntax on the table. This will be described below.

第２図（Ａ）および（Ｂ）のフローチャートに付随し
て、第３図（Ａ）および（Ｂ）に示すように、想定して
いる文節列長Ｎに等しい数の行および列、および第ｉ行
第ｊ列において文節集合Ｂ（i,j）の元の数NUM（i,j）
に等しい項を持った２つの３次元のテーブルtable1（i,
j,q）およびtable2（i,j,q）（１≦ｉ≦ｊ≦N,1≦ｑ≦N
UM（i,j））が必要である。各テーブルの添字は左から
順に行、列、項を表す。As shown in FIGS. 3A and 3B accompanying the flowcharts of FIGS. 2A and 2B, the number of rows and columns equal to the assumed bunsetsu column length N, and The original number NUM (i, j) of the phrase set B (i, j) at row i, column j
Two three-dimensional tables with terms equal to table1 (i,
j, q) and table2 (i, j, q) (1 ≤ i ≤ j ≤ N, 1 ≤ q ≤ N
UM (i, j)) is required. The subscripts in each table represent rows, columns, and terms in order from the left.

table1（i,j,q）はOPT（i,j;x_i,i,q）の値を、またtabl
e2（i,j,q）は、i,j,x_i,j,qに対する最適区分点と最適
文節番号の組を記憶するためのものである。文節集合Ｂ
（i,j）の元の数NUM（i,j）は２次元のテーブルnum（i,
j）に入力され保持される。第ｋ文節集合Ｂ（i,j）内の
第ｐ文節ｘ_i,j,pの確実度は３次元のテーブルscore（i,
j,p）に入力され保持される。また、PEN（ｘ
_{k(0)+1,k(1),p(1)},x_{k(1)+1,k(2),p(2)}，…,x
_{k(m-1)+1,km,pm},x_i,j,q）を計算する関数を pen（（ｋ（０）＋1,k（１）,p（１））,k（１）＋1,k
（２）,p（２）），…，（ｋ（ｍ−１）,k（ｍ）,p
（ｍ）），（i,j,q））とする。INIT（ｘ_i,j,q）を計算する関数をinit（i,j,
q）とする。table1 (i, j, q) is the value of OPT (i, j; x _{i, i, q} ) and tabl
e2 (i, j, q) is for storing a set of optimum segment points and optimum clause numbers for i, j, x _{i, j, q} . Bun Set B
The original number NUM (i, j) of (i, j) is a two-dimensional table num (i,
j) is input and held. The certainty of the p-th clause x _{i, j, p} in the k-th clause set B (i, j) is the three-dimensional table score (i,
j, p) and is held. Also, PEN (x
_{k (0) + 1, k (1), p (1)} , x _{k (1) + 1, k (2), p (2)} , ..., x
_The function to calculate _{k (m-1) + 1, km, pm} , x _{i, j, q} ) is pen ((k (0) + 1, k (1), p (1)), k (1) +1 , k
(2), p (2)), ..., (k (m-1), k (m), p
(M)), (i, j, q)). The function that calculates INIT (x _{i, j, q} ) is init (i, j,
q).

第２図（Ａ）および（Ｂ）のフローチャートにおいて、
ステップS1からステップS13において、各テーブルの列
番号ｊを１から始めてＮまで１ずつ増加させ、第ｊ列に
対して次の処理を実行する。In the flow charts of FIGS. 2 (A) and (B),
In steps S1 to S13, the column number j of each table is incremented by 1 starting from 1 and incremented to N, and the following process is executed for the j-th column.

各ステップS2からステップS11において、各テーブルの
行番号ｉをｊから始めて１まで１ずつ減少させ、第ｉ行
に対して次の処理を実行する。In each of steps S2 to S11, the row number i of each table starts from j and is decreased by 1 to 1, and the following processing is executed for the i-th row.

ステップS3からステップS9において、各テーブルの項番
号ｑを１から始めてnum（i,j）まで１ずつ増加させ、第
ｑ項に対して次の処理を実行する。In steps S3 to S9, the term number q of each table is incremented from 1 by 1 up to num (i, j), and the following processing is executed for the q-th term.

（１）ステップS4において、init（i,j,q）≠ｉと判定
されたならば、ステップS5に進み、ここで次の［F1］を
実行し、ついでステップS6において［F2］を実行する。(1) In step S4, if it is determined that init (i, j, q) ≠ i, the process proceeds to step S5, where the next [F1] is executed, and then [F2] is executed in step S6. .

［F1］ table1（i,j,q）：＝min（１≦ｍ≦min｛init（i,j,q）−1,L｝min（ｉ
−１＝ｋ（０）＜ｋ（１）＜Ｋ（２）＜…＜ｋ（ｍ）＝
init（i,j,q）−１）min（１≦ｐ（１）≦num（ｋ
（０）＋1,k（１））,1≦ｐ（２）≦mun（ｋ（１）＋1,
k（２）），…,1≦ｐ（ｍ）≦num（ｋ（ｍ−１）＋1,k
（ｍ））［table1（ｋ（０）＋1,k（１）,p（１））＋t
able1（ｋ（１）＋1,k（２）,p（２））＋…＋table1
（ｋ（ｍ−１）＋1,k（ｍ）,p（ｍ））＋pen（（ｋ
（０）＋1,k（１）,p（１）），（ｋ（１）＋1,k
（２）,p（２）），…，（ｋ（ｍ−１）＋1,k（ｍ）,p
（ｍ）），（i,j,q））］＋score（i,j,q）［F2］ table2（i,j,q）：＝（［F1］において最小値を与える（ｋ（１）,p
（１）），（ｋ（２）,p（２）），…，（ｋ（ｍ）,p
（ｍ））（２）ステップS4において、init（i,j,q）＝ｉと判定
されたならば、ステップS7に進み、ここで次の［F3］を
実行する。[F1] table1 (i, j, q): = min (1≤m≤min {init (i, j, q) -1, L} min (i
−1 = k (0) <k (1) <K (2) <... <k (m) =
init (i, j, q) -1) min (1≤p (1) ≤num (k
(0) + 1, k (1)), 1 ≦ p (2) ≦ mun (k (1) +1,
k (2)), ..., 1 ≦ p (m) ≦ num (k (m−1) + 1, k
(M)) [table1 (k (0) + 1, k (1), p (1)) + t
able1 (k (1) + 1, k (2), p (2)) + ... + table1
(K (m-1) + 1, k (m), p (m)) + pen ((k
(0) + 1, k (1), p (1)), (k (1) + 1, k
(2), p (2)), ..., (k (m-1) + 1, k (m), p
(M)), (i, j, q))] + score (i, j, q) [F2] table2 (i, j, q): = ([F1] gives the minimum value (k (1), p
(1)), (k (2), p (2)), ..., (k (m), p
(M)) (2) If init (i, j, q) = i is determined in step S4, the process proceeds to step S7, where the next [F3] is executed.

［F3］ table1（i,j,q）：＝score（i,j,q）［F2］における区分点と文節の選び方の組合わせの発生
は第１図示の組合わせ発生装置COMBIで行われる。それ
らの組合わせに関する最小値と、最小値を与える組合わ
せの検知は最小値検出器MINで行われる。PEN計算装置PE
においてPENを計算するのに必要な文節の選択は選択装
置SEL1によって行われ、table1（ｋ（０）＋1,k（１）,
p（１））,table1（ｋ（１）＋1,k（２）,p（２）），
…,table1（ｋ（ｍ−１）＋1,k（ｍ）,p（ｍ））の値の
読み出しは選択装置SEL2によって行われる。init（i,j,
q）の計算とそれがｉに等しいか否かの判定は文節始端
検出装置INで行われる。また、テーブルnumはバッファB
UFの一部に記憶される。以上の処理により、table1およ
びtable2の各行、列、項に上述の計算を施し、その結果
を順次table1およびtable2に書込んで行く。[F3] table1 (i, j, q): = score (i, j, q) The combination generating device COMBI shown in FIG. 1 generates the combination of the selection points and the segment points in [F2]. The minimum value detector MIN detects the minimum value of those combinations and the combination giving the minimum value. PEN calculator PE
The selection of the clauses required to calculate PEN at is done by the selector SEL1, table1 (k (0) + 1, k (1),
p (1)), table1 (k (1) + 1, k (2), p (2)),
.., table1 (k (m-1) + 1, k (m), p (m)) is read by the selector SEL2. init (i, j,
The calculation of q) and the determination as to whether it is equal to i or not are performed by the clause start end detecting device IN. Also, table num is buffer B
Memorized in part of UF. Through the above processing, the above calculation is performed on each row, column, and term of table1 and table2, and the results are sequentially written into table1 and table2.

ステップS13においてｊ＞Ｎとなったときに計算が終了
し、table1（i,N,q）にはOPT（1,N,x_1,N,q）、（１≦ｑ
≦NUM（1,N））が記憶されている。また、table2には最
適区分点と最適文節番号の情報が記憶されているので、
（４）（ｅ）で述べた方法により、この情報から最適文
節列と最適構文を構成することができる。When j> N in step S13, the calculation ends, and OPT (1, N, x _{1, N, q} ) and (1 ≦ q for table1 (i, N, q).
≦ NUM (1, N)) is stored. Also, since the information of the optimum segment point and the optimum clause number is stored in table2,
(4) By the method described in (e), the optimum clause sequence and the optimum syntax can be constructed from this information.

本発明を実際に使用するときには、第１図示の装置、お
よび第２図（Ａ）および（Ｂ）に示したフローチャート
の他にtable2の情報から最適な文節列とその上の最適な
構文を構成する機構が必要であるが、本発明の主眼点は
table1およびtable2の内容を計算するところにあり、こ
れらの情報から最適な文節列およびその上の最適な構文
を構成する機構については上記の説明にとどめる。When the present invention is actually used, in addition to the apparatus shown in FIG. 1 and the flow charts shown in FIGS. 2A and 2B, an optimum phrase sequence and an optimum syntax on it are constructed from the information in table2. However, the main point of the present invention is
The contents of table1 and table2 are calculated, and the mechanism of constructing the optimal clause sequence and the optimal syntax on it from this information will be limited to the above description.

但し、table1およびtable2の内容が計算できていれば、
与えられた文節の集合から最適な文節列およびその上の
最適な構文を構成するために必要な計算の内で、最も計
算量の多い部分はもはや完了していることに注意してお
く。However, if the contents of table1 and table2 can be calculated,
Note that the most computationally expensive part of the computation needed to construct the optimal sequence of phrases and the optimal syntax above it from a given set of phrases is now complete.

［F1］において最小値を与える数値の対の組（（ｋ
（１）,p（１）），（（ｋ（２）,p（２）），…，（ｋ
（ｍ）,p（ｍ）））が複数個存在することがあるが、そ
のときには、table2（i,j,q）に複数個の数値の組が記
憶できるようにしておき、［F2］においてそれらを全て
table2（i,j,q）に記憶するようにすればよい。このよ
うに第２図（Ａ）および（Ｂ）のフローチャートを変更
しても計算量には殆ど変わりがない。The pair of numerical values ((k
(1), p (1)), ((k (2), p (2)), ..., (k
(M), p (m))) may exist, in that case, it is possible to store a plurality of sets of numerical values in table2 (i, j, q), and in [F2] All of them
It should be stored in table2 (i, j, q). Thus, even if the flowcharts of FIGS. 2A and 2B are changed, the calculation amount is almost unchanged.

なお、上述した実施例では、最小値を求める処理の場合
を示したが、これらはＳ（ｘ）の値が小さい程文節ｘの
確実度が高く、PENの値が小さいほど係り受けの整合度
が高いとしたためである。もしＳ（ｘ）の値が大きい程
確実度が高く、PENの値が大きい程係り受けの整合度が
高いならば、最小値の代りに最大値を求める処理を行え
ばよい。In the above-described embodiment, the case of the process of obtaining the minimum value is shown. However, the smaller the value of S (x), the higher the certainty of the clause x, and the smaller the value of PEN, the degree of dependency matching. Because it is high. If the larger the value of S (x) is, the higher the certainty is, and the larger the value of PEN is, the higher the matching degree of the dependency is, the process of obtaining the maximum value instead of the minimum value may be performed.

［発明の効果］以上述べたように、本発明によれば、与えられた文字列
位置1,2,…,Nの部分列i,i＋1,…,jの範囲内での、最後
の文節を固定した時の最適な文節列とその上の最適な構
文、およびその適格度を、長さの短い部分列に対応する
ものから順次求めてそれを記憶しておき、それらの部分
列を含むより長い部分列に対して同様のものを計算する
ときにそれらを利用することによって、同じ計算を繰り
返すことなく効率的に所期の結果を得ることができる。[Effects of the Invention] As described above, according to the present invention, the last clause within the range of the subsequence i, i + 1, ..., j of the given character string positions 1, 2 ,. The optimum phrase sequence when fixed, the optimum syntax above it, and its eligibility are obtained sequentially from the ones corresponding to the subsequences with short lengths and stored, and rather than including those subsequences. By utilizing them when calculating similar ones for long subsequences, the desired result can be efficiently obtained without repeating the same calculation.

複数の文節x₁,x₂,…,xmが同時に、ある一つの文節ｘに
係ることの整合度PEN（x₁,x₂,…,xm,x）は、各文節を構
成する単語の属性や、実際の文章の中に現れる係り受け
の頻度などの統計情報などを予め乱書に記述しておき、
それに基づいて計算することができる。その計算量は言
語辞書の構成法などによっても変るが、一つの目安とし
て、次のような場合につき、本発明の計算方法と枚挙法
における加算と比較演算の回数を評価する。なお、ｍ個
の数値の総和を求めるのにはｍ−１回の加算演算が必要
であり、また、ｍ個の数値の最小値を求めるのにはｍ−
１回の比較演算が必要であるとした。Consistency PEN (x ₁ , x ₂ , ..., xm, x) that a plurality of phrases x ₁ , x ₂ , ..., xm are related to a certain phrase x at the same time is attribute of words forming each phrase. Or describe statistical information such as the frequency of dependency appearing in the actual sentence in advance in the irregular letter,
It can be calculated based on it. Although the amount of calculation varies depending on the construction method of the language dictionary and the like, as one guide, the number of times of addition and comparison operations in the calculation method and enumeration method of the present invention are evaluated in the following cases. It should be noted that m-1 number of addition operations are required to obtain the sum of the m numbers, and m− is required to obtain the minimum value of the m numbers.
It is assumed that one comparison operation is necessary.

（１）PEN（x,y）を計算するためには、加算Ｊ回分の計
算量を必要とする。(1) In order to calculate PEN (x, y), the calculation amount for J additions is required.

（２）PEN（x₁,x₂,…,xm,x）を計算するためには、PEN
（x₁,x）＋PEN（x₂,x）＋…＋PEN（xm,x）を計算するの
と同じだけの計算量、すなわち加算（Ｊ＋１）・ｍ−１
回分の計算量を必要とする。(2) To calculate PEN (x ₁ , x ₂ , ..., xm, x),
(X ₁ , x) + PEN (x ₂ , x) + ... + PEN (xm, x) The same amount of calculation, that is, addition (J + 1) · m-1
It requires a lot of calculation.

さらに、文節集合Ｂ（i,j）の元の数は、全てＭに等し
いとする。そうすると、解くべき問題の大きさを定める
パラメータは、全部で次のようになる。Further, it is assumed that the original numbers of the clause set B (i, j) are all equal to M. Then, the parameters that determine the size of the problem to be solved are as follows.

M:各文節集合Ｂ（i,j）の元の数 N:文字列長 L:一つの文節に同時に係り得る文節数の上限 J:二文節間の係り受けの整合度計算量の、加算換算値以上の前提の下で、計算量は次のようになる。M: Original number of each bunsetsu set B (i, j) N: Character string length L: Upper limit of the number of bunsetsus that can be related to one bunsetsu at the same time J: Addition conversion of the degree of dependency matching between two bunsetsus Under the assumption that the value is higher than the value, the calculation amount is

（ａ）本発明関数Ｆ（n,k）をＦ（n,k）＝Σ（m₁＋m₂＋…,mk＝n,1≦mi≦ｎ）［m₁・m
₂…mk］と定義し、これを用いて加算回数と比較回数を表す。(A) The function F (n, k) of the present invention is expressed as F (n, k) = Σ (m ₁ + m ₂ + ..., mk = n, 1 ≦ mi ≦ n) [m ₁ · m
₂ … mk] and use this to express the number of additions and the number of comparisons.

（１）加算関数をｆ（ｎ）と定義すると（２）比較関数ｇ（ｎ）をと定義すると（ｂ）枚挙法 knum（n,L）を長さｎの文節列上の係り受け構造の中
で、一つの文節に同時に係る文節の数がＬ以下のものの
個数とし、_N-1Cnを二項係数とすると、加算回数と比較
回数は次のように表される。(1) Add function to f (n) Is defined as (2) Compare function g (n) Is defined as (B) The enumeration knum (n, L) is defined as the number of bunsetsus that are simultaneously related to one bunsetsu and is L or less in the dependency structure on the bunsetsu string of length n, and _N-1 Cn is two. Assuming a term coefficient, the number of additions and the number of comparisons are expressed as follows.

（１）加算全加算回数＝Σ（０≦ｎ≦Ｎ−１）［_N-1Cn・｛knum
（ｎ＋1,L）・（Ｊ＋１）・ｎ＋ｎ｝・Mn⁺¹］（２）比較全比較回数＝Σ（０≦ｎ≦Ｎ−１）［_N-1Cn・｛knum
（ｎ＋1,L）・Mn⁺¹］−１ knum（n,L）は次の漸化式を用いて計算することができ
る。(1) Addition Total number of additions = Σ (0 ≤ n ≤ N-1) [ _N-1 Cn · {knum
(N + 1, L) · (J + 1) · n + n} · Mn ^{+ 1} ] (2) Comparison Total number of comparisons = Σ (0 ≦ n ≦ N−1) [ _N-1 Cn · {knum
(N + 1, L) * Mn ^{+ 1} ] -1 knum (n, L) can be calculated using the following recurrence formula.

これらの全加算回数および全比較回数をＪ＝１といくつ
かのM,N,Lについて計算した値を第２表および第３表に
掲げる。 Tables 2 and 3 show the values obtained by calculating the total number of additions and the total number of comparisons for J = 1 and some M, N, L.

これらの表によれば本発明の効果は明らかであり、例え
ば、Ｍ＝５、Ｎ＝20、Ｌ＝５のときの計算量は枚挙法の
約10¹³分の１に削減される。 According to these tables the effect of the present invention are apparent, for example, the calculation amount when the M = 5, N = 20, L = 5 is reduced by a factor of about ¹⁰¹³ minutes enumeration methods.

[Brief description of drawings]

第１図は本発明を実施する装置の一実施例を示すブロッ
ク図、第２図（Ａ）および（Ｂ）はその制御手順の一例を示す
フローチャート、第３図（Ａ）および（Ｂ）は第２図のフローチャートを
実行する際に必要となるテーブルの一例を示すテーブル
構造図である。 SC……文節確実度保持用RAM、 BUF……文節集合保持用RAM、 T1……table1用RAM、 T2……table2用RAM、 SEL1……データ選択装置、 SEL2……データ選択装置、 SEL3……データ選択装置、 PE……係り受け整合度計算装置、 COMBI……組合わせ発生装置、 INT……文節始端検出装置、 ADD1……加算器、 ADD2……加算器、 MIN……最小値検出器、 CPU……中央処理装置、 MEM1……制御手順記憶用ROM、 MEM2……CPU作業用RAM、 CONT……各部の動作順序を制御する制御装置、 i1……文節確実度入力端子、 i2……文節入力端子、 o1……メモリT1に得られた結果の出力端子、 o2……メモリT2に得られた結果の出力端子。FIG. 1 is a block diagram showing an embodiment of an apparatus for carrying out the present invention, FIGS. 2 (A) and 2 (B) are flow charts showing an example of the control procedure, and FIGS. 3 (A) and 3 (B) are It is a table structure diagram which shows an example of the table required when performing the flowchart of FIG. SC: RAM for holding clause certainty, BUF: RAM for holding clause sets, T1: RAM for table1, T2: RAM for table2, SEL1 ... data selection device, SEL2 ... data selection device, SEL3 .. Data selection device, PE ... Dependency matching degree calculation device, COMBI ... Combination generator, INT ... Phrase start detection device, ADD1 ... Adder, ADD2 ... Adder, MIN ... Minimum value detector, CPU: Central processing unit, MEM1: ROM for control procedure storage, MEM2: RAM for CPU work, CONT: Control device for controlling the operation sequence of each part, i1 ... Phrase certainty input terminal, i2 ... Phrase Input terminal, o1 ... output terminal of the result obtained in the memory T1, o2 ... output terminal of the result obtained in the memory T2.

Claims

[Claims]

1. A phonetic symbol position determined by a natural number from 1 to N, and a set of clauses whose phonetic notation start position and phonetic notation end position are at various positions within the range of 1 to N, and those. Given a numerical value that represents the certainty of a bunsetsu, minimizes or maximizes the sum of the numerical values that represent the degree of consistency of each bunsetsu with respect to another bunsetsu at the same time and the certainty of each bunsetsu. Under the optimum criterion, the phonetic transcription start position of the first phrase is equal to 1, the phonetic transcription end position of the last phrase is equal to N, and the phonetic transcription end position of a phrase other than the final phrase is 1 From among all the bunsetsu strings that can be arranged so that the value added with is equal to the phonetic transcription start position of the next bunsetsu, the optimum bunsetsu string, the optimum syntax of that bunsetsu string, and its eligibility In the language processing method that determines the degree Two-dimensional upper triangular matrix-shaped first and second tables having the same number of rows and columns as N are prepared, and each grid of the first table and the second table is represented by a phonetic transcription termination. The first table and the second table are three-dimensionalized by dividing into the number of clauses whose position is equal to its column number and whose phonetic notation start position is equal to or more than its line number. For the q-th bunsetsu in the bunsetsu set in which the notation start position is a natural number i or more and the phonetic notation end position is a natural number j, when the phonetic notation start position is equal to i, The certainty of the phrase is stored in the i-th row, the j-th column, and the q-th term, and the phonetic transcription start position is a natural number i or more and the phonetic transcription end position is a natural number j. For the q-th phrase, if its phonetic notation start position is not equal to i, the phonetic Divide into several sections up to the number obtained by subtracting 1 from the phonetic transcription start position of the q-th phrase in the set of phrases whose start position is i or more and the phonetic transcription end position is equal to j. , The start position s, the end position t, and the p-th p in the clause set whose start position is s or more and whose end position is t
Corresponding to the clause, the calculated value is stored in the s-th row, the t-th column and the p-th term of the first table, and if the storage is done, the calculated first value is stored.
The values of the s-th row, the t-th column, and the p-th term of the table are added to all the sections of the above division, and this value is associated with the start position s and the end position t of each section of the division. , The p-th phrase in the phrase set whose phonetic transcription start position is s or more and whose phonetic transcription end position is t has the phonetic transcription start position i or more and the phonetic transcription end position is j. The degree of matching of the q-th phrase in the same phrase set at the same time, and the minimum value or the maximum value for the combination of the above-mentioned divisions and the above-mentioned clauses associated with each section of the division is calculated. , A value obtained by adding to the minimum or maximum value the certainty of the q-th phrase in the phrase set in which the phonetic transcription start position is i or more and the phonetic transcription end position is j. The division that is stored in the i-th row, j-th column, and q-th term of Table 1 and gives the minimum value or the maximum value A segment point and a set of clause numbers associated with each segment of the segment are stored in the i-th row, the j-th column, and the q-th term of the second table, and the first table and the second table are stored according to the above procedure. When the calculated values are stored in the first row and the Nth column of the first table and the second table, the calculated values are sequentially filled in with the calculated values. By obtaining the minimum value or the maximum value in each item in the 1st row and the Nth column, the final eligibility and the clause number of the final clause are obtained, and the optimal segmentation point necessary for constructing the optimal syntax is obtained. And the entire set of optimum phrase numbers is obtained in the second table.