JPS6030992B2

JPS6030992B2 - character encoding device

Info

Publication number: JPS6030992B2
Application number: JP54084597A
Authority: JP
Inventors: 悟富田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1979-07-02
Filing date: 1979-07-02
Publication date: 1985-07-19
Also published as: JPS569872A

Description

【発明の詳細な説明】この発明は文字を多く含む画像の伝送などに用いられる
文字符号化装置に関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a character encoding device used for transmitting images containing many characters.

近年、計算機技術や通信技術の発展によって、画像を取
り扱うことが比較的容易となり、画像データファイルや
画像通信の応用が盛んになりつつあるが、画像の持つ膨
大な情報量をいかに効率よく表現し、記憶や伝送に応用
するかは重要な問題である。従来画像のデータ圧縮技術
として、画像内の画素信号の持つ統計的な性質を利用し
て、ＲＬ（ランレンクス）符号化、ＤＰＣＭなどの予測
符号化、アダマール変換などの直交変換符号化などの各
種の符号化方式が研究されてきた。In recent years, with the development of computer technology and communication technology, it has become relatively easy to handle images, and applications of image data files and image communication are becoming more popular. , whether it will be applied to storage or transmission is an important question. Conventional image data compression techniques utilize the statistical properties of pixel signals within an image to utilize various techniques such as RL (run length) encoding, predictive encoding such as DPCM, and orthogonal transform encoding such as Hadamard transform. Coding schemes have been studied.

これらの方式は、画像が文書なのか図面なのか、写真な
のかという内容にかかわらず適用することができる意味
で汎用性はあるが、その反面、特定の種類の画像に対し
て必ずしも最も効率の良い符号化方式であると限らない
ものである。つまり、画像の種類を限定した場合にはこ
れらの汎用的符号化方式よりも、より効率の良い符号化
方式が考えられる。この発明は、上記の点に着目してな
されてもので、文字を多く含む文書のような画像に対し
て、極めて効率の良い符号化を可能とする文字符号化装
置を提供することを目的とする。以下、この発明の一実
施例を図面にもとづいて説明する。These methods are versatile in the sense that they can be applied regardless of whether the image is a document, drawing, or photograph, but on the other hand, they are not necessarily the most efficient for a particular type of image. It does not necessarily mean that it is a good encoding method. In other words, when the types of images are limited, a more efficient encoding method can be considered than these general-purpose encoding methods. The present invention has been made with attention to the above points, and an object of the present invention is to provide a character encoding device that enables extremely efficient encoding of images such as documents containing many characters. do. Hereinafter, one embodiment of the present invention will be described based on the drawings.

第１図は、この発明に係る文字符号化装置の構成を示す
ブロック図で、１０は文字が書かれた文書、１１は文
字の字形を読みとる字形読取部、１２は所定の字形があ
らかじめ記憶された第１の記憶部、１３は所定の字形が
あらかじめ記憶されていない第２の記憶部、１４は符号
化部で、１５は上記各部１１〜１４を制御する制御部で
ある。FIG. 1 is a block diagram showing the configuration of a character encoding device according to the present invention, in which 10 is a document in which characters are written, 11 is a character shape reading unit that reads the character shape, and 12 is a predetermined character shape that is stored in advance. 13 is a second storage section in which predetermined glyphs are not stored in advance; 14 is an encoding section; and 15 is a control section for controlling each of the sections 11 to 14.

第２図は、上記文字符号化装置に対応する文字復号化装
置の構成を示すブロック図で、２１は復号化部、２２は
第１の記憶部、２３は第２の記憶部、２４は字形記憶部
、２５は制御部で、２６は文字がコピーされた文書コピ
ーである。つぎに作用を説明する。FIG. 2 is a block diagram showing the configuration of a character decoding device corresponding to the character encoding device described above, in which 21 is a decoding section, 22 is a first storage section, 23 is a second storage section, and 24 is a glyph shape. A storage section, 25 is a control section, and 26 is a document copy into which characters are copied. Next, the effect will be explained.

文書１０上に書かれた、たとえば「カナと漢字では、漢
字の方がはるかに複雑で、・…・・」という文字列は、
字形読取部１１で順次読みとられのち、制御部１５にお
いて、第１の記憶部１２に記憶されている所定の文字字
形と比較、照合される。For example, the character string written on document 10, ``Between kana and kanji, kanji is much more complex,...''
After being sequentially read by the character shape reading section 11, the control section 15 compares and collates them with predetermined character shapes stored in the first storage section 12.

この第１の記憶部１２内には、所定の文字、たとえば仮
名文字、数字、アルファベット文字のような比較的字形
の簡単な文字の標準的字形や字形の特徴があらかじめ記
憶されており、公知の文字認識技術によって、このうち
１つと読み取られた文字が同一であるかが判定される。
そして、第１の記憶部１２内に同一文字が見つかった場
合には、つまり字形読取部１１で読みとられた文字の字
形が第１の記憶部１２に記憶されている場合には、その
該当文字に個有に与えられたコードを、読みとられた文
字に対する符号として、符号化部４０で符号化する。た
とえば上記の例では、「力」には“０１０００１０１”
という符号が、また「ナ」には“０１０１０１００’’
という符号がそれぞれ与えられる。一方、第１の記憶部
１２内に同一文字が見つからない場合は、つまり、字形
読取部１１で読み取られた文字の字形が第１の記憶部１
２に記憶されていない場合には、読み取った文字の字形
を逐次第２の記憶部１３に記憶させる。The first storage unit 12 stores in advance the standard shapes and character shapes of predetermined characters, for example, characters with relatively simple shapes such as kana characters, numbers, and alphabetic characters. Character recognition technology determines whether one of the characters read is the same.
If the same character is found in the first storage unit 12, that is, if the glyph of the character read by the glyph reading unit 11 is stored in the first storage unit 12, the corresponding A code uniquely given to a character is encoded by the encoding unit 40 as a code for the read character. For example, in the above example, "force" is "01000101".
The code ``Na'' is also ``01010100''
A code is given for each. On the other hand, if the same character is not found in the first storage section 12, that is, the glyph of the character read by the glyph reading section 11 is not found in the first storage section 1.
If the shape of the read character is not stored in the memory section 2, the shape of the read character is sequentially stored in the memory section 13 of the second memory.

この第２の記憶部１３は書き込みおよび読み出しの両方
が可能な記憶装置である。上記第２の記憶部１３に新た
に記憶された文字には、先の第１の記憶部１２内の文字
とは異なる文字コードが与えられる。この文字コード‘
ま、たとえば、第１の記憶部１２で見し・出されなかっ
た文字が出現した順番や、第２の記憶部１３内の記憶ア
ドレスを表わすコ−ドを用いることができる。たとえば
、再び上記の例で、「漢」には“１０００００００”と
いうコードが、また「字」には“１００００００１’’
というコードがそれぞれ与えられる。符号化部１４では
、このコ−ドを、読みとった文字に対する符号として符
号化を行なつｏここで、第１の記憶部１２に見つからな
い文字であって、以前に文書中に出現して第２の記憶部
１３内にすでに記憶されている文字が再び出現した場合
には、新たに文字コードを与えてもよいし、あるいは先
に与えられたコードをその文字の符号として符号化して
もよい。This second storage unit 13 is a storage device that allows both writing and reading. A character newly stored in the second storage unit 13 is given a character code different from that of the character in the first storage unit 12. This character code'
For example, it is possible to use a code representing the order in which characters not displayed in the first storage section 12 appear or a storage address within the second storage section 13. For example, again in the above example, "Kan" has the code "10000000" and "Ji" has the code "10000001''.
A code is given for each. The encoding unit 14 encodes this code as a code for the read character. When a character already stored in the storage unit 13 of 2 appears again, a new character code may be given, or the previously given code may be encoded as the code for that character. .

つまり、上記の例では、２度割こ出現した「漢」と「字
」は以前と同じく、それぞれ“ｌ００００００び、“１
００００００１’１とし、つコードで符号化される。文
書中のすべての文字がこのようにして符号化されれば、
つまり第１の記憶部１２と第２の記憶部１３内の文字字
形が与えられれば元の文書面の文字列を再現することが
可能である。In other words, in the above example, "kan" and "ji" that appear twice are "l000000" and "1" respectively, as before.
0000001'1, and is encoded with one code. If all the characters in the document are encoded in this way,
In other words, if the character shapes in the first storage section 12 and the second storage section 13 are given, it is possible to reproduce the character string on the original document surface.

ところで、このような符号化によって、文書を伝送する
ことを考えれば、文書中の文字のうち、第１の記憶部１
２内の文字は、標準文字としてあらかじめ送信側にも受
信側にも備えておけばよいが、第２の記憶部１３内の文
字は、文書によって出現する文字の種類が異なるため、
あらかじめ準備することはできない。By the way, if we consider transmitting a document using such encoding, some of the characters in the document are stored in the first storage unit 1.
The characters in 2 may be prepared in advance as standard characters on both the sending and receiving sides, but the characters in the second storage section 13 differ in the type of characters that appear depending on the document.
It cannot be prepared in advance.

したがって、この内容を送信側から受信側へ伝達しなけ
ればならない。この際には、第２の記憶部１３内には標
準文字以外の出現文字が「漢・字・方・複・雑・・・・
・・」と並んでいるわけであるから、これを通常の２値
画像としてＲＬ符号化などの方法で符号化部１４によっ
て符号化すればよい。伝送の際には、先に符号化された
文字列に対応して、“０１０１０１００”、“０００１
００１ｒ、“１００００００ぴとなる符号列に、第２の
記憶部１３内の字形列の符号化されたものを付加すれば
よい。さて、このようにして符号化されて、記憶された
り、伝送されてきた文書の文字列の復元は、第２図のよ
うな復号化装置によって行なうことができる。Therefore, this content must be conveyed from the sender to the receiver. At this time, characters appearing other than standard characters are stored in the second storage unit 13 as "kanji, characters, forms, complex, complicated...
. . ”, so the encoding unit 14 can encode this as a normal binary image using a method such as RL encoding. During transmission, "01010100" and "0001" are written in response to the previously encoded character string.
001r, "1,000,000 bits" can be added to the encoded glyph string in the second storage unit 13. The character string of the document can be restored by a decoding device as shown in FIG.

すなわち、第１の記憶部２２は標準文字の標準字形が記
憶されており、あらかじめ準備されるものである。一方
第２の記憶部２３は、画像記憶での応用の際は、符号化
時の記憶内容がそのまま利用できるが、画像伝送などで
の応用では、最初は何も記憶されていない。そこで、ま
ず「漢」「字」「方」「複一「難」という字形列の符号
を受信し、復号化部２１によって復号化して、字形列を
再現し、それを第２の記憶部２３内に記憶させる。つぎ
に、‘‘０１０１０１００”、“０００１００１１”、
“１００００００びなる文字符号列から文字コードを順
次復号化して第１の記憶部２２と第２の記憶部２３内の
文字字形を読み出し、その信号を字形記録部２４に印加
することにより、元の文書の文字列「ナハ「とハ「漢」
を再生して文書コピー２６を得る。なお、上記の実施例
の説明で、文書面上の文字列の符号化のみについて述べ
たが、文字と同様に取扱うことのできる図形、たとえば
、登録商標や署名などについても、文字と同様に標準図
形として第１の記憶部１２内に記憶したり、あるいは非
標準図形として第２の記憶部１３に記憶して符号化した
りできることは勿論である。That is, the first storage unit 22 stores standard glyph shapes of standard characters and is prepared in advance. On the other hand, in the second storage unit 23, when applied to image storage, the stored contents at the time of encoding can be used as is, but when applied to image transmission etc., nothing is stored at first. Therefore, first, the code of the glyph string "Kan", "ji", "kata", and "multiple "dang" is received, decoded by the decoding section 21 to reproduce the glyph string, and stored in the second storage section 23. memorize it internally. Next, ``01010100'', ``00010011'',
By sequentially decoding character codes from a character code string of 1,000,000 and reading the character shapes in the first storage section 22 and second storage section 23, and applying the signal to the character shape recording section 24, the original Document string “naha” and ha “han”
is reproduced to obtain a document copy 26. In the explanation of the above embodiment, only the encoding of character strings on the document surface was described, but graphics that can be handled in the same way as characters, such as registered trademarks and signatures, can also be encoded as standard. Of course, it can be stored in the first storage unit 12 as a figure, or stored in the second storage unit 13 as a non-standard figure and encoded.

以上詳述したように、この発明では、文字を読み取って
符号化する文字符号化装置において、２種の記憶部を有
し、読み取られた字形が第１の記憶部に記載されている
場合は、該当文字に与えられたコードをその読み取り文
字に対する符号とし、読み取られた字形が第１の記憶部
に記憶されてない場合には、その字形を第２の記憶部に
記憶して新たなコードを付し、このコードをその読み取
り文字に対する符号とするように構成したので、つまり
、文書中の文字のすべてを符号に変換するので、文字を
多く含む文書などの画像の符号化においてＲＬ符号化な
ど通常の符号化方式にくらべて極めて効率が高く、かつ
標準文字以外の図形を含む画像でも符号化できるという
汎用性もある。As described in detail above, in the present invention, a character encoding device that reads and encodes characters has two types of memory sections, and when the read character shape is written in the first memory section, , the code given to the corresponding character is used as the code for the read character, and if the read character shape is not stored in the first storage unit, the character shape is stored in the second storage unit and a new code is generated. , and this code is used as the code for the read character. In other words, all the characters in the document are converted into codes, so RL encoding is used when encoding images such as documents containing many characters. It is extremely efficient compared to normal encoding methods such as , and has the versatility of being able to encode images that include graphics other than standard characters.

[Brief explanation of the drawing]

第１図はこの発明に係る文字符号化装置のブロック図、
第２図は文字復号化装置のブ。ック図である。１１・・…・字形読取部、１２・・・・
・・第１の記憶部、１３・・・・・・第２の記憶部、１
４・・…・符号化部。第１図第２図FIG. 1 is a block diagram of a character encoding device according to the present invention;
Figure 2 shows the block of the character decoding device. This is a diagram. 11...Glyph reading section, 12...
...First storage section, 13...Second storage section, 1
4... Encoding section. Figure 1 Figure 2

Claims

[Claims]

1. A character shape reading unit comprising a character shape reading unit that reads the character shape, a first storage unit in which a predetermined character shape is stored in advance, a second storage unit in which a predetermined character shape is not stored, and an encoding unit. If the glyph of the character read by is stored in the first storage unit, the code given to the corresponding character stored in the first storage unit is used as the code for the read character, and the glyph reading unit If the glyph of the character read in is not recorded in the first memory, the glyph of the character is stored in the second memory, a new code is attached, and this code is applied to the read character. What is claimed is: 1. A character encoding device characterized in that the character encoding device is configured to encode a code for a character.