JP7235966B2

JP7235966B2 - File classification device, file classification program and file classification method

Info

Publication number: JP7235966B2
Application number: JP2019090074A
Authority: JP
Inventors: 忠信角田; 孝一矢崎; 和明二村
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2019-05-10
Filing date: 2019-05-10
Publication date: 2023-03-09
Anticipated expiration: 2039-05-10
Also published as: JP2020187429A

Description

本発明は、ファイル分類装置、ファイル分類プログラム及びファイル分類方法に関する。 The present invention relates to a file sorting device, a file sorting program, and a file sorting method.

近年、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）に記憶された情報の漏洩対策として、例えば、ファイル存在確認ツールが用いられている。このファイル存在確認ツールは、例えば、情報漏洩を引き起こす可能性があるファイルがＰＣ内に記憶されている場合に、そのファイルの削除や移動等を促す通知をユーザに行うものである。 In recent years, for example, a file existence confirmation tool has been used as a countermeasure against leakage of information stored in a PC (Personal Computer). For example, when a file that may cause information leakage is stored in the PC, this file existence confirmation tool notifies the user to delete or move the file.

一般的に、ユーザが有するＰＣには、ユーザによって作成されたデータファイルだけでなく、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）やアプリケーションについてのファイルが多数存在する。そのため、上記のようなファイル存在確認ツールは、例えば、チェック対象のファイル（以下、対象ファイルとも呼ぶ）のファイル名やファイルパス（以下、ファイル名等とも呼ぶ）と、事前に作成されたパターン（ホワイトリストやブラックリスト）とのマッチングを行うことにより、対象ファイルのそれぞれが秘密情報を含むか否かを判定する。そして、ファイル存在確認ツールは、秘密情報を含む可能性があると判定したファイルについての情報をユーザに通知する。これにより、ユーザは、例えば、各ＰＣが社外等に持ち出し可能な状態になっているか否かの判定を行うことが可能になる。 In general, a user's PC contains not only data files created by the user, but also a large number of OS (Operating System) and application files. For this reason, the above-mentioned file existence confirmation tool, for example, uses the file name and file path (hereinafter also referred to as the file name etc.) of the file to be checked (hereinafter also referred to as the target file) and the pre-created pattern ( It is determined whether or not each target file contains confidential information by matching with a whitelist or blacklist. Then, the file existence confirmation tool notifies the user of information about the files determined to possibly contain confidential information. This enables the user to determine, for example, whether or not each PC is ready to be taken outside the company.

具体的に、ファイル存在確認ツールは、秘密情報を含む可能性が高いファイルとして、例えば、ＭｉｃｒｏｓｏｆｔＯｆｆｉｃｅ（登録商標）ドキュメントファイル、メールファイル及びプログラムのソースコード等を特定する。また、ファイル存在確認ツールは、秘密情報を含む可能性が低いファイルとして、例えば、プログラムの実行ファイル、一時ファイル、ショートカットファイル及び設定ファイル等を特定する（例えば、特許文献１及び２参照）。 Specifically, the file existence confirmation tool identifies, for example, Microsoft Office (registered trademark) document files, mail files, program source codes, etc., as files that are highly likely to contain confidential information. In addition, the file existence confirmation tool identifies, for example, program execution files, temporary files, shortcut files, setting files, etc., as files that are unlikely to contain confidential information (see Patent Documents 1 and 2, for example).

特開２００８－１４０１０２号公報Japanese Patent Application Laid-Open No. 2008-140102 特開２０１１－１２９０２３号公報JP 2011-129023 A

ここで、例えば、記述の自由度が高い正規表現によって記述されたパターンが用いられる場合、ファイル存在確認ツールでは、対象ファイルのファイル名等とパターンとのマッチングに長時間を要する。 Here, for example, when a pattern described by a regular expression with a high degree of freedom of description is used, the file existence confirmation tool takes a long time to match the file name of the target file with the pattern.

そのため、ファイル存在確認ツールでは、例えば、正規表現によって記述されていないパターンを用いた前方一致や後方一致を行うことによって、対象ファイルのファイル名等とパターンとのマッチングを行う。これにより、ファイル存在確認ツールでは、秘密情報を含む可能性が高いファイルと、秘密情報を含む可能性が低いファイルとの分類（以下、単にファイルの分類とも呼ぶ）を短時間に行うことが可能になる。 Therefore, in the file existence confirmation tool, for example, by performing forward matching and backward matching using a pattern not described by a regular expression, the file name of the target file and the pattern are matched. As a result, the file existence confirmation tool can quickly classify files that are highly likely to contain confidential information and files that are unlikely to contain confidential information (hereafter simply referred to as file classification). become.

しかしながら、正規表現によって記述されていないパターンは、記述の自由度が低いパターンである。そのため、正規表現によって記述されていないパターンを用いる場合、ファイルの分類を行うために必要となるパターンの数が膨大になる。したがって、この場合、パターンの用意に膨大な時間を要することになり、ファイルの分類を効率的に行うことが困難になる場合がある。 However, patterns not described by regular expressions are patterns with a low degree of freedom in description. Therefore, when using patterns that are not described by regular expressions, the number of patterns required to classify files becomes enormous. Therefore, in this case, it takes an enormous amount of time to prepare the patterns, which may make it difficult to efficiently classify the files.

そこで、一つの側面では、本発明は、ファイルの分類を効率的に行うことを可能とするファイル分類装置、ファイル分類プログラム及びファイル分類方法を提供することを目的とする。 Accordingly, in one aspect, an object of the present invention is to provide a file sorting device, a file sorting program, and a file sorting method that enable efficient file sorting.

実施の形態の一態様では、文字列を記憶した第１記憶部を参照し、複数のファイルのファイル名のそれぞれが前記文字列を含むか否かを判定する第１判定部と、前記複数のファイルのファイル名のそれぞれが前記文字列を含まないと判定した場合、正規表現を記憶した第２記憶部を参照し、前記複数のファイルのファイル名のそれぞれが前記正規表現に適合するか否かを判定する第２判定部と、前記複数のファイルのファイル名のそれぞれが前記正規表現に適合する場合、前記複数のファイルのファイル名における文字列の共通部分を特定する共通特定部と、特定した前記共通部分が前記正規表現に適合する場合、特定した前記共通部分に対応する文字列を前記第１記憶部にさらに記憶する情報管理部と、を有する。 In one aspect of the embodiment, a first determination unit that refers to a first storage unit that stores a character string and determines whether each file name of a plurality of files includes the character string; When it is determined that each of the file names of the files does not contain the character string, referring to a second storage unit storing the regular expression to determine whether each of the file names of the plurality of files conforms to the regular expression. and a common identification unit that identifies a common portion of character strings in the file names of the plurality of files when each of the file names of the plurality of files conforms to the regular expression. and an information management unit that further stores a character string corresponding to the identified common portion in the first storage unit when the common portion matches the regular expression.

一つの側面によれば、ファイルの分類を効率的に行うことを可能とする。 According to one aspect, it is possible to efficiently classify files.

図１は、情報処理システム１０の構成について説明する図である。FIG. 1 is a diagram illustrating the configuration of an information processing system 10. As shown in FIG. 図２は、情報処理装置１のハードウエア構成を説明する図である。FIG. 2 is a diagram for explaining the hardware configuration of the information processing device 1. As shown in FIG. 図３は、情報処理装置１の機能のブロック図である。FIG. 3 is a functional block diagram of the information processing apparatus 1. As shown in FIG. 図４は、第１の実施の形態におけるファイル分類処理の概略を説明するフローチャート図である。FIG. 4 is a flowchart for explaining an outline of file classification processing according to the first embodiment. 図５は、第１の実施の形態におけるファイル分類処理の概略を説明するフローチャート図である。FIG. 5 is a flowchart for explaining an outline of file classification processing according to the first embodiment. 図６は、第１の実施の形態におけるファイル分類処理の詳細を説明するフローチャート図である。FIG. 6 is a flowchart for explaining details of file classification processing in the first embodiment. 図７は、第１の実施の形態におけるファイル分類処理の詳細を説明するフローチャート図である。FIG. 7 is a flowchart for explaining details of file classification processing according to the first embodiment. 図８は、第１の実施の形態におけるファイル分類処理の詳細を説明するフローチャート図である。FIG. 8 is a flowchart for explaining details of file classification processing according to the first embodiment. 図９は、前方一致情報１３３の具体例について説明する図である。FIG. 9 is a diagram explaining a specific example of the prefix match information 133. As shown in FIG. 図１０は、正規表現情報１３２の具体例について説明する図である。FIG. 10 is a diagram explaining a specific example of the regular expression information 132. As shown in FIG. 図１１は、一時格納情報１３５の具体例について説明する図である。FIG. 11 is a diagram illustrating a specific example of the temporary storage information 135. As shown in FIG. 図１２は、第１の実施の形態におけるファイル分類処理を説明する図である。FIG. 12 is a diagram for explaining file classification processing according to the first embodiment. 図１３は、前方一致情報１３３の具体例について説明する図である。FIG. 13 is a diagram explaining a specific example of the prefix match information 133. As shown in FIG. 図１４は、前方一致情報１３３の具体例について説明する図である。FIG. 14 is a diagram explaining a specific example of the prefix match information 133. As shown in FIG. 図１５は、一時格納情報１３５の具体例について説明する図である。FIG. 15 is a diagram illustrating a specific example of the temporary storage information 135. As shown in FIG. 図１６は、一時格納情報１３５の具体例について説明する図である。FIG. 16 is a diagram illustrating a specific example of the temporary storage information 135. As shown in FIG. 図１７は、第２の実施の形態におけるファイル分類処理を説明するフローチャート図である。FIG. 17 is a flowchart for explaining file classification processing according to the second embodiment. 図１８は、第２の実施の形態におけるファイル分類処理を説明するフローチャート図である。FIG. 18 is a flowchart for explaining file classification processing according to the second embodiment. 図１９は、第２の実施の形態におけるファイル分類処理を説明するフローチャート図である。FIG. 19 is a flowchart for explaining file classification processing according to the second embodiment. 図２０は、第２の実施の形態におけるファイル分類処理を説明するフローチャート図である。FIG. 20 is a flowchart for explaining file classification processing in the second embodiment. 図２１は、第２の実施の形態におけるファイル分類処理を説明するフローチャート図である。FIG. 21 is a flowchart for explaining file classification processing in the second embodiment. 図２２は、後方一致情報１３４の具体例について説明する図である。FIG. 22 is a diagram illustrating a specific example of the backward matching information 134. As shown in FIG.

［情報処理システムの構成］
初めに、情報処理システム１０の構成について説明を行う。図１は、情報処理システム１０の構成について説明する図である。 [Configuration of information processing system]
First, the configuration of the information processing system 10 will be described. FIG. 1 is a diagram illustrating the configuration of an information processing system 10. As shown in FIG.

図１に示すように、情報処理システム１０は、例えば、ユーザが各種作業を行うＰＣである情報処理装置１と、情報処理装置１とネットワークＮＷ（例えば、インターネット）を介して接続する管理装置２と、管理装置２によってアクセスされる記憶装置３とを有する。 As shown in FIG. 1, an information processing system 10 includes, for example, an information processing device 1, which is a PC on which a user performs various tasks, and a management device 2 connected to the information processing device 1 via a network NW (for example, the Internet). and a storage device 3 accessed by the management device 2 .

情報処理装置１は、対象ファイルが秘密情報を含むか否かを判定する処理（以下、ファイル分類処理とも呼ぶ）が行われる場合、例えば、管理装置２にアクセスし、記憶装置３に記憶されたパターン（以下、第１文字列とも呼ぶ）を取得して記憶領域（以下、第１記憶部とも呼ぶ）に記憶する。第１文字列は、例えば、正規表現によって記述された文字列である。 The information processing device 1 accesses the management device 2 to determine whether or not the target file contains confidential information (hereinafter also referred to as file classification processing). A pattern (hereinafter also referred to as a first character string) is obtained and stored in a storage area (hereinafter also referred to as a first storage unit). The first character string is, for example, a character string described by regular expressions.

そして、情報処理装置１は、正規表現が用いられていない文字列（以下、第２文字列とも呼ぶ）を記憶した記憶領域（以下、第２記憶部とも呼ぶ）を参照し、複数の対象ファイルのそれぞれが第２文字列を含むか否かを判定する。第２文字列は、例えば、正規表現が用いられていない文字列である。 Then, the information processing apparatus 1 refers to a storage area (hereinafter also referred to as a second storage unit) that stores a character string (hereinafter also referred to as a second character string) that does not use regular expressions, and stores a plurality of target files. contains the second string. The second character string is, for example, a character string that does not use regular expressions.

その結果、複数のファイルのそれぞれが第２文字列を含まないと判定した場合、情報処理装置１は、第１記憶部を参照し、複数の対象ファイルのそれぞれが第１文字列に対応する正規表現に適合するか否かを判定する。 As a result, when it is determined that each of the plurality of files does not contain the second character string, the information processing apparatus 1 refers to the first storage unit and determines that each of the plurality of target files corresponds to the first character string. Determine whether the expression matches.

そして、複数のファイルのそれぞれが正規表現に適合しないと判定した場合、情報処理装置１は、例えば、複数のファイルのそれぞれが秘密情報を含む可能性が高いファイルであると判定し、その旨をユーザに通知する。 Then, when it is determined that each of the plurality of files does not match the regular expression, the information processing apparatus 1 determines that each of the plurality of files is likely to contain confidential information, and notifies that fact. Notify users.

一方、複数のファイルのそれぞれが正規表現に適合すると判定した場合、情報処理装置１は、複数の対象ファイルのファイル名における文字列の共通部分を特定する。そして、情報処理装置１は、特定した共通部分が第１文字列に対応する正規表現に適合すると判定した場合、特定した共通部分に対応する文字列を第２文字列の少なくとも１つとして第２記憶部にさらに記憶する。 On the other hand, when it is determined that each of the plurality of files matches the regular expression, the information processing apparatus 1 identifies the common part of the character strings in the file names of the plurality of target files. Then, when the information processing apparatus 1 determines that the specified common part matches the regular expression corresponding to the first character string, the information processing apparatus 1 sets the character string corresponding to the specified common part as at least one of the second character strings to the second character string. It is further stored in the storage unit.

すなわち、本実施の形態における情報処理装置１は、複数の対象ファイルのそれぞれが秘密情報を含む可能性が高いファイルであるか否かを判定するとともに、複数のファイルのファイル名等から新たに特定された第２文字列の蓄積を行う。ここで、第２文字列は、秘密情報を含む可能性が低いと判定された複数の対象ファイルのファイル名等の共通部分である。そのため、情報処理装置１は、ファイル名等に第２文字列が含まれる対象ファイルが新たに発生した場合、その新たに発生した対象ファイルが秘密情報を含む可能性が低いファイルであると判定することが可能である。したがって、情報処理装置１は、複数の対象ファイルのそれぞれが第１文字列に対応する正規表現に適合するか否かについての判定を行う前に、複数の対象ファイルのそれぞれが第２文字列を含むか否かの判定を行う。 That is, the information processing apparatus 1 according to the present embodiment determines whether or not each of the plurality of target files is likely to contain confidential information, and newly identifies the plurality of files from the file names and the like. store the second character string. Here, the second character string is a common part such as file names of a plurality of target files determined to have a low possibility of containing confidential information. Therefore, when a new target file whose file name or the like includes the second character string is generated, the information processing apparatus 1 determines that the newly generated target file is unlikely to contain confidential information. Is possible. Therefore, before determining whether each of the plurality of target files matches the regular expression corresponding to the first character string, the information processing apparatus 1 allows each of the plurality of target files to match the second character string. It is determined whether or not it is included.

これにより、情報処理装置１は、対象ファイルのファイル名等と正規表現によって記述された第１文字列とのマッチング回数を抑制することが可能になる。そのため、情報処理装置１は、正規表現によって記述されていないパターンを用いることなく、対象ファイルの分類に要する時間を短縮させることが可能になる。したがって、情報処理装置１は、パターンの作成に要する負担を抑制しつつ、対象ファイルの分類の効率化を行うことが可能になる。 As a result, the information processing apparatus 1 can suppress the number of matching times between the file name of the target file and the first character string described by the regular expression. Therefore, the information processing apparatus 1 can reduce the time required to classify the target file without using patterns not described by regular expressions. Therefore, the information processing apparatus 1 can efficiently classify the target files while suppressing the burden required for pattern creation.

［情報処理システムのハードウエア構成］
次に、情報処理システム１０のハードウエア構成について説明する。図２は、情報処理装置１のハードウエア構成を説明する図である。 [Hardware configuration of information processing system]
Next, the hardware configuration of the information processing system 10 will be described. FIG. 2 is a diagram for explaining the hardware configuration of the information processing device 1. As shown in FIG.

情報処理装置１は、図２に示すように、プロセッサであるＣＰＵ１０１と、メモリ１０２と、外部インターフェース（Ｉ／Ｏユニット）１０３と、記憶媒体１０４とを有する。各部は、バス１０５を介して互いに接続される。 The information processing apparatus 1 has a CPU 101 as a processor, a memory 102, an external interface (I/O unit) 103, and a storage medium 104, as shown in FIG. Each unit is connected to each other via a bus 105 .

記憶媒体１０４は、例えば、ファイル分類処理を行うためのプログラム１１０を記憶するプログラム格納領域（図示しない）を有する。また、記憶媒体１０４は、例えば、ファイル分類処理を行う際に用いられる情報を記憶する記憶部１３０（以下、情報格納領域１３０とも呼ぶ）を有する。なお、記憶媒体１０４は、例えば、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）やＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）であってよい。なお、上記の第１記憶部、第２記憶部及び第３記憶部のそれぞれは、例えば、記憶部１３０の少なくとも一部に対応するものであってよい。 The storage medium 104 has, for example, a program storage area (not shown) that stores a program 110 for file classification processing. The storage medium 104 also has a storage unit 130 (hereinafter also referred to as an information storage area 130) that stores information used when performing file classification processing, for example. Note that the storage medium 104 may be, for example, an HDD (Hard Disk Drive) or an SSD (Solid State Drive). Note that each of the first storage unit, the second storage unit, and the third storage unit described above may correspond to at least part of the storage unit 130, for example.

ＣＰＵ１０１は、記憶媒体１０４からメモリ１０２にロードされたプログラム１１０を実行してファイル分類処理を行う。 The CPU 101 executes the program 110 loaded from the storage medium 104 to the memory 102 to perform file classification processing.

また、外部インターフェース１０３は、例えば、ネットワークＮＷを介して管理装置２と通信を行う。 Also, the external interface 103 communicates with the management apparatus 2 via the network NW, for example.

［情報処理システムの機能］
次に、情報処理システム１０の機能について説明を行う。図３は、情報処理装置１の機能のブロック図である。 [Functions of information processing system]
Next, functions of the information processing system 10 will be described. FIG. 3 is a functional block diagram of the information processing apparatus 1. As shown in FIG.

情報処理装置１は、図３に示すように、例えば、ＣＰＵ１０１やメモリ１０２等のハードウエアとプログラム１１０とが有機的に協働することにより、情報受付部１１１と、情報管理部１１２と、ファイル名抽出部１１３と、第１判定部１１４と、第２判定部１１５と、共通特定部１１６と、第３判定部１１７と、情報出力部１１８とを含む各種機能を実現する。 As shown in FIG. 3, the information processing apparatus 1 has an information reception unit 111, an information management unit 112, a file Various functions including a name extraction unit 113, a first determination unit 114, a second determination unit 115, a common identification unit 116, a third determination unit 117, and an information output unit 118 are realized.

また、情報処理装置１は、例えば、図３に示すように、ファイル情報１３１と、正規表現情報１３２と、前方一致情報１３３と、後方一致情報１３４と、一時格納情報１３５とを情報格納領域１３０に記憶する。なお、上記の第１文字列は、例えば、正規表現情報１３２に対応するものであってよい。また、上記の第２文字列は、例えば、前方一致情報１３３または後方一致情報１３４に対応するものであってよい。 For example, as shown in FIG. 3, the information processing apparatus 1 stores file information 131, regular expression information 132, prefix matching information 133, trailing match information 134, and temporary storage information 135 in the information storage area 130. memorize to Note that the above first character string may correspond to the regular expression information 132, for example. Also, the above second character string may correspond to the prefix match information 133 or the suffix match information 134, for example.

情報受付部１１１は、例えば、管理装置２から送信された正規表現情報１３２を受け付ける。そして、情報管理部１１２は、例えば、情報受付部１１１が受け付けた正規表現情報１３２を情報格納領域１３０に記憶する。なお、正規表現情報１３２は、ユーザ等によって予め情報格納領域１３０に記憶されるものであってもよい。 The information receiving unit 111 receives regular expression information 132 transmitted from the management device 2, for example. Then, the information management unit 112 stores the regular expression information 132 received by the information receiving unit 111 in the information storage area 130, for example. Note that the regular expression information 132 may be stored in advance in the information storage area 130 by a user or the like.

ファイル名抽出部１１３は、例えば、情報処理装置１で動作するＯＳ（図示しない）からファイル情報１３１を取得する。ファイル情報１３１は、例えば、ファイルに対するｒｅａｄ情報やｗｒｉｔｅ情報を含むものであってよい。そして、ファイル名抽出部１１３は、ＯＳから取得したファイル情報１３１から各ファイルのファイル名等を抽出する。 The file name extractor 113 acquires the file information 131 from, for example, an OS (not shown) running on the information processing apparatus 1 . The file information 131 may include, for example, read information and write information for files. Then, the file name extraction unit 113 extracts the file name of each file from the file information 131 acquired from the OS.

第１判定部１１４は、情報格納領域１３０に記憶された前方一致情報１３３を参照し、ファイル名抽出部１１３が抽出したファイル名のそれぞれが前方一致情報１３３に含まれる文字列を含むか否かを判定する。前方一致情報１３３は、ファイル名抽出部１１３が抽出したファイル名のそれぞれと前方一致する関係にあるか否かの判定が行われる文字列を含む情報である。 The first determination unit 114 refers to the prefix match information 133 stored in the information storage area 130 and determines whether each of the file names extracted by the file name extraction unit 113 includes the character string included in the prefix match information 133. judge. The prefix match information 133 is information including a character string for determining whether or not there is a prefix match relationship with each of the file names extracted by the file name extraction unit 113 .

第２判定部１１５は、ファイル名抽出部１１３が抽出したファイル名のそれぞれが前方一致情報１３３に含まれる文字列を含まないと判定した場合、情報格納領域１３０に記憶された正規表現情報１３２を参照し、ファイル名抽出部１１３が抽出したファイル名のそれぞれが正規表現情報１３２に含まれる正規表現に適合するか否かを判定する。 When the second determination unit 115 determines that each of the file names extracted by the file name extraction unit 113 does not include the character string included in the prefix matching information 133, the second determination unit 115 converts the regular expression information 132 stored in the information storage area 130 into It is determined whether or not each of the file names extracted by the file name extraction unit 113 matches the regular expression included in the regular expression information 132 .

共通特定部１１６は、ファイル名抽出部１１３が抽出したファイル名のそれぞれが正規表現情報１３２に含まれる正規表現に適合する場合、ファイル名抽出部１１３が抽出したファイル名における文字列の共通部分を特定する。 If each of the file names extracted by the file name extraction unit 113 matches the regular expression included in the regular expression information 132, the common identification unit 116 identifies the common part of the character strings in the file names extracted by the file name extraction unit 113. Identify.

第３判定部１１７は、共通特定部１１６が特定した共通部分が正規表現情報１３２に含まれる正規表現に適合するか否かを判定する。 The third determination unit 117 determines whether the common portion identified by the common identification unit 116 matches the regular expression included in the regular expression information 132 .

そして、情報管理部１１２は、共通特定部１１６が特定した共通部分が正規表現情報１３２に含まれる場合、共通特定部１１６が特定した共通部分に対応する文字列を前方一致情報１３３の少なくとも一部として情報格納領域１３０に記憶する。 Then, when the common part specified by the common specifying part 116 is included in the regular expression information 132 , the information managing part 112 adds a character string corresponding to the common part specified by the common specifying part 116 to at least part of the prefix matching information 133 . is stored in the information storage area 130 as .

情報出力部１１８は、ファイル名抽出部１１３が抽出したファイル名のそれぞれが正規表現情報１３２に含まれる正規表現に適合しないと第２判定部１１５が判定した場合、ファイル名抽出部１１３が抽出したファイル名に対応するファイルが秘密情報を含む可能性が高いことを示す情報をユーザに通知する。具体的に、情報出力部１１８は、この場合、ファイル名抽出部１１３が抽出したファイル名に対応するファイルが秘密情報を含む可能性が高いことを示す情報を、情報処理装置１の出力装置（図示しない）に出力する。後方一致情報１３４についての説明は後述する。 When the second determination unit 115 determines that each of the file names extracted by the file name extraction unit 113 does not match the regular expression included in the regular expression information 132, the information output unit 118 outputs the file name extracted by the file name extraction unit 113. The user is notified that the file corresponding to the file name is highly likely to contain confidential information. Specifically, in this case, the information output unit 118 outputs information indicating that there is a high possibility that the file corresponding to the file name extracted by the file name extraction unit 113 contains confidential information. (not shown). A description of the backward match information 134 will be given later.

［第１の実施の形態の概略］
次に、第１の実施の形態の概略について説明する。図４及び図５は、第１の実施の形態におけるファイル分類処理の概略を説明するフローチャート図である。 [Outline of the first embodiment]
Next, an outline of the first embodiment will be described. 4 and 5 are flowcharts for explaining an outline of file classification processing according to the first embodiment.

情報処理装置１は、図４に示すように、ファイル分類タイミングになるまで待機する（Ｓ１のＮＯ）。ファイル分類タイミングは、例えば、ユーザが情報処理装置１に対してファイル分類処理を行う旨の情報を入力したタイミングであってよい。 As shown in FIG. 4, the information processing apparatus 1 waits until the file classification timing (NO in S1). The file classification timing may be, for example, the timing at which the user inputs information to the information processing apparatus 1 to perform the file classification process.

そして、ファイル分類タイミングになった場合（Ｓ１のＹＥＳ）、情報処理装置１は、第２文字列を記憶した第２記憶部を参照し、複数のファイルのファイル名等のそれぞれが第２文字列を含むか否かを判定する（Ｓ２）。 Then, when it is time to classify files (YES in S1), the information processing apparatus 1 refers to the second storage unit storing the second character string, and the file names of the plurality of files each have the second character string. (S2).

その結果、複数のファイルのファイル名等のそれぞれが第２文字列を含まないと判定した場合（Ｓ３のＮＯ）、情報処理装置１は、第１文字列を記憶した第１記憶部を参照し、複数のファイルのファイル名等のそれぞれが第１文字列に対応する正規表現に適合するか否かを判定する（Ｓ４）。 As a result, when it is determined that each of the file names of the plurality of files does not contain the second character string (NO in S3), the information processing device 1 refers to the first storage unit storing the first character string. , file names of a plurality of files, and the like match the regular expression corresponding to the first character string (S4).

そして、複数のファイルのファイル名等のそれぞれが第１文字列に対応する正規表現に適合すると判定した場合（Ｓ５のＹＥＳ）、情報処理装置１は、図５に示すように、複数のファイルのファイル名等における文字列の共通部分を特定する（Ｓ１１）。 Then, when it is determined that each of the file names of the plurality of files matches the regular expression corresponding to the first character string (YES in S5), the information processing apparatus 1 performs the processing of the plurality of files as shown in FIG. A common portion of character strings in file names and the like is specified (S11).

続いて、情報処理装置１は、Ｓ１１の処理で特定した共通部分が第１文字列に対応する正規表現に適合するが否かを判定する（Ｓ１２）。 Subsequently, the information processing apparatus 1 determines whether or not the common part specified in the process of S11 matches the regular expression corresponding to the first character string (S12).

その結果、Ｓ１１の処理で特定した共通部分が第１文字列に対応する正規表現に適合すると判定した場合（Ｓ１３のＹＥＳ）、情報処理装置１は、Ｓ１２の処理で特定した共通部分に対応する文字列を第２文字列として第２記憶部にさらに記憶する（Ｓ１４）。 As a result, if it is determined that the common part specified in the process of S11 matches the regular expression corresponding to the first character string (YES in S13), the information processing device 1 determines that the common part specified in the process of S12 The character string is further stored in the second storage unit as a second character string (S14).

一方、複数のファイルのファイル名等のそれぞれが第２文字列を含むと判定した場合（Ｓ３のＹＥＳ）、または、複数のファイルのファイル名等のそれぞれが第１文字列に対応する正規表現に適合しないと判定した場合（Ｓ５のＮＯ）、情報処理装置１は、ファイル分類処理を終了する。また、Ｓ１１の処理で特定した共通部分が第１文字列に対応する正規表現に適合しないと判定した場合についても同様に（Ｓ１３のＮＯ）、情報処理装置１は、ファイル分類処理を終了する。 On the other hand, if it is determined that each of the file names of the plurality of files includes the second character string (YES in S3), or if each of the file names of the plurality of files is a regular expression corresponding to the first character string, If it is determined that it does not match (NO in S5), the information processing device 1 ends the file classification process. Likewise, when it is determined that the common portion specified in the process of S11 does not match the regular expression corresponding to the first character string (NO in S13), the information processing device 1 terminates the file classification process.

［第１の実施の形態の詳細］
次に、第１の実施の形態の詳細について説明する。図６から図８は、第１の実施の形態におけるファイル分類処理の詳細を説明するフローチャート図である。また、図９から図１６は、第１の実施の形態におけるファイル分類処理の詳細を説明する図である。なお、以下、正規表現情報１３２、前方一致情報１３３及び後方一致情報１３４のそれぞれがホワイトリストである場合について説明を行う。 [Details of the first embodiment]
Next, details of the first embodiment will be described. 6 to 8 are flowcharts for explaining the details of file classification processing according to the first embodiment. 9 to 16 are diagrams for explaining the details of the file classification process in the first embodiment. A case where each of the regular expression information 132, the prefix matching information 133, and the trailing match information 134 is a whitelist will be described below.

情報受付部１１１は、図６に示すように、ファイル情報１３１を受け付けるまで待機する（Ｓ２１のＮＯ）。具体的に、情報受付部１１１は、例えば、ＯＳから送信されたファイル情報１３１を受け付けるまで待機する。 As shown in FIG. 6, the information reception unit 111 waits until it receives the file information 131 (NO in S21). Specifically, the information receiving unit 111 waits until it receives the file information 131 transmitted from the OS, for example.

そして、ファイル情報１３１を受け付けた場合（Ｓ２１のＹＥＳ）、ファイル名抽出部１１３は、Ｓ２１の処理で受け付けたファイル情報１３１から対象ファイルのファイル名等を抽出する（Ｓ２２）。 When the file information 131 is received (YES in S21), the file name extractor 113 extracts the file name and the like of the target file from the file information 131 received in the process of S21 (S22).

続いて、第１判定部１１４は、Ｓ２２の処理で抽出したファイル名等に対応する文字列が、情報格納領域１３０に記憶された前方一致情報１３３に含まれる文字列と前方一致する関係にあるか否かを判定する（Ｓ２３）。以下、前方一致情報１３３の具体例について説明を行う。 Subsequently, the first determination unit 114 determines that the character string corresponding to the file name extracted in the process of S22 matches the character string included in the prefix match information 133 stored in the information storage area 130. (S23). A specific example of the prefix match information 133 will be described below.

［前方一致情報の具体例］
図９、図１３及び図１４は、前方一致情報１３３の具体例について説明する図である。 [Specific example of prefix matching information]
9, 13 and 14 are diagrams for explaining specific examples of the prefix match information 133. FIG.

図９等に示す前方一致情報１３３は、前方一致の判定に用いられる文字列が記憶される「文字列」と、各情報が生成（更新）されたエポック秒が記憶される「タイムスタンプ」とを項目として有する。 The prefix matching information 133 shown in FIG. 9 and the like consists of a "character string" that stores a character string used for determining prefix matching, and a "time stamp" that stores the epoch seconds when each piece of information was generated (updated). as items.

具体的に、図９に示す前方一致情報１３３において、１行目の情報には、「文字列」として「ｃ：￥ｕｓｅｒ￥ａｐｐｄａｔａ￥ｔｅｓｔ￥」が記憶され、「タイムスタンプ」として「１５５１１２８９２８」が記憶されている。 Specifically, in the prefix match information 133 shown in FIG. 9, the information on the first line stores "c:\user\appdata\test\" as the "character string" and "1551128928" as the "time stamp". is stored.

また、図９に示す前方一致情報１３３において、２行目の情報には、「文字列」として「ｃ：￥ｔｍｐ￥」が記憶され、「タイムスタンプ」として「１５５１１２９４７５」が記憶されている。 In the prefix match information 133 shown in FIG. 9, the information on the second line stores "c:\tmp\" as the "character string" and "1551129475" as the "time stamp".

そのため、例えば、Ｓ２２の処理で抽出したファイル名等に対応する文字列が「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ｔｅｓｔ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥ａ１￥３４５６７」であった場合、第１判定部１１４は、Ｓ２２の処理で抽出したファイル名等に対応する文字列と、前方一致情報１３３に情報が含まれる各文字列とが前方一致する関係にないと判定する。 Therefore, for example, if the character string corresponding to the file name extracted in S22 is "c:\documents\test\.git\objects\a1\34567", the first determination unit 114 It is determined that the character string corresponding to the file name or the like extracted in the process and each character string whose information is included in the prefix match information 133 do not have a prefix matching relationship.

図６に戻り、Ｓ２２の処理で抽出したファイル名等に対応する文字列が、情報格納領域１３０に記憶された前方一致情報１３３に含まれる文字列と前方一致する関係にないと判定した場合（Ｓ２４のＮＯ）、第２判定部１１５は、Ｓ２２の処理で抽出したファイル名等に対応する文字列が、情報格納領域１３０に記憶された正規表現情報１３２に含まれる正規表現に適合するか否かを判定する（Ｓ２５）。以下、正規表現情報１３２の具体例について説明を行う。 Returning to FIG. 6, when it is determined that the character string corresponding to the file name or the like extracted in the process of S22 does not have a prefix match relationship with the character string included in the prefix match information 133 stored in the information storage area 130 ( NO in S24), the second determination unit 115 determines whether the character string corresponding to the file name extracted in the process of S22 matches the regular expression included in the regular expression information 132 stored in the information storage area 130. (S25). A specific example of the regular expression information 132 will be described below.

［正規表現情報の具体例］
図１０は、正規表現情報１３２の具体例について説明する図である。 [Specific example of regular expression information]
FIG. 10 is a diagram explaining a specific example of the regular expression information 132. As shown in FIG.

図１０に示す正規表現情報１３２は、正規表現情報１３２に含まれる各情報を識別する「ＩＤ」と、正規表現によって記述された文字列が記憶される「文字列」とを項目として有する。 The regular expression information 132 shown in FIG. 10 has items of “ID” for identifying each piece of information included in the regular expression information 132 and “character string” in which a character string described by the regular expression is stored.

具体的に、図１０に示す正規表現情報１３２において、１行目の情報には、「ＩＤ」として「ＲＥＧＥＸＰ１」が記憶されている。また、図１０に示す正規表現情報１３２において、１行目の情報には、「文字列」として、￥または／で区切られたフォルダ以下のコンマから始まるフォルダ名またはファイル名等を示す「（．＊［／￥￥］）＋￥．．＋」が記憶されている。図１０に含まれる他の情報についての説明は省略する。 Specifically, in the regular expression information 132 shown in FIG. 10, "REGEXP1" is stored as "ID" in the information on the first line. In the regular expression information 132 shown in FIG. 10, the information on the first line contains "(. *[/¥¥])+¥..+” is stored. Description of other information included in FIG. 10 is omitted.

そのため、例えば、Ｓ２２の処理で抽出したファイル名等に対応する文字列が「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ｔｅｓｔ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥ａ１￥３４５６７」であった場合、第２判定部１１５は、１行目の情報の「文字列」に記憶された正規表現に、Ｓ２２の処理で抽出したファイル名等に対応する文字列が適合すると判定する。 Therefore, for example, when the character string corresponding to the file name extracted in the process of S22 is "c:\documents\test\.git\objects\a1\34567", the second determination unit 115 It is determined that the character string corresponding to the file name extracted in the process of S22 matches the regular expression stored in the "character string" of the eye information.

図６に戻り、Ｓ２２の処理で抽出したファイル名等に対応する文字列が、情報格納領域１３０に記憶された正規表現情報１３２に含まれる正規表現に適合すると判定した場合（Ｓ２６のＹＥＳ）、共通特定部１１６は、図７に示すように、Ｓ２２の処理で抽出したファイル名等における文字列と、情報格納領域１３０に記憶された一時格納情報１３５に含まれる文字列のそれぞれとが前方一致する関係にあるか否かを判定する（Ｓ３１）。以下、一時格納情報１３５の具体例について説明を行う。 Returning to FIG. 6, if it is determined that the character string corresponding to the file name extracted in the process of S22 matches the regular expression included in the regular expression information 132 stored in the information storage area 130 (YES in S26), As shown in FIG. 7, the common identification unit 116 determines whether the character strings in the file name extracted in the process of S22 and the character strings included in the temporary storage information 135 stored in the information storage area 130 are prefix-matched. It is determined whether or not there is a relationship to do (S31). A specific example of the temporary storage information 135 will be described below.

［一時格納情報の具体例］
図１１及び図１５は、一時格納情報１３５の具体例について説明する図である。 [Specific example of temporary storage information]
11 and 15 are diagrams illustrating specific examples of the temporary storage information 135. FIG.

図１１等に示す一時格納情報１３５は、一時格納情報１３５に含まれる各情報と適合する正規表現情報１３２（例えば、図１０で説明した正規表現情報１３２に含まれるいずれかの情報）を識別する「ＩＤ」と、Ｓ２２の処理で抽出したファイル名等の文字列が記憶される「文字列」と、各情報が生成（更新）されたエポック秒が記憶される「タイムスタンプ」とを項目として有する。 Temporarily stored information 135 shown in FIG. 11 and the like identifies regular expression information 132 that matches each piece of information included in temporarily stored information 135 (for example, any information included in regular expression information 132 described with reference to FIG. 10). "ID", "character string" storing character strings such as file names extracted in the process of S22, and "timestamp" storing the epoch seconds when each piece of information was generated (updated) as items. have.

具体的に、図１１に示す一時格納情報１３５において、１行目の情報には、「ＩＤ」として「ＲＥＧＥＸＰ１」が記憶され、「文字列」として「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ｔｅｓｔ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥００￥１２３４５」が記憶され、「タイムスタンプ」として「１５５１１２８８７１」が記憶されている。 Specifically, in the temporary storage information 135 shown in FIG. 11, the information on the first line stores “REGEXP1” as “ID” and “c:\documents\test\.git\objects” as “character string”. ¥00¥12345” is stored, and “1551128871” is stored as the “time stamp”.

また、図１１に示す一時格納情報１３５において、２行目の情報には、「ＩＤ」として「ＲＥＧＥＸＰ２」が記憶され、「文字列」として「ｃ：￥ｕｓｅｒ￥ａｐｐｄａｔａ￥ｔｅｓｔ￥ａｂｃｄｅｆ.ｐｄｆ」が記憶され、「タイムスタンプ」として「１５５１１２８９２８」が記憶されている。図１１に含まれる他の情報についての説明は省略する。 In the temporary storage information 135 shown in FIG. 11, the information on the second line stores "REGEXP2" as "ID" and "c:\user\appdata\test\abcdef.pdf" as "character string". is stored, and "1551128928" is stored as the "timestamp". Description of other information included in FIG. 11 is omitted.

そのため、例えば、Ｓ２２の処理で抽出したファイル名等に対応する文字列が「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ｔｅｓｔ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥ａ１￥３４５６７」であった場合、共通特定部１１６は、Ｓ２２の処理で抽出したファイル名等に対応する文字列と、１行目の情報の「文字列」に記憶された文字列とが前方一致する関係にあると判定する。 Therefore, for example, when the character string corresponding to the file name extracted in the process of S22 is "c:\documents\test\.git\objects\a1\34567", the common identification unit 116 performs the process of S22. It is determined that the character string corresponding to the file name, etc., extracted in step 1 and the character string stored in the "character string" of the information on the first line have a forward matching relationship.

図７に戻り、Ｓ２２の処理で抽出したファイル名等における文字列と、情報格納領域１３０に記憶された一時格納情報１３５に含まれる文字列のそれぞれとが前方一致する関係にあると判定した場合（Ｓ３２のＹＥＳ）、共通特定部１１６は、Ｓ２２の処理で抽出したファイル名等における文字列のうち、Ｓ３１の処理で前方一致する関係にあると判定した文字列を特定する（Ｓ３３）。 Returning to FIG. 7, when it is determined that the character strings in the file name extracted in the process of S22 and the character strings included in the temporary storage information 135 stored in the information storage area 130 have a forward matching relationship. (YES in S32), the common identification unit 116 identifies character strings in the file names and the like extracted in the process of S22 that are determined to have a prefix matching relationship in the process of S31 (S33).

具体的に、図１１で説明した一時格納情報１３５における１行目の情報には、「文字列」として「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ｔｅｓｔ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥００￥１２３４５」が記憶されている。そのため、例えば、Ｓ２２の処理で抽出したファイル名等に対応する文字列が「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ｔｅｓｔ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥ａ１￥３４５６７」である場合、共通特定部１１６は、図１２に示すように、これらの文字列の共通部分である「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ｔｅｓｔ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥」と特定する。 Specifically, in the information on the first line in the temporary storage information 135 described with reference to FIG. 11, "c:\documents\test\.git\objects\00\12345" is stored as the "character string". Therefore, for example, when the character string corresponding to the file name extracted in the process of S22 is "c:\documents\test\.git\objects\a1\34567", the common identification unit 116 performs the , the common part of these character strings is identified as "c:\documents\test\.git\objects\".

そして、第３判定部１１７は、Ｓ３３の処理で特定した文字列が、Ｓ３１の処理で前方一致する関係にあると判断した一時格納情報１３５に含まれる文字列に対応するＩＤに対応する情報格納領域１３０に記憶された正規表現情報１３２に含まれる正規表現に適合するか否かを判定する（Ｓ３４）。 Then, the third determination unit 117 stores information corresponding to the ID corresponding to the character string included in the temporarily stored information 135 determined in the process of S31 that the character string specified in the process of S33 has a prefix matching relationship. It is determined whether or not the regular expression contained in the regular expression information 132 stored in the area 130 is matched (S34).

具体的に、例えば、Ｓ３３の処理で特定した文字列が「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ｔｅｓｔ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥」である場合、第３判定部１１７は、Ｓ３３の処理で特定した文字列が、Ｓ３１の処理で前方一致する関係にあると判断した一時格納情報１３５に含まれる文字列「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ｔｅｓｔ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥００￥１２３４５」に対応するＩＤ「ＲＥＧＥＸＰ１」に対応する図１０で説明した正規表現情報１３２における１行目の情報の「文字列」に記憶された正規表現である「（．＊［／￥￥］）＋￥．．＋」に適合すると判定する。 Specifically, for example, when the character string specified in the process of S33 is "c:\documents\test\.git\objects\", the third determination unit 117 determines that the character string specified in the process of S33 is A diagram corresponding to the ID "REGEXP1" corresponding to the character string "c:\documents\test\.git\objects\00\12345" included in the temporary storage information 135 determined to have a prefix matching relationship in the process of S31 10 is matched with the regular expression "(.*[/¥¥])+¥..+" stored in the "character string" of the information on the first line in the regular expression information 132 described in 10 above.

その結果、Ｓ３３の処理で特定した文字列が、情報格納領域１３０に記憶された正規表現情報１３２に含まれる正規表現に適合すると判定した場合（Ｓ３５のＹＥＳ）、情報管理部１１２は、Ｓ３３の処理で特定した文字列を前方一致情報１３３の少なくとも一部として情報格納領域１３０に記憶する（Ｓ３６）。 As a result, when it is determined that the character string specified in the process of S33 matches the regular expression included in the regular expression information 132 stored in the information storage area 130 (YES in S35), the information management unit 112 performs the The character string specified in the process is stored in the information storage area 130 as at least part of the prefix matching information 133 (S36).

具体的に、例えば、Ｓ３５の処理において正規表現に適合すると判定した文字列が「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ｔｅｓｔ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥」である場合、情報管理部１１２は、図１２に示すように、「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ｔｅｓｔ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥」を前方一致情報１３３として情報格納領域１３０に記憶することを決定する。そして、情報管理部１１２は、例えば、図１３の下線部分に示すように、「文字列」に「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ｔｅｓｔ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥」を記憶した情報（３行目の情報）を、前方一致情報１３３として情報格納領域１３０に追加する。 Specifically, for example, when the character string determined to match the regular expression in the process of S35 is "c:\documents\test\.git\objects\", the information management unit 112 performs the following operations as shown in FIG. , “c:¥documents¥test¥.git¥objects¥” is stored in the information storage area 130 as the prefix matching information 133 . Then, the information management unit 112, for example, as shown in the underlined portion of FIG. is added to the information storage area 130 as the prefix matching information 133 .

続いて、情報出力部１１８は、図８に示すように、例えば、Ｓ２２の処理で抽出したファイル名等に対応するファイルが秘密情報を含むファイルでないことを示す情報を生成する（Ｓ４２）。 Subsequently, as shown in FIG. 8, the information output unit 118 generates, for example, information indicating that the file corresponding to the file name extracted in the process of S22 does not contain confidential information (S42).

その後、情報出力部１１８は、Ｓ４２の処理で生成した情報を出力する（Ｓ４４）。具体的に、情報出力部１１８は、例えば、Ｓ４２の処理で生成した情報を情報処理装置１の出力装置（図示しない）に出力する。 After that, the information output unit 118 outputs the information generated in the process of S42 (S44). Specifically, the information output unit 118 outputs the information generated in the process of S42 to an output device (not shown) of the information processing apparatus 1, for example.

また、Ｓ２２の処理で抽出したファイル名等に対応する文字列が、情報格納領域１３０に記憶された前方一致情報１３３に含まれる文字列と前方一致する関係にあると判定した場合も同様に（Ｓ２４のＹＥＳ）、情報出力部１１８は、Ｓ４２以降の処理を行う。 Similarly, when it is determined that the character string corresponding to the file name or the like extracted in the process of S22 has a prefix matching relationship with the character string included in the prefix match information 133 stored in the information storage area 130 ( YES in S24), the information output unit 118 performs the processing from S42 onwards.

なお、情報管理部１１２は、この場合、図１４の下線部分に示すように、情報格納領域１３０に記憶された前方一致情報１３３に含まれるタイムスタンプのうち、Ｓ２２の処理で抽出したファイル名等に対応する文字列と前方一致する関係にあると判定された文字列のタイムスタンプを、現在の日時に更新するものであってよい。 In this case, the information management unit 112, as shown in the underlined part in FIG. The time stamp of the character string determined to have a prefix matching relationship with the character string corresponding to may be updated to the current date and time.

一方、Ｓ２２の処理で抽出したファイル名等における文字列と、情報格納領域１３０に記憶された一時格納情報１３５に含まれる文字列のそれぞれとが前方一致する関係にないと判定した場合（Ｓ３２のＮＯ）、情報管理部１１２は、図８に示すように、Ｓ２２の処理で抽出したファイル名等における文字列を一時格納情報１３５として情報格納領域１３０に記憶する（Ｓ４１）。 On the other hand, if it is determined that the character string in the file name extracted in the processing of S22 and the character string included in the temporary storage information 135 stored in the information storage area 130 do not have a forward matching relationship (S32 NO), as shown in FIG. 8, the information management unit 112 stores the character string in the file name or the like extracted in the process of S22 in the information storage area 130 as the temporary storage information 135 (S41).

具体的に、例えば、Ｓ３５の処理において正規表現に適合すると判定した文字列が「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ａｂｃｄ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥２０￥３４５６７」である場合、情報管理部１１２は、例えば、図１５の下線部分に示すように、「文字列」に「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ａｂｃｄ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥２０￥３４５６７」を記憶した情報（４行目の情報）を追加する。 Specifically, for example, when the character string determined to match the regular expression in the process of S35 is "c:\documents\abcd\.git\objects\20\34567", the information management unit 112, for example, As shown in the underlined part of 15, information (fourth line information) storing "c:\documents\abcd\.git\objects\20\34567" is added to the "character string".

なお、Ｓ３５の処理において正規表現に適合すると判定した文字列が一時格納情報１３５として既に記憶されている場合、情報管理部１１２は、Ｓ３５の処理において正規表現に適合すると判定した文字列に対応するタイムスタンプのみを更新するものであってよい。 If the character string determined to match the regular expression in the process of S35 is already stored as the temporary storage information 135, the information management unit 112 stores the character string determined to match the regular expression in the process of S35. It may be one that updates only the timestamp.

そして、情報出力部１１８は、Ｓ４２以降の処理を行う。また、Ｓ３３の処理で特定した文字列が、情報格納領域１３０に記憶された正規表現情報１３２に含まれる正規表現に適合しないと判定した場合についても同様に（Ｓ３５のＮＯ）、情報管理部１１２等は、Ｓ４１以降の処理を行う。 Then, the information output unit 118 performs the processes after S42. Likewise, when it is determined that the character string specified in the process of S33 does not match the regular expression included in the regular expression information 132 stored in the information storage area 130 (NO in S35), the information management unit 112 etc. perform the processing after S41.

さらに、Ｓ２２の処理で抽出したファイル名等に対応する文字列が、情報格納領域１３０に記憶された正規表現情報１３２に含まれる正規表現に適合しないと判定した場合（Ｓ２６のＮＯ）、情報出力部１１８は、図８に示すように、Ｓ２２の処理で抽出したファイル名等に対応するファイルが秘密情報を含むファイルであることを示す情報を生成する（Ｓ４３）。そして、情報出力部１１８は、Ｓ４３の処理で生成した情報を出力する（Ｓ４４）。 Furthermore, when it is determined that the character string corresponding to the file name extracted in the process of S22 does not match the regular expression included in the regular expression information 132 stored in the information storage area 130 (NO in S26), the information is output. As shown in FIG. 8, the unit 118 generates information indicating that the file corresponding to the file name extracted in the process of S22 is a file containing confidential information (S43). Then, the information output unit 118 outputs the information generated in the process of S43 (S44).

すなわち、本実施の形態における情報処理装置１は、複数の対象ファイルのそれぞれが秘密情報を含む可能性が高いファイルであるか否かを判定するとともに、複数のファイルのファイル名等から新たに特定された前方一致情報１３３の蓄積を行う。ここで、前方一致情報１３３は、秘密情報を含む可能性が低いと判定された複数の対象ファイルのファイル名等の共通部分である。そのため、情報処理装置１は、ファイル名等に前方一致情報１３３が含まれる対象ファイルが新たに発生した場合、その新たに発生した対象ファイルが秘密情報を含む可能性が低いファイルであると判定することが可能である。したがって、情報処理装置１は、複数の対象ファイルのそれぞれが正規表現情報１３２に含まれる正規表現に適合するか否かについての判定を行う前に、複数の対象ファイルのそれぞれが前方一致情報１３３を含むか否かの判定を行う。 That is, the information processing apparatus 1 according to the present embodiment determines whether or not each of the plurality of target files is likely to contain confidential information, and newly identifies the plurality of files from the file names and the like. The forward matching information 133 that has been received is accumulated. Here, the prefix match information 133 is a common part such as the file names of a plurality of target files determined to have a low possibility of containing confidential information. Therefore, when a new target file whose file name or the like includes the prefix matching information 133 is generated, the information processing apparatus 1 determines that the newly generated target file is unlikely to contain confidential information. Is possible. Therefore, before the information processing apparatus 1 determines whether or not each of the plurality of target files matches the regular expression included in the regular expression information 132, each of the plurality of target files matches the prefix matching information 133. It is determined whether or not it is included.

これにより、情報処理装置１は、対象ファイルのファイル名等と正規表現によって記述された正規表現情報１３２とのマッチング回数を抑制することが可能になる。そのため、情報処理装置１は、正規表現によって記述されていないパターンを用いることなく、対象ファイルの分類に要する時間を短縮させることが可能になる。したがって、情報処理装置１は、パターンの作成に要する負担を抑制しつつ、対象ファイルの分類の効率化を行うことが可能になる。 As a result, the information processing apparatus 1 can suppress the number of times of matching between the file name of the target file and the regular expression information 132 described by the regular expression. Therefore, the information processing apparatus 1 can reduce the time required to classify the target file without using patterns not described by regular expressions. Therefore, the information processing apparatus 1 can efficiently classify the target files while suppressing the burden required for pattern creation.

なお、Ｓ３６の処理において、一時格納情報１３５として既に記憶されている文字列を、Ｓ３３の処理で特定した文字列に置き換えるものであってもよい。 In the process of S36, the character string already stored as the temporary storage information 135 may be replaced with the character string specified in the process of S33.

具体的に、例えば、一時格納情報１３５として既に記憶されている文字列が「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ｔｅｓｔ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥００￥１２３４５」であって、Ｓ３５の処理において正規表現に適合すると判定した文字列が「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ｔｅｓｔ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥」である場合、情報管理部１１２は、図１６の下線部分に示すように、一時格納情報１３５に含まれる文字列である「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ｔｅｓｔ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥００￥１２３４５」を「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ｔｅｓｔ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥」に更新するものであってよい。 Specifically, for example, the character string already stored as the temporary storage information 135 is "c:\documents\test\.git\objects\00\12345", and it is determined in the process of S35 that it matches the regular expression. If the character string obtained is "c:\documents\test\.git\objects\", the information management unit 112 stores the character string " c:\documents\test\.git\objects\00\12345" may be updated to "c:\documents\test\.git\objects\".

これにより、情報管理部１１２は、一時格納情報１３５の記憶に要する記憶領域を削減することが可能になる。 This enables the information management unit 112 to reduce the storage area required for storing the temporary storage information 135 .

また、情報管理部１１２は、Ｓ４１の処理においてだけでなく、Ｓ３３の処理で特定した文字列を前方一致情報１３３として情報格納領域１３０に記憶するタイミング（Ｓ３６の処理が行われるタイミング）においても、一時格納情報１３５の更新を行うものであってよい。 Further, the information management unit 112 not only stores the character string specified in the process of S33 as the prefix match information 133 in the information storage area 130 (the timing of the process of S36), but also in the process of S41. The temporary storage information 135 may be updated.

この場合、情報管理部１１２は、図１６の下線部分に示すように、対応するタイムスタンプ（１行目の情報のタイムスタンプ）として、Ｓ３６の処理において前方一致情報１３３に記憶された情報のタイムスタンプ（例えば、図１３で説明した前方一致情報１３３における３行目の情報のタイムスタンプ）と同じ日時を記憶する。 In this case, the information management unit 112, as shown in the underlined part in FIG. The same date and time as the stamp (for example, the time stamp of the information on the third line in the prefix match information 133 described with reference to FIG. 13) is stored.

これにより、情報管理部１１２は、一時格納情報１３５の記憶に要する記憶領域をより削減することが可能になる。 As a result, the information management unit 112 can further reduce the storage area required for storing the temporary storage information 135 .

さらに、情報管理部１１２は、例えば、前方一致情報１３３及び一時格納情報１３５のそれぞれに含まれる情報のうち、タイムスタンプとして記憶された日時が現在日時よりも所定時間以上前になった情報を随時削除するものであってもよい。 Further, the information management unit 112, for example, among the information included in each of the prefix match information 133 and the temporarily stored information 135, stores information that is stored as a time stamp before a predetermined time or more before the current date and time. It may be deleted.

［第２の実施の形態］
次に、第２の実施の形態について説明する。図１７から図２１は、第２の実施の形態におけるファイル分類処理を説明するフローチャート図である。また、図２２は、第２の実施の形態におけるファイル分類処理を説明する図である。 [Second embodiment]
Next, a second embodiment will be described. 17 to 21 are flowcharts for explaining file classification processing in the second embodiment. FIG. 22 is a diagram for explaining file classification processing in the second embodiment.

第２の実施の形態におけるファイル分類処理は、前方一致情報１３３のみでなく、後方一致情報１３４の参照及び更新についても行う。後方一致情報１３４は、ファイル名抽出部１１３が抽出したファイル名等のそれぞれと後方一致する関係にあるか否かの判定を行う文字列を含む情報である。以下、第１の実施の形態におけるファイル分類処理を異なる点についてのみ説明を行う。 The file classification process in the second embodiment is performed not only for the prefix match information 133 but also for reference and update of the suffix match information 134 . The backward matching information 134 is information including a character string for determining whether or not there is a backward matching relationship with each of the file names extracted by the file name extraction unit 113 . Only points that differ from the file classification process in the first embodiment will be described below.

情報受付部１１１は、図１７に示すように、ファイル情報１３１を受け付けるまで待機する（Ｓ５１のＮＯ）。 As shown in FIG. 17, the information reception unit 111 waits until it receives the file information 131 (NO in S51).

そして、ファイル情報１３１を受け付けた場合（Ｓ５１のＹＥＳ）、ファイル名抽出部１１３は、Ｓ５１の処理で受け付けたファイル情報１３１から対象ファイルのファイル名等を抽出する（Ｓ５２）。 When the file information 131 is received (YES in S51), the file name extractor 113 extracts the file name of the target file from the file information 131 received in the process of S51 (S52).

続いて、第１判定部１１４は、Ｓ５２の処理で抽出したファイル名等に対応する文字列が、情報格納領域１３０に記憶された前方一致情報１３３に含まれる文字列と前方一致する関係にあるか否かを判定する（Ｓ５３）。 Subsequently, the first determination unit 114 determines that the character string corresponding to the file name extracted in the process of S52 matches the character string included in the prefix match information 133 stored in the information storage area 130. (S53).

その結果、Ｓ５２の処理で抽出したファイル名等に対応する文字列が、情報格納領域１３０に記憶された前方一致情報１３３に含まれる文字列と前方一致する関係にないと判定した場合（Ｓ５４のＮＯ）、第１判定部１１４は、Ｓ５２の処理で抽出したファイル名等に対応する文字列が、情報格納領域１３０に記憶された後方一致情報１３４に含まれる文字列と後方一致する関係にあるか否かを判定する（Ｓ５５）。以下、後方一致情報１３４の具体例について説明を行う。 As a result, if it is determined that the character string corresponding to the file name or the like extracted in the process of S52 does not have a prefix match relationship with the character string included in the prefix match information 133 stored in the information storage area 130 (S54). NO), the first determination unit 114 determines that the character string corresponding to the file name extracted in the process of S52 matches the character string included in the backward match information 134 stored in the information storage area 130. (S55). A specific example of the backward match information 134 will be described below.

［前方一致情報の具体例］
図２２は、後方一致情報１３４の具体例について説明する図である。 [Specific example of prefix matching information]
FIG. 22 is a diagram illustrating a specific example of the backward matching information 134. As shown in FIG.

図２２に示す後方一致情報１３４は、後方一致の判定に用いられる文字列が記憶される「文字列」と、各情報が生成（更新）されたエポック秒が記憶される「タイムスタンプ」とを項目として有する。 The suffix information 134 shown in FIG. 22 consists of a "character string" that stores a character string used for suffix determination, and a "time stamp" that stores the epoch seconds when each piece of information was generated (updated). have as an item.

具体的に、図２２に示す後方一致情報１３４において、１行目の情報には、「文字列」として「．ｅｘｅ」が記憶され、「タイムスタンプ」として「１５５１１２８９４２」が記憶されている。 Specifically, in the backward match information 134 shown in FIG. 22, the information on the first line stores ".exe" as the "character string" and "1551128942" as the "time stamp".

また、図２２に示す後方一致情報１３４において、２行目の情報には、「文字列」として「．ｄｌｌ」が記憶され、「タイムスタンプ」として「１５５１１２９６２１」が記憶されている。 In the backward match information 134 shown in FIG. 22, the information on the second line stores ".dll" as the "character string" and "1551129621" as the "time stamp."

そのため、例えば、Ｓ５２の処理で抽出したファイル名等に対応する文字列が「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ｔｅｓｔ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥ａ１￥９８７６５．ｅｘｅ」であった場合、第１判定部１１４は、Ｓ５２の処理で抽出したファイル名等に対応する文字列と、１行目の情報の「文字列」に記憶された文字列とが後方一致する関係にあると判定する。 Therefore, for example, when the character string corresponding to the file name extracted in the process of S52 is "c:\documents\test\.git\objects\a1\98765.exe", the first determination unit 114 It is determined that the character string corresponding to the file name or the like extracted in the process of S52 and the character string stored in the "character string" of the information on the first line have a backward matching relationship.

図１７に戻り、Ｓ５２の処理で抽出したファイル名等に対応する文字列が、情報格納領域１３０に記憶された後方一致情報１３４に含まれる文字列と後方一致する関係にないと判定した場合（Ｓ５６のＮＯ）、第２判定部１１５は、図１８に示すように、Ｓ５２の処理で抽出したファイル名等に対応する文字列が、情報格納領域１３０に記憶された正規表現情報１３２に含まれる正規表現に適合するか否かを判定する（Ｓ６１）。 Returning to FIG. 17, when it is determined that the character string corresponding to the file name extracted in the process of S52 does not have a backward matching relationship with the character string included in the backward matching information 134 stored in the information storage area 130 ( NO in S56), the second determination unit 115 determines that the regular expression information 132 stored in the information storage area 130 includes a character string corresponding to the file name extracted in the process of S52, as shown in FIG. It is determined whether or not it matches the regular expression (S61).

その結果、Ｓ５２の処理で抽出したファイル名等に対応する文字列が、情報格納領域１３０に記憶された正規表現情報１３２に含まれる正規表現に適合すると判定した場合（Ｓ６２のＹＥＳ）、共通特定部１１６は、Ｓ５２の処理で抽出したファイル名等における文字列と、情報格納領域１３０に記憶された一時格納情報１３５に含まれる文字列のそれぞれとが前方一致する関係にあるか否かを判定する（Ｓ６３）。 As a result, if it is determined that the character string corresponding to the file name extracted in the process of S52 matches the regular expression included in the regular expression information 132 stored in the information storage area 130 (YES in S62), common identification The unit 116 determines whether or not the character string in the file name or the like extracted in the process of S52 and each of the character strings included in the temporary storage information 135 stored in the information storage area 130 have a forward matching relationship. (S63).

そして、Ｓ５２の処理で抽出したファイル名等における文字列と、情報格納領域１３０に記憶された一時格納情報１３５に含まれる文字列のそれぞれとが前方一致する関係にあると判定した場合（Ｓ６４のＹＥＳ）、共通特定部１１６は、Ｓ５２の処理で抽出したファイル名等における文字列のうち、Ｓ６３の処理で前方一致する関係にあると判定した文字列を特定する（Ｓ６５）。 If it is determined that the character strings in the file name extracted in the process of S52 and the character strings contained in the temporary storage information 135 stored in the information storage area 130 have a forward matching relationship (S64). YES), the common identification unit 116 identifies, among the character strings in the file names and the like extracted in the process of S52, the character strings determined to have a prefix matching relationship in the process of S63 (S65).

続いて、第３判定部１１７は、図１９に示すように、Ｓ６５の処理で特定した文字列が、情報格納領域１３０に記憶された正規表現情報１３２に含まれる正規表現に適合するか否かを判定する（Ｓ７１）。 Subsequently, as shown in FIG. 19, the third determination unit 117 determines whether the character string specified in the process of S65 matches the regular expression included in the regular expression information 132 stored in the information storage area 130. is determined (S71).

その結果、Ｓ６５の処理で特定した文字列が、情報格納領域１３０に記憶された正規表現情報１３２に含まれる正規表現に適合すると判定した場合（Ｓ７２のＹＥＳ）、情報管理部１１２は、Ｓ６５の処理で特定した文字列を前方一致情報１３３の少なくとも一部として情報格納領域１３０に記憶する（Ｓ７３）。 As a result, when it is determined that the character string specified in the process of S65 matches the regular expression included in the regular expression information 132 stored in the information storage area 130 (YES in S72), the information management unit 112 performs the The character string specified in the process is stored in the information storage area 130 as at least part of the prefix match information 133 (S73).

一方、Ｓ６５の処理で特定した文字列が、情報格納領域１３０に記憶された正規表現情報１３２に含まれる正規表現に適合しないと判定した場合（Ｓ７２のＮＯ）、情報管理部１１２は、Ｓ７３の処理を行わない。 On the other hand, if it is determined that the character string specified in the process of S65 does not match the regular expression included in the regular expression information 132 stored in the information storage area 130 (NO in S72), the information management unit 112 performs No processing.

また、Ｓ５２の処理で抽出したファイル名等における文字列と、情報格納領域１３０に記憶された一時格納情報１３５に含まれる文字列のそれぞれとが前方一致する関係にないと判定した場合（Ｓ６４のＮＯ）、共通特定部１１６は、Ｓ６５からＳ７３の処理を行わない。 Also, if it is determined that the character string in the file name extracted in the process of S52 and the character string included in the temporary storage information 135 stored in the information storage area 130 do not have a forward matching relationship (S64 NO), the common identification unit 116 does not perform the processing of S65 to S73.

続いて、共通特定部１１６は、Ｓ５２の処理で抽出したファイル名等における文字列と、情報格納領域１３０に記憶された一時格納情報１３５に含まれる文字列のそれぞれとが後方一致する関係にあるか否かを判定する（Ｓ７４）。 Subsequently, the common identification unit 116 finds that the character string in the file name extracted in the process of S52 and the character string included in the temporary storage information 135 stored in the information storage area 130 have a backward matching relationship. It is determined whether or not (S74).

その結果、Ｓ５２の処理で抽出したファイル名等における文字列と、情報格納領域１３０に記憶された一時格納情報１３５に含まれる文字列のそれぞれとが後方一致する関係にあると判定した場合（Ｓ７５のＹＥＳ）、共通特定部１１６は、Ｓ５２の処理で抽出したファイル名等における文字列のうち、Ｓ７４の処理で後方一致する関係にあると判定した文字列を特定する（Ｓ７６）。 As a result, if it is determined that the character string in the file name extracted in the process of S52 and the character string included in the temporary storage information 135 stored in the information storage area 130 have a backward matching relationship (S75 YES), the common identification unit 116 identifies the character string determined to have a backward matching relationship in the process of S74, among the character strings in the file names and the like extracted in the process of S52 (S76).

具体的に、例えば、Ｓ５２の処理で抽出したファイル名等に対応する文字列が「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ｔｅｓｔ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥ａ１￥９８７６５．ｅｘｅ」であり、一時格納情報１３５に含まれる文字列が「ｃ：￥ｄｏｃｕｍｅｎｔｓ￥ｔｅｓｔ￥.ｇｉｔ￥ｏｂｊｅｃｔｓ￥ａ１￥７６５４３．ｅｘｅ」であった場合、共通特定部１１６は、Ｓ５２の処理で抽出したファイル名等に対応する文字列と、一時格納情報１３５に含まれる文字列とが後方一致する関係にあると判定する。そして、共通特定部１１６は、この場合、Ｓ５２の処理で抽出したファイル名等に対応する文字列と、一時格納情報１３５に含まれる文字列とにおける共通部分である「．ｅｘｅ」を特定する。 Specifically, for example, the character string corresponding to the file name extracted in the process of S52 is "c:\documents\test\.git\objects\a1\98765.exe", which is included in the temporary storage information 135. If the character string is "c:\documents\test\.git\objects\a1\76543.exe", the common identification unit 116 adds the character string corresponding to the file name extracted in the process of S52 and the temporary It is determined that the character string included in the stored information 135 has a backward matching relationship. In this case, the common specifying unit 116 specifies “.exe”, which is a common part between the character string corresponding to the file name extracted in the process of S52 and the character string included in the temporary storage information 135 .

さらに、第３判定部１１７は、図２０に示すように、Ｓ７６の処理で特定した文字列が、情報格納領域１３０に記憶された正規表現情報１３２に含まれる正規表現に適合するか否かを判定する（Ｓ８１）。 Furthermore, as shown in FIG. 20, the third determination unit 117 determines whether or not the character string specified in the process of S76 matches the regular expression contained in the regular expression information 132 stored in the information storage area 130. Determine (S81).

その結果、Ｓ７３の処理で特定した文字列が、情報格納領域１３０に記憶された正規表現情報１３２に含まれる正規表現に適合すると判定した場合（Ｓ８２のＹＥＳ）、情報管理部１１２は、Ｓ７３の処理で特定した文字列を後方一致情報１３４の少なくとも一部として情報格納領域１３０に記憶する（Ｓ８３）。 As a result, when it is determined that the character string specified in the process of S73 matches the regular expression included in the regular expression information 132 stored in the information storage area 130 (YES in S82), the information management unit 112 performs the The character string specified in the process is stored in the information storage area 130 as at least part of the backward matching information 134 (S83).

一方、Ｓ７３の処理で特定した文字列が、情報格納領域１３０に記憶された正規表現情報１３２に含まれる正規表現に適合しないと判定した場合（Ｓ８２のＮＯ）、情報管理部１１２は、Ｓ８３の処理を行わない。 On the other hand, when it is determined that the character string specified in the process of S73 does not match the regular expression included in the regular expression information 132 stored in the information storage area 130 (NO in S82), the information management unit 112 performs the No processing.

また、Ｓ５２の処理で抽出したファイル名等における文字列と、情報格納領域１３０に記憶された一時格納情報１３５に含まれる文字列のそれぞれとが後方一致する関係にないと判定した場合（Ｓ７５のＮＯ）、共通特定部１１６は、Ｓ７６からＳ８３の処理を行わない。 Also, if it is determined that the character string in the file name extracted in the process of S52 and the character string included in the temporary storage information 135 stored in the information storage area 130 do not have a backward matching relationship (S75 NO), the common identification unit 116 does not perform the processing of S76 to S83.

その後、情報管理部１１２は、Ｓ６５またはＳ７６の処理において文字列を前方一致情報１３３または後方一致情報１３４として記憶したか否かを判定する（Ｓ８４）。 After that, the information management unit 112 determines whether or not the character string was stored as the prefix matching information 133 or the trailing match information 134 in the process of S65 or S76 (S84).

そして、Ｓ６５またはＳ７６の処理において文字列を記憶していないと判定した場合（Ｓ８５のＹＥＳ）、情報出力部１１８は、図２１に示すように、例えば、Ｓ５２の処理で抽出したファイル名等に対応するファイルが秘密情報を含むファイルでないことを示す情報を生成する（Ｓ９２）。 Then, if it is determined in the process of S65 or S76 that the character string is not stored (YES in S85), the information output unit 118 stores, for example, the file name extracted in the process of S52 as shown in FIG. Information indicating that the corresponding file does not contain confidential information is generated (S92).

その後、情報出力部１１８は、Ｓ９２の処理で生成した情報を出力する（Ｓ９４）。具体的に、情報出力部１１８は、例えば、Ｓ９２の処理で生成した情報を情報処理装置１の出力装置（図示しない）に出力する。 After that, the information output unit 118 outputs the information generated in the process of S92 (S94). Specifically, the information output unit 118 outputs the information generated in the process of S92 to an output device (not shown) of the information processing apparatus 1, for example.

また、Ｓ５２の処理で抽出したファイル名等に対応する文字列が、情報格納領域１３０に記憶された前方一致情報１３３に含まれる文字列と前方一致する関係にあると判定した場合（Ｓ５４のＹＥＳ）、または、Ｓ５２の処理で抽出したファイル名等に対応する文字列が、情報格納領域１３０に記憶された後方一致情報１３４に含まれる文字列と後方一致する関係にあると判定した場合も同様に（Ｓ５６のＹＥＳ）、情報出力部１１８は、Ｓ９２以降の処理を行う。 If it is determined that the character string corresponding to the file name or the like extracted in the process of S52 has a prefix matching relationship with the character string included in the prefix match information 133 stored in the information storage area 130 (YES in S54). ), or if it is determined that the character string corresponding to the file name or the like extracted in the process of S52 has a backward matching relationship with the character string included in the backward matching information 134 stored in the information storage area 130. Then (YES in S56), the information output unit 118 performs the processes after S92.

一方、Ｓ６５またはＳ７６の処理において文字列を記憶していると判定した場合（Ｓ８５のＹＥＳ）、情報管理部１１２は、図２１に示すように、Ｓ５２の処理で抽出したファイル名等における文字列を一時格納情報１３５として情報格納領域１３０に記憶する（Ｓ９１）。そして、情報出力部１１８は、Ｓ４２以降の処理を行う。 On the other hand, if it is determined in the process of S65 or S76 that the character string is stored (YES in S85), the information management unit 112 stores the character string in the file name, etc., extracted in the process of S52, as shown in FIG. is stored in the information storage area 130 as the temporary storage information 135 (S91). Then, the information output unit 118 performs the processes after S42.

さらに、Ｓ５２の処理で抽出したファイル名等に対応する文字列が、情報格納領域１３０に記憶された正規表現情報１３２に含まれる正規表現に適合しないと判定した場合（Ｓ６２のＮＯ）、情報出力部１１８は、図２１に示すように、Ｓ５２の処理で抽出したファイル名等に対応するファイルが秘密情報を含むファイルであることを示す情報を生成する（Ｓ９３）。そして、情報出力部１１８は、Ｓ９４の処理を行う。 Furthermore, if it is determined that the character string corresponding to the file name or the like extracted in the process of S52 does not match the regular expression included in the regular expression information 132 stored in the information storage area 130 (NO in S62), the information is output. As shown in FIG. 21, the unit 118 generates information indicating that the file corresponding to the file name extracted in the process of S52 is a file containing confidential information (S93). Then, the information output unit 118 performs the process of S94.

これにより、情報処理装置１は、対象ファイルのファイル名等と正規表現によって記述された正規表現情報１３２とのマッチング回数をより抑制することが可能になる。そのため、情報処理装置１は、対象ファイルの分類をより効率的に行うことが可能になる。 As a result, the information processing apparatus 1 can further suppress the number of times of matching between the file name of the target file and the regular expression information 132 described by the regular expression. Therefore, the information processing apparatus 1 can more efficiently classify the target files.

なお、情報管理部１１２は、例えば、後方一致情報１３４に含まれる情報のうち、タイムスタンプとして記憶された日時が現在日時よりも所定時間以上前になった情報を随時削除するものであってもよい。 Note that the information management unit 112 may, for example, delete at any time, from among the information included in the backward match information 134, information whose date and time stored as a time stamp is earlier than the current date and time by a predetermined time or more. good.

以上の実施の形態をまとめると、以下の付記のとおりである。 The above embodiments are summarized as follows.

（付記１）
文字列を記憶した第１記憶部を参照し、複数のファイルのファイル名のそれぞれが前記文字列を含むか否かを判定する第１判定部と、
前記複数のファイルのファイル名のそれぞれが前記文字列を含まないと判定した場合、正規表現を記憶した第２記憶部を参照し、前記複数のファイルのファイル名のそれぞれが前記正規表現に適合するか否かを判定する第２判定部と、
前記複数のファイルのファイル名のそれぞれが前記正規表現に適合する場合、前記複数のファイルのファイル名における文字列の共通部分を特定する共通特定部と、
特定した前記共通部分が前記正規表現に適合する場合、特定した前記共通部分に対応する文字列を前記第１記憶部にさらに記憶する情報管理部と、を有する、
ことを特徴とするファイル分類装置。 (Appendix 1)
a first determination unit that refers to a first storage unit that stores a character string and determines whether each file name of a plurality of files includes the character string;
When it is determined that each of the file names of the plurality of files does not contain the character string, a second storage unit storing a regular expression is referred to, and each of the file names of the plurality of files matches the regular expression. A second determination unit that determines whether or not
a common identification unit that identifies a common portion of character strings in the file names of the plurality of files when each of the file names of the plurality of files matches the regular expression;
an information management unit that further stores a character string corresponding to the identified common portion in the first storage unit when the identified common portion matches the regular expression;
A file classification device characterized by:

（付記２）
付記１において、
前記第１記憶部に記憶された文字列は、正規表現が用いられていない文字列である、
ことを特徴とするファイル分類装置。 (Appendix 2)
In Appendix 1,
The character string stored in the first storage unit is a character string that does not use regular expressions,
A file classification device characterized by:

（付記３）
付記１において、
前記第１判定部は、前記第１記憶部を参照し、前記複数のファイルに含まれる第１ファイルが前記文字列を含むか否かを判定し、
前記第２判定部は、前記第１ファイルが前記文字列を含まないと判定した場合、前記第２記憶部を参照し、前記第１ファイルのファイル名が前記正規表現に適合するか否かを判定し、
前記情報管理部は、前記第１ファイルのファイル名が前記正規表現に適合する場合、前記第１ファイルのファイル名を第３記憶部に記憶し、さらに、
前記第１判定部は、前記情報管理部が前記第１ファイルのファイル名を記憶した後、前記第１記憶部を参照し、前記複数のファイルに含まれる第２ファイルが前記文字列を含むか否かを判定し、
前記第２判定部は、前記正規表現に適合するか否かを判定する処理では、前記第２ファイルが前記文字列を含まないと判定した場合、前記第２記憶部を参照し、前記第２ファイルのファイル名が前記正規表現に適合するか否かを判定し、
前記共通特定部は、前記第２ファイルのファイル名が前記正規表現に適合する場合、前記第３記憶部を参照し、前記第３記憶部に記憶されたファイル名のいずれかと前記第２ファイルのファイル名とにおける文字列の共通部分を特定する、
ことを特徴とするファイル分類装置。 (Appendix 3)
In Appendix 1,
The first determination unit refers to the first storage unit and determines whether a first file included in the plurality of files includes the character string,
The second determination unit, when determining that the first file does not contain the character string, refers to the second storage unit and determines whether the file name of the first file matches the regular expression. judge,
The information management unit stores the file name of the first file in a third storage unit when the file name of the first file matches the regular expression, and
The first determination unit refers to the first storage unit after the information management unit stores the file name of the first file, and determines whether a second file included in the plurality of files includes the character string. determine whether or not
In the process of determining whether or not the regular expression conforms to the regular expression, the second determination unit refers to the second storage unit when determining that the second file does not contain the character string. determining whether the filename of the file matches the regular expression;
When the file name of the second file matches the regular expression, the common identification unit refers to the third storage unit and selects one of the file names stored in the third storage unit and the second file. identify the common part of the string in the file name,
A file classification device characterized by:

（付記４）
付記３において、
前記情報管理部は、前記第２ファイルのファイル名が前記正規表現に適合する場合、前記第２ファイルのファイル名を前記第３記憶部に記憶する、
ことを特徴とするファイル分類装置。 (Appendix 4)
In Appendix 3,
The information management unit stores the file name of the second file in the third storage unit when the file name of the second file matches the regular expression.
A file classification device characterized by:

（付記５）
付記３において、
前記共通特定部は、前記第３記憶部に記憶されたファイル名のいずれかと前記第２ファイルのファイル名との間において、前方一致する文字列を前記共通部分として特定する、
ことを特徴とするファイル分類装置。 (Appendix 5)
In Appendix 3,
The common identifying unit identifies, as the common part, a character string that matches the beginning of any of the file names stored in the third storage unit and the file name of the second file,
A file classification device characterized by:

（付記６）
付記３において、
前記共通特定部は、前記第３記憶部に記憶されたファイル名のいずれかと前記第２ファイルのファイル名との間において、後方一致する文字列を前記共通部分として特定する、
ことを特徴とするファイル分類装置。 (Appendix 6)
In Appendix 3,
The common identification unit identifies, as the common part, a character string that matches the end of any of the file names stored in the third storage unit and the file name of the second file,
A file classification device characterized by:

（付記７）
付記３において、さらに、
前記第１ファイルのファイル名が前記文字列を含むと判定した場合、前記第１ファイルが前記正規表現に適合することを示す情報を出力し、前記第２ファイルのファイル名が前記文字列を含むと判定した場合、前記第２ファイルが前記正規表現に適合することを示す情報を出力する情報出力部を有する、
ことを特徴とするファイル分類装置。 (Appendix 7)
In Supplementary Note 3, further,
when it is determined that the file name of the first file includes the character string, outputting information indicating that the first file matches the regular expression, and outputting information indicating that the file name of the second file includes the character string an information output unit that outputs information indicating that the second file matches the regular expression when it is determined that
A file classification device characterized by:

（付記８）
付記７において、
前記情報出力部は、前記第１ファイルのファイル名が前記正規表現に適合しない場合、前記第１ファイルが前記正規表現に適合しないことを示す情報を出力し、前記第２ファイルのファイル名が前記正規表現に適合しない場合、前記第２ファイルが前記正規表現に適合しないことを示す情報を出力する、
ことを特徴とするファイル分類装置。 (Appendix 8)
In Appendix 7,
The information output unit outputs information indicating that the first file does not match the regular expression when the file name of the first file does not match the regular expression, and the file name of the second file outputs information indicating that the file name of the second file does not match the regular expression. If the regular expression does not match, output information indicating that the second file does not match the regular expression;
A file classification device characterized by:

（付記９）
文字列を記憶した第１記憶部を参照し、複数のファイルのファイル名のそれぞれが前記文字列を含むか否かを判定し、
前記複数のファイルのファイル名のそれぞれが前記文字列を含まないと判定した場合、正規表現を記憶した第２記憶部を参照し、前記複数のファイルのファイル名のそれぞれが前記正規表現に適合するか否かを判定し、
前記複数のファイルのファイル名のそれぞれが前記正規表現に適合する場合、前記複数のファイルのファイル名における文字列の共通部分を特定し、
特定した前記共通部分が前記正規表現に適合する場合、特定した前記共通部分に対応する文字列を前記第１記憶部にさらに記憶する、
処理をコンピュータに実行させることを特徴とするファイル分類プログラム。 (Appendix 9)
referring to the first storage unit storing the character string, determining whether each of the file names of the plurality of files includes the character string;
When it is determined that each of the file names of the plurality of files does not contain the character string, a second storage unit storing a regular expression is referred to, and each of the file names of the plurality of files matches the regular expression. determine whether or not
If each of the file names of the plurality of files matches the regular expression, identify a common portion of strings in the file names of the plurality of files;
If the identified common portion matches the regular expression, further storing a character string corresponding to the identified common portion in the first storage unit;
A file classification program characterized by causing a computer to execute processing.

（付記１０）
付記９において、
前記文字列を含むか否かを判定する処理では、前記第１記憶部を参照し、前記複数のファイルに含まれる第１ファイルが前記文字列を含むか否かを判定し、
前記正規表現に適合するか否かを判定する処理では、前記第１ファイルが前記文字列を含まないと判定した場合、前記第２記憶部を参照し、前記第１ファイルのファイル名が前記正規表現に適合するか否かを判定し、さらに、
前記第１ファイルのファイル名が前記正規表現に適合する場合、前記第１ファイルのファイル名を第３記憶部に記憶する、
処理をコンピュータに実行させ、
前記文字列を含むか否かを判定する処理では、前記第１ファイルのファイル名を前記第３記憶部に記憶する処理の後、前記第１記憶部を参照し、前記複数のファイルに含まれる第２ファイルが前記文字列を含むか否かを判定し、
前記正規表現に適合するか否かを判定する処理では、前記第２ファイルが前記文字列を含まないと判定した場合、前記第２記憶部を参照し、前記第２ファイルのファイル名が前記正規表現に適合するか否かを判定し、
前記特定する処理では、前記第２ファイルのファイル名が前記正規表現に適合する場合、前記第３記憶部を参照し、前記第３記憶部に記憶されたファイル名のいずれかと前記第２ファイルのファイル名とにおける文字列の共通部分を特定する、
ことを特徴とするファイル分類プログラム。 (Appendix 10)
In Appendix 9,
In the process of determining whether or not the character string is included, referring to the first storage unit, determining whether or not a first file included in the plurality of files includes the character string,
In the process of determining whether or not the regular expression matches the regular expression, when it is determined that the first file does not contain the character string, the second storage unit is referred to, and the file name of the first file is the regular expression. Determine whether or not the expression matches, and
storing the file name of the first file in a third storage if the file name of the first file matches the regular expression;
let the computer do the work,
In the process of determining whether or not the character string is included, after the process of storing the file name of the first file in the third storage unit, the first storage unit is referred to, and the file name included in the plurality of files is stored. determining whether the second file contains the character string;
In the process of determining whether or not the second file matches the regular expression, when it is determined that the second file does not contain the character string, the second storage unit is referred to, and the file name of the second file is the regular expression. Determine whether the expression matches,
In the specifying process, if the file name of the second file matches the regular expression, the third storage unit is referred to, and one of the file names stored in the third storage unit and the second file identify the common part of the string in the file name,
A file classification program characterized by:

（付記１１）
文字列を記憶した第１記憶部を参照し、複数のファイルのファイル名のそれぞれが前記文字列を含むか否かを判定し、
前記複数のファイルのファイル名のそれぞれが前記文字列を含まないと判定した場合、正規表現を記憶した第２記憶部を参照し、前記複数のファイルのファイル名のそれぞれが前記正規表現に適合するか否かを判定し、
前記複数のファイルのファイル名のそれぞれが前記正規表現に適合する場合、前記複数のファイルのファイル名における文字列の共通部分を特定し、
特定した前記共通部分が前記正規表現に適合する場合、特定した前記共通部分に対応する文字列を前記第１記憶部にさらに記憶する、
処理をコンピュータに実行させることを特徴とするファイル分類方法。 (Appendix 11)
referring to the first storage unit storing the character string, determining whether each of the file names of the plurality of files includes the character string;
When it is determined that each of the file names of the plurality of files does not contain the character string, a second storage unit storing a regular expression is referred to, and each of the file names of the plurality of files matches the regular expression. determine whether or not
If each of the file names of the plurality of files matches the regular expression, identify a common portion of strings in the file names of the plurality of files;
If the identified common portion matches the regular expression, further storing a character string corresponding to the identified common portion in the first storage unit;
A file classification method characterized in that processing is executed by a computer.

（付記１２）
付記１１において、
前記文字列を含むか否かを判定する処理では、前記第１記憶部を参照し、前記複数のファイルに含まれる第１ファイルが前記文字列を含むか否かを判定し、
前記正規表現に適合するか否かを判定する処理では、前記第１ファイルが前記文字列を含まないと判定した場合、前記第２記憶部を参照し、前記第１ファイルのファイル名が前記正規表現に適合するか否かを判定し、さらに、
前記第１ファイルのファイル名が前記正規表現に適合する場合、前記第１ファイルのファイル名を第３記憶部に記憶する、
処理をコンピュータに実行させ、
前記文字列を含むか否かを判定する処理では、前記第１ファイルのファイル名を前記第３記憶部に記憶する処理の後、前記第１記憶部を参照し、前記複数のファイルに含まれる第２ファイルが前記文字列を含むか否かを判定し、
前記正規表現に適合するか否かを判定する処理では、前記第２ファイルが前記文字列を含まないと判定した場合、前記第２記憶部を参照し、前記第２ファイルのファイル名が前記正規表現に適合するか否かを判定し、
前記特定する処理では、前記第２ファイルのファイル名が前記正規表現に適合する場合、前記第３記憶部を参照し、前記第３記憶部に記憶されたファイル名のいずれかと前記第２ファイルのファイル名とにおける文字列の共通部分を特定する、
ことを特徴とするファイル分類方法。 (Appendix 12)
In Appendix 11,
In the process of determining whether or not the character string is included, referring to the first storage unit, determining whether or not a first file included in the plurality of files includes the character string,
In the process of determining whether or not the regular expression matches the regular expression, when it is determined that the first file does not contain the character string, the second storage unit is referred to, and the file name of the first file is the regular expression. Determine whether or not the expression matches, and
storing the file name of the first file in a third storage if the file name of the first file matches the regular expression;
let the computer do the work,
In the process of determining whether or not the character string is included, after the process of storing the file name of the first file in the third storage unit, the first storage unit is referred to, and the file name included in the plurality of files is stored. determining whether the second file contains the character string;
In the process of determining whether or not the second file matches the regular expression, when it is determined that the second file does not contain the character string, the second storage unit is referred to, and the file name of the second file is the regular expression. Determine whether the expression matches,
In the specifying process, if the file name of the second file matches the regular expression, the third storage unit is referred to, and one of the file names stored in the third storage unit and the second file identify the common part of the string in the file name,
A file classification method characterized by:

１：情報処理装置２：管理装置
３：記憶装置１０：情報処理システム
ＮＷ：ネットワーク 1: information processing device 2: management device 3: storage device 10: information processing system NW: network

Claims

a first determination unit that refers to a first storage unit that stores a character string and determines whether each file name of a plurality of files includes the character string;
When it is determined that each of the file names of the plurality of files does not contain the character string, a second storage unit storing a regular expression is referred to, and each of the file names of the plurality of files matches the regular expression. A second determination unit that determines whether or not
a common identification unit that identifies a common portion of character strings in the file names of the plurality of files when each of the file names of the plurality of files matches the regular expression;
an information management unit that further stores a character string corresponding to the identified common portion in the first storage unit when the identified common portion matches the regular expression;
A file classification device characterized by:

In claim 1,
The character string stored in the first storage unit is a character string that does not use regular expressions,
A file classification device characterized by:

In claim 1,
The first determination unit refers to the first storage unit and determines whether a first file included in the plurality of files includes the character string,
The second determination unit, when determining that the first file does not contain the character string, refers to the second storage unit and determines whether the file name of the first file matches the regular expression. judge,
The information management unit stores the file name of the first file in a third storage unit when the file name of the first file matches the regular expression, and
The first determination unit refers to the first storage unit after the information management unit stores the file name of the first file, and determines whether a second file included in the plurality of files includes the character string. determine whether or not
In the process of determining whether or not the regular expression conforms to the regular expression, the second determination unit refers to the second storage unit when determining that the second file does not contain the character string. determining whether the filename of the file matches the regular expression;
When the file name of the second file matches the regular expression, the common identification unit refers to the third storage unit and selects one of the file names stored in the third storage unit and the second file. identify the common part of the string in the file name,
A file classification device characterized by:

In claim 3,
The information management unit stores the file name of the second file in the third storage unit when the file name of the second file matches the regular expression.
A file classification device characterized by:

In claim 3,
The common identifying unit identifies, as the common part, a character string that matches the beginning of any of the file names stored in the third storage unit and the file name of the second file,
A file classification device characterized by:

In claim 3,
The common identification unit identifies, as the common part, a character string that matches the end of any of the file names stored in the third storage unit and the file name of the second file,
A file classification device characterized by:

In claim 3, further,
when it is determined that the file name of the first file includes the character string, outputting information indicating that the first file matches the regular expression, and outputting information indicating that the file name of the second file includes the character string an information output unit that outputs information indicating that the second file matches the regular expression when it is determined that
A file classification device characterized by:

In claim 7,
The information output unit outputs information indicating that the first file does not match the regular expression when the file name of the first file does not match the regular expression, and the file name of the second file outputs information indicating that the file name of the second file does not match the regular expression. If the regular expression does not match, output information indicating that the second file does not match the regular expression;
A file classification device characterized by:

referring to the first storage unit storing the character string, determining whether each of the file names of the plurality of files includes the character string;
When it is determined that each of the file names of the plurality of files does not contain the character string, a second storage unit storing a regular expression is referred to, and each of the file names of the plurality of files matches the regular expression. determine whether or not
If each of the file names of the plurality of files matches the regular expression, identify a common portion of strings in the file names of the plurality of files;
If the identified common portion matches the regular expression, further storing a character string corresponding to the identified common portion in the first storage unit;
A file classification program characterized by causing a computer to execute processing.

referring to the first storage unit storing the character string, determining whether each of the file names of the plurality of files includes the character string;
When it is determined that each of the file names of the plurality of files does not contain the character string, a second storage unit storing a regular expression is referred to, and each of the file names of the plurality of files matches the regular expression. determine whether or not
If each of the file names of the plurality of files matches the regular expression, identify a common portion of strings in the file names of the plurality of files;
If the identified common portion matches the regular expression, further storing a character string corresponding to the identified common portion in the first storage unit;
A file classification method characterized in that processing is executed by a computer.