JP7488702B2

JP7488702B2 - Information Management System

Info

Publication number: JP7488702B2
Application number: JP2020104593A
Authority: JP
Inventors: 啓幸鈴木; 明佳倉田; 彰規淺原; 秀和森田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2020-06-17
Filing date: 2020-06-17
Publication date: 2024-05-22
Anticipated expiration: 2040-06-17
Also published as: JP2021197016A

Description

本発明は、情報管理システムに関する。 The present invention relates to an information management system.

マテリアルズ・インフォマティクス（ＭＩ）やプロセス・インフォマティクス（ＰＩ）による高効率な材料開発への期待は大きい。他者が実施した実験データを含めた大量のデータを扱うことは、パラメータ間の新たな相関や、少ない実験での最適なパラメータ値を見出す可能性が高まるため、有効である。一方、材料開発は目的の性能や機能を発現させるために多くの作業プロセスを経るのが通例で、それらのパラメータ間の時系列情報を踏まえたＭＩが重要となる。 There are high expectations for highly efficient materials development through materials informatics (MI) and process informatics (PI). Handling large amounts of data, including data from experiments conducted by others, is effective because it increases the likelihood of finding new correlations between parameters and optimal parameter values with a small number of experiments. Meanwhile, materials development typically involves many work processes to achieve the desired performance or functionality, making MI that takes into account the time-series information between these parameters.

また、特許文献１には、ユーザが開発を行う際の作業画面から重要キーワードを抽出し、当該抽出された重要キーワードからユーザの開発分野を特定するとともに、ユーザに提案するための情報を生成することが開示されている。 Patent Document 1 also discloses a method for extracting important keywords from the work screen when a user is developing a product, identifying the user's development field from the extracted important keywords, and generating information to propose to the user.

特開２０１８－６７２３０号公報JP 2018-67230 A

しかし、特許文献１に開示された技術は、材料開発分野に関するものではなく、様々なプロセスを経て実験が行われる場合に、その時系列まで考慮した情報をユーザに提供することは困難である。 However, the technology disclosed in Patent Document 1 does not relate to the field of materials development, and when experiments are conducted through a variety of processes, it is difficult to provide users with information that takes into account the time series.

本発明の目的は、材料開発分野において、過去の実験データを用いた効率的なプロセス設計が可能な情報管理システムを提供することにある。 The objective of the present invention is to provide an information management system in the field of materials development that enables efficient process design using past experimental data.

上記目的を達成するために、本発明は、材料の実験データを格納する格納部と、前記実験データから材料処理プロセスの時系列情報を抽出する時系列抽出部と、前記実験データから前記材料処理プロセスに関するキーワードの候補を抽出するキーワード抽出部と、前記時系列情報および前記キーワードの候補に基づいて索引を作成する索引作成部と、を備える。 To achieve the above object, the present invention comprises a storage unit that stores experimental data of materials, a time series extraction unit that extracts time series information of a material processing process from the experimental data, a keyword extraction unit that extracts candidate keywords related to the material processing process from the experimental data, and an index creation unit that creates an index based on the time series information and the candidate keywords.

本発明によれば、材料開発分野において、過去の実験データを用いた効率的なプロセス設計が可能な情報管理システムを提供できる。 The present invention provides an information management system that enables efficient process design using past experimental data in the field of materials development.

情報管理システムの構成図。A diagram of the information management system. 材料関連データベース群の構成を示す図。FIG. 2 is a diagram showing the configuration of a material-related database group. プロセス時系列抽出部の詳細を示す図。FIG. 4 is a diagram showing details of a process time series extraction unit. 表出力部により出力された表の一例を示す図。FIG. 13 is a diagram showing an example of a table output by a table output unit. 材料処理プロセスの時系列情報の一例を示す図。FIG. 13 is a diagram showing an example of time-series information of a material processing process. プロセス時系列抽出部が、実験データに含まれる表と、材料関連ＤＢ群と、に基づいて、時系列情報を抽出するステップを示すフロー図。FIG. 11 is a flow diagram showing steps in which a process time series extraction unit extracts time series information based on a table included in the experimental data and a group of material-related DBs. キーワード抽出部の詳細を示す図。FIG. 4 is a diagram showing details of a keyword extraction unit. 材料処理プロセスに関するキーワードの一例を示す図。FIG. 13 is a diagram showing an example of keywords related to material treatment processes. 試料識別子抽出部の詳細を示す図。FIG. 13 is a diagram showing details of a sample identifier extraction unit. 実験対象の試料識別子の一例を示す図。FIG. 13 shows an example of a sample identifier for an experimental subject. 索引作成部により作成された索引の一例を示す図。FIG. 4 is a diagram showing an example of an index created by an index creating unit.

以下、本発明の実施形態を説明する。図１は、本実施形態に係る情報管理システムの構成図である。本実施形態に係る情報管理システムは、材料に関する研究開発を支援するものであり、図１に示すように、実験データ格納部１００と、プロセス時系列抽出部２００と、キーワード抽出部３００と、試料識別子抽出部４００と、データ整形部５００と、索引作成部６００と、データカタログ部７００と、を備える。 An embodiment of the present invention will be described below. FIG. 1 is a configuration diagram of an information management system according to this embodiment. The information management system according to this embodiment supports research and development related to materials, and as shown in FIG. 1, includes an experimental data storage unit 100, a process time series extraction unit 200, a keyword extraction unit 300, a sample identifier extraction unit 400, a data formatting unit 500, an index creation unit 600, and a data catalog unit 700.

実験データ格納部１００は、材料の実験データを格納するものであり、プロジェクト単位の関連するファイルが、実験者などによりアップロードされる。なお、格納されるファイルには、実験計画書の文書や表などが含まれる。 The experimental data storage unit 100 stores experimental data on materials, and related files for each project are uploaded by experimenters, etc. The files stored include documents and tables of experimental plans, etc.

プロセス時系列抽出部２００は、実験データから材料処理プロセスの時系列情報を抽出するものであり、実験データ格納部１００のファイルに含まれる文書や表からプロセス名などを抽出し、その相対位置に基づいてプロセスの順番を推定するとともに、推定結果の表示や登録を行う。プロセス時系列抽出部２００の詳細に関しては、図３～図６を用いて後述する。 The process time series extraction unit 200 extracts time series information of material processing processes from the experimental data. It extracts process names and the like from documents and tables contained in files in the experimental data storage unit 100, estimates the order of the processes based on their relative positions, and displays and registers the results of the estimation. Details of the process time series extraction unit 200 will be described later using Figures 3 to 6.

キーワード抽出部３００は、実験データから材料処理プロセスに関するキーワードの候補を抽出するものであり、実験データ格納部１００のファイルに含まれる文書や表から、頻繁に用いられる用語、表題に記載されている用語、実験者が自ら設定した用語などを、キーワードを推定するとともに、推定結果の表示や登録を行う。キーワード抽出部３００の詳細に関しては、図７および図８を用いて後述する。 The keyword extraction unit 300 extracts candidate keywords related to material processing processes from the experimental data, and estimates keywords from documents and tables contained in files in the experimental data storage unit 100, such as frequently used terms, terms found in titles, and terms set by the experimenter, and displays and registers the results of the estimation. Details of the keyword extraction unit 300 will be described later using Figures 7 and 8.

試料識別子抽出部４００は、実験データから材料の識別子を抽出するものであり、実験データ格納部１００のファイルに含まれる文書や表から、＃１，＃２（Ｎｏ．１，Ｎｏ．２あるいは試料ａ，試料ｂなど）といった、どういうプロセスを経て作られた試料であるかを特定する番号などを抽出するものである。試料識別子抽出部４００の詳細に関しては、図９および図１０を用いて後述する。 The sample identifier extraction unit 400 extracts material identifiers from the experimental data, and extracts numbers such as #1, #2 (No. 1, No. 2 or sample a, sample b, etc.) that identify the process through which the sample was made from documents and tables contained in files in the experimental data storage unit 100. Details of the sample identifier extraction unit 400 will be described later using Figures 9 and 10.

データ整形部５００は、実験データ格納部１００のファイルに含まれる文書や表の形式を、定型のデータに変換して保管するものである。索引作成部６００は、時系列情報、キーワードの候補および識別子に基づいて、索引を作成するものである。データカタログ部７００は、データ整形部５００や索引作成部６００で得られた情報を、情報管理システムの利用者が検索できるように、データカタログとして整理して保管するものである。なお、時系列情報だけでなく、キーワードを用いた索引を作成することで、利用者による材料検索の精度が向上するが、識別子は、索引の作成に必須ではない。 The data reforming unit 500 converts the format of documents and tables contained in files in the experimental data storage unit 100 into standardized data and stores it. The index creation unit 600 creates indexes based on time-series information, keyword candidates, and identifiers. The data catalog unit 700 organizes and stores the information obtained by the data reforming unit 500 and index creation unit 600 as a data catalog so that users of the information management system can search for it. Note that by creating an index using keywords as well as time-series information, the accuracy of material searches by users is improved, but identifiers are not essential for creating an index.

また、本実施形態の情報管理システムは、さらに、データ検索部７０１と、パラメータ間の関係性解析部７０２と、単位変換部７０３と、データ統合部７０４と、データベース格納部７０５と、を備える。データ検索部７０１は、利用者が入力した材料に関するキーワードに基づき、データカタログ部７００に保管された情報を検索し、関連性の高い実験結果から順に、その概略を出力するものである。パラメータ間の関係性解析部７０２は、各プロセスのパラメータ間の関連性を解析するものであり、例えば、熱処理のプロセスは混合のプロセスの後に行われた場合、熱処理の温度と混合の時間の関連性を解析する。単位変換部７０３は、利用者の好みに応じて利用し易い単位係に変換するものである。データ統合部７０４は、上述の各データを統合するものであり、データベース格納部７０５は、統合したデータをＭＩやＰＩのデータベースとして格納するものである。なお、本実施形態の情報管理システムでは、データ検索部７０１からデータベース格納部７０５までの構成により、自動でデータベースへの格納までを自動で行うが、利用者が手動で行っても良い。 The information management system of this embodiment further includes a data search unit 701, a parameter relationship analysis unit 702, a unit conversion unit 703, a data integration unit 704, and a database storage unit 705. The data search unit 701 searches for information stored in the data catalog unit 700 based on keywords related to materials entered by the user, and outputs an outline of the experimental results in order of the most relevant. The parameter relationship analysis unit 702 analyzes the relationship between the parameters of each process. For example, if the heat treatment process is performed after the mixing process, it analyzes the relationship between the heat treatment temperature and the mixing time. The unit conversion unit 703 converts the data into a unit system that is easy to use according to the user's preferences. The data integration unit 704 integrates the above-mentioned data, and the database storage unit 705 stores the integrated data as a database of MI and PI. In the information management system of this embodiment, the configuration from the data search unit 701 to the database storage unit 705 automatically stores the data in the database, but the user may do so manually.

図２は、材料関連データベース群８００の構成を示す図である。図２に示すように、本実施形態の材料関連データベース群８００は、プロセス名データベース８０１と、パラメータ名データベース８０２と、単位記号データベース８０３と、装置機器データベース８０４と、を備えている。プロセス名データベース８０１には、材料分野で利用される、熱処理、混合、分離、反応といったプロセスの具体的な名称が格納されている。パラメータ名データベース８０２には、当該プロセスを規定するときに必要となる、流量、時間、温度といったパラメータの名称が格納されている。単位記号データベース８０３には、当該パラメータに関係するｍｌ、℃などの単位が格納されている。また、当該パラメータを表現する場合に慣用的に使用される記号、例えば温度の場合はＴ_ａといった記号も、単位記号データベース８０３には格納されている。装置機器データベース８０４には、材料の実験に利用される装置や機器に関する情報が格納されている。 FIG. 2 is a diagram showing the configuration of the material-related database group 800. As shown in FIG. 2, the material-related database group 800 of this embodiment includes a process name database 801, a parameter name database 802, a unit symbol database 803, and an apparatus/equipment database 804. The process name database 801 stores specific names of processes used in the material field, such as heat treatment, mixing, separation, and reaction. The parameter name database 802 stores names of parameters, such as flow rate, time, and temperature, that are necessary when defining the process. The unit symbol database 803 stores units, such as ml and °C, related to the parameter. The unit symbol database 803 also stores symbols commonly used to express the parameter, such as _Ta for temperature. The apparatus/equipment database 804 stores information about devices and equipment used in material experiments.

また、本実施形態の材料関連データベース群８００は、各データベースに格納された情報の相関関係についても整理する、統合データベースの役割を果たしている。例えば、あるプロセスには、特定のパラメータや単位、記号が関連する場合も多く、こうした関連性を含んだ情報が、材料関連データベース群８００に格納される。また、あるプロセスを行うためには特定の装置が用いられる場合があり、こうした場合は、プロセス名と装置機器とが紐づく形で、材料関連データベース群８００に情報が格納される。なお、材料関連データベース群８００は、予め登録されているものであれば、情報管理システム内にあっても良いし、ネットワークを介して接続可能な外部のサーバ等にあっても良い。 The material-related database group 800 of this embodiment also functions as an integrated database that organizes the correlations between the information stored in each database. For example, certain processes are often related to specific parameters, units, and symbols, and information including such correlations is stored in the material-related database group 800. In addition, a specific device may be used to perform a certain process, and in such cases, information is stored in the material-related database group 800 with the process name and the device linked. Note that the material-related database group 800 may be stored in an information management system or an external server that can be connected via a network, as long as it is registered in advance.

図３は、プロセス時系列抽出部２００の詳細を示す図である。本実施形態のプロセス時系列抽出部２００は、実験データ格納部１００に格納された実験データから、表を読み出して表に含まれるプロセス名等の相対位置を取得するとともに、文書を読み出して文書に含まれるプロセス名等の相対位置を取得する。また、本実施形態のプロセス時系列抽出部２００は、取得したこれらの相対位置を用いて材料処理プロセスの時系列情報を推定し、その結果を実験者などに対して表示する。実験者などによる確認の結果、データの登録が行われると、プロセス時系列抽出部２００は、推定した時系列情報を記憶する。 Figure 3 is a diagram showing details of the process time series extraction unit 200. The process time series extraction unit 200 of this embodiment reads out tables from the experimental data stored in the experimental data storage unit 100 to obtain the relative positions of the process names, etc. contained in the tables, and also reads out documents to obtain the relative positions of the process names, etc. contained in the documents. The process time series extraction unit 200 of this embodiment also uses these obtained relative positions to estimate time series information of the material processing process, and displays the results to the experimenter, etc. After the experimenter, etc. has confirmed the data and registered it, the process time series extraction unit 200 stores the estimated time series information.

ここで、プロセス時系列抽出部２００の具体的な構成について、図３を用いて説明する。本実施形態のプロセス時系列抽出部２００は、まず、表抽出部２０１と、表変換部２０２と、表出力部２０３と、確定表記憶部２０４と、表ヘッダー取得部２０５と、ヘッダー内配置取得部２０６と、を備え、実験データに含まれる表、特にそのヘッダーの記載内容から、プロセスの時系列の推定に必要な情報を取得する。一般に、材料処理プロセスは、上流工程から順番に、表のヘッダーの左から右へ記載されることを考慮したものである。 The specific configuration of the process time series extraction unit 200 will now be described with reference to FIG. 3. The process time series extraction unit 200 of this embodiment first includes a table extraction unit 201, a table conversion unit 202, a table output unit 203, a confirmed table storage unit 204, a table header acquisition unit 205, and a header layout acquisition unit 206, and acquires information required to estimate the process time series from tables included in the experimental data, particularly from the contents of their headers. This takes into account that material processing processes are generally written from left to right in the table header, starting from the upstream process.

表抽出部２０１は、実験データ格納部１００に格納された実験データから表に相当するデータを抽出するものである。ここで、表のデータフォーマットは、実験者によって異なることがあるので、表変換部２０２が定型の表に変換する。表出力部２０３は、定型に変換された表を実験者の端末に出力するものである。なお、本実施形態における実験者には、実際に実験した人だけでなく、実験はしていないが表や実験計画書を作成した人、これらの人を指揮する立場の人、これらの人の指揮の下で作業する人、なども含まれる。 The table extraction unit 201 extracts data corresponding to a table from the experimental data stored in the experimental data storage unit 100. Here, since the data format of the table may differ depending on the experimenter, the table conversion unit 202 converts it into a standard table. The table output unit 203 outputs the table converted into a standard format to the experimenter's terminal. Note that the experimenters in this embodiment include not only those who actually performed the experiment, but also those who did not perform the experiment but created tables or experimental plans, people in a position to supervise these people, and people who work under the supervision of these people.

図４は、表出力部２０３により出力された表の一例を示す図である。図４に示すように、表のヘッダーにはＡ反応、Ｂ混合、Ｃ評価などのプロセス名が記載されており、試料識別子ごとに、各プロセス名に対応したパラメータ量が記載されている。出力された表は、実験者が確認して問題がなければ、確定データとして、確定表記憶部２０４に登録される。 Figure 4 is a diagram showing an example of a table output by the table output unit 203. As shown in Figure 4, the header of the table lists the process names, such as A reaction, B mixing, and C evaluation, and for each sample identifier, the parameter amounts corresponding to each process name are listed. If the experimenter checks the output table and there are no problems, it is registered in the confirmed table storage unit 204 as confirmed data.

次に、表ヘッダー取得部２０５は、確定表の中からヘッダーの内容を取得する。そして、ヘッダー内配置取得部２０６が、ヘッダー内におけるプロセス名などの情報を検索するとともに、検索で抽出したプロセス名などの配置を取得する。このとき、ヘッダー内配置取得部２０６は、プロセス名などの検索に際して、材料関連データベース群８００も参照することが可能となっている。例えば、図４に示す表の場合、ヘッダー内配置取得部２０６は、左から一番目にＡ反応、次にＢ混合、その後にＣ評価、といったプロセス名の配置や、左から一番目に流量、次に時間、その後に収量、といったパラメータ名の配置を取得する。なお、ヘッダー内配置取得部２０６は、プロセス名やパラメータ以外に、物理量や単位または記号、あるいは、実験装置、実験手法などの文字列を、材料関連データベースを用いて特定し、ヘッダー内の位置関係を取得することもできる。物理量や単位などの情報があれば、材料関連データベースを参照することで、どのプロセスと関連するのかある程度分かるためである。 Next, the table header acquisition unit 205 acquires the contents of the header from the confirmed table. Then, the header placement acquisition unit 206 searches for information such as the process name in the header, and acquires the placement of the process name extracted by the search. At this time, the header placement acquisition unit 206 can also refer to the material-related database group 800 when searching for the process name. For example, in the case of the table shown in FIG. 4, the header placement acquisition unit 206 acquires the placement of the process name such as A reaction first from the left, then B mixing, then C evaluation, and the placement of the parameter name such as flow rate first from the left, then time, then yield. In addition to the process name and parameter, the header placement acquisition unit 206 can also identify the physical quantity, unit, or symbol, or the character string such as the experimental apparatus and experimental method, using the material-related database, and acquire the positional relationship in the header. This is because if there is information such as the physical quantity and unit, it is possible to know to some extent which process it is related to by referring to the material-related database.

また、本実施形態の時系列抽出部は、図３に示すように、実験計画書内配置取得部２０７も備えており、実験計画書の文書の記載内容から、プロセスの時系列の推定に必要な情報を取得する。一般に、材料処理プロセスは、上流工程から順番に、文章に記載されることを考慮したものである。本実施形態の実験計画書内配置取得部２０７は、材料関連データベース群８００も参照して、文書内のプロセス名などの情報を検索するとともに、検索で抽出したプロセス名などの配置を取得する。実験計画書内配置取得部２０７は、ヘッダー内配置取得部２０６と同様に、パラメータ名、物理量や単位または記号、あるいは、実験装置、実験手法などを用いて、文書内の位置関係を取得することもできる。なお、実験装置と実験手法が対応している場合には、実験計画書には実験装置か実験手法のどちらかしか記載されないこともある。 As shown in FIG. 3, the time series extraction unit of this embodiment also includes an experimental plan layout acquisition unit 207, which acquires information required for estimating the time series of a process from the contents of the experimental plan document. In general, it is considered that a material processing process is described in a document in order from the upstream process. The experimental plan layout acquisition unit 207 of this embodiment also refers to the material related database group 800 to search for information such as process names in the document and acquire the layout of the process names extracted by the search. The experimental plan layout acquisition unit 207 can also acquire positional relationships in the document using parameter names, physical quantities, units, or symbols, or experimental equipment, experimental methods, etc., in the same way as the header layout acquisition unit 206. Note that when the experimental equipment and the experimental method correspond to each other, the experimental plan may only describe either the experimental equipment or the experimental method.

プロセスの時系列推定部２０８は、ヘッダー内配置取得部２０６で取得したヘッダー内でのプロセス名等の位置関係や、実験計画書内配置取得部２０７で取得した実験計画書内でのプロセス名等の位置関係に基づき、材料処理プロセスの時系列を推定する。プロセスの時系列推定部２０８の推定結果は、プロセスの時系列表示部２０９により、実験者の端末に出力される。 The process time series estimation unit 208 estimates the time series of the material processing process based on the positional relationship of the process name, etc. in the header acquired by the header layout acquisition unit 206 and the positional relationship of the process name, etc. in the experiment plan acquired by the experiment plan layout acquisition unit 207. The estimation result of the process time series estimation unit 208 is output to the experimenter's terminal by the process time series display unit 209.

図５は、プロセスの時系列表示部２０９により表示された、材料処理プロセスの時系列情報の一例を示す図である。図５に示すように、最初にＡ秤量を行い、次にＢ反応があり、更にＣ分留があって、最後にＤ評価、といったプロセスの順番が表示される。ここで、評価のプロセスは、材料の組織や構造がどうなっているかを含めて、Ｘ線装置や走査電子顕微鏡などを使った様々な分析により実施されるので、評価の種類は複数となるのが通例である。したがって、図５では、同じ試料識別子について、Ｄ評価の他に、Ｅ評価も併せて実施されたことを示している。なお、評価以外についても、例えばＢ反応とＦ反応が併せて実施されるなど、同時に複数のプロセスが存在しても良い。すなわち、時系列情報は、プロセスの順番さえ特定できれば、並列を含むツリー状に表示されることもある。 Figure 5 is a diagram showing an example of time series information of a material processing process displayed by the process time series display unit 209. As shown in Figure 5, the order of processes is displayed, such as first A weighing, then B reaction, then C fractional distillation, and finally D evaluation. Here, the evaluation process is performed by various analyses using X-ray devices and scanning electron microscopes, including the texture and structure of the material, so there are usually multiple types of evaluation. Therefore, Figure 5 shows that in addition to D evaluation, E evaluation was also performed for the same sample identifier. Note that, other than evaluation, multiple processes may exist at the same time, such as B reaction and F reaction being performed together. In other words, the time series information may be displayed in a tree shape including parallel processes, as long as the order of the processes can be specified.

推定されたプロセスの時系列は、実験者が確認して問題がなければ、プロセスの時系列記憶部２１０に登録される。また、表示されたプロセスの時系列が間違っている場合は、実験者がデータを修正する。実験者によって修正された履歴は、修正方法記憶部２１１に登録され、学習部２１２を介して、次回以降の推定のために利用される。 The estimated process time series is checked by the experimenter, and if there are no problems, it is registered in the process time series storage unit 210. If the displayed process time series is incorrect, the experimenter corrects the data. The history of corrections made by the experimenter is registered in the correction method storage unit 211 and is used for subsequent estimations via the learning unit 212.

このようなプロセス時系列抽出部２００を有することで、情報管理システムの利用者が、プロセス間の前後関係まで把握できるため、データ利活用時に目的の性能や機能を発現させるための支配因子を容易に推測し、効率的なプロセス設計が可能となる。 Having such a process time series extraction unit 200, users of the information management system can grasp the context between processes, making it easy to infer the governing factors for realizing the desired performance and functionality when utilizing data, enabling efficient process design.

図６は、プロセス時系列抽出部２００が、実験データに含まれる表と、材料関連ＤＢ群と、に基づいて、時系列情報を抽出するステップを示すフロー図である。まず、表抽出部２０１、実験データ格納部１００に格納された実験データに含まれる表を抽出する（ステップＳ９０１）。次に、抽出された表を表変換部２０２が定型のデータに整形し、整形後の表を表出力部２０３が実験者の端末に表示する（ステップＳ９０２）。そして、実験者が出力された表に誤りのないことを確認し（ステップＳ９０３）、誤りがなければ確定表として記憶される（ステップＳ９０４）。誤りがあった場合には、実験者が出力された表を修正し（ステップＳ９０５）、修正後のものが確定表として記憶される。 Figure 6 is a flow diagram showing the steps in which the process time series extraction unit 200 extracts time series information based on tables included in the experimental data and the material-related DB group. First, the table extraction unit 201 extracts tables included in the experimental data stored in the experimental data storage unit 100 (step S901). Next, the table conversion unit 202 formats the extracted tables into standard data, and the table output unit 203 displays the formatted tables on the experimenter's terminal (step S902). The experimenter then checks that the output table is error-free (step S903), and if there are no errors, it is stored as a finalized table (step S904). If there are errors, the experimenter corrects the output table (step S905), and the corrected table is stored as a finalized table.

次に、表ヘッダー取得部２０５が、確定表の中からヘッダーのデータを取得する（ステップＳ９０６）。そして、ヘッダー内配置取得部２０６が、材料関連データベース群８００を参照し、データの相対位置を取得する（ステップＳ９０７）。さらに、プロセスの時系列推定部２０８は、相対位置に基づき、材料処理プロセスの時系列を推定する（ステップＳ９０８）。その後、プロセスの時系列表示部２０９は、推定した時系列を実験者の端末に表示する（ステップＳ９０９）。そして、実験者が表示された時系列に誤りのないことを確認し（ステップＳ９１０）、誤りがなければプロセスの時系列記憶部２１０に記憶される（ステップＳ９１１）。誤りがあった場合には、実験者がプロセスの時系列を修正し（ステップＳ９１２）、修正後のものがプロセスの時系列記憶部２１０に記憶される。 Next, the table header acquisition unit 205 acquires header data from the confirmed table (step S906). Then, the header internal arrangement acquisition unit 206 refers to the material related database group 800 and acquires the relative position of the data (step S907). Furthermore, the process time series estimation unit 208 estimates the time series of the material processing process based on the relative position (step S908). After that, the process time series display unit 209 displays the estimated time series on the experimenter's terminal (step S909). Then, the experimenter checks that there are no errors in the displayed time series (step S910), and if there are no errors, it is stored in the process time series storage unit 210 (step S911). If there is an error, the experimenter corrects the process time series (step S912), and the corrected one is stored in the process time series storage unit 210.

図７は、キーワード抽出部３００の詳細を示す図である。本実施形態のキーワード抽出部３００は、実験データ格納部１００に格納された実験データに含まれる文書や表などから、キーワードの候補を推定し、その結果を実験者に対して表示する。実験者による確認の結果、データの登録が行われると、キーワード抽出部３００は、推定したキーワードを記憶する。 Figure 7 is a diagram showing details of the keyword extraction unit 300. The keyword extraction unit 300 of this embodiment estimates keyword candidates from documents, tables, etc. included in the experimental data stored in the experimental data storage unit 100, and displays the results to the experimenter. When the data is registered as a result of confirmation by the experimenter, the keyword extraction unit 300 stores the estimated keywords.

本実施形態のキーワード抽出部３００は、まず、ユーザ定義情報取得部３０１と、頻出用語取得部３０２と、表ヘッダー取得部３０３と、物理量／単位記号取得部３０４と、装置・手法取得部３０５と、物質情報（組織／化合物／構造式など）取得部３０６と、を備え、これらによってキーワードの候補を推定する。ユーザ定義情報取得部３０１は、実験者が自らキーワードをカスタマイズして定義した場合に、定義された情報を取得するものである。例えば、実験者が自身の名前を記録したいとき、実験の日付を記載したいとき、などに用いられる。頻出用語取得部３０２は、実験データに含まれる文書の中で一定回数以上記載されている用語を、頻出用語として取得するものである。表ヘッダー取得部３０３は、実験データに含まれる表のうち、ヘッダーなど行や列の表題に記載されている用語を、キーワードの可能性が高いとして取得する。物理量／単位記号取得部３０４は、実験データの文書や表のヘッダーに含まれる、物理量（例えば熱量）あるいは、当該物理量の単位（例えばｃａｌ）や慣用的に用いられる記号（例えばＱ）を取得するものである。装置・手法取得部３０５は、実験で使用した装置や手法を取得するものであり、物質情報取得部３０６は、物質の組成、化合物、構造式などを取得するものである。なお、物理量／単位記号取得部３０４や装置・手法取得部３０５が物理量や装置に関する情報を取得する際には、材料関連データベース群８００も参照される。 The keyword extraction unit 300 of this embodiment includes a user-defined information acquisition unit 301, a frequently-used term acquisition unit 302, a table header acquisition unit 303, a physical quantity/unit symbol acquisition unit 304, an apparatus/method acquisition unit 305, and a substance information (organization/compound/structural formula, etc.) acquisition unit 306, which estimate keyword candidates. The user-defined information acquisition unit 301 acquires information defined when an experimenter customizes and defines a keyword. For example, it is used when an experimenter wants to record his/her own name or the date of an experiment. The frequently-used term acquisition unit 302 acquires terms that are written a certain number of times or more in documents included in the experimental data as frequently-used terms. The table header acquisition unit 303 acquires terms that are written in the titles of rows and columns, such as headers, of tables included in the experimental data as being highly likely to be keywords. The physical quantity/unit symbol acquisition unit 304 acquires the physical quantity (e.g., heat quantity) or the unit of the physical quantity (e.g., calorie) or commonly used symbol (e.g., Q) contained in the header of the document or table of the experimental data. The equipment and method acquisition unit 305 acquires the equipment and method used in the experiment, and the material information acquisition unit 306 acquires the composition, compound, structural formula, etc. of the material. Note that when the physical quantity/unit symbol acquisition unit 304 and the equipment and method acquisition unit 305 acquire information related to the physical quantity or the equipment, the material related database group 800 is also referenced.

キーワード推定部３０７は、頻出用語取得部３０２などで取得した情報に基づき、実験データに記載されている物質名（略称や慣用名を含む）、組成、化学式および構造式のうち少なくとも１つを、キーワードの候補として推定する。このとき、キーワード推定部３０７は、専門用語データベース８１０、類義語データベース８２０、オントロジーデータベース８３０などを参照しても良い。そして、キーワード推定部３０７の推定結果は、キーワード表示部３０８により、実験者の端末に表示される。 Based on information acquired by the frequently used term acquisition unit 302 and the like, the keyword estimation unit 307 estimates at least one of the substance names (including abbreviations and common names), compositions, chemical formulas, and structural formulas described in the experimental data as keyword candidates. At this time, the keyword estimation unit 307 may refer to the technical term database 810, the synonym database 820, the ontology database 830, and the like. The estimation results of the keyword estimation unit 307 are then displayed on the experimenter's terminal by the keyword display unit 308.

図８は、キーワード表示部３０８により表示された、材料処理プロセスに関するキーワードの一例を示す図である。表示された材料処理プロセスに関するキーワードは、実験者が確認して問題がなければ、キーワード記憶部３０９に登録される。また、表示された用語の中にキーワードに相当しないものを含む場合は、実験者がデータを修正する。実験者によって修正された履歴は、修正方法記憶部３１０に登録され、学習部３１１を介して、次回以降の推定のために利用される。 Figure 8 shows an example of keywords related to material processing processes displayed by the keyword display unit 308. The displayed keywords related to material processing processes are checked by the experimenter and, if there are no problems, are registered in the keyword storage unit 309. Furthermore, if the displayed terms include ones that do not correspond to keywords, the experimenter corrects the data. The history of corrections made by the experimenter is registered in the correction method storage unit 310 and is used for subsequent estimations via the learning unit 311.

このようなキーワード抽出部３００を有することで、情報管理システムの利用者が、例えば「Ａ装置、Ｂ化合物」と入力して検索した場合でも、装置や化合物がキーワードとして実験データと紐づけられているので、これらのキーワードを含む実験データが出力できる。 By having such a keyword extraction unit 300, even if a user of the information management system enters, for example, "Device A, Compound B" to search, the device and compound are linked to the experimental data as keywords, so experimental data containing these keywords can be output.

図９は、試料識別子抽出部４００の詳細を示す図である。まず、試料識別子推定部４０１が、材料関連データベース群８００を参照しながら、実験データ格納部１００に格納された実験データに含まれる文書や表などに含まれる試料識別子を推定する。そして、試料識別子推定部４０１の推定結果は、試料識別子表示部４０２により、実験者の端末に表示される。 Figure 9 is a diagram showing the details of the sample identifier extraction unit 400. First, the sample identifier estimation unit 401 estimates the sample identifier contained in documents, tables, etc. contained in the experimental data stored in the experimental data storage unit 100 while referring to the material-related database group 800. The estimation result of the sample identifier estimation unit 401 is then displayed on the experimenter's terminal by the sample identifier display unit 402.

図１０は、試料識別子表示部４０２により表示された、実験対象の試料識別子の一例を示す図である。表示された試料識別子は、実験者が確認して問題がなければ、試料識別子記憶部４０３に登録される。また、表示された試料識別子に誤りがあった場合は、実験者がデータを修正する。実験者によって修正された履歴は、修正方法記憶部４０４に登録され、学習部４０５を介して、次回以降の推定のために利用される。 Figure 10 shows an example of the sample identifier of the experimental subject displayed by the sample identifier display unit 402. The displayed sample identifier is checked by the experimenter and, if there is no problem, is registered in the sample identifier storage unit 403. If there is an error in the displayed sample identifier, the experimenter corrects the data. The history of corrections made by the experimenter is registered in the correction method storage unit 404 and is used for subsequent estimations via the learning unit 405.

図１１は、索引作成部６００により作成された索引の一例を示す図である。本実施形態の索引作成部６００は、プロセス時系列抽出部２００で抽出した時系列情報、キーワード抽出部３００で抽出したキーワードの候補、および、試料識別子抽出部４００で抽出した試料識別子に基づいて、索引を作成する。図１１に示すように、例えば、＃１という識別子の付された試料は、プロセスＡ，プロセスＢ，・・・の順番に作られ、各プロセスのパラメータ名がそれぞれ温度、流量・・・で、対応する表のデータがＸファイルのＹシートに保管されており、キーワードとしては反応Ａ、化合物Ｂであることが分かる。また、情報管理システムの利用者が、例えば、ある特定の期間内に行われた実験を検索したい場合でも、キーワードとして日付の情報が登録されていれば、対応の実験データを特定することが可能である。なお、本実施形態における索引作成部６００は、試料識別子も含む索引を作成しているが、プロセス時系列情報およびキーワードだけによる索引を作成しても良い。 Figure 11 is a diagram showing an example of an index created by the index creation unit 600. The index creation unit 600 of this embodiment creates an index based on the time series information extracted by the process time series extraction unit 200, the keyword candidates extracted by the keyword extraction unit 300, and the sample identifier extracted by the sample identifier extraction unit 400. As shown in Figure 11, for example, a sample with an identifier #1 is created in the order of process A, process B, ..., and the parameter names of each process are temperature, flow rate, ..., and the corresponding table data is stored in sheet Y of file X, and the keywords are reaction A and compound B. In addition, even if a user of the information management system wants to search for an experiment performed within a certain period of time, for example, it is possible to identify the corresponding experimental data if date information is registered as a keyword. Note that the index creation unit 600 in this embodiment creates an index that also includes a sample identifier, but an index may be created using only process time series information and keywords.

また、上述の実施形態は、本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。 The above-mentioned embodiment has been described in detail to clearly explain the present invention, and is not necessarily limited to having all of the configurations described.

１００実験データ格納部
２００プロセス時系列抽出部
２０１表抽出部
２０２表変換部
２０３表出力部
２０４確定表記憶部
２０５表ヘッダー取得部
２０６ヘッダー内配置取得部
２０７実験計画書内配置取得部
２０８プロセスの時系列推定部
２０９プロセスの時系列表示部
２１０プロセスの時系列記憶部
２１１修正方法記憶部
２１２学習部
３００キーワード抽出部
３０１ユーザ定義情報主億部
３０２頻出用語取得部
３０３表ヘッダー取得部
３０４物理量／単位記号取得部
３０５装置・手法取得部
３０６物質情報（組成／化合物／構造式）取得部
３０７キーワード推定部
３０８キーワード表示部
３０９キーワード記憶部
３１０修正方法記憶部
３１１学習部
４００試料識別子抽出部
４０１試料識別子推定部
４０２試料識別子表示部
４０３試料識別子記憶部
４０４修正方法記憶部
４０５学習部
５００データ整形部
６００索引作成部
７００データカタログ部
７０１データ検索部
７０２パラメータ間の関係性解析部
７０３単位変換部
７０４データ統合部
７０５データベース格納部
８００材料関連データベース群
８０１プロセス名データベース
８０２パラメータ名データベース
８０３単位記号データベース
８０４装置機器データベース 100 Experimental data storage unit 200 Process time series extraction unit 201 Table extraction unit 202 Table conversion unit 203 Table output unit 204 Confirmation table storage unit 205 Table header acquisition unit 206 Header layout acquisition unit 207 Experimental plan layout acquisition unit 208 Process time series estimation unit 209 Process time series display unit 210 Process time series storage unit 211 Correction method storage unit 212 Learning unit 300 Keyword extraction unit 301 User-defined information storage unit 302 Frequently used term acquisition unit 303 Table header acquisition unit 304 Physical quantity/unit symbol acquisition unit 305 Equipment/method acquisition unit 306 Material information (composition/compound/structural formula) acquisition unit 307 Keyword estimation unit 308 Keyword display unit 309 Keyword storage unit 310 Correction method storage unit 311 Learning unit 400 Sample identifier extraction unit 401 Sample identifier estimation unit 402 Sample identifier display unit 403 Sample identifier storage unit 404 Correction method storage unit 405 Learning unit 500 Data shaping unit 600 Index creation unit 700 Data catalog unit 701 Data search unit 702 Inter-parameter relationship analysis unit 703 Unit conversion unit 704 Data integration unit 705 Database storage unit 800 Material-related database group 801 Process name database 802 Parameter name database 803 Unit symbol database 804 Equipment database

Claims

a storage unit for storing experimental data of materials;
a time series extraction unit for extracting time series information of a material processing process from the experimental data;
a keyword extraction unit that extracts candidates for keywords related to the material treatment process from the experimental data;
and an index creation unit that creates an index for searching for desired experimental data by associating the experimental data with the time-series information or the keyword candidates.

2. The information management system according to claim 1, wherein the time series extraction unit acquires relative positions of a plurality of process names included in the experimental data while referring to a preregistered material related database, and extracts an order of the processes as the time series information based on the acquired relative positions.

3. The information management system according to claim 2, wherein said time series extraction unit has a display unit that displays said time series information and enables a user to register whether or not said time series information needs to be corrected.

2. The information management system according to claim 1, further comprising a sample identifier extraction unit that extracts an identifier of a target sample from the experimental data, wherein the index creation unit creates the index based on the identifier.

The information management system of claim 1, wherein the time series extraction unit searches for character strings of the device and/or experimental method described in the experimental data, and extracts the time series information based on the relative positions of the character strings.

The information management system of claim 1, wherein the keyword extraction unit estimates the titles of rows or columns in a table described in the experimental data as candidates for the keyword.

The information management system of claim 1, wherein the keyword extraction unit estimates at least one of the substance name, composition, chemical formula, and structural formula described in the experimental data as a candidate for the keyword.

2. The information management system according to claim 1, wherein the keyword extraction section displays the estimated keyword candidates, and if an error is found, the keyword candidates are corrected by an experimenter .

9. The information management system according to claim 8 , wherein said keyword extracting section has a learning section that learns based on a history corrected by said experimenter .