JP5847313B2

JP5847313B2 - Information processing device

Info

Publication number: JP5847313B2
Application number: JP2014526682A
Authority: JP
Inventors: 地尋吉村; 由子長坂; 秀貴青木
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2012-07-27
Filing date: 2012-07-27
Publication date: 2016-01-20
Anticipated expiration: 2032-07-27
Also published as: WO2014016951A1; JPWO2014016951A1

Description

本発明は、ＤＭＡ転送によって記憶装置にデータを転送して処理を行う情報処理装置で、特に複数のスレッドを切替えて実行することを特徴とした情報処理装置に関する。 The present invention relates to an information processing apparatus that performs processing by transferring data to a storage device by DMA transfer, and particularly relates to an information processing apparatus that switches and executes a plurality of threads.

コンピュータは、データを格納する記憶装置と、記憶装置からデータを読み出して処理を行う中央処理装置（ＣＰＵ）から構成される。一般的に記憶装置は高速なものほど、ビットあたりの価格（単価）が高価である。また、高速なものほど、単位面積ないしは単位体積あたりのビット数（記録密度）が低い。そこで、ＣＰＵの近くに高速だが高価で容量が小さい記憶装置が用意され、直近で必要なデータが配置される。そして、それだけでは収まりきらないデータは低速だが安価で容量が大きい記憶装置に配置され、必要に応じて両記憶装置間でデータが入れ替られて利用される。このように、記憶装置に速度とコスト若しくは速度と容量のトレードオフの関係があることから、性質の異なる複数種の記憶装置を階層的に利用する、いわゆる記憶階層の概念がコンピュータの世界では幅広く用いられてきた。 The computer includes a storage device that stores data and a central processing unit (CPU) that reads data from the storage device and performs processing. In general, the faster the storage device, the higher the price (unit price) per bit. Also, the higher the speed, the lower the number of bits per unit area or unit volume (recording density). Therefore, a high-speed but expensive and small-capacity storage device is prepared near the CPU, and necessary data is arranged most recently. Data that cannot be accommodated alone is placed in a low-speed but inexpensive and large-capacity storage device, and the data is exchanged between both storage devices as needed. As described above, since storage devices have a trade-off relationship between speed and cost or speed and capacity, a so-called storage hierarchy concept in which a plurality of storage devices having different properties are used hierarchically is widely used in the computer world. Has been used.

この傾向は今日でも不変であるが、コンピュータベンダーが利用可能な記憶素子の種類から、今日では主にレジスタ、キャッシュメモリ、メインメモリ、およびストレージの４階層から成る記憶階層でコンピュータは構成されている。それぞれの階層で用いられている主な記憶素子は、レジスタはフリップフロップ、キャッシュメモリはＳＲＡＭ、メインメモリはＤＲＡＭ、ストレージはＨＤＤであり、各記憶素子の速度とコスト及び容量によって階層を分ける必然性が生まれている。 This trend remains unchanged today, but because of the types of storage elements available to computer vendors, today computers are mainly composed of storage hierarchies consisting of four levels: registers, cache memory, main memory, and storage. . The main storage elements used in each layer are flip-flops for registers, SRAM for cache memory, DRAM for main memory, and HDD for storage, and there is a necessity to divide the layers according to the speed, cost, and capacity of each storage element. Born.

次に、前述した記憶階層に関して、別の面から説明を行う。ＣＰＵはレジスタに格納されているデータに処理を行う。ＣＰＵは、処理すべきデータがレジスタに無い場合、キャッシュを探して、キャッシュに格納されていればキャッシュからデータをレジスタに読み込んでから処理を行う。ＣＰＵは、キャッシュにも処理すべきデータが無い場合は、メインメモリからキャッシュにデータを読み込む。メインメモリにも処理すべきデータが無い場合には、ストレージからメインメモリにデータが読み込まれる。このように、ＣＰＵから見て手近な階層にデータが無い場合、遠くの階層からデータが読み込まれ、そのペナルティは大きくなる。そして、データを読み込んでいる間、ＣＰＵは行うべき処理が出来ないのでその動作が空いてしまい、ＣＰＵ利用率が低下する。データの読み込みに限らず、データの書き込みでも同様の問題が発生する。 Next, the storage hierarchy described above will be described from another aspect. The CPU processes the data stored in the register. When there is no data to be processed in the register, the CPU searches the cache and, if stored in the cache, reads the data from the cache into the register and performs processing. When there is no data to be processed in the cache, the CPU reads data from the main memory into the cache. If there is no data to be processed in the main memory, the data is read from the storage into the main memory. As described above, when there is no data in a layer close to the CPU, data is read from a far layer and the penalty increases. Then, while the data is being read, the CPU cannot perform a process to be performed, so that the operation becomes vacant, and the CPU utilization rate decreases. The same problem occurs not only in reading data but also in writing data.

ここで、データ転送にＤＭＡ（ＤｉｒｅｃｔＭｅｍｏｒｙＡｃｃｅｓｓ）転送を用いるとＣＰＵの介在無しにデータ転送を行うことが出来るため、本来はデータ転送中の空き時間にＣＰＵは他の処理を行うことが出来るはずである。例えば、プログラム上で予めデータが必要になる箇所を特定して、明示的にＤＭＡ転送の指示を埋め込むことで、プログラマーがＤＭＡ転送のタイミングとＣＰＵが行う処理をスケジューリングするような方法も用いられている。しかし、この方法はプログラムのチューニングが煩雑になるという問題を引き起こす。 Here, if DMA (Direct Memory Access) transfer is used for data transfer, data transfer can be performed without the intervention of the CPU. Therefore, the CPU should be able to perform other processing during the idle time during data transfer. It is. For example, a method in which a programmer schedules DMA transfer timing and processing performed by a CPU by specifying a part where data is required in advance in a program and explicitly embedding a DMA transfer instruction is also used. Yes. However, this method causes a problem that program tuning becomes complicated.

これに対処するための技術として、マルチスレッドが用いられてきた。マルチスレッドでは、同時並行的に行うことが出来る処理の単位がスレッドとして定義され、あるスレッドの実行が止まったら、他の実行可能なスレッドが実行される。なお、スレッドの実行が止まるのは、当該スレッドの処理が完了したときだけでなく、当該スレッドで必要となるデータを読み込んでいる時にもスレッドの実行は止まる。つまり、先のデータ読み込みの動作と合わせて考えれば、あるスレッドを実行していて、必要なデータを読み込み始めると、その間は他のスレッドの実行を行う。このようにして、ＣＰＵ利用率を高めることができる。 Multi-threading has been used as a technique for dealing with this. In multithreading, a unit of processing that can be performed concurrently is defined as a thread, and when execution of a certain thread stops, another executable thread is executed. The thread execution stops not only when the processing of the thread is completed, but also when the data required by the thread is read. That is, when considered in conjunction with the previous data reading operation, when a certain thread is executed and necessary data starts to be read, another thread is executed during that time. In this way, the CPU utilization rate can be increased.

このようなＤＭＡ転送とマルチスレッドに関する先行技術として、特許文献１に開示の技術と特許文献２に開示の技術がある。いずれも、ＤＭＡ転送がＣＰＵ上のオンチップメモリ（ローカルメモリとも呼ばれる）とメインメモリ（グローバルメモリとも呼ばれる）の間で行われている。 As prior art relating to such DMA transfer and multithreading, there are a technique disclosed in Patent Document 1 and a technique disclosed in Patent Document 2. In both cases, DMA transfer is performed between an on-chip memory (also referred to as local memory) and a main memory (also referred to as global memory) on the CPU.

特開２００５−１２９００１号公報JP 2005-129001 A 特開２００２−１６３２３９号公報JP 2002-163239 A

ところで、近年、インターネットや各種端末の普及で、大量のデータを容易に取得することが出来るようになってきている。このような大量のデータは、旧来のデータベース管理システムなどで取り扱うことが難しく、ビッグデータという標語の元に種々の技術が開発されている。あらゆるモノがインターネットに接続されるＩｏＴ（ＩｎｔｅｒｎｅｔｏｆＴｈｉｎｇｓ）の時代には、モノで発生したあらゆるイベントに関するデータがインターネットに送信される。つまり、大量のＰＯＥ（ＰｏｉｎｔｏｆＥｖｅｎｔ）データがインターネット上に送信される。このような世界では、インターネットからＰＯＥデータを収集し、モノとモノ、ヒトとヒト、ないしは、モノとヒトがどのような関係にあるのかを分析し、それに基づいて適切なサービスを提供したり、将来を予測したりするなどの利用がなされていくことが期待される。そのためには、コンピュータが大量のデータを高速に処理することができなければならない。 By the way, in recent years, with the spread of the Internet and various terminals, a large amount of data can be easily acquired. Such a large amount of data is difficult to handle with a conventional database management system or the like, and various technologies have been developed under the slogan of big data. In the IoT (Internet of Things) era in which every thing is connected to the Internet, data related to every event that occurs in the thing is transmitted to the Internet. That is, a large amount of POE (Point of Event) data is transmitted on the Internet. In such a world, we collect POE data from the Internet, analyze the relationship between things and things, people and people, or things and people, and provide appropriate services based on them. It is expected to be used for predicting the future. For this purpose, the computer must be able to process a large amount of data at high speed.

コンピュータが大量のデータを高速に処理するためには、処理すべきデータがメインメモリ上に載ることが望ましい。ストレージ上にデータが置かれていると、その読み出しに時間を要する。特に、様々なモノやヒトの関係性を分析しようとする場合、様々なモノやヒトのデータを読み出さなければならない。その都度ストレージへのアクセスが必要になってしまうと、ストレージの読み出しの遅さがネックとなる。しかし、前述したようなビッグデータに対して、それに見合う容量のメインメモリを実現するために大量のＤＲＡＭを並べると、様々な問題を引き起こす。 In order for a computer to process a large amount of data at high speed, it is desirable that the data to be processed be placed on the main memory. When data is placed on the storage, it takes time to read it. In particular, when trying to analyze the relationship between various things and people, the data of various things and people must be read out. If access to the storage becomes necessary each time, the slow read of the storage becomes a bottleneck. However, when a large amount of DRAMs are arranged in order to realize a main memory having a capacity corresponding to the big data as described above, various problems are caused.

ＤＲＡＭはストレージに用いるＨＤＤやフラッシュメモリ、相変化メモリなどと比較して単価が高いため、大量のＤＲＡＭでメインメモリを構成するとコスト増を引き起こす。また、ＤＲＡＭは記録密度でも劣ることから、同容量のＨＤＤやフラッシュメモリと比較して装置が巨大になってしまう。そこで、本願発明者らは、メインメモリとメインメモリからデータを退避させておく記憶領域との間でＤＭＡ転送を行うことで、必要な容量を用意しつつ、読み出し速度の問題を解決することを試みた。 Since the unit price of DRAM is higher than that of HDD, flash memory, phase change memory, etc. used for storage, if a main memory is composed of a large amount of DRAM, the cost increases. Also, since the DRAM is inferior in recording density, the device becomes huge compared to the HDD and flash memory of the same capacity. Therefore, the inventors of the present application solve the problem of reading speed while preparing the necessary capacity by performing DMA transfer between the main memory and the storage area in which data is saved from the main memory. Tried.

ここで、スレッドのスケジューリングは、実行可能なスレッドをキューから取りだして行う単純なＦＩＦＯに基づくことが多い。そのため、各スレッドの実行時間は均一なほうが効率的にスケジューリングすることができるので、処理をスレッドに分割するときには負荷が均等になるように分割することが望ましい。 Here, the scheduling of threads is often based on a simple FIFO that is executed by taking out executable threads from a queue. For this reason, uniform execution time of each thread enables efficient scheduling. Therefore, when dividing a process into threads, it is desirable to divide the processing so that the load is equal.

しかし、コンピュータが普及し、アプリケーションが多様化する中で、必ずしも均等な分割が出来ないアプリケーションもあり問題となる。たとえば、社会科学系の問題を扱うとき、グラフ処理が行われる。グラフは、頂点の集合と、頂点間を結ぶ辺の集合で構成される。社会科学系の問題では、関係性が扱われることが多い。例えば会社間の関係は会社を表現する頂点と、関係を表現する辺で示される。このようなグラフを複数のスレッドで処理するために分割しようとすると、頂点毎にスレッドを割当てて分割する形態が自然である。ところが、頂点毎に分割したときに、各頂点が繋がっている辺の数はばらつきがある。そして、各頂点の処理に要する時間は、各頂点が関係を持っている頂点の数、すなわち、辺の数に比例する。そのため、頂点毎の分割ではスレッド間に処理量のばらつきが生じてしまい、退避させておいたデータをメインメモリにＤＭＡ転送する際に、ＤＭＡ転送されるデータの大きさがばらついてしまう。 However, with the spread of computers and the diversification of applications, there are some applications that cannot always be divided equally. For example, graph processing is performed when dealing with social science problems. The graph is composed of a set of vertices and a set of edges connecting the vertices. Social science issues often deal with relationships. For example, the relationship between companies is indicated by a vertex representing the company and an edge representing the relationship. When such a graph is divided to be processed by a plurality of threads, it is natural that a thread is allocated and divided for each vertex. However, when dividing each vertex, the number of sides connected to each vertex varies. The time required for processing each vertex is proportional to the number of vertices with which each vertex is related, that is, the number of sides. Therefore, in the division for each vertex, the processing amount varies between threads, and when the saved data is DMA-transferred to the main memory, the size of the DMA-transferred data varies.

中でも、社会科学系の問題で登場するグラフはスケールフリー特性と呼ばれる性質を有しているため、このばらつきがより顕著なものとなる。グラフの各頂点が接続されている辺の数を、その頂点の次数と呼ぶ。スケールフリー特性はこの次数の分布が冪乗分布になっていることが特徴で、ごく少数の頂点は極めて次数が大きいが、大多数の頂点は次数が小さいという特徴を持つ。この特徴を前述した処理量のばらつきにあてはめて考えると、社会学系のグラフを処理する時には、非常に処理量の大きい少数のスレッドと、処理量の小さい多数のスレッドを処理することになる。 Above all, graphs that appear due to social science problems have a property called scale-free characteristics, so this variation becomes more prominent. The number of edges to which each vertex of the graph is connected is called the degree of that vertex. The scale-free characteristic is characterized by the fact that the distribution of this order is a power distribution, with a very small number of vertices having a very large degree, but a large number of vertices having a small degree. When this characteristic is applied to the above-described variation in the processing amount, when processing a sociological graph, a small number of threads with a very large processing amount and a large number of threads with a small processing amount are processed.

特許文献１に開示の技術と特許文献２に開示の技術は、ＤＭＡ転送がＣＰＵ上のオンチップメモリとメインメモリの間で行われるものであり、ＤＭＡ転送されるのは直近でＣＰＵが処理するデータに限られるので、上述のデータの大きさのばらつきを解決する技術ではない。 In the technique disclosed in Patent Document 1 and the technique disclosed in Patent Document 2, the DMA transfer is performed between the on-chip memory and the main memory on the CPU. Since it is limited to data, it is not a technique for solving the above-described variation in data size.

本発明は、データの転送量にばらつきのある状況であっても効率的なＤＭＡ転送を実現することを目的とする。 An object of the present invention is to realize efficient DMA transfer even in a situation in which the amount of data transfer varies.

本発明の情報処理装置は、マルチスレッドプロセッサと、第１の記憶装置と、第１の記憶装置との間でＤＭＡ転送を行う第２の記憶装置と、第１の記憶装置に物理アドレス空間を割当て、該物理アドレス空間上に仮想アドレス空間を提供するオペレーティングシステムと、を備え、実行予定のスレッドが該物理アドレス空間上でＤＭＡ転送に必要とする容量に応じてメモリ空間を確保し、該スレッドを実行に移し、該スレッドの処理が終了した後に確保したメモリ空間を開放することで、前述の課題を解決する。 The information processing apparatus according to the present invention includes a multi-thread processor, a first storage device, a second storage device that performs DMA transfer with the first storage device, and a physical address space in the first storage device. An operating system that allocates and provides a virtual address space on the physical address space, and secures a memory space according to a capacity required for DMA transfer on the physical address space by the thread to be executed, The above-mentioned problem is solved by releasing the memory space secured after the processing of the thread is completed.

本発明により、データの転送量にばらつきのある状況であっても効率的なＤＭＡ転送を実現することができ、ひいては情報処理装置の処理を高速化できる。 According to the present invention, efficient DMA transfer can be realized even in a situation in which the amount of data transfer varies, and the processing of the information processing apparatus can be speeded up.

本発明の情報処理システムの構成の例を示す図である。It is a figure which shows the example of a structure of the information processing system of this invention. 本発明の情報処理装置の構成の例を示す図である。It is a figure which shows the example of a structure of the information processing apparatus of this invention. 本発明のＮＶＭサブシステムの構成の例を示す図である。It is a figure which shows the example of a structure of the NVM subsystem of this invention. スレッドとプロセッサの関係の例を説明するための図である。It is a figure for demonstrating the example of the relationship between a thread | sled and a processor. 本発明のプロセスの内部構造の例を説明するための図である。It is a figure for demonstrating the example of the internal structure of the process of this invention. 本発明の情報処理装置の物理アドレス空間と、本発明のプロセスの仮想アドレス空間との対応関係の例を説明するための図である。It is a figure for demonstrating the example of the correspondence of the physical address space of the information processing apparatus of this invention, and the virtual address space of the process of this invention. ｓｔａｃｋ領域とＰＩＮ領域プールの例の詳細を説明するための図である。It is a figure for demonstrating the detail of the example of a stack area | region and a PIN area | region pool. マスターコントロールスレッドがスレッド及びＤＭＡ転送の管理に用いるキューを説明するための図である。It is a figure for demonstrating the queue which a master control thread uses for management of a thread | sled and DMA transfer. スレッド管理セルの構成の例を説明するための図である。It is a figure for demonstrating the example of a structure of a thread management cell. スレッドの状態遷移の例を示す図である。It is a figure which shows the example of the state transition of a thread. ＮＶＭ転送要求管理セルの構成の例を説明するための図である。It is a figure for demonstrating the example of a structure of a NVM transfer request management cell. ＮＶＭ転送要求の状態遷移の例を示す図である。It is a figure which shows the example of the state transition of a NVM transfer request. ユーザレベルマルチスレッディングとＤＭＡ転送の連携を説明するための図である。It is a figure for demonstrating cooperation of user level multithreading and DMA transfer. ユーザレベルマルチスレッディングとＤＭＡ転送の関係を説明するための概念図である。It is a conceptual diagram for demonstrating the relationship between user level multithreading and DMA transfer. マスターコントロールスレッドのスケジューリング動作の例を説明するためのフローチャートである。It is a flowchart for demonstrating the example of the scheduling operation | movement of a master control thread. ユーザレベルマルチスレッディングとＤＭＡ転送の関係を説明するための概念図である。It is a conceptual diagram for demonstrating the relationship between user level multithreading and DMA transfer.

以下、本発明の実施の形態を図面を用いて説明する。なお、実施の形態を説明するための全図において、同一の部材には原則として同一の符号を付し、その繰り返しの説明は省略する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. Note that components having the same function are denoted by the same reference symbols throughout the drawings for describing the embodiment, and the repetitive description thereof will be omitted.

本実施例では、フラッシュメモリや相変化メモリなどの不揮発メモリを利用して、アプリケーションに対して、ＤＲＡＭで構成されるメモリよりも大きなメモリを提供しつつ、不揮発メモリの欠点である速度の遅さをマルチスレッドで解決する情報処理システム１００を説明する。 In this embodiment, a nonvolatile memory such as a flash memory and a phase change memory is used to provide a memory larger than a memory constituted by a DRAM to an application, and the slowness that is a disadvantage of the nonvolatile memory is provided. An information processing system 100 that solves the problem with multi-thread will be described.

図１は、本実施例の情報処理システム１００の構成の例を示す図である。情報処理システム１００は、少なくとも１個以上のノード１１０を有する。ノード１１０は、情報処理装置であり、例えばサーバ装置である。図１の例は、ノード０〜３（符号は１１０）の４ノード構成の例を示している。ノード間はノード間ネットワーク１２０で接続される。また、情報処理システム１００は、ノード間ネットワーク１２０に加えて、後述する不揮発メモリ（ＮＶＭ：Ｎｏｎ−ｖｏｌａｔｉｌｅｍｅｍｏｒｙ）サブシステム間を接続するＮＶＭサブシステムインターコネクト１３０をさらに備えていても良い。 FIG. 1 is a diagram illustrating an example of a configuration of an information processing system 100 according to the present embodiment. The information processing system 100 has at least one node 110. The node 110 is an information processing apparatus, for example, a server apparatus. The example of FIG. 1 shows an example of a four-node configuration of nodes 0 to 3 (reference numeral 110). The nodes are connected by an inter-node network 120. In addition to the inter-node network 120, the information processing system 100 may further include an NVM subsystem interconnect 130 that connects non-volatile memory (NVM) subsystems described later.

図２は、情報処理装置であるノード１１０の構成の例を示す図である。ノード１１０は、プロセッサ２１０、２２０、ＤＩＭＭ２３０、２４０、Ｉ／Ｏハブ２５０、ＮＩＣ２６０、ディスクコントローラ２７０、ＨＤＤ２８０、ＳＳＤ２９０、およびＮＶＭサブシステム３００を備える。メインメモリは、記憶装置であるＤＩＭＭ２３０およびＤＩＭＭ２４０で構成される。ＤＩＭＭ２３０、２４０は、揮発性メモリであるＤＲＡＭで構成される。なお、各ノード１１０が有するプロセッサの数は最小で１個あれば良く、図２のノード１１０は、プロセッサ２１０、２２０の２プロセッサ構成の例である。また、プロセッサ２１０、２２０はそれぞれマルチコアプロセッサであっても良い。図２の例では、各プロセッサが２個のコアを有しており、ノード１１０全体では４コアノードとなっている。さらに、各コアは同時マルチスレッディング（ＳＭＴ：ＳｉｍｕｌｔａｎｅｏｕｓＭｕｌｔｉｔｈｒｅａｄｉｎｇ）に対応していても良い。図２の例では、各コアが２ＳＭＴ対応であり、よって各プロセッサは４個のスレッドを同時に処理する能力を有する。すなわち、各プロセッサはマルチスレッドプロセッサである。なお、これ以降、ハードウェアとして同時に処理することの出来るスレッドをハードウェアスレッドと称する。 FIG. 2 is a diagram illustrating an example of the configuration of the node 110 that is the information processing apparatus. The node 110 includes processors 210 and 220, DIMMs 230 and 240, an I / O hub 250, a NIC 260, a disk controller 270, an HDD 280, an SSD 290, and an NVM subsystem 300. The main memory is composed of DIMM 230 and DIMM 240 which are storage devices. The DIMMs 230 and 240 are composed of DRAM which is a volatile memory. Note that each node 110 has at least one processor, and the node 110 in FIG. 2 is an example of a two-processor configuration of the processors 210 and 220. Further, each of the processors 210 and 220 may be a multi-core processor. In the example of FIG. 2, each processor has two cores, and the node 110 as a whole has four core nodes. Further, each core may support simultaneous multithreading (SMT). In the example of FIG. 2, each core is 2SMT capable, so each processor has the ability to process 4 threads simultaneously. That is, each processor is a multi-thread processor. Hereinafter, threads that can be processed simultaneously as hardware are referred to as hardware threads.

Ｉ／Ｏハブ２５０は、ＮＩＣ２６０、ディスクコントローラ２７０、ＮＶＭサブシステム３００などの各種装置を接続するためのインタフェースを提供する。Ｉ／Ｏハブ２５０は、プロセッサ２１０、２２０とは、それぞれのプロセッサが提供するシステムバスで接続される。接続には、例えば、ＨｙｐｅｒＴｒａｎｓｐｏｒｔのようなバスが用いられる。一方、Ｉ／Ｏハブ２５０は、ＮＩＣ２６０、ディスクコントローラ２７０、ＮＶＭサブシステム３００のような各種装置とは、ＰＣＩＥｘｐｒｅｓｓなどの周辺機器接続用のペリフェラルバスで接続される。本実施例では、Ｉ／Ｏハブ２５０と、ＮＩＣ２６０、ディスクコントローラ２７０、およびＮＶＭサブシステム３００は、ＰＣＩＥｘｐｒｅｓｓで接続されているものとして説明するが、他のインターコネクト手段でも本発明は実施可能である。 The I / O hub 250 provides an interface for connecting various devices such as the NIC 260, the disk controller 270, and the NVM subsystem 300. The I / O hub 250 is connected to the processors 210 and 220 via a system bus provided by each processor. For the connection, for example, a bus such as HyperTransport is used. On the other hand, the I / O hub 250 is connected to various devices such as the NIC 260, the disk controller 270, and the NVM subsystem 300 by a peripheral bus for connecting peripheral devices such as PCI Express. In this embodiment, the I / O hub 250, the NIC 260, the disk controller 270, and the NVM subsystem 300 are described as being connected by PCI Express. However, the present invention can be implemented by other interconnect means. .

従来、コンピュータのメインメモリはＤＩＭＭの容量で決まっていた。そして、メインメモリに入り切らないデータは、ストレージであるＨＤＤやＳＳＤに格納される。ストレージはディスクコントローラを介して接続され、ハードウェア面ではＳＡＳ（ＳｅｒｉａｌＡｔｔａｃｈｅｄＳＣＳＩ）やＳＡＴＡ（ＳｅｒｉａｌＡｄｖａｎｃｅｄＴｅｃｈｎｏｌｏｇｙＡｔｔａｃｈｍｅｎｔ）のようなインタフェースが用いられる。そして、ソフトウェアから見たインタフェースはファイルシステムとなる。アプリケーションはファイルに対して読書きを行うことで、ファイルシステムを経由して、オペレーティングシステムが有するデバイスドライバがディスクコントローラを制御して、ＨＤＤやＳＳＤを読書きする。そのため、複数の階層を経由しないと読書きが出来ず、オーバヘッドが大きい。 Conventionally, the main memory of a computer is determined by the capacity of the DIMM. Data that does not fit in the main memory is stored in the HDD or SSD as storage. The storage is connected via a disk controller, and an interface such as SAS (Serial Attached SCSI) or SATA (Serial Advanced Technology Attachment) is used in terms of hardware. The interface seen from the software is a file system. The application reads / writes the file, and the device driver of the operating system controls the disk controller via the file system to read / write the HDD and SSD. For this reason, reading and writing cannot be performed without going through a plurality of hierarchies, resulting in a large overhead.

それに対して、本実施例の情報処理システム１００は、ＤＩＭＭに比べて大容量の不揮発メモリをＨＤＤやＳＳＤのストレージよりも高速に読書きするために、ＮＶＭサブシステム３００を備えている。ストレージを使用するよりも高速に読み書きが必要な場合には、データをストレージからＮＶＭサブシステム３００に予め読み出しておくことで、高速な読み書きを実現する。なお、不揮発メモリ（Ｎｏｎ−ｖｏｌａｔｉｌｅＭｅｍｏｒｙ）を略記してＮＶＭと記す。 On the other hand, the information processing system 100 according to the present embodiment includes an NVM subsystem 300 in order to read and write a large-capacity nonvolatile memory faster than an HDD or SSD storage compared to a DIMM. When reading / writing is required at higher speed than using the storage, data is read from the storage to the NVM subsystem 300 in advance, thereby realizing high-speed reading / writing. Note that a non-volatile memory is abbreviated as NVM.

図３は、ＮＶＭサブシステム３００の構成の例を示す図である。ＮＶＭサブシステム３００は、ハイブリッドメモリコントローラ３１０と、記憶装置である不揮発性メモリ（ＮＶＭ）３２０と、記憶装置である揮発性メモリ３３０とを備える。ＮＶＭ３２０は、フラッシュメモリや相変化メモリなどの不揮発メモリである。また、揮発性メモリ３３０はＤＲＡＭであり、ＤＩＭＭを流用することができる。ハイブリッドメモリコントローラ３１０は、ＮＶＭ３２０と、揮発性メモリ３３０と、Ｉ／Ｏハブ２５０に接続されている。ハイブリッドメモリコントローラ３１０は、プロセッサ２１０または２２０上で動作するソフトウェアの要求に応じて、ＮＶＭ３２０に格納されているデータをメインメモリであるＤＩＭＭ２３０または２４０にＤＭＡ転送する。また、ハイブリッドメモリコントローラ３１０は、メインメモリであるＤＩＭＭ２３０または２４０に格納されているデータをＮＶＭ３２０にＤＭＡ転送する役割を担う。揮発性メモリ３３０は、ＤＭＡ転送の際に、バッファとして用いられる。また、各ノードが有するハイブリッドメモリコントローラ３１０の間を、前述したようにＮＶＭサブシステムインターコネクト１３０で接続することもできる。この接続により、他のノードのＮＶＭサブシステム３００に格納されているデータにもアクセスが可能になる。 FIG. 3 is a diagram illustrating an example of the configuration of the NVM subsystem 300. The NVM subsystem 300 includes a hybrid memory controller 310, a nonvolatile memory (NVM) 320 that is a storage device, and a volatile memory 330 that is a storage device. The NVM 320 is a nonvolatile memory such as a flash memory or a phase change memory. Further, the volatile memory 330 is a DRAM, and a DIMM can be used. The hybrid memory controller 310 is connected to the NVM 320, the volatile memory 330, and the I / O hub 250. The hybrid memory controller 310 DMA-transfers the data stored in the NVM 320 to the DIMM 230 or 240 as the main memory in response to a request for software operating on the processor 210 or 220. The hybrid memory controller 310 plays a role of DMA-transferring data stored in the DIMM 230 or 240 that is the main memory to the NVM 320. The volatile memory 330 is used as a buffer during DMA transfer. Further, the hybrid memory controller 310 of each node can be connected by the NVM subsystem interconnect 130 as described above. This connection enables access to data stored in the NVM subsystem 300 of another node.

ハイブリッドメモリコントローラ３１０は、メモリマップされたレジスタ（ＭＭＲ：ＭｅｍｏｒｙＭａｐｐｅｄＲｅｇｉｓｔｅｒ）３１１を有する。ＭＭＲ３１１は、プロセッサ２１０、２２０上で動作するソフトウェアがハイブリッドメモリコントローラ３１０に対してＤＭＡ転送の指示を行うためのレジスタである。ＰＣＩＥｘｐｒｅｓｓでは周辺装置が有するレジスタを、メインメモリと同じメモリ空間上にマップすることが可能である。そのため、ＭＭＲ３１１に対して、ソフトウェアは、メインメモリに読書きするのと同様に、プロセッサ２１０、２２０のロード・ストア命令でアクセスすることができる。 The hybrid memory controller 310 has a memory-mapped register (MMR: Memory Mapped Register) 311. The MMR 311 is a register used by software operating on the processors 210 and 220 to instruct the hybrid memory controller 310 to perform DMA transfer. In PCI Express, it is possible to map the registers of peripheral devices in the same memory space as the main memory. Therefore, the software can access the MMR 311 with the load / store instructions of the processors 210 and 220 in the same manner as when reading and writing to the main memory.

ノード１１０上では、仮想記憶に対応したオペレーティングシステムが動作する。ノード１１０は、前述のように複数のコアから構成されるが、全てのコアが単一のメインメモリを共有する対称型マルチプロセッシング（ＳＭＰ：ＳｙｍｍｅｔｒｉｃＭｕｌｔｉｐｒｏｃｅｓｓｉｎｇ）の構成である。そのため、ノード１１０では単一のオペレーティングシステムが動作することになる。これ以降、ノード１１０でオペレーティングシステムを１個動作させている、シングルシステムイメージを前提として実施例を説明する。各ノード１１０で動作するオペレーティングシステムは、各ノード１１０のＤＩＭＭ２３０、２４０で構成されるメインメモリに物理アドレス空間を割当て、該物理アドレス空間上に仮想アドレス空間を提供する。 On the node 110, an operating system corresponding to the virtual memory operates. The node 110 is composed of a plurality of cores as described above, but has a configuration of symmetric multiprocessing (SMP) in which all the cores share a single main memory. Therefore, a single operating system operates on the node 110. Hereinafter, an embodiment will be described on the premise of a single system image in which one operating system is operated on the node 110. The operating system operating on each node 110 allocates a physical address space to the main memory configured by the DIMMs 230 and 240 of each node 110 and provides a virtual address space on the physical address space.

図４は、ノード１１０が持つ複数のコアと、オペレーティングシステムがユーザに提供する資源との関係を示した説明図である。各ノード１１０は、２プロセッサ／ノード、２コア／プロセッサ、２ＳＭＴ／コアの構成であるため、それぞれのノード全体では２×２×２＝８ハードウェアスレッドの資源がある。 FIG. 4 is an explanatory diagram showing a relationship between a plurality of cores of the node 110 and resources provided to the user by the operating system. Since each node 110 has a configuration of 2 processors / nodes, 2 cores / processors, and 2 SMT / cores, each node has resources of 2 × 2 × 2 = 8 hardware threads.

仮想記憶に対応したオペレーティングシステムでは、プロセッサのＭＭＵ（ＭｅｍｏｒｙＭａｎａｇｅｍｅｎｔＵｎｉｔ）が有するアドレス変換機構を活用して、アプリケーション用の仮想アドレス空間（ユーザ空間）と、オペレーティングシステムのカーネルが動くための仮想アドレス空間（カーネル空間）を分離して、システムのセキュリティや頑強性を確保している。ユーザ空間では、プロセスという単位毎に独立した仮想アドレス空間を持っている。一般的に、このようなプロセスという概念がある環境下でのスレッドは、プロセスに従属する形態をとる。すなわち、各プロセスが１個以上のスレッドを有し、各スレッドは親となるプロセスの仮想アドレス空間を共有して動作する。 In an operating system that supports virtual memory, a virtual address space (user space) for applications and a virtual address space for operating the kernel of the operating system by utilizing an address translation mechanism possessed by the MMU (Memory Management Unit) of the processor (Kernel space) is separated to ensure system security and robustness. The user space has an independent virtual address space for each unit of process. In general, a thread in an environment having such a concept of a process takes a form dependent on the process. That is, each process has one or more threads, and each thread operates by sharing the virtual address space of the parent process.

また、単一のオペレーティングシステムが複数のコアを管理するような場合、オペレーティングシステムは何らかの形でコアを抽象化し、プロセスに割当てることで、アプリケーションが複数のコアを利用可能な環境を提供しなければならない。そのような目的のためにも、スレッドという概念は用いられている。図４に示すように、オペレーティングシステムはカーネルレベルスレッドを各プロセスに提供している。 Also, when a single operating system manages multiple cores, the operating system must abstract the core in some way and assign it to a process, so that the application can provide an environment where multiple cores can be used. Don't be. For this purpose, the concept of threads is used. As shown in FIG. 4, the operating system provides a kernel level thread to each process.

一般的に、スレッドを実行するときには、実行すべきスレッドの数（Ｎ）に対して、ハードウェアスレッドの数（Ｍ）は限られる。ＮがＭ以下の場合には、両者を一対一対応させることが出来るが、ＮがＭより大きいの場合には切替えが必要となる。この切替えが、コンテキスト切替えである。しかし、カーネルレベルスレッドでコンテキスト切替えを行うためには、仮想アドレス空間をユーザ空間からカーネル空間に切替えて、カーネルの中でコンテキスト切替えの処理を行う必要があるため、コンテキスト切替えのオーバヘッドが大きいことが問題となる。 Generally, when executing threads, the number of hardware threads (M) is limited with respect to the number of threads to be executed (N). When N is less than or equal to M, the two can be in a one-to-one correspondence, but when N is greater than M, switching is necessary. This switching is context switching. However, in order to perform context switching with a kernel-level thread, it is necessary to switch the virtual address space from the user space to the kernel space and perform context switching processing in the kernel, so the context switching overhead is large. It becomes a problem.

そこで本実施例の情報処理システム１００では、図４に示すようにカーネルレベルスレッドとハードウェアスレッドを一対一に対応させて利用する。すなわち、Ｎ＝Ｍの関係とする。しかし、このままではＮの数がＭの数に制約を受けてしまうため、アプリケーションが必要とするスレッド数を確保できない。スレッド数が小さいと、後述するＤＭＡ転送を隠蔽する余地も少なくなってしまうので、システム全体として効率が低下する。そのため、カーネルでのコンテキスト切替えを避けつつ、大量のスレッドを利用可能にする（Ｎを大きくする）方法が必要となる。 Therefore, in the information processing system 100 according to the present embodiment, as shown in FIG. 4, kernel level threads and hardware threads are used in a one-to-one correspondence. That is, a relationship of N = M is established. However, since the number of N is restricted by the number of M, the number of threads required by the application cannot be secured. When the number of threads is small, there is less room for concealing DMA transfer, which will be described later, and the efficiency of the entire system is lowered. Therefore, there is a need for a method for making a large number of threads available (increasing N) while avoiding context switching in the kernel.

これに対処するために、本実施例の情報処理システム１００では、図４のプロセス４１０、４２０の各プロセスに、図５に示すようにマスターコントロールスレッド５５０とユーザレベルスレッドを設けることを特徴としている。 In order to cope with this, the information processing system 100 according to the present embodiment is characterized in that a master control thread 550 and a user level thread are provided in each of the processes 410 and 420 in FIG. 4 as shown in FIG. .

図５に示したように、本実施例の情報処理システム１００では、プロセス４１０はスレッド間共有リソース５１０と、カーネルから割当てられた少なくとも２個以上のカーネルレベルスレッドを持つ。そして、複数のカーネルレベルスレッドのうちの１個に、マスターコントロールスレッド５５０が固定的に割当てられる。また、アプリケーションが必要とするユーザレベルスレッドが時分割でカーネルレベルスレッドに割当てられる。 As shown in FIG. 5, in the information processing system 100 of this embodiment, the process 410 has an inter-thread shared resource 510 and at least two or more kernel level threads allocated from the kernel. A master control thread 550 is fixedly assigned to one of the plurality of kernel level threads. In addition, user level threads required by the application are allocated to kernel level threads in a time division manner.

マスターコントロールスレッド５５０は、プロセス４１０が存続している期間、常にカーネルレベルスレッドを１個占有して動き続けるスレッドであり、ユーザレベルスレッドのコンテキスト切替えや、スケジューリング、及び、スレッド間共有リソースの管理などを行う。カーネルレベルスレッドとは異なり、マスターコントロールスレッド５５０がこれらの処理をプロセス４１０の中で行うことで、カーネル空間への切替えを発生させることなく、より高速にコンテキスト切替えを実現することができる。すなわち、マスターコントロールスレッド５５０により、大量のスレッドを利用しつつ、コンテキスト切替えの高速化を実現できる。 The master control thread 550 is a thread that always occupies one kernel level thread and continues to run for the duration of the process 410, such as context switching of user level threads, scheduling, and management of shared resources between threads. I do. Unlike the kernel level thread, the master control thread 550 performs these processes in the process 410, so that context switching can be realized at higher speed without causing switching to the kernel space. That is, the master control thread 550 can speed up context switching while using a large number of threads.

前述したように、スレッドはプロセスの資源を共有して動作する。スレッドでは、各スレッドが固有のスタックを持ち、それ以外の領域は他スレッドと共有するモデルが用いられる。本実施例の情報処理システム１００では、後述のＤＭＡ転送を行うために、さらに、後述するＰＩＮ領域の資源配分をユーザレベルスレッドに対して行う。 As described above, threads operate by sharing process resources. In the thread, a model is used in which each thread has a unique stack and other areas are shared with other threads. In the information processing system 100 according to the present embodiment, in order to perform a DMA transfer described later, a PIN area resource allocation described later is further performed for a user level thread.

図６は、ノード１１０が有する物理アドレス空間６１０と、ノード１１０上で動作するプロセス４１０が有する仮想アドレス空間６２０との対応を記した説明図である。物理アドレス空間６１０に配置される領域は、基本的にはノード１１０の物理的な構成部材と一対一に対応している。ＤＲＡＭ領域６１１はＤＩＭＭ２３０、２４０に対応している。また、ＭＭＩＯ（ＭｅｍｏｒｙＭａｐｐｅｄＩｎｐｕｔ／Ｏｕｔｐｕｔ）領域６１２は、前述したＭＭＲ３１１が配置される領域である。これらの領域はいずれもページと呼ばれる単位で管理される。一般的には１ページが４ＫＢの大きさである。 FIG. 6 is an explanatory diagram showing the correspondence between the physical address space 610 included in the node 110 and the virtual address space 620 included in the process 410 operating on the node 110. The area arranged in the physical address space 610 basically corresponds one-to-one with the physical components of the node 110. The DRAM area 611 corresponds to the DIMMs 230 and 240. An MMIO (Memory Mapped Input / Output) area 612 is an area where the MMR 311 described above is arranged. All of these areas are managed in units called pages. In general, one page is 4 KB in size.

プロセス４１０の仮想アドレス空間６２０は、ｔｅｘｔ領域６２１、ｄａｔａ領域６２２、ｍｍｉｏ領域６２３、およびｓｔａｃｋ領域６２４に大別できる。本実施例のプロセス４１０では、それらに加えてさらにＰＩＮ領域プール５１６を有する。 The virtual address space 620 of the process 410 can be roughly divided into a text area 621, a data area 622, a mmio area 623, and a stack area 624. In addition to these, the process 410 of this embodiment further has a PIN area pool 516.

仮想アドレス空間６２０内の各領域に関して、図５に示すプロセス４１０の内部構造と、図６に示すプロセス４１０の仮想アドレス空間６２０を対応させながら以下に説明する。 Each area in the virtual address space 620 will be described below by associating the internal structure of the process 410 shown in FIG. 5 with the virtual address space 620 of the process 410 shown in FIG.

プロセス４１０は、ユーザレベルスレッド間で共有する様々なリソースをスレッド間共有リソース５１０として有している。スレッド間共有リソース５１０には、プログラムコード５１１、グローバル変数５１２、ヒープ領域５１３、スレッド管理情報５１４、ＮＶＭ転送要求管理情報５１５、およびＰＩＮ領域プール５１６が含まれる。プログラムコード５１１は、スレッドが実行すべきプログラムの命令列であり、これは仮想アドレス空間６２０上ではｔｅｘｔ領域６２１に配置される。グローバル変数５１２は、プロセス４１０内で動作する任意のサブルーチンやスレッドが共通して利用する変数であり、ｄａｔａ領域６２２に配置される。ヒープ領域５１３は、プログラムが動的にメモリを確保するときのリソースプールであり、ｄａｔａ領域６２２に配置される。スレッド管理情報５１４は、詳細は後述するが、スレッドを管理するためにスレッド毎に必要な情報を記憶するためのものであり、主にマスターコントロールスレッド５５０に利用されるが、ユーザレベルスレッドからアクセス出来る必要があるためグローバル変数５１２と同様の性質を持つことになり、ｄａｔａ領域６２２に配置される。ＮＶＭ転送要求管理情報５１５は、後述するＤＭＡ転送を管理するための情報で、スレッド管理情報５１４と同様の理由により、ｄａｔａ領域６２２に配置される。ｓｔａｃｋ領域は、ローカル変数やサブルーチンのパラメータ渡しのために用いるスタックを用意するための領域であり、後述するように各スレッドに配分される。 The process 410 has various resources shared between user level threads as an inter-thread shared resource 510. The inter-thread shared resource 510 includes a program code 511, a global variable 512, a heap area 513, thread management information 514, NVM transfer request management information 515, and a PIN area pool 516. The program code 511 is an instruction sequence of a program to be executed by a thread, and is arranged in the text area 621 on the virtual address space 620. The global variable 512 is a variable that is commonly used by any subroutine or thread that operates in the process 410, and is arranged in the data area 622. The heap area 513 is a resource pool when the program dynamically secures memory, and is arranged in the data area 622. Although details will be described later, the thread management information 514 is used to store information necessary for each thread to manage the thread, and is mainly used for the master control thread 550, but is accessed from the user level thread. Since it must be possible, it has the same properties as the global variable 512 and is placed in the data area 622. The NVM transfer request management information 515 is information for managing DMA transfer described later, and is placed in the data area 622 for the same reason as the thread management information 514. The stack area is an area for preparing a stack used for passing parameters of local variables and subroutines, and is allocated to each thread as will be described later.

図６には、物理アドレス空間６１０と仮想アドレス空間６２０との対応関係が破線で示されている。図６に示したように、ページ単位で物理アドレス空間６１０と仮想アドレス空間６２０はマッピングされるが、仮想アドレス空間６２０上は存在するのに、物理アドレス空間６１０に対応するページが存在しないような仮想アドレス空間６２０上のページがあることに留意されたい。これがいわゆる仮想記憶を実現している仕組みであり、仮想アドレス空間６２０にはページが存在していても、それは直接ＤＲＡＭに載っているとは限らず、ＨＤＤやＳＳＤの中にページアウトされている可能性がある。このようなページにアクセスすると、ＭＭＵがページフォルト例外を発生させ、オペレーティングシステムがＨＤＤやＳＳＤから退避されているページを読み出してページインさせる。このように、仮想記憶を採用している情報処理システム１００では、プロセスから見て存在しているメモリ領域（ページ）が、必ずしも物理的なメモリ上に存在している訳ではないという特徴がある。この特徴がＤＭＡ転送に及ぼす影響について、後ほど説明する。 In FIG. 6, the correspondence between the physical address space 610 and the virtual address space 620 is indicated by a broken line. As shown in FIG. 6, the physical address space 610 and the virtual address space 620 are mapped on a page basis, but there is no page corresponding to the physical address space 610 even though it exists on the virtual address space 620. Note that there are pages on the virtual address space 620. This is a mechanism that realizes so-called virtual storage. Even if a page exists in the virtual address space 620, it is not always directly stored in the DRAM, but is paged out in the HDD or SSD. there is a possibility. When such a page is accessed, the MMU generates a page fault exception, and the operating system reads the page saved from the HDD or SSD and causes it to page in. As described above, the information processing system 100 adopting the virtual memory has a feature that the memory area (page) existing from the viewpoint of the process does not necessarily exist on the physical memory. . The effect of this feature on DMA transfer will be described later.

図７は、仮想アドレス空間６２０のＰＩＮ領域プール５１６およびｓｔａｃｋ領域６２４のスレッドとの関係を示している。ＰＩＮ領域プール５１６とｓｔａｃｋ領域６２４は、いずれも利用されるスレッド毎に分割される。しかし、ｓｔａｃｋ領域６２４が必ずプロセス４１０が有する全てのスレッド（マスターコントロールスレッド、及び、ユーザレベルスレッド）に対応したｓｔａｃｋ領域を持つのに対して、ＰＩＮ領域プール５１６はその領域の大きさに応じて一部のスレッドに対応する領域しか持たない。これは、ｓｔａｃｋ領域６２４は、仮想記憶の機構を利用して、仮想アドレス空間の許す限り領域を確保することが出来るのに対して、ＰＩＮ領域プール５１６は、後述する理由で、物理アドレス空間６１０のＤＲＡＭ領域６１１と対応しているページを確保できる分しか用意しないためである。 FIG. 7 shows the relationship between the PIN area pool 516 and the stack area 624 in the virtual address space 620. Both the PIN area pool 516 and the stack area 624 are divided for each thread to be used. However, the stack area 624 always has a stack area corresponding to all the threads (master control thread and user level thread) of the process 410, whereas the PIN area pool 516 has a size corresponding to the size of the area. It has only an area corresponding to some threads. This is because the stack area 624 can secure an area as much as the virtual address space allows by using a virtual storage mechanism, whereas the PIN area pool 516 has a physical address space 610 for the reason described later. This is because only the amount that can secure the page corresponding to the DRAM area 611 is prepared.

以下に、情報処理システム１００のユーザレベルマルチスレッディングとＤＭＡ転送を連携させる機構について説明する。ユーザレベルマルチスレッディングとＤＭＡ転送の連携により、例えば、背景技術で述べたような大規模なグラフ処理を高速化できる。 Hereinafter, a mechanism for linking user-level multithreading and DMA transfer in the information processing system 100 will be described. By linking user-level multithreading and DMA transfer, for example, large-scale graph processing as described in the background art can be accelerated.

ここで、ユーザレベルマルチスレッディングとは、プロセス４１０の中で複数のスレッド（ユーザレベルスレッド）を切替えながら動作させていくことを言う。スレッド切替えに必要な処理がプロセス４１０の中で完結するので、カーネルレベルスレッドの切替えよりも高速である。一方で、ユーザレベルマルチスレッディングでは、スレッドの管理もプロセス４１０の中で行われる。本実施例の情報処理システム１００では、マスターコントロールスレッド５５０がスレッドの管理の役割を担う。 Here, the user level multithreading means that a plurality of threads (user level threads) are operated while being switched in the process 410. Since processing necessary for thread switching is completed in the process 410, the processing is faster than kernel level thread switching. On the other hand, in user level multithreading, thread management is also performed in the process 410. In the information processing system 100 of this embodiment, the master control thread 550 plays a role of thread management.

図８は、マスターコントロールスレッド５５０によるユーザレベルスレッド、及び、ＤＭＡ転送を管理するためのキューを示す図である。これらのキューはスレッド管理情報５１４、及び、ＮＶＭ転送要求管理情報５１５としてメモリ上に置かれている。 FIG. 8 is a diagram showing a user level thread by the master control thread 550 and a queue for managing DMA transfer. These queues are stored in the memory as thread management information 514 and NVM transfer request management information 515.

スレッド管理情報５１４は、ＲＥＡＤＹキュー８１０、ＩＯＷＡＩＴキュー８１１、ＮＶＭＷＡＩＴキュー８１２、およびＦＩＮキュー８１３を含む。各キューにエンキューされるエントリは、図９に示すスレッド管理セル９００である。スレッド管理セル９００は、Ｖａｌｉｄフラグ９０１、スレッドＩＤ９０２、スレッド状態９０３、退避コンテキスト９０４、退避スタックポインタ９０５、退避プログラムカウンタ９０６、バッファ要求フラグ９０７、バッファ要求サイズ９０８、バッファ割当てフラグ９０９、およびバッファ領域先頭アドレス９１０から構成される。 The thread management information 514 includes a READY queue 810, an IOWAIT queue 811, an NVMWAIT queue 812, and a FIN queue 813. The entry enqueued in each queue is the thread management cell 900 shown in FIG. The thread management cell 900 includes a valid flag 901, a thread ID 902, a thread state 903, a save context 904, a save stack pointer 905, a save program counter 906, a buffer request flag 907, a buffer request size 908, a buffer allocation flag 909, and a buffer area head. It consists of an address 910.

Ｖａｌｉｄフラグ９０１は、当該スレッド管理セル９００が有効であるかどうかを示すフラグである。スレッドＩＤ９０２は、スレッドを一意に識別するための識別子であり、後述するＤＭＡ転送とスレッドのスケジューリングを連携させるという本発明の特徴となる動作を実現させるために用いられる。スレッド状態９０３は、スレッドが現在どのような状態にあるかを示すための情報である。スレッドの状態に関しては後で詳しく説明する。 The Valid flag 901 is a flag indicating whether or not the thread management cell 900 is valid. The thread ID 902 is an identifier for uniquely identifying a thread, and is used to realize an operation that is a feature of the present invention in which DMA transfer and thread scheduling described later are linked. The thread state 903 is information for indicating what state the thread is currently in. The thread state will be described in detail later.

退避コンテキスト９０４、退避スタックポインタ９０５、および退避プログラムカウンタ９０６は、スレッドの実行に用いられる情報であり、スレッドを停止させるときにプロセッサ２１０、２２０上のレジスタからスレッド管理セル９００に退避させた情報である。バッファ要求フラグ９０７、バッファ要求サイズ９０８、バッファ割当てフラグ９０９、およびバッファ領域先頭アドレス９１０は、後述するＤＭＡ転送のために用いられるものであり、詳細は後述する。 The save context 904, the save stack pointer 905, and the save program counter 906 are information used for executing a thread, and are information saved from the registers on the processors 210 and 220 to the thread management cell 900 when the thread is stopped. is there. The buffer request flag 907, the buffer request size 908, the buffer allocation flag 909, and the buffer area head address 910 are used for DMA transfer described later, and will be described in detail later.

ＲＥＡＤＹキュー８１０は、実行可能なユーザレベルスレッドのスレッド管理セル９００がエンキューされている。マスターコントロールスレッド５５０は、プロセス４１０が有するカーネルレベルスレッドに空きがある場合、または、他のユーザレベルスレッドが停止する場合に、ＲＥＡＤＹキュー８１０からスレッド管理セル９００をデキューする。そして、デキューしたスレッド管理セル９００に含まれる退避コンテキスト９０４、退避スタックポインタ９０５、および退避プログラムカウンタ９０６を用いてコンテキスト切替えを行い、スレッドの実行を開始する。 In the READY queue 810, a thread management cell 900 of an executable user level thread is enqueued. The master control thread 550 dequeues the thread management cell 900 from the READY queue 810 when a kernel level thread included in the process 410 is free or when another user level thread is stopped. Then, context switching is performed using the save context 904, save stack pointer 905, and save program counter 906 included in the dequeued thread management cell 900, and the execution of the thread is started.

実行中（ＲＵＮ）でもなく、実行可能状態（ＲＥＤＡＹ）でも無いスレッドは、何らかの待ち状態にある。マスターコントロールスレッド５５０は、その待ち状態を管理するために、ＩＯＷＡＩＴキュー８１１とＮＶＭＷＡＩＴキュー８１２を用いる。特に、ＮＶＭＷＡＩＴキュー８１２を有することが、本実施例の情報処理システム１００の特徴である。 A thread that is neither executing (RUN) nor being ready (READY) is in some waiting state. The master control thread 550 uses the IOWAIT queue 811 and the NVMWAIT queue 812 to manage the waiting state. In particular, having the NVMWAIT queue 812 is a feature of the information processing system 100 of the present embodiment.

ＩＯＷＡＩＴキュー８１１は、オペレーティングシステムにシステムコールで要求したＩ／Ｏの完了を待っているスレッドが格納されているキューである。ユーザレベルスレッドは、ファイルへのアクセスなどオペレーティングシステムの機能を使ったＩ／Ｏの要求を行う場合に、システムコールを発行してそれをオペレーティングシステムに伝える。その後、システムコールを発行したスレッドが当該システムコールの実行が完了するまで行うべき処理が無い場合、つまり当該システムコールの完了を待って処理を再開する場合には、マスターコントロールスレッド５５０は、このＩＯＷＡＩＴキュー８１１にシステムコールを発行したスレッドを退避させる。システムコールの実行が完了すると、マスターコントロールスレッド５５０は、退避させていたスレッドをＲＥＡＤＹキュー８１０に移行する。 The IOWAIT queue 811 is a queue in which threads waiting for completion of I / O requested by the system call to the operating system are stored. When a user level thread makes an I / O request using an operating system function such as access to a file, it issues a system call and informs the operating system of it. Thereafter, when there is no processing to be performed until the thread that issued the system call completes the execution of the system call, that is, when the processing is resumed after the completion of the system call, the master control thread 550 makes this IOWAIT The thread that issued the system call is saved in the queue 811. When the execution of the system call is completed, the master control thread 550 shifts the saved thread to the READY queue 810.

ＮＶＭＷＡＩＴキュー８１２は、メインメモリと不揮発メモリ（ＮＶＭ）３２０の間のＤＭＡ転送の完了を待っているスレッドが格納されているキューである。本実施例の情報処理システム１００の想定されるユースケースでは、例えば大規模グラフ処理のために大量の頂点のそれぞれに対応するスレッドが実行される。その際、スレッドの数が膨大となるため、全てのスレッドで処理に必要なデータを一度にメインメモリに載せておくことは難しい。そこで、情報処理システム１００は、データを不揮発メモリ（ＮＶＭ）３２０に格納しておき、スレッドを実行する際にＤＭＡ転送でＮＶＭ３２０からメインメモリに持ってくる。メインメモリがＤＲＡＭで構成されるのに対し、ＮＶＭ３２０はフラッシュメモリや相変化メモリで構成されるため、ＤＲＡＭに比較して安価に大容量が実現される。しかし、このＤＭＡ転送にも時間を要するため、その間、ＤＭＡ転送でデータを要求したスレッドで他に実行可能な処理が無い場合、マスターコントロールスレッド５５０は、当該スレッドをＮＶＭＷＡＩＴ８１２キューに退避する。ＤＭＡ転送が完了すると、マスターコントロールスレッド５５０は、ＮＶＭＷＡＩＴ８１２キューに退避させたスレッドをＲＥＡＤＹキュー８１０に移行する。 The NVMWAIT queue 812 stores a thread waiting for completion of DMA transfer between the main memory and the non-volatile memory (NVM) 320. In the assumed use case of the information processing system 100 of the present embodiment, for example, a thread corresponding to each of a large number of vertices is executed for large-scale graph processing. At that time, since the number of threads becomes enormous, it is difficult to put data necessary for processing in all threads at once in the main memory. Therefore, the information processing system 100 stores data in the non-volatile memory (NVM) 320 and brings it from the NVM 320 to the main memory by DMA transfer when executing a thread. Whereas the main memory is composed of DRAM, the NVM 320 is composed of flash memory and phase change memory, so that a large capacity can be realized at a lower cost than DRAM. However, since this DMA transfer also takes time, if there is no other process that can be executed by the thread that requested the data in the DMA transfer, the master control thread 550 saves the thread in the NVMWAIT812 queue. When the DMA transfer is completed, the master control thread 550 shifts the thread saved in the NVMWAIT 812 queue to the READY queue 810.

ＦＩＮキュー８１３は、実行が完了し不要となったスレッド管理セル９００を収集するためのキューである。スレッド管理セル９００はスレッド毎に存在する。そのため、動的な大規模グラフ処理などで、大量のスレッドの生成と破棄を繰り返すような場合、スレッド管理セル９００をその都度、ヒープ領域５１３からメモリの割り当てを受けて生成、ないしは、ヒープ領域５１３に領域を返却していては、そのオーバヘッドが大きい。そのため、マスターコントロールスレッド５５０は、使用済みのスレッド管理セル９００をＦＩＮキュー８１３に収集し、必要に応じてＦＩＮキュー８１３からスレッド管理セル９００を再利用する。 The FIN queue 813 is a queue for collecting the thread management cells 900 that have become unnecessary after execution is completed. A thread management cell 900 exists for each thread. Therefore, when a large number of threads are repeatedly generated and destroyed in dynamic large-scale graph processing or the like, the thread management cell 900 is generated by receiving memory allocation from the heap area 513 each time, or heap area 513 is generated. If the area is returned to, the overhead is large. Therefore, the master control thread 550 collects the used thread management cell 900 in the FIN queue 813 and reuses the thread management cell 900 from the FIN queue 813 as necessary.

図１０は、ここまでに説明したスレッドの各種状態を状態遷移図としてまとめたものである。図１０の（１）で新規にスレッドが生成されると、マスターコントロールスレッド５５０は、新規スレッドのスレッド管理セル９００を、ＲＥＡＤＹキュー８１０にエンキューする。図１０の（２）で、マスターコントロールスレッド５５０は、スレッドを実行する資源に空きが生じた場合、すなわち、既に実行していたユーザレベルスレッドが停止（ＲＥＡＤＹキュー８１０、ＩＯＷＡＩＴキュー８１１、ＮＶＭＷＡＩＴキュー８１２、またはＦＩＮキュー８１３にスレッドが退避される）した場合、または、カーネルレベルスレッドにユーザレベルスレッドが割当てられておらず空いている場合に、ＲＥＤＡＹキュー８１０の先頭にあるスレッド管理セル９００をデキューして実行開始する。当該スレッドは実行中（ＲＵＮ）となる。 FIG. 10 summarizes the various states of the threads described so far as a state transition diagram. When a new thread is created in (1) of FIG. 10, the master control thread 550 enqueues the thread management cell 900 of the new thread into the READY queue 810. In (2) of FIG. 10, the master control thread 550 stops the user level thread that has already been executed when the resource for executing the thread is free (READY queue 810, IOWAIT queue 811, NVMWAIT queue 812). When the thread is saved in the FIN queue 813) or when the user level thread is not allocated to the kernel level thread and is free, the thread management cell 900 at the head of the REDED queue 810 is dequeued. Start running. The thread is executing (RUN).

図１０の（３）では、実行中のスレッドから自発的な資源返上（ｙｉｅｌｄ）があった場合に、マスターコントロールスレッド５５０は、当該スレッドの実行を一時中断し、当該スレッドをＲＥＡＤＹキュー８１０にエンキューする。図１０の（４）では、実行中のスレッドがファイルアクセスなどのためにシステムコールを発行し、システムコールの完了待ちになった場合に、マスターコントロールスレッド５５０は、当該スレッドの実行を一時中断し、当該スレッドをＩＯＷＡＩＴキュー８１１にエンキューする。 In (3) of FIG. 10, when there is a spontaneous resource return (yield) from the executing thread, the master control thread 550 suspends the execution of the thread and enqueues the thread in the READY queue 810. To do. In (4) of FIG. 10, when the executing thread issues a system call for file access or the like and waits for completion of the system call, the master control thread 550 temporarily suspends the execution of the thread. The thread is enqueued in the IOWAIT queue 811.

図１０の（５）では、実行中のスレッドが、ＮＶＭ３２０とメインメモリの間でのＤＭＡ転送の完了待ちになった場合に、マスターコントロールスレッド５５０は、当該スレッドの実行を一時中断し、当該スレッドをＮＶＭＷＡＩＴキュー８１２にエンキューする。 In (5) of FIG. 10, when the executing thread waits for completion of the DMA transfer between the NVM 320 and the main memory, the master control thread 550 suspends the execution of the thread, Is enqueued into the NVMWAIT queue 812.

図１０の（６）では、マスターコントロールスレッド５５０は、（４）でＩＯＷＡＩＴキュー８１１にエンキューされることにより始まったシステムコール完了待ちで、システムコールの完了を検出した場合に、当該スレッドをＩＯＷＡＩＴキュー８１１からＲＥＡＤＹキュー８１０に移行させる。図１０の（７）に関しても、図１０の（６）と同様に、完了待ちに対して実際に完了した場合の動作であり、マスターコントロールスレッド５５０は、（４）でＮＶＭＷＡＩＴキュー８１２にエンキューされることにより始まったＤＭＡ転送完了待ちで、ＤＭＡ転送の完了を検出した場合に、当該スレッドをＮＶＭＷＡＩＴキュー８１２からＲＥＡＤＹキュー８１０に移行させる。（６）および（７）により、システムコール、ないしは、ＤＭＡ転送の完了待ちをしていて、かつ、システムコール、ないしは、ＤＭＡ転送が完了したスレッドは再度実行可能となり、次にスケジュールされて実行開始されるのを待つこととなる。なお、ＤＭＡ転送の完了を検出する方法は後述する。 In (6) of FIG. 10, when the master control thread 550 detects the completion of the system call while waiting for the completion of the system call started by being enqueued in the IOWAIT queue 811 in (4), the master control thread 550 adds the thread to the IOWAIT queue. 811 is transferred to the READY queue 810. Similarly to (6) of FIG. 10, (7) of FIG. 10 is the operation when the completion is actually completed with respect to the waiting for completion, and the master control thread 550 is enqueued in the NVMWAIT queue 812 in (4). When the completion of the DMA transfer is detected while waiting for the completion of the DMA transfer started, the thread is transferred from the NVMWAIT queue 812 to the READY queue 810. According to (6) and (7), the system call or the DMA transfer waiting for the completion of the DMA transfer and the system call or the DMA transfer completed can be executed again, and then the execution is scheduled and started. It will wait to be done. A method for detecting completion of DMA transfer will be described later.

スレッドの実行が完了した場合、または、スレッドの実行を途中で中断した場合には、マスターコントロールスレッド５５０は、完了または中断したスレッドのスレッド管理セル９００をＦＩＮキュー８１３に移行する（図１０の（８）、（９）に対応する）。なお、スレッドの実行中断は、スレッドがＲＥＡＤＹキュー８１０、ＩＯＷＡＩＴキュー８１１、またはＮＶＭＷＡＩＴキュー８１２に入っている場合にも起こり得るので、これらの場合にも、マスターコントロールスレッド５５０は、中断したスレッドのスレッド管理セル９００をＦＩＮキュー８１３に移行する。また、スレッドの実行が完了した場合、または、スレッドの実行を途中で中断した場合に、マスターコントロールスレッド５５０は、完了または中断して処理が終了したスレッドに後述するＰＩＮ領域を割当てていた場合には、割当てを解除、すなわちＰＩＮ領域として割当てていたメモリ空間を解放する。 When the execution of the thread is completed or when the execution of the thread is interrupted, the master control thread 550 shifts the thread management cell 900 of the completed or interrupted thread to the FIN queue 813 ((( 8), corresponding to (9)). Note that the thread execution interruption can also occur when the thread is in the READY queue 810, the IOWAIT queue 811, or the NVMWAIT queue 812. In these cases, the master control thread 550 is also the thread of the suspended thread. The management cell 900 is transferred to the FIN queue 813. Further, when the execution of the thread is completed or when the execution of the thread is interrupted, the master control thread 550 allocates a PIN area to be described later to the thread that has been completed or interrupted and finished processing. Cancels the allocation, that is, releases the memory space allocated as the PIN area.

ＦＩＮキュー８１３に入っているスレッド管理セル９００は、ヒープ領域５１３が不足する場合には、マスターコントロールスレッド５５０により適宜開放される。また、新しいスレッドが生成される（図１０の（１）の動作）際には、ＦＩＮキュー８１３に入っているスレッド管理セル９００から優先的に使われる。それでもスレッド管理セル９００が足りない場合には、マスターコントロールスレッド５５０は、ヒープ領域５１３からスレッド管理セル９００用の領域を割当てる。 The thread management cell 900 in the FIN queue 813 is appropriately released by the master control thread 550 when the heap area 513 is insufficient. Further, when a new thread is generated (operation (1) in FIG. 10), it is preferentially used from the thread management cell 900 in the FIN queue 813. If the thread management cell 900 is still insufficient, the master control thread 550 allocates an area for the thread management cell 900 from the heap area 513.

ＮＶＭ転送要求管理情報５１５は、ＲＥＱキュー８２０、ＷＡＩＴキュー８２１、ＣＯＭＰＬＥＴＥキュー８２２、およびＤＩＳＰＯＳＥキュー８２３から構成される。各キューにエンキューされるエントリは、図１１に示すＮＶＭ転送要求管理セル１１００である。ＮＶＭ転送要求管理セル１１００は、Ｖａｌｉｄフラグ１１０１、要求元スレッドＩＤ１１０２、スレッド管理セルポインタ１１０３、転送方向１１０４、転送状態１１０５、転送元アドレス１１０６、転送データ長１１０７、および転送先アドレス１１０８から構成される。 The NVM transfer request management information 515 includes a REQ queue 820, a WAIT queue 821, a COMPLETE queue 822, and a DISPOSE queue 823. The entry enqueued in each queue is the NVM transfer request management cell 1100 shown in FIG. The NVM transfer request management cell 1100 includes a Valid flag 1101, a request source thread ID 1102, a thread management cell pointer 1103, a transfer direction 1104, a transfer state 1105, a transfer source address 1106, a transfer data length 1107, and a transfer destination address 1108. .

Ｖａｌｉｄフラグ１１０１は、当該ＮＶＭ転送要求管理セル１１００が有効であるかどうかを示すフラグである。要求元スレッドＩＤ１１０２は、当該ＮＶＭ転送要求を発生させたスレッドのスレッドＩＤを格納するものである。スレッド管理セルポインタ１１０３は、当該ＮＶＭ転送要求を発生させたスレッドを管理するスレッド管理セル９００へのポインタである。つまり、このポインタを辿って得られたスレッド管理セル９００に格納されているスレッドＩＤ９０２は、要求元スレッドＩＤ１１０２と同一である。 The Valid flag 1101 is a flag indicating whether or not the NVM transfer request management cell 1100 is valid. The request source thread ID 1102 stores the thread ID of the thread that generated the NVM transfer request. The thread management cell pointer 1103 is a pointer to the thread management cell 900 that manages the thread that has generated the NVM transfer request. That is, the thread ID 902 stored in the thread management cell 900 obtained by tracing this pointer is the same as the request source thread ID 1102.

転送方向１１０４は、ＮＶＭ転送の方向を指定するための情報で、ロード、または、ストアを指定する。ロードはＣＰＵに近い側、つまりメインメモリにＮＶＭ３２０からデータを読み出す方向である（もしくは、ＮＶＭ３２０に格納されているデータをメインメモリに書込むとも言える）。ストアは、ＣＰＵに遠い側、つまりＮＶＭ３２０にメインメモリからデータを読み出す方向である（もしくは、メインメモリに格納されているデータをＮＶＭ３２０に書込むとも言える）。転送状態１１０５は、ＮＶＭ転送が現在どのような状態にあるかを示すものである。転送の状態に関しての詳細は後述する。 The transfer direction 1104 is information for specifying the direction of NVM transfer, and specifies load or store. The load is the direction close to the CPU, that is, the direction in which data is read from the NVM 320 to the main memory (or it can be said that the data stored in the NVM 320 is written to the main memory). The store is the direction far from the CPU, that is, the direction in which data is read from the main memory to the NVM 320 (or the data stored in the main memory can also be written to the NVM 320). The transfer state 1105 indicates what state the NVM transfer is currently in. Details regarding the transfer status will be described later.

転送元アドレス１１０６は、ＮＶＭ転送で行うＤＭＡ転送の転送元となるアドレスである。転送方向１１０４がロード（ＮＶＭ３２０からメインメモリに転送）である場合、転送元アドレス１１０６はＮＶＭ３２０で用いられる識別子となる。この識別子には、メインメモリのアドレス空間（物理アドレス空間、仮想アドレス空間）と異なる、ＮＶＭ３２０専用のアドレス空間を用いて良い。一般的に、メインメモリのアドレス空間は、その時代でコンピュータに搭載可能と想定されるＤＲＡＭの量で制約を受けており、例えば今日の６４ビットアーキテクチャのプロセッサであっても、コスト面から現実的に利用可能なＤＲＡＭの量を考慮して、４８ビット程度の空間しか実装していないことが多い。そのため、大容量のＮＶＭをメインメモリのアドレス空間にマッピングすることは難しい。転送方向１１０４がストア（メインメモリからＮＶＭ３２０に転送）である場合、転送元アドレス１１０６は、メインメモリのアドレス空間であり、特に後述する理由から物理アドレス空間上のアドレスで指定される。 The transfer source address 1106 is an address serving as a transfer source of DMA transfer performed by NVM transfer. When the transfer direction 1104 is load (transfer from the NVM 320 to the main memory), the transfer source address 1106 is an identifier used in the NVM 320. For this identifier, an address space dedicated to the NVM 320, which is different from the address space (physical address space, virtual address space) of the main memory, may be used. In general, the address space of the main memory is limited by the amount of DRAM assumed to be mountable in a computer at that time. For example, even a processor with a 64-bit architecture today is realistic in terms of cost. In consideration of the amount of DRAM that can be used, only a space of about 48 bits is often mounted. For this reason, it is difficult to map a large-capacity NVM to the address space of the main memory. When the transfer direction 1104 is store (transfer from the main memory to the NVM 320), the transfer source address 1106 is the address space of the main memory, and is specified by an address in the physical address space, particularly for the reason described later.

転送データ長１１０７は、ＮＶＭ転送で行うＤＭＡ転送の転送長を指定する。転送先アドレス１１０８は、ＮＶＭ転送で行うＤＭＡ転送の転送先となるアドレスである。転送元アドレス１１０６と同様に、転送方向１１０４に応じて、ＮＶＭ３２０で用いている識別子を指定するか、物理アドレスを指定するかのどちらかで利用される。 The transfer data length 1107 designates the transfer length of DMA transfer performed by NVM transfer. The transfer destination address 1108 is an address that is a transfer destination of a DMA transfer performed by NVM transfer. Similar to the transfer source address 1106, an identifier used in the NVM 320 or a physical address is specified according to the transfer direction 1104.

ＮＶＭ転送要求の状態遷移に関して、図１２の状態遷移図と、図８のキュー構成を対比させながら説明する。 The state transition of the NVM transfer request will be described while comparing the state transition diagram of FIG. 12 with the queue configuration of FIG.

まず、図１２の（１）スレッドからのアクセス要求、および（２）ＮＮＭサブシステムへのＤＭＡ転送要求について説明する。スレッドがメインメモリとＮＶＭ３２０との間でのＤＭＡ転送を必要とする場合、マスターコントロールスレッド５５０は、ＮＶＭ転送要求管理セル１１００を生成して、生成したセル１１００をＲＥＱキュー８２０にエンキューする。ＤＭＡ転送を開始するためには、ＭＭＲ３１１にＤＭＡ転送の開始を要求するコマンドを書込まなければならない。ここで、ＮＶＭサブシステム３００にＭＭＲ３１１を複数セット用意して、さらなる処理の高速化のために複数のＤＭＡ転送を同時並行的に進める多重ＤＭＡ転送に対応させることもできる。マスターコントロールスレッド５５０は、ＮＶＭサブシステム３００のＤＭＡ転送に空きがある場合に、ＲＥＱキュー８２０からＮＶＭ転送要求管理セル１１００をデキューして、ＭＭＲ３１１にコマンドを書込むことで、ＤＭＡ転送を開始する。そして、マスターコントロールスレッド５５０は、ＤＭＡ転送中となるＮＶＭ転送要求管理セル１１００をＷＡＩＴキュー８２１にエンキューして、ＤＭＡ転送の完了待ちを行う。 First, (1) an access request from a thread and (2) a DMA transfer request to the NNM subsystem in FIG. 12 will be described. When the thread requires a DMA transfer between the main memory and the NVM 320, the master control thread 550 generates the NVM transfer request management cell 1100 and enqueues the generated cell 1100 into the REQ queue 820. In order to start the DMA transfer, a command for requesting the start of the DMA transfer must be written in the MMR 311. Here, a plurality of sets of MMRs 311 may be prepared in the NVM subsystem 300 to support multiple DMA transfers in which a plurality of DMA transfers are advanced simultaneously in order to further increase the processing speed. The master control thread 550 starts the DMA transfer by dequeuing the NVM transfer request management cell 1100 from the REQ queue 820 and writing a command to the MMR 311 when there is an empty DMA transfer in the NVM subsystem 300. Then, the master control thread 550 enqueues the NVM transfer request management cell 1100 that is undergoing DMA transfer into the WAIT queue 821, and waits for completion of DMA transfer.

次に、図１２の（３）ＮＶＭシステムからのＤＭＡ転送完了通知、（４）スレッドへのアクセス完了通知、および（５）ＮＶＭアクセス管理セルの解放・再利用について説明する。ＤＭＡ転送の完了は、ＮＶＭサブシステム３００からプロセッサ２１０、２２０への割込みで通知される方式を取ることが可能である。また、マスターコントロールスレッド５５０が、ＭＭＲ３１１をポーリングすることでＤＭＡ転送の完了を知ることも出来る。但し、割込みを用いる場合には、割込みを受けるのはオペレーティングシステムが有する割込みハンドラであり、カーネル空間への切替えが必要となり、オーバヘッドが大きい。そこで、本実施例ではＭＭＲ３１１にＤＭＡ転送が完了したことを示すフラグを設けて、このフラグをマスターコントロールスレッド５５０がポーリングする。また、特に複数のＤＭＡ転送を同時並行的に行う場合には、複数のＤＭＡ転送の完了を個別に検出することすら、オーバヘッドの要因となる。そこで、本実施例の情報処理システム１００では、マスターコントロールスレッド５００は、ユーザレベルスレッドやスケジューリングに支障の無い程度の間隔、換言すればマスターコントロールスレッド５００において他に行うべき処理が無い場合にＭＭＲ３１１をポーリングし、複数のＤＭＡ転送の完了を一度のポーリングで検出する。 Next, (3) DMA transfer completion notification from the NVM system, (4) thread access completion notification, and (5) NVM access management cell release / reuse in FIG. 12 will be described. The completion of the DMA transfer can be notified by an interrupt from the NVM subsystem 300 to the processors 210 and 220. The master control thread 550 can also know the completion of the DMA transfer by polling the MMR 311. However, when an interrupt is used, the interrupt handler of the operating system receives the interrupt, which requires switching to the kernel space and has a large overhead. Therefore, in this embodiment, a flag indicating that the DMA transfer is completed is provided in the MMR 311 and the master control thread 550 polls this flag. In particular, when a plurality of DMA transfers are performed in parallel, even the completion of the plurality of DMA transfers is individually detected as a factor of overhead. Therefore, in the information processing system 100 of the present embodiment, the master control thread 500 sets the MMR 311 when there is no other processing to be performed in the master control thread 500, that is, at an interval that does not interfere with the user level thread or scheduling. Polling is performed, and completion of a plurality of DMA transfers is detected by one polling.

ＤＭＡ転送の完了が検出されたＮＶＭ転送要求管理セル１１００は、マスターコントロールスレッド５５０によって、ＷＡＩＴキュー８２１からデキューされて、ＣＯＭＰＬＥＴＥキュー８２２にエンキューされる。なお、ＤＭＡ転送の完了順序はＷＡＩＴキュー８２１にエンキューされている順序とは限らず、ＷＡＩＴキュー８２１へのアクセスはＦＩＦＯになるとは限らない。 The NVM transfer request management cell 1100 in which the completion of the DMA transfer is detected is dequeued from the WAIT queue 821 by the master control thread 550 and enqueued in the COMPLETE queue 822. Note that the DMA transfer completion order is not necessarily the order enqueued in the WAIT queue 821, and access to the WAIT queue 821 is not necessarily a FIFO.

ＣＯＭＰＬＥＴＥキュー８２２にエンキューされたＮＶＭ転送要求管理セル１１００は、ＤＭＡ転送を管理するという観点から言えば、ＤＭＡ転送は既に完了しているのでもう不要な情報である。しかし、本実施例の情報処理システム１００では、このＣＯＭＰＬＥＴＥキュー８２２に格納されているＮＶＭ転送要求管理セル１１００を元にして、ＮＶＭＷＡＩＴキュー８１２に格納されているスレッド管理セル９００を、ＮＶＭＷＡＩＴキュー８１２からＲＥＡＤＹキュー８１０に移行させることを特徴としている。つまり、ＤＭＡ転送の完了状況に応じて、当該ＤＭＡ転送要求を発生させる要因となったスレッドを、ＤＭＡ転送完了待ちの状態から実行可能な状態に状態遷移させることが、本実施例におけるＤＭＡ転送とマルチスレッディングの連携という特徴的な動作となる。 From the viewpoint of managing the DMA transfer, the NVM transfer request management cell 1100 enqueued in the COMPLETE queue 822 is unnecessary information because the DMA transfer has already been completed. However, in the information processing system 100 according to this embodiment, the thread management cell 900 stored in the NVMWAIT queue 812 is transferred from the NVMWAIT queue 812 based on the NVM transfer request management cell 1100 stored in the COMPLETE queue 822. It is characterized by shifting to the READY queue 810. In other words, depending on the completion status of the DMA transfer, changing the state of the thread that has generated the DMA transfer request from the DMA transfer completion waiting state to the executable state is the DMA transfer in this embodiment. This is a characteristic operation of multithreading linkage.

マスターコントロールスレッド５５０は、定期的にＣＯＭＰＬＥＴＥキュー８２２を監視し、ＣＯＭＰＬＥＴＥキュー８２２にＮＶＭ転送要求管理セル１１００が存在する場合にはそれをデキューして、要求元スレッドＩＤ１１０２をキーとして、ＮＶＭＷＡＩＴキュー８１２の中から対応するスレッド管理セル９００を探す。マスターコントロールスレッド５５０は、対応するスレッド管理セル９００を見つけたら、該スレッド管理セル９００をＲＥＡＤＹキュー８１０にエンキューすると共に、ＮＶＭ転送要求管理セル１１００をＤＩＳＰＯＳＥキュー８２３にエンキューする。なお、ＤＩＳＰＯＳＥキュー８２３は、ＦＩＮキュー８１３と同様に、使用済みのＮＶＭ転送要求管理セル１１００の再利用を行うためのものである。 The master control thread 550 periodically monitors the COMPLETE queue 822. If the NVM transfer request management cell 1100 is present in the COMPLETE queue 822, the master control thread 550 dequeues it, and uses the request source thread ID 1102 as a key, in the NVMWAIT queue 812. The corresponding thread management cell 900 is searched from among them. When the master control thread 550 finds the corresponding thread management cell 900, the master control thread 550 enqueues the thread management cell 900 into the READY queue 810 and enqueues the NVM transfer request management cell 1100 into the DISPOSE queue 823. The DISPOSE queue 823 is used for reusing the used NVM transfer request management cell 1100, similarly to the FIN queue 813.

図１３は、これまでに説明してきたユーザレベルマルチスレッディングとＤＭＡ転送の連携で、プロセス４１０内のスレッドがどのように起動・停止するかを示す説明図である。以下に、図１３に沿ってスレッドのライフサイクルと、ＤＭＡ転送との連携を説明する。 FIG. 13 is an explanatory diagram showing how threads in the process 410 are started and stopped by cooperation between user level multithreading and DMA transfer described so far. Hereinafter, the cooperation between the thread life cycle and the DMA transfer will be described with reference to FIG.

プロセス４１０は、カーネルレベルスレッド１−１〜１−５をオペレーティングシステムから割当てられている。プロセス４１０は、カーネルレベルスレッド１−１〜１−５に任意のスレッドを割当てて実行することができる。ここで、本実施例の情報処理システム１００では、カーネルレベルスレッド１−１上にマスターコントロールスレッド５５０が固定的に割当てられる。プロセス４１０は、起動すると、マスターコントロールスレッド５５０をカーネルレベルスレッド１−１で実行する。 The process 410 is assigned kernel level threads 1-1 to 1-5 from the operating system. The process 410 can be executed by assigning an arbitrary thread to the kernel level threads 1-1 to 1-5. Here, in the information processing system 100 of the present embodiment, the master control thread 550 is fixedly allocated on the kernel level thread 1-1. When the process 410 is started, the master control thread 550 is executed by the kernel level thread 1-1.

例えば、大規模なグラフ処理を行う場合、マスターコントロールスレッド５５０はグラフの頂点毎に対応するスレッドを立てて処理を行う。図１３の例では、ユーザレベルスレッドＡ、Ｂ、Ｃ、Ｄ、Ｅ、Ｆ、Ｇがそれに対応し、マスターコントロールスレッド５５０は、まずユーザレベルスレッドＡ、Ｄ、Ｆ、Ｇをそれぞれカーネルレベルスレッド１−２〜１−５で起動する。 For example, when large-scale graph processing is performed, the master control thread 550 performs processing by setting up a thread corresponding to each vertex of the graph. In the example of FIG. 13, user level threads A, B, C, D, E, F, and G correspond to it, and the master control thread 550 first assigns user level threads A, D, F, and G to the kernel level thread 1 respectively. Start with -2 to 1-5.

スレッドの実行状況を時系列に見て行くと、（１）においてユーザレベルスレッドＡがマスターコントロールスレッド５５０に対して、ＮＶＭ３２０からのＤＭＡ転送を要求し、また該ＤＭＡ転送が終わらないと他に行える処理が無いことを申告している。マスターコントロールスレッド５５０は、対応するＮＶＭ転送要求管理セル１１００を生成してＲＥＱキュー８２０にエンキューすると共に、ユーザレベルスレッドＡの実行を一時停止し、そのコンテキストをスレッド管理セル９００に退避して、ＮＶＭＷＡＩＴキュー８１２にエンキューする。これにより、カーネルレベルスレッド１−２には実行すべきスレッドが無くなるため、マスターコントロールスレッド５５０は、ＲＥＡＤＹキュー８１０からスレッド管理セル９００を１個デキューして、そのスレッドをカーネルレベルスレッド１−２で実行する（図１３では、マスターコントロールスレッド５５０は、ユーザレベルスレッドＢの実行を開始している）。 Looking at the execution status of the threads in time series, in (1), the user level thread A requests the master control thread 550 to perform a DMA transfer from the NVM 320, and if the DMA transfer is not completed, other processing can be performed. Declaring that there is no processing. The master control thread 550 generates the corresponding NVM transfer request management cell 1100 and enqueues it in the REQ queue 820, suspends execution of the user level thread A, saves its context to the thread management cell 900, and then sets the NVMWAIT Enqueue to the queue 812. As a result, there is no thread to be executed in the kernel level thread 1-2. Therefore, the master control thread 550 dequeues one thread management cell 900 from the READY queue 810, and the thread is designated as the kernel level thread 1-2. (In FIG. 13, the master control thread 550 has started execution of the user level thread B).

次に、カーネルレベルスレッド１−３上で実行されているユーザレベルスレッドＤが（２）で、（１）と同様にマスターコントロールスレッド５５０にＤＭＡ転送の要求を出している。しかし、（２）の場合には、ユーザレベルスレッドＤはマスターコントロールスレッド５５０に対してＤＭＡ転送の完了を待たずとも他に行える処理があることを申告しているため、マスターコントロールスレッド５５０は、ユーザレベルスレッドＤを一時停止させることなく、実行を続行させている。例えば、ユーザレベルスレッドで必要となるデータを予めＤＭＡ転送しておき、その後別の処理を行うような場合がこのケースにあたる。つまり、ＤＭＡ転送でＮＶＭ３２０上のデータをプリフェッチしているような動作になる。 Next, the user level thread D executed on the kernel level thread 1-3 is (2), and a DMA transfer request is issued to the master control thread 550 as in (1). However, in the case of (2), since the user level thread D reports to the master control thread 550 that there is other processing that can be performed without waiting for the completion of the DMA transfer, the master control thread 550 Execution is continued without pausing the user level thread D. For example, this is the case where data required by the user level thread is DMA-transferred in advance and then another process is performed. That is, the operation is such that data on the NVM 320 is prefetched by DMA transfer.

その後、ユーザレベルスレッドＢ、Ｄはそれぞれ処理が一旦停止され、ＲＥＤＡＹキュー８１０に積まれていた次のスレッドの実行が開始される。図１３の例では、ユーザレベルスレッドＢは、ユーザレベルスレッドＣの実行が完了した後に再度実行予定とされ、ユーザレベルスレッドＤは、先に要求したＤＭＡ転送の完了待ちに入る。続いて図１３の例では、ユーザレベルスレッドＣ、Ｅの実行が開始されている。ユーザレベルスレッドＣの実行中に（３）で（１）で要求したＤＭＡ転送に対する完了が通知されたので、ユーザレベルスレッドＡがＲＥＡＤＹキュー８１０にエンキューされる。このユーザレベルスレッドＡは、ユーザレベルスレッドＣの実行が終了（ないしは一時停止）した後にスケジュールされて、カーネルスレッド１−２で動作を開始している。 Thereafter, the processing of each of the user level threads B and D is temporarily stopped, and the execution of the next thread loaded in the REDEY queue 810 is started. In the example of FIG. 13, the user level thread B is scheduled to be executed again after the execution of the user level thread C is completed, and the user level thread D waits for completion of the previously requested DMA transfer. Subsequently, in the example of FIG. 13, execution of user level threads C and E is started. Since the completion of the DMA transfer requested in (1) is notified in (3) during execution of the user level thread C, the user level thread A is enqueued in the READY queue 810. The user level thread A is scheduled after the execution of the user level thread C is finished (or suspended), and starts operating with the kernel thread 1-2.

カーネルレベルスレッド１−４では、（４）でユーザレベルスレッドＦが自主的に資源を返上して停止する。その後、カーネルレベルスレッド１−４では処理すべきスレッドが無い空白の時間がしばらく発生し、その後、ユーザレベルスレッドＢが再びスケジュールされる。前述したように、ユーザレベルスレッドＢはユーザレベルスレッドＣの完了後に再スケジュールされる予定だったが、元々利用していたカーネルスレッド１−１をユーザレベルスレッドＡに取られてしまったので、カーネルスレッド１−４上で動作する。プロセス内は各スレッド間で仮想アドレス空間が共有されるので、コンピュータの構成におけるＳＭＰのように各カーネルスレッド１−１〜１−５は対等なものとして利用可能である。 In the kernel level thread 1-4, the user level thread F voluntarily returns resources and stops in (4). Thereafter, a blank time in which there is no thread to be processed occurs for a while in the kernel level thread 1-4, and then the user level thread B is scheduled again. As described above, the user level thread B was scheduled to be rescheduled after the completion of the user level thread C. However, since the kernel thread 1-1 originally used is taken by the user level thread A, the kernel Operates on threads 1-4. Since the virtual address space is shared among the threads in the process, the kernel threads 1-1 to 1-5 can be used as equivalents like SMP in the computer configuration.

図１４は、本実施例におけるスレッドコンテキスト（スレッド管理情報５１４）と、データのＤＭＡ転送の関係を示した概念図である。スレッドが処理すべきデータはＤＭＡ転送でＮＶＭ３２０からＰＩＮ領域プール５１６を介してメインメモリに転送され、スレッドを実行するために必要なコンテキストは、退避されていたメインメモリから各ハードウェアスレッド（カーネルレベルスレッドはハードウェアスレッドに一対一対応している）にロードされる。 FIG. 14 is a conceptual diagram showing the relationship between the thread context (thread management information 514) and the DMA transfer of data in this embodiment. Data to be processed by the thread is transferred to the main memory from the NVM 320 via the PIN area pool 516 by DMA transfer, and the context necessary for executing the thread is transferred from the saved main memory to each hardware thread (kernel level). Threads are loaded one-to-one with hardware threads).

図１５は、本実施例におけるマスターコントロールスレッド５５０のスケジューリング動作を示すフローチャートである。ステップＳ１５０１で、マスターコントロールスレッド５５０は、スケジューリングを開始する。スケジューリングを開始する契機はカーネルレベルスレッドに空きが検出された時である。次に、ステップＳ１５０２で、マスターコントロールスレッド５５０は、ＲＥＡＤＹキュー８１０からスレッド管理セル９００をデキュー可能かどうか判定する。デキューできなければその時点でスケジューリングは終了し、マスターコントロールスレッド５５０は、次にスケジューリング可能となる契機を待つ。デキューできる場合には、ステップＳ１５０３で、マスターコントロールスレッド５５０は、スレッド管理セル９００をＲＥＡＤＹキューからデキューする。 FIG. 15 is a flowchart showing the scheduling operation of the master control thread 550 in this embodiment. In step S1501, the master control thread 550 starts scheduling. The trigger for starting the scheduling is when a free space is detected in the kernel level thread. Next, in step S1502, the master control thread 550 determines whether the thread management cell 900 can be dequeued from the READY queue 810. If it cannot be dequeued, scheduling ends at that point, and the master control thread 550 waits for the next opportunity for scheduling. If it can be dequeued, in step S1503, the master control thread 550 dequeues the thread management cell 900 from the READY queue.

その後、ステップＳ１５０４で、マスターコントロールスレッド５５０は、ＰＩＮ領域プール５１６にＰＩＮ領域割当てが可能かどうかを判定する。本実施例の情報処理システム１００では、図１５に示すスレッドのスケジューリングにおいて、ＰＩＮ領域の割当ての可否を判定することを特徴としている。 Thereafter, in step S1504, the master control thread 550 determines whether the PIN area can be allocated to the PIN area pool 516. The information processing system 100 according to the present embodiment is characterized by determining whether or not a PIN area can be allocated in the thread scheduling shown in FIG.

本実施例の情報処理システム１００では、スレッド毎にＤＭＡ転送を行う。ＤＭＡ転送はプロセッサやオペレーティングシステムの介在なしに行われるため、基本的には物理アドレス空間内で転送が行われることになる。すなわち、仮想アドレス空間はオペレーティングシステムの管理下であるため、ＤＭＡ転送からは仮想アドレス空間を用いた転送が出来ない。そこで、ＤＭＡ転送を行う場合には、転送対象となるメインメモリ上の領域が物理的なメモリ上にあり、物理アドレス空間上に配置されていることが必須となる。そのため、ＤＭＡ転送を行うプロセスは、プロセスの仮想アドレス空間中の一部の領域を物理アドレス空間のＤＲＡＭ領域に固定的に割当て、仮想記憶によるページアウト・ページインを生じさせないようにする。この物理的な存在があるメモリに固定的に割当てられた領域が、ＰＩＮ領域であり、本実施例ではプロセスに対してＰＩＮ領域プール５１６として用意される。なお、ＰＩＮ領域５１６の大きさは、予めユーザが設定できる。 In the information processing system 100 of this embodiment, DMA transfer is performed for each thread. Since DMA transfer is performed without the intervention of a processor or operating system, the transfer is basically performed within the physical address space. That is, since the virtual address space is managed by the operating system, transfer using the virtual address space cannot be performed from the DMA transfer. Therefore, when performing DMA transfer, it is essential that the area on the main memory to be transferred is on the physical memory and is arranged in the physical address space. Therefore, a process that performs DMA transfer allocates a part of the virtual address space of the process to the DRAM area of the physical address space in a fixed manner so as not to cause page-out and page-in due to virtual storage. The area fixedly allocated to the memory having the physical presence is the PIN area, and is prepared as a PIN area pool 516 for the process in this embodiment. Note that the user can set the size of the PIN area 516 in advance.

ここで、プロセスに用意したＰＩＮ領域プール５１６をスレッド毎に均等に分割して利用するという方法もあり得るが、これはスレッドの必要とするデータ量が予め分かっていて、かつ、各スレッドの要求するＰＩＮ領域の大きさが均等でないと上手く利用できない。 Here, there is a method in which the PIN area pool 516 prepared for the process is equally divided for each thread and used, but this is because the amount of data required by the thread is known in advance and the request of each thread If the size of the PIN area to be performed is not uniform, it cannot be used successfully.

そこで、本実施例の情報処理システム１００では、プロセスにＰＩＮ領域プール５１６を用意し、プロセスが持つＰＩＮ領域をリソースプールとして管理する。その上で、スレッドが起動時に自スレッドで必要とするＰＩＮ領域の大きさをスレッド管理セルのバッファ要求フラグ９０７、及び、バッファ要求サイズ９０８でマスターコントロールスレッド５５０に申告し、マスターコントロールスレッド５５０からＰＩＮ領域の割当てを受けて該スレッドが実行に移る。 Therefore, in the information processing system 100 according to the present embodiment, a PIN area pool 516 is prepared for a process, and the PIN area of the process is managed as a resource pool. After that, the size of the PIN area required by the thread at the time of activation of the thread is reported to the master control thread 550 using the buffer request flag 907 and buffer request size 908 of the thread management cell, and the master control thread 550 makes a PIN. In response to the area allocation, the thread starts execution.

そのため、ステップＳ１５０４では、現在のＰＩＮ領域プール５１６の残量と、起動しようとしているスレッドが要求している領域の量とを比較して、スレッドの起動可否を判定する。実行中のスレッドがＰＩＮ領域プール５１６の割当てを得ていれば、残量はその分少ない。ＰＩＮ領域プール５１６の残量が不足すると判定された場合には、ステップＳ１５０６で、マスターコントロールスレッド５５０は、当該スレッドをＲＥＡＤＹキュー８１０の末尾に積み、他のスレッドを優先して実行に移す。一方、ＰＩＮ領域プール５１６から必要な領域を確保できる場合には、ステップＳ１５０５で、マスターコントロールスレッド５５０は、申告をしてきたスレッドにＰＩＮ領域を割当てて、スレッドの切替えを行う。マスターコントロールスレッド５５０は、ＰＩＮ領域の割当てについて、スレッド管理セル９００のバッファ割当てフラグ９０９とバッファ領域先頭アドレス９１０に書込むことで、申告をしてきたスレッドに割当てた領域を通知する。 Therefore, in step S1504, the current remaining amount of the PIN area pool 516 is compared with the amount of area requested by the thread to be activated to determine whether the thread can be activated. If the executing thread has been assigned the PIN area pool 516, the remaining amount is small. If it is determined that the remaining amount of the PIN area pool 516 is insufficient, in step S1506, the master control thread 550 stacks the thread at the end of the READY queue 810 and prioritizes execution of other threads. On the other hand, if the necessary area can be secured from the PIN area pool 516, in step S1505, the master control thread 550 assigns the PIN area to the thread that has made the declaration and switches the thread. The master control thread 550 writes the PIN area allocation to the buffer allocation flag 909 and the buffer area head address 910 of the thread management cell 900 to notify the allocated area to the thread that has made the declaration.

このように、実行予定のスレッドがメインメモリの物理アドレス空間上でＤＭＡ転送に必要とする容量に応じてメモリ空間を確保することで、各スレッドの要求するＰＩＮ領域の大きさのばらつきに対して柔軟に対応することが可能となる。ひいては、効率的なＤＭＡ転送を実現でき、情報処理システム１００の処理を高速化できる。 In this way, by securing the memory space according to the capacity required for DMA transfer in the physical address space of the main memory by the thread to be executed, it is possible to cope with variations in the size of the PIN area requested by each thread. It becomes possible to respond flexibly. As a result, efficient DMA transfer can be realized, and the processing of the information processing system 100 can be speeded up.

本実施例では、さらに大規模なグラフ処理を行うなどの状況で、スレッド管理セル９００がメインメモリに収まらないほど大量のスレッドであっても実行することのできる情報処理システムの例を説明する。 In the present embodiment, an example of an information processing system that can be executed even when the thread management cell 900 does not fit in the main memory in a situation where a larger scale graph processing is performed will be described.

大規模なグラフ処理などで頂点の数が莫大になると、スレッドの数も莫大になる。その時、全てのスレッドのスレッド管理セル９００をメインメモリ中に保持しておくことすら不可能になる。そこで、本実施例では、図１６に示すようにスレッド管理セル９００の実体をＮＶＭ３２０に配置し、必要に応じてメインメモリにＤＭＡ転送して利用する。さらに、マスターコントロールスレッド５５０がＮＶＭ３２０上のスレッド管理セル９００を、それが必要になる以前に予めＤＭＡ転送しておく。すなわち、スレッド管理セル９００に対するプリフェッチが行われる。図１６に示したように、スレッドコンテキストは、メインメモリとＮＶＭとの間でＤＭＡ転送され、退避または呼び出される。 When the number of vertices becomes enormous due to large-scale graph processing, the number of threads also becomes enormous. At that time, it becomes impossible to hold the thread management cells 900 of all the threads in the main memory. Therefore, in this embodiment, as shown in FIG. 16, the entity of the thread management cell 900 is arranged in the NVM 320, and is DMA-transferred to the main memory and used as necessary. Further, the master control thread 550 DMA-transfers the thread management cell 900 on the NVM 320 in advance before it becomes necessary. That is, prefetch for the thread management cell 900 is performed. As shown in FIG. 16, the thread context is DMA-transferred between the main memory and the NVM, and saved or called.

スレッド管理セル９００は基本的にＲＥＡＤＹキュー８１０にエンキューされている順番で必要となるので、ＲＥＤＡＹキュー８１０を監視して、その順序でプリフェッチを行えば良い。また、ＲＥＤＡＹキュー８１０の先頭アドレスを保持し、プロセッサと並行してＲＥＡＤＹキュー８１０を読み出してプリフェッチを行うＤＭＡコントローラを別途用意しても良い。 Since the thread management cell 900 is basically required in the order enqueued in the READY queue 810, the READY queue 810 may be monitored and prefetched in that order. In addition, a DMA controller that holds the head address of the READY queue 810, reads the READY queue 810 in parallel with the processor, and performs prefetching may be separately prepared.

以上、本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は前記実施の形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能であることはいうまでもない。 As mentioned above, the invention made by the present inventor has been specifically described based on the embodiment. However, the present invention is not limited to the embodiment, and various modifications can be made without departing from the scope of the invention. Needless to say.

１００：情報処理システム、１１０：ノード、１２０：ノード間ネットワーク、１３０：ＮＶＭサブシステムインターコネクト、２１０、２２０：プロセッサ、２３０、２４０：ＤＩＭＭ、２５０：Ｉ／Ｏハブ、２６０：ＮＩＣ、２７０：ディスクコントローラ、２８０：ＨＤＤ、２９０：ＳＳＤ、３００：ＮＶＭサブシステム、３１０：ハイブリッドメモリコントローラ、３２０：ＮＶＭ（Ｎｏｎ−ｖｏｌａｔｉｌｅＭｅｍｏｒｙ、不揮発メモリ）、３３０：揮発性メモリ、３１１：ＭＭＲ（ＭｅｍｏｒｙＭａｐｐｅｄＲｅｇｉｓｔｅｒ）、４１０、４２０：プロセス、５１０：スレッド間共有リソース、５１１：プログラムコード、５１２：グローバル変数、５１３：ヒープ領域、５１４：スレッド管理情報、５１５：ＮＶＭ転送要求管理情報、５１６：ＰＩＮ領域プール。 100: Information processing system, 110: Node, 120: Inter-node network, 130: NVM subsystem interconnect, 210, 220: Processor, 230, 240: DIMM, 250: I / O hub, 260: NIC, 270: Disk controller 280: HDD, 290: SSD, 300: NVM subsystem, 310: Hybrid memory controller, 320: NVM (Non-volatile Memory, non-volatile memory), 330: Volatile memory, 311: MMR (Memory Mapped Register), 410 420: Process 510: Shared resource between threads 511: Program code 512: Global variable 513: Heap area 514: Thread management information 515: NVM transfer required Management information, 516: PIN area pool.

Claims

A multi-thread processor;
A first storage device;
A second storage device for performing a DMA transfer with the first storage device;
An operating system that allocates a physical address space to the first storage device and provides a virtual address space on the physical address space;
The thread to be executed secures a memory space according to the capacity required for the DMA transfer on the physical address space,
Move the thread to execution,
An information processing apparatus that releases a memory space secured after processing of the thread is completed.

The information processing apparatus according to claim 1,
An area of a physical address space used for the DMA transfer is preset,
An area excluding the area reserved for the executing thread in the preset area.
An information processing apparatus that secures the memory space.

The information processing apparatus according to claim 1,
An information processing apparatus characterized by temporarily interrupting a thread that has been transferred to execution and is waiting for completion of DMA transfer.

The information processing apparatus according to claim 3.
An information processing apparatus that stores a context of a suspended thread in the first storage device.

The information processing apparatus according to claim 3.
An information processing apparatus that stores a context of a suspended thread in the second storage device.

The information processing apparatus according to claim 1,
An information processing apparatus, wherein the first storage device is a main storage device.

The information processing apparatus according to claim 1,
The first storage device comprises a volatile memory;
The information processing apparatus, wherein the second storage device includes a nonvolatile memory.

The information processing apparatus according to claim 7,
An information processing apparatus comprising a hard disk drive.

The information processing apparatus according to claim 1,
The multi-thread processor has a plurality of cores.

The information processing apparatus according to claim 1,
The volatile memory is DRAM;
An information processing apparatus, wherein the nonvolatile memory is a flash memory.

The information processing apparatus according to claim 1,
The volatile memory is DRAM;
An information processing apparatus, wherein the nonvolatile memory is a phase change memory.