JP7560765B2

JP7560765B2 - Calculation method, calculation device and program

Info

Publication number: JP7560765B2
Application number: JP2022564862A
Authority: JP
Inventors: 崇元佐々木; 隆一谷田; 英明木全
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc USA
Current assignee: NTT Inc; NTT Inc USA
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2024-10-03
Anticipated expiration: 2040-11-25
Also published as: WO2022113180A1; JPWO2022113180A1

Description

本発明は、計算方法、計算装置およびプログラムの技術に関する。 The present invention relates to a calculation method, a calculation device and a program technology.

ネットワークや地理情報学、画像工学などの諸分野において、埋め込みグラフというデータ構造が存在する。埋め込みグラフとは、複数の頂点とその頂点同士を結ぶ辺の集合であるグラフと、その頂点位置を合わせて定義したものである。例えばネットワークでは、ネットワークノードを頂点、ネットワークリンクを辺に対応させてグラフを構成できる。このとき、ネットワークノードの物理的な位置や、ネットワークを可視化する際のノード座標を頂点位置に対応させると、埋め込みグラフを見出すことができる。 A data structure called an embedded graph exists in various fields such as networks, geographic information science, and image engineering. An embedded graph is defined by combining a graph, which is a set of multiple vertices and the edges connecting those vertices, with the positions of those vertices. For example, in a network, a graph can be constructed by corresponding network nodes to vertices and network links to edges. In this case, an embedded graph can be found by matching the physical positions of the network nodes or the node coordinates when visualizing the network to the vertex positions.

また、地理情報学であれば、国境や県境、海岸線、湖岸線等の境界線データは一般に、埋め込みグラフとしてデータが保持されている。境界線上に密に頂点をプロットして辺を結ぶことで、境界線を表現している。また、画像工学分野であれば、形状や領域に関するデータの保持に埋め込みグラフが使用される。例えばベクター画像におけるオブジェクト形状、深度マップにおける深度境界線、自然画像の被写体形状やテクスチャ領域形状等のデータは、埋め込みグラフとして表現される。形状や領域の境界上に頂点をプロットすることで、境界線を辺により表現している。In geographic information science, boundary data such as national and prefectural borders, coastlines, and lake shorelines are generally stored as embedded graphs. Boundaries are represented by densely plotting vertices on the boundaries and connecting the edges. In the field of image engineering, embedded graphs are used to store data related to shapes and areas. For example, data such as object shapes in vector images, depth boundaries in depth maps, subject shapes and texture area shapes in natural images are represented as embedded graphs. Boundaries are represented by edges by plotting vertices on the boundaries of shapes and areas.

以上で例示した埋め込みグラフにおいては、埋め込みグラフの形状をできるだけ保ちつつ、頂点数や辺数を可能な限り削減して単純化すること（以下、埋め込みグラフの単純化、あるいは単に単純化と呼ぶ）に大きなメリットがある。 In the embedded graphs shown above, there is a great advantage to simplifying the embedded graph by reducing the number of vertices and edges as much as possible while preserving the shape of the embedded graph as much as possible (hereinafter referred to as simplifying the embedded graph, or simply simplification).

ネットワークにおいては、頂点数、辺数が多いほど詳細なデータを表現できる一方で、一見してネットワーク構成の概要を掴むことが困難になり、加えてデータ量が膨大になるため描画に大きな負荷がかかる。このネットワークについての埋め込みグラフを単純化できれば、ネットワーク構成を視覚的に分かりやすく提示でき、頂点数、辺数を減らしてデータ圧縮し、描画の負荷を低減することが可能になる。地理情報学においては、境界線上に密に頂点をプロットすることで正確に境界線を表現するが、データ量は膨大になってしまう。 In a network, the more vertices and edges there are, the more detailed the data can be expressed, but it also becomes difficult to grasp the overall network configuration at a glance, and the data volume becomes huge, placing a heavy load on drawing. If the embedded graph for this network could be simplified, the network configuration could be presented in a visually easy-to-understand manner, and it would be possible to reduce the number of vertices and edges, compress the data, and reduce the load on drawing. In geographic information studies, boundaries can be accurately represented by densely plotting vertices on the boundaries, but this results in a huge amount of data.

境界線データを単純化することで、データ量を削減して伝送や蓄積が可能な他、境界を簡略した視覚的に分かりやすい地理情報を生成することができる。また画像工学においては、頂点数、辺数が多いほど形状や領域のデータを詳細に保持して正確に表現可能である一方で、データ量が膨大になり、伝送や蓄積に要する符号量が増えてしまう。 By simplifying boundary data, it is possible to reduce the amount of data required for transmission and storage, and to generate geographic information with simplified boundaries that is visually easier to understand. In image engineering, the more vertices and edges there are, the more detailed shape and area data can be retained and more accurately represented, but the amount of data becomes enormous, and the amount of code required for transmission and storage increases.

この形状や領域を単純化できれば、表現の正確性を可能な限り保ちながら、頂点数、辺数を削減して符号量を削減できる。
以上の単純化を達成するための発明として、埋め込みグラフ単純化法が提案されている（特許文献２参照）。またこの高速処理法(２次元版)(特許文献３参照)と、高速処理法(多次元版)(特許文献４参照)とが提案されている。 If this shape and area can be simplified, the number of vertices and edges can be reduced, thereby reducing the amount of code, while maintaining the accuracy of the representation as much as possible.
As an invention for achieving the above simplification, an embedded graph simplification method has been proposed (see Patent Document 2). A high-speed processing method (two-dimensional version) (see Patent Document 3) and a high-speed processing method (multidimensional version) (see Patent Document 4) have also been proposed.

上記の高速処理法では特異値閾値処理（Singular Value Thresholding；ＳＶＴ）（特許文献４参照）を多数並列に高速計算するFast Multiple SVT（非特許文献１参照）が採用されている。図１６にアルゴリズムを掲載する。図１６Ａと図１６Ｂに示されるアルゴリズムのうちの３行目ｇ^－１とｈ^－１と４行目σ_２においてはそれぞれ逆数平方根と平方根の差が計算されている。この逆数平方根は図１６Ａと図１６Ｂに示されるアルゴリズムの計算速度のボトルネックとなっている。 The above high-speed processing method employs Fast Multiple SVT (see Non-Patent Document 1), which performs high-speed calculations of multiple Singular Value Thresholding (SVT) (see Patent Document 4) in parallel. The algorithm is shown in FIG. 16. In the algorithm shown in FIG. 16A and FIG. 16B, the reciprocal square root and the difference between square roots are calculated in the third line g ^-1 and h ^-1 and the fourth line σ _2, respectively. This reciprocal square root is the bottleneck in the calculation speed of the algorithm shown in FIG. 16A and FIG. 16B.

また平方根の差の計算は桁落ちが生じやすく、誤差が発生しやすい。さらに、図１６Ａと図１６Ｂに示されるアルゴリズムのうちの４行目の平方根の差計算においてのみ、平方根の計算が必要であり、その他の個所では逆数平方根しか用いられていない。 In addition, the calculation of the difference between square roots is prone to cancellation of digits, which can easily lead to errors. Furthermore, in the algorithms shown in Figures 16A and 16B, only the square root calculation is required in the square root difference calculation on line 4, and only the reciprocal square root is used in other places.

さて、ある正の実数Ａの逆数平方根ａ＝１／√Ａは、コンピュータグラフィクス（ＣＧ）分野における反射光および分散光シミュレーションにおいて、法線計算という形で多量に繰り返して計算される。Now, the reciprocal square root a = 1/√A of a positive real number A is calculated repeatedly and extensively in the form of normal calculations in reflected light and scattered light simulations in the field of computer graphics (CG).

この計算量を削減するために、高速逆数平方根法（Fast Inverse Square Root；FastInvSqrt法）が発明され、平方根の計算を回避することで、逆数平方根の計算は数倍程度高速化されている。この技術はCGを用いたゲームソフトウェアに実装されている（非特許文献２参照）。またFastInvSqrt法の高精度版として修正FastInvSqrt法が発明されている（非特許文献３参照）。 To reduce the amount of calculation, the Fast Inverse Square Root (FastInvSqrt) method was invented, which speeds up the calculation of the reciprocal square root by several times by avoiding the calculation of the square root. This technology is implemented in game software that uses CG (see Non-Patent Document 2). In addition, a modified FastInvSqrt method has been invented as a high-precision version of the FastInvSqrt method (see Non-Patent Document 3).

このFastInvSqrt法または修正FastInvSqrt法を、上述のFast Multiple SVTにおける逆数平方根の計算に採用することで計算速度の向上が見込まれる。 By adopting this FastInvSqrt method or the modified FastInvSqrt method to calculate the reciprocal square root in the Fast Multiple SVT described above, it is expected that the calculation speed will be improved.

特開２０１６－２２１８２９号公報JP 2016-221829 A 特開２０１７－２１１７０６号公報JP 2017-211706 A 特開２０１８－０８２２４９号公報JP 2018-082249 A 特開２０１９－０４６１９６号公報JP 2019-046196 A

佐々木崇元，北原正樹，清水淳，”低ランク最適化のための高速特異値閾値処理の数理，” 第16回情報科学技術フォーラム，2017Takamoto Sasaki, Masaki Kitahara, Jun Shimizu, "Fast Singular Value Thresholding for Low-Rank Optimization," 16th Forum on Information Science and Technology, 2017 M. Robertson, “A Brief History of InvSqrt,” Bachelor Thesis, University of New Brunswick, 2012.M. Robertson, “A Brief History of InvSqrt,” Bachelor Thesis, University of New Brunswick, 2012. C. J. Walczyk, ''A Modification of the Fast Inverse Square, '' MDPI Computation, vol. 7, no. 3, 2019C. J. Walczyk, ``A Modification of the Fast Inverse Square,'' MDPI Computation, vol. 7, no. 3, 2019

しかしながら、Fast Multiple SVTでは平方根の差計算も必要であるため、結局のところ平方根が必要であり、さらに逆数演算を取る必要が発生する。このため、上記の速度向上は達成されない。However, Fast Multiple SVT requires square root difference calculations, so ultimately the square root is required, and then a reciprocal calculation is required. As a result, the speedup mentioned above is not achieved.

このように、Fast Multiple SVTを用いるグラフ単純化装置は高速に処理できず、グラフ描画やグラフ処理を高速に実行できない。 As such, a graph simplification device using Fast Multiple SVT cannot process quickly, and cannot perform graph drawing or graph processing quickly.

上記事情に鑑み、本発明は、平方根を含む計算をより速く行う技術の提供を目的としている。 In view of the above circumstances, the present invention aims to provide a technology for performing calculations including square roots more quickly.

本発明の一態様は、コンピュータが、正の実数Ａの逆数平方根を高速逆数平方根法により計算する第１逆数平方根ステップと、コンピュータが、正の実数Ｂの逆数平方根を高速逆数平方根法により計算する第２逆数平方根ステップと、コンピュータが、ＡからＢを減算する減算ステップと、コンピュータが、前記第１逆数平方根ステップでの計算結果と、前記第２逆数平方根ステップでの計算結果とを乗算する第１乗算ステップと、コンピュータが、前記第１乗算ステップでの計算結果と、前記減算ステップでの計算結果とを乗算する第２乗算ステップと、コンピュータが、前記第１逆数平方根ステップでの計算結果と、前記第２逆数平方根ステップでの計算結果とを加算する加算ステップと、コンピュータが、前記第２乗算ステップでの計算結果を、前記加算ステップでの計算結果で除算する除算ステップと、を備えた計算方法である。One aspect of the present invention is a calculation method including a first reciprocal square root step in which the computer calculates the reciprocal square root of a positive real number A using the fast reciprocal square root method, a second reciprocal square root step in which the computer calculates the reciprocal square root of a positive real number B using the fast reciprocal square root method, a subtraction step in which the computer subtracts B from A, a first multiplication step in which the computer multiplies the calculation result of the first reciprocal square root step by the calculation result of the second reciprocal square root step, a second multiplication step in which the computer multiplies the calculation result of the first multiplication step by the calculation result of the subtraction step, an addition step in which the computer adds the calculation result of the first reciprocal square root step by the calculation result of the second reciprocal square root step, and a division step in which the computer divides the calculation result of the second multiplication step by the calculation result of the addition step.

本発明の一態様は、上記の計算方法をコンピュータに実行させるためのプログラムである。 One aspect of the present invention is a program for causing a computer to execute the above calculation method.

本発明の一態様は、正の実数Ａの逆数平方根を高速逆数平方根法により計算する第１逆数平方根部と、正の実数Ｂの逆数平方根を高速逆数平方根法により計算する第２逆数平方根部と、ＡからＢを減算する減算部と、前記第１逆数平方根部での計算結果と、前記第２逆数平方根部での計算結果とを乗算する第１乗算部と、前記第１乗算部での計算結果と、前記減算部での計算結果とを乗算する第２乗算部と、前記第１逆数平方根部での計算結果と、前記第２逆数平方根部での計算結果とを加算する加算部と、前記第２乗算部での計算結果を、前記加算部での計算結果で除算する除算部と、を備えた計算装置である。One aspect of the present invention is a calculation device including a first reciprocal square root unit that calculates the reciprocal square root of a positive real number A using the fast reciprocal square root method, a second reciprocal square root unit that calculates the reciprocal square root of a positive real number B using the fast reciprocal square root method, a subtraction unit that subtracts B from A, a first multiplication unit that multiplies the calculation result of the first reciprocal square root unit by the calculation result of the second reciprocal square root unit, a second multiplication unit that multiplies the calculation result of the first multiplication unit by the calculation result of the subtraction unit, an addition unit that adds the calculation result of the first reciprocal square root unit to the calculation result of the second reciprocal square root unit, and a division unit that divides the calculation result of the second multiplication unit by the calculation result of the addition unit.

本発明により、平方根を含む計算をより速く行うことが可能となる。 The present invention makes it possible to perform calculations involving square roots faster.

計算装置を含む情報処理装置の構成を示す図である。FIG. 1 is a diagram illustrating a configuration of an information processing device including a computing device. 計算装置１０Ａの構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a computing device 10A. 計算装置２０を示すブロック図である。FIG. 2 is a block diagram showing a computing device 20. 計算量を示す図である。FIG. 計算装置１０Ｂの構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a computing device 10B. 計算装置４０を示すブロック図である。FIG. 4 is a block diagram showing a computing device 40. 計算装置１０Ｃの構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a computing device 10C. 計算装置１０Ｄの構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a computing device 10D. 計算装置１０Ｅの構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a computing device 10E. 計算装置１０Ｆの構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a computing device 10F. グラフの単純化処理の処理概要を示す図である。FIG. 13 is a diagram illustrating an outline of a graph simplification process. グラフ単純化装置の構成例を示す図である。FIG. 1 illustrates an example of the configuration of a graph simplifying device. 局所線形整列部の構成例を示す図である。FIG. 13 is a diagram illustrating an example of the configuration of a local linear alignment unit. 隣接辺のベクトルを並べる操作を説明するための図である。FIG. 13 is a diagram for explaining an operation of arranging vectors of adjacent sides. 局所線形整列問題の解法を示すアルゴリズムを示す図である。FIG. 1 shows an algorithm illustrating the solution to the local linear alignment problem. 局所線形整列問題の解法を示すアルゴリズムを示す図である。FIG. 1 shows an algorithm illustrating the solution to the local linear alignment problem. 特異値閾値処理のアルゴリズムを示す図である。FIG. 1 illustrates an algorithm for singular value threshold processing. 特異値閾値処理のアルゴリズムを示す図である。FIG. 1 illustrates an algorithm for singular value threshold processing.

本発明の実施形態について、図面を参照して詳細に説明する。
図１は、実施形態における計算装置１０を含む情報処理装置１の構成を示す図である。情報処理装置１は、入力装置３、計算装置１０、および出力装置５を備える。入力装置３は、計算装置１０に数値等の入力データを入力する。計算装置１０は、入力データを用いて計算を行い、計算結果を出力装置５に出力する。 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described in detail with reference to the drawings.
1 is a diagram showing the configuration of an information processing device 1 including a calculation device 10 in an embodiment. The information processing device 1 includes an input device 3, a calculation device 10, and an output device 5. The input device 3 inputs input data such as numerical values to the calculation device 10. The calculation device 10 performs calculations using the input data and outputs the calculation results to the output device 5.

計算装置１０は、バスで接続されたＣＰＵ（Central Processing Unit）やメモリや補助記憶装置などを備え、計算プログラムに実行することによって計算処理を実行する。計算装置１０の各機能の全て又は一部は、ＡＳＩＣ（Application Specific Integrated Circuit）やＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されてもよい。計算プログラムは、コンピュータ読み取り可能な記録媒体に記録されてもよい。コンピュータ読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ、半導体記憶装置（例えばＳＳＤ：Solid State Drive）等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置である。計算プログラムは、電気通信回線を介して送信されてもよい。The computing device 10 includes a CPU (Central Processing Unit), memory, auxiliary storage device, etc., which are connected by a bus, and performs computational processing by executing a computation program. All or part of the functions of the computing device 10 may be realized using hardware such as an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), or an FPGA (Field Programmable Gate Array). The computation program may be recorded on a computer-readable recording medium. Examples of computer-readable recording media include portable media such as flexible disks, optical magnetic disks, ROMs, CD-ROMs, and semiconductor storage devices (e.g., SSDs: Solid State Drives), and storage devices such as hard disks built into computer systems. The computation program may be transmitted via a telecommunications line.

以下、計算装置１０の第１実施形態～第６実施形態について説明する。その後、計算装置１０の適用例としてグラフの単純化処理について説明する。なお、各実施形態における「高速逆数平方根計算部」は、高速逆数平方根法により逆数平方根を計算する。 Below, the first to sixth embodiments of the calculation device 10 will be described. After that, a graph simplification process will be described as an application example of the calculation device 10. Note that the "fast reciprocal square root calculation unit" in each embodiment calculates the reciprocal square root using the fast reciprocal square root method.

［第１実施形態］
以下、第１実施形態における計算装置１０である計算装置１０Ａについて説明する。図２は、実施形態における逆数平方根、および平方根差を計算する計算装置１０Ａの構成を示すブロック図である。計算装置１０Ａは、入力装置３から正の実数Ａ、Ｂが入力され、ａ＝１／√Ａ、ｂ＝１／√Ｂ、ｃ＝√Ａ－√Ｂを出力装置５に出力する計算装置である。計算装置１０Ａは、高速逆数平方根計算部１１、２１、減算部１２、乗算部１７、１１７、加算部１３、および除算部１４を備える。 [First embodiment]
Hereinafter, a calculation device 10A, which is the calculation device 10 in the first embodiment, will be described. Fig. 2 is a block diagram showing the configuration of the calculation device 10A that calculates the reciprocal square root and the square root difference in the embodiment. The calculation device 10A is a calculation device that receives positive real numbers A and B from the input device 3 and outputs a = 1/√A, b = 1/√B, and c = √A - √B to the output device 5. The calculation device 10A includes high-speed reciprocal square root calculation units 11 and 21, a subtraction unit 12, multiplication units 17 and 117, an addition unit 13, and a division unit 14.

計算装置１０Ａにおいて、正の実数Ａが入力されると、Ａは、高速逆数平方根計算部１１および減算部１２に入力される。正の実数Ｂが入力されると、Ｂは、高速逆数平方根計算部２１および減算部１２に入力される。高速逆数平方根計算部１１は、減算部１２、加算部１３、および出力装置５にａ＝１／√Ａを出力する。高速逆数平方根計算部２１は、減算部１２、加算部１３、および出力装置５にｂ＝１／√Ｂを出力する。減算部１２は、Ａ－Ｂを乗算部１１７に出力する。When a positive real number A is input to calculation device 10A, A is input to fast reciprocal square root calculation unit 11 and subtraction unit 12. When a positive real number B is input, B is input to fast reciprocal square root calculation unit 21 and subtraction unit 12. Fast reciprocal square root calculation unit 11 outputs a = 1/√A to subtraction unit 12, addition unit 13, and output device 5. Fast reciprocal square root calculation unit 21 outputs b = 1/√B to subtraction unit 12, addition unit 13, and output device 5. Subtraction unit 12 outputs A - B to multiplication unit 117.

乗算部１７は、乗算部１１７にａｂを出力する。乗算部１１７は、除算部１４に（Ａ－Ｂ）ａｂを出力する。加算部１３は、ａ＋ｂを除算部１４に出力する。除算部１４は、ｃとして（Ａ－Ｂ）ａｂ／（ａ＋ｂ）を出力装置５に出力する。 The multiplication unit 17 outputs ab to the multiplication unit 117. The multiplication unit 117 outputs (A-B)ab to the division unit 14. The addition unit 13 outputs a+b to the division unit 14. The division unit 14 outputs (A-B)ab/(a+b) as c to the output device 5.

ここで、（Ａ－Ｂ）ａｂ／（ａ＋ｂ）＝√Ａ－√Ｂであることは、以下の通りである。
√Ａ－√Ｂ＝（√Ａ－√Ｂ）（√Ａ＋√Ｂ）／（√Ａ＋√Ｂ）
＝（Ａ－Ｂ）／（√Ａ＋√Ｂ）
ここで、
√Ａ＋√Ｂ＝（１／√Ａ＋１／√Ｂ）（√Ａ√Ｂ）＝（ａ＋ｂ）／（ａｂ）
よって
（Ａ－Ｂ）／（√Ａ＋√Ｂ）＝（Ａ－Ｂ）ａｂ／（ａ＋ｂ） Here, (A-B)ab/(a+b)=√A-√B as follows.
√A-√B=(√A-√B)(√A+√B)/(√A+√B)
=(A-B)/(√A+√B)
Where:
√A+√B=(1/√A+1/√B)(√A√B)=(a+b)/(ab)
Therefore, (A-B)/(√A+√B)=(A-B)ab/(a+b).

以上より、計算装置１０Ａは、ａ＝１／√Ａ、ｂ＝１／√Ｂ、およびｃ＝√Ａ－√Ｂを出力装置５に出力する。計算装置１０Ａは、逆数平方根を計算するが、√Ａ、√Ｂは計算していない。よって、ｃを√Ａ、√Ｂを計算することなく出力する。また、分子の有理化を行うことで、√Ａ、√Ｂを計算する場合と比較して、√Ａ－√Ｂの桁落ちを防止することができる。 From the above, calculation device 10A outputs a = 1/√A, b = 1/√B, and c = √A - √B to output device 5. Calculation device 10A calculates the reciprocal square root, but does not calculate √A or √B. Therefore, it outputs c without calculating √A or √B. Furthermore, by rationalizing the numerator, it is possible to prevent loss of digits in √A - √B compared to when calculating √A and √B.

図３は、参考例として、従来通り、√Ａ、√Ｂを計算してａ，ｂ，ｃを出力する計算装置２０を示すブロック図である。計算装置２０は、平方根計算部１５、２５、減算部２２、および除算部２４、３４を備える。 Figure 3 is a block diagram showing a calculation device 20 that calculates √A and √B and outputs a, b, and c as in the conventional case. The calculation device 20 includes square root calculation units 15 and 25, a subtraction unit 22, and division units 24 and 34.

計算装置２０において、正の実数Ａが入力されると、平方根計算部１５は、減算部２２、および除算部２４に√Ａを出力する。正の実数Ｂが入力されると、平方根計算部２５は、減算部２２、および除算部３４に√Ｂを出力する。In the calculation device 20, when a positive real number A is input, the square root calculation unit 15 outputs √A to the subtraction unit 22 and the division unit 24. When a positive real number B is input, the square root calculation unit 25 outputs √B to the subtraction unit 22 and the division unit 34.

除算部２４は、ａ＝１／√Ａを出力する。除算部３４は、ｂ＝１／√Ｂを出力する。減算部２２は、ｃ＝√Ａ－√Ｂを出力する。 The division unit 24 outputs a = 1/√A. The division unit 34 outputs b = 1/√B. The subtraction unit 22 outputs c = √A - √B.

ここで計算量を比較するために、図４を用いて各計算の計算量について説明する。図４は、Ｃ＋＋言語で、各計算を１，０００，０００回計算するのに要した時間を、加算を１ｆｌｏｐｓとして示した図である。図４において、sqrt(x)は平方根，1/sqrt(x)は逆数平方根をＣ＋＋言語の標準ライブラリ関数によって演算した場合の計算量を示している。また、fastInvSqrt(x)はニュートン法１回のFast Inverse Square Root法による計算量を示している。fastInvSqrt２(x)はニュートン法２回のFast Inverse Square Root法による計算量を示している。fastInvSqrt３(x)はニュートン法３回のFast Inverse Square Root法による計算量を示している。mFastInvSqrt(x)は修正Fast Inverse Square Root法による計算量を示している。 In order to compare the amount of calculations, the amount of calculations for each calculation will be explained using Figure 4. Figure 4 shows the time required to perform each calculation 1,000,000 times in C++, with addition being 1 flop. In Figure 4, sqrt(x) indicates the amount of calculation required to calculate the square root and 1/sqrt(x) indicates the amount of calculation required to calculate the reciprocal square root using standard library functions in C++. Also, fastInvSqrt(x) indicates the amount of calculation required by the Fast Inverse Square Root method using Newton's method once. fastInvSqrt2(x) indicates the amount of calculation required by the Fast Inverse Square Root method using Newton's method twice. fastInvSqrt3(x) indicates the amount of calculation required by the Fast Inverse Square Root method using Newton's method three times. mFastInvSqrt(x) indicates the amount of calculation required by the modified Fast Inverse Square Root method.

また、H.A. Thant, et. al., “Mobile Agents Based Load Balancing Method for Parallel Applications,” 6th Asia-Pacific Symposium on Information and Telecommunication Technologies, Yangon, 2005.によれば、浮動小数点型の加算1回の計算量を同じく１ｆｌｏｐとすると、減算、乗算は１ｆｌｏｐ、除算および平方根計算は４ｆｌｏｐｓの計算量である。またFastInvSqrt法による逆数平方根演算は、図４に示すとおり、１ｆｌｏｐｓと見積もることができる。According to H.A. Thant, et. al., “Mobile Agents Based Load Balancing Method for Parallel Applications,” 6th Asia-Pacific Symposium on Information and Telecommunication Technologies, Yangon, 2005, if the computational effort of one floating-point addition is also 1 flop, then subtraction and multiplication require 1 flop, while division and square root calculations require 4 flops. Furthermore, the reciprocal square root calculation using the FastInvSqrt method can be estimated to be 1 flop, as shown in Figure 4.

a、b、cを従来の計算装置２０で計算する場合、計算量の総計は１７ｆｌｏｐｓである。一方、本実施形態に係る計算装置１０Ａで計算する場合、計算量の総計は１０ｆｌｏｐｓである。よって、従来の計算装置２０と比較して、本実施形態に係る計算装置１０Ａでは、７ｆｌｏｐｓ、およそ41.2%の浮動小数点演算を削減すると見積もることができる。 When a, b, and c are calculated using a conventional calculation device 20, the total amount of calculation is 17 flops. On the other hand, when calculated using the calculation device 10A according to this embodiment, the total amount of calculation is 10 flops. Therefore, compared to the conventional calculation device 20, it can be estimated that the calculation device 10A according to this embodiment reduces floating-point operations by 7 flops, or approximately 41.2%.

［第２実施形態］
以下、第２実施形態における計算装置１０である計算装置１０Ｂについて説明する。図５は、実施形態における、和と差の逆数平方根、および和と差の平方根差を計算する計算装置１０Ｂの構成を示すブロック図である。計算装置１０Ｂは、入力装置３から正の実数Ｘ、ｅ（Ｘ＞ｅ＞０）が入力され、ａ＝１／√（Ｘ＋ｅ）、ｂ＝１／√（Ｘ－ｅ）、ｃ＝√（Ｘ＋ｅ）－√（Ｘ－ｅ）を出力装置５に出力する計算装置である。計算装置１０Ｂは、高速逆数平方根計算部３１、４１、減算部３２、乗算部２７、３７、４７、加算部２３、３３、および除算部４４を備える。 [Second embodiment]
Hereinafter, a calculation device 10B, which is the calculation device 10 in the second embodiment, will be described. Fig. 5 is a block diagram showing the configuration of the calculation device 10B that calculates the reciprocal square root of the sum and the difference, and the square root difference of the sum and the difference, in the embodiment. The calculation device 10B is a calculation device that receives positive real numbers X and e (X>e>0) from the input device 3 and outputs a = 1/√(X+e), b = 1/√(X-e), and c = √(X+e)-√(X-e) to the output device 5. The calculation device 10B includes high-speed reciprocal square root calculation units 31 and 41, a subtraction unit 32, multiplication units 27, 37, and 47, addition units 23 and 33, and a division unit 44.

計算装置１０Ｂにおいて、正の実数Ｘが入力されると、Ｘは、加算部２３および減算部３２に入力される。正の実数ｅが入力されると、ｅは、加算部２３、減算部３２、および乗算部２７に入力される。加算部２３は、Ｘ＋ｅを高速逆数平方根計算部３１に出力する。減算部３２は、Ｘ－ｅを高速逆数平方根計算部４１に出力する。乗算部２７は、２ｅを乗算部４７に出力する。 When a positive real number X is input to calculation device 10B, X is input to addition unit 23 and subtraction unit 32. When a positive real number e is input, e is input to addition unit 23, subtraction unit 32, and multiplication unit 27. Addition unit 23 outputs X+e to fast reciprocal square root calculation unit 31. Subtraction unit 32 outputs X-e to fast reciprocal square root calculation unit 41. Multiplication unit 27 outputs 2e to multiplication unit 47.

高速逆数平方根計算部３１は、乗算部３７、加算部３３、および出力装置５にａ＝１／√（Ｘ＋ｅ）を出力する。高速逆数平方根計算部４１は、乗算部３７、加算部３３、および出力装置５にｂ＝１／√（Ｘ－ｅ）を出力する。乗算部３７は、ａｂ＝（１／√（Ｘ＋ｅ））（１／√（Ｘ－ｅ））を乗算部４７に出力する。乗算部４７は、２ｅａｂを除算部４４に出力する。加算部３３は、ａ＋ｂ＝１／√（Ｘ＋ｅ）＋１／√（Ｘ－ｅ）を除算部４４に出力する。除算部４４は、ｃとして２ｅａｂ／（ａ＋ｂ）を出力装置５に出力する。２ｅａｂ／（ａ＋ｂ）＝√（Ｘ＋ｅ）－√（Ｘ－ｅ）であることは、第１実施形態におけるＡをＸ＋ｅとし、ＢをＸ－ｅとすることで示される。The fast reciprocal square root calculation unit 31 outputs a = 1/√(X+e) to the multiplication unit 37, the addition unit 33, and the output device 5. The fast reciprocal square root calculation unit 41 outputs b = 1/√(X-e) to the multiplication unit 37, the addition unit 33, and the output device 5. The multiplication unit 37 outputs ab = (1/√(X+e)) (1/√(X-e)) to the multiplication unit 47. The multiplication unit 47 outputs 2eab to the division unit 44. The addition unit 33 outputs a + b = 1/√(X+e) + 1/√(X-e) to the division unit 44. The division unit 44 outputs 2eab/(a + b) as c to the output device 5. The fact that 2eab/(a+b)=√(X+e)-√(X-e) can be shown by setting A in the first embodiment to X+e and B to X-e.

以上より、計算装置１０Ｂは、ａ＝１／√（Ｘ＋ｅ）、ｂ＝１／√（Ｘ－ｅ）、およびｃ＝√（Ｘ＋ｅ）－√（Ｘ－ｅ）を出力装置５に出力する。計算装置１０Ｂは、逆数平方根を計算するが、√（Ｘ＋ｅ）、√（Ｘ－ｅ）は計算していない。よって、ｃを√（Ｘ＋ｅ）、√（Ｘ－ｅ）を計算することなく出力する。 From the above, calculation device 10B outputs a = 1/√(X+e), b = 1/√(X-e), and c = √(X+e) - √(X-e) to output device 5. Calculation device 10B calculates the reciprocal square root, but does not calculate √(X+e) or √(X-e). Therefore, it outputs c without calculating √(X+e) or √(X-e).

図６は、参考例として、従来通り、√（Ｘ＋ｅ）、√（Ｘ－ｅ）を計算してａ，ｂ，ｃを出力する計算装置４０を示すブロック図である。計算装置４０は、平方根計算部３５、４５、減算部４２、５２、加算部４３、および除算部５４、６４を備える。 Figure 6 is a block diagram showing, as a reference example, a calculation device 40 that calculates √(X+e) and √(X-e) and outputs a, b, and c as in the conventional manner. The calculation device 40 includes square root calculation units 35 and 45, subtraction units 42 and 52, an addition unit 43, and division units 54 and 64.

計算装置４０において、正の実数Ｘ、ｅが入力されると、加算部４３は、平方根計算部３５に、Ｘ＋ｅを出力する。減算部４２は、平方根計算部４５に、Ｘ－ｅを出力する。平方根計算部３５は、除算部５４および減算部５２に√（Ｘ＋ｅ）を出力する。平方根計算部４５は、除算部６４および減算部５２に√（Ｘ－ｅ）を出力する。 When positive real numbers X and e are input to calculation device 40, addition unit 43 outputs X+e to square root calculation unit 35. Subtraction unit 42 outputs X-e to square root calculation unit 45. Square root calculation unit 35 outputs √(X+e) to division unit 54 and subtraction unit 52. Square root calculation unit 45 outputs √(X-e) to division unit 64 and subtraction unit 52.

除算部５４は、ａ＝１／√（Ｘ＋ｅ）を出力する。除算部６４は、ｂ＝１／√（Ｘ－ｅ）を出力する。減算部５２は、ｃ＝√（Ｘ＋ｅ）－√（Ｘ－ｅ）を出力する。The division unit 54 outputs a = 1/√(X+e). The division unit 64 outputs b = 1/√(X-e). The subtraction unit 52 outputs c = √(X+e) - √(X-e).

a、b、cを従来の計算装置４０で計算する場合、計算量の総計は１９ｆｌｏｐｓである。一方、本実施形態に係る計算装置１０Ｂで計算する場合、計算量の総計は１２ｆｌｏｐｓである。よって、従来の計算装置４０と比較して、本実施形態に係る計算装置１０Ｂでは、７ｆｌｏｐｓ、およそ36.8%の浮動小数点演算を削減すると見積もることができる。 When a, b, and c are calculated using a conventional calculation device 40, the total amount of calculation is 19 flops. On the other hand, when calculated using the calculation device 10B according to this embodiment, the total amount of calculation is 12 flops. Therefore, compared to the conventional calculation device 40, the calculation device 10B according to this embodiment can be estimated to reduce floating-point operations by 7 flops, or approximately 36.8%.

［第３実施形態］
以下、第３実施形態における計算装置１０である計算装置１０Ｃについて説明する。図７は、実施形態における、Ｍ×２行列の第２特異値、特異値の和の逆数および特異値の差の逆数を計算する計算装置１０Ｃの構成を示すブロック図である。本実施形態において、Ｍ≧３であり、Ｍ＝２については第４実施形態で説明する。計算装置１０Ｃは、入力装置３からＭ×２行列Ｙ＝［ｙ_１，ｙ_２］が入力され、第２特異値σ_２、特異値の和の逆数１／（σ_１＋σ_２）および特異値の差の逆数１／（σ_１－σ_２）を出力装置５に出力する計算装置である。 [Third embodiment]
Hereinafter, a calculation device 10C, which is the calculation device 10 in the third embodiment, will be described. Fig. 7 is a block diagram showing the configuration of the calculation device 10C in the embodiment, which calculates the second singular value of an M x 2 matrix, the inverse of the sum of the singular values, and the inverse of the difference of the singular values. In this embodiment, M ≥ 3, and M = 2 will be described in the fourth embodiment. The calculation device 10C is a calculation device that receives an M x 2 matrix Y = [y ₁ , y ₂ ] from the input device 3, and outputs the second singular value σ ₂ , the inverse of the sum of the singular values 1/(σ ₁ + σ ₂ ), and the inverse of the difference of the singular values 1/(σ ₁ - _{σ 2} ) to the output device 5.

計算装置１０Ｃは、第２実施形態で説明した計算装置１０Ｂを備える。また、計算装置１０Ｃは、分解部１１０、内積部１６、２６、３６、減算部６２、乗算部５７、６７、７７、８７、加算部５３、および平方根計算部５５を備える。The calculation device 10C includes the calculation device 10B described in the second embodiment. The calculation device 10C also includes a decomposition unit 110, inner product units 16, 26, and 36, a subtraction unit 62, multiplication units 57, 67, 77, and 87, an addition unit 53, and a square root calculation unit 55.

計算装置１０Ｃにおいて、Ｍ×２行列Ｙ＝［ｙ_１，ｙ_２］が入力されると、Ｙは、分解部１１０に入力される。分解部１１０は、Ｍ×２行列Ｙを列ベクトルｙ_１，ｙ_２に分解する。分解部１１０は、ｙ_１を内積部１６、３６に出力する。分解部１１０は、ｙ_２を内積部２６、３６に出力する。 In the calculation device 10C, when an M×2 matrix Y=[ _y1 , _y2 ] is input, Y is input to the decomposition unit 110. The decomposition unit 110 decomposes the M×2 matrix Y into column vectors _y1 and _y2 . The decomposition unit 110 outputs _y1 to the inner product units 16 and 36. The decomposition unit 110 outputs _y2 to the inner product units 26 and 36.

内積部１６は、ｙ_１とｙ_１との内積ａを計算し、ａを加算部５３、乗算部６７、および出力装置５に出力する。内積部２６は、ｙ_２とｙ_２との内積ｃを計算し、ｃを加算部５３、乗算部６７、および出力装置５に出力する。内積部３６は、ｙ_１とｙ_２との内積ｂを計算し、ｂを乗算部７７、および出力装置５に出力する。 The dot product unit 16 calculates the dot product a between _y1 and _y1 , and outputs a to the adder unit 53, the multiplier unit 67, and the output device 5. The dot product unit 26 calculates the dot product c between _y2 and _y2 , and outputs c to the adder unit 53, the multiplier unit 67, and the output device 5. The dot product unit 36 calculates the dot product b between _y1 and _y2 , and outputs b to the multiplier unit 77 and the output device 5.

加算部５３は、ｆ＝ａ＋ｃを計算装置１０Ｂおよび出力装置５に出力する。乗算部６７は、ａｃを減算部６２に出力する。乗算部７７は、ｂ^２を減算部６２に出力する。減算部６２は、ｄ＝ａｃ－ｂ^２を平方根計算部５５および出力装置５に出力する。平方根計算部５５は、ｅ＝√ｄを乗算部５７および出力装置５に出力する。乗算部５７は、２ｅを計算装置１０Ｂに出力する。 Addition unit 53 outputs f=a+c to calculation device 10B and output device 5. Multiplication unit 67 outputs ac to subtraction unit 62. Multiplication unit 77 outputs b ² to subtraction unit 62. Subtraction unit 62 outputs d=ac-b ² to square root calculation unit 55 and output device 5. Square root calculation unit 55 outputs e=√d to multiplication unit 57 and output device 5. Multiplication unit 57 outputs 2e to calculation device 10B.

計算装置１０Ｂは、１／√（ｆ＋２ｅ）を１／（σ_１＋σ_２）として出力装置５に出力する。計算装置１０Ｂは、１／√（ｆ－２ｅ）を１／（σ_１－σ_２）として出力装置５に出力する。計算装置１０Ｂは、√（ｆ＋２ｅ）－√（ｆ－２ｅ）を２σ_２として乗算部８７に出力する。乗算部８７は、σ_２を出力装置５に出力する。 Calculation device 10B outputs 1/√(f+2e) as 1/(σ ₁ +σ ₂ ) to output device 5. Calculation device 10B outputs 1/√(f-2e) as 1/(σ ₁ -σ ₂ ) to output device 5. Calculation device 10B outputs √(f+2e)-√(f-2e) as 2σ ₂ to multiplication unit 87. Multiplication unit 87 outputs σ ₂ to output device 5.

ここで１／√（ｆ±２ｅ）が１／（σ_１±σ_２）であることは、以下の通りである。非特許文献１によれば、Ｍ×２行列の特異値σ_１、σ_２の和および差は、σ_１±σ_２＝√(ｔｒ（Ｙ^ＴＹ）±２√（ｄｅｔ（Ｙ^ＴＹ）））である。 Here, 1/√(f±2e) is 1/(σ ₁ ±σ ₂ ) as follows: According to Non-Patent Document 1, the sum and difference of the singular values σ ₁ and σ ₂ of an M×2 matrix are σ ₁ ±σ ₂ =√(tr(Y ^T Y)±2√(det(Y ^T Y))).

ｔｒ（Ｙ^ＴＹ）＝ａ（＝ｙ_１とｙ_１との内積）＋ｃ（＝ｙ_２とｙ_２との内積）＝ｆである。ｄｅｔ（Ｙ^ＴＹ）＝ａ（＝ｙ_１とｙ_１との内積）×ｃ（＝ｙ_２とｙ_２との内積）－ｂ^２（＝（ｙ_１とｙ_２との内積ｂ）^２）＝ａｃ－ｂ^２＝ｄある。よって、√（ｄｅｔ（Ｙ^ＴＹ））＝√ｄ＝ｅである。 tr( ^YTY ) = a (= the dot product of _y1 and _y1 ) + c (= the dot product of _y2 and _y2 ) = f. det( ^YTY ) = a (= the dot product of _y1 and _y1 ) × c (= the dot product of _y2 and _y2 ) - ^b2 (= (the dot product b of _y1 and _y2 ) ² ) = ac - ^b2 = d. Therefore, √(det( ^YTY )) = √d = e.

よって、√(ｔｒ（Ｙ^ＴＹ）±２√（ｄｅｔ（Ｙ^ＴＹ）））＝√（ｆ±２ｅ）である。したがって、１／√（ｆ±２ｅ）＝１／（σ_１±σ_２）である。 Therefore, (tr( ^YTY )±2(det( ^YTY )))=(f±2e). Therefore, 1/(f±2e)=1/( _σ1 ± _σ2 ).

なお、図７に示されるように、計算装置１０Ｃは、第２特異値σ_２、特異値の和の逆数１／（σ_１＋σ_２）および特異値の差の逆数１／（σ_１－σ_２）だけではなく、ａ、ｂ、ｃ、ｄ、ｅ、ｆも出力装置５に出力する。 As shown in FIG. 7, the calculation device 10C outputs not only the second singular value σ ₂ , the reciprocal of the sum of the singular values 1/(σ ₁ +σ ₂ ), and the reciprocal of the difference of the singular values 1/(σ ₁ -σ ₂ ), but also a, b, c, d, e, and f to the output device 5.

計算装置１０Ｃの計算量について説明する。分解部１１０の処理は、行列を列ベクトルに分解するだけなので浮動小数点演算は行われない(０ｆｌｏｐ)。また内積部１６、２６、３６による計算量は、Ｍ次列ベクトルの入力に対し、２Ｍ－１ｆｌｏｐｓの計算量である。またsqrtによる平方根計算は前述の通り４ｆｌｏｐｓである。よって、計算装置１０Ｃによる計算量の総計は６Ｍ+１９ｆｌｏｐｓである。上述したように、計算装置１０Ｂにおいて、７ｆｌｏｐｓの削減効果があるため、計算装置１０Ｃも計算装置１０Ｂによって７ｆｌｏｐｓの削減効果がある。
［第４実施形態］
以下、第４実施形態における計算装置１０である計算装置１０Ｄについて説明する。図８は、実施形態における、２×２行列の第２特異値、特異値の和の逆数および特異値の差の逆数を計算する計算装置１０Ｄの構成を示すブロック図である。計算装置１０Ｄは、入力装置３から２×２行列Ｙ＝［ｙ_１，ｙ_２］が入力され、第２特異値σ_２、特異値の和の逆数１／（σ_１＋σ_２）および特異値の差の逆数１／（σ_１－σ_２）を出力装置５に出力する計算装置である。 The computational complexity of the computing device 10C will be described. The processing of the decomposition unit 110 simply decomposes a matrix into column vectors, so no floating-point calculations are performed (0 flops). The computational complexity of the inner product units 16, 26, and 36 is 2M-1 flops for an input of an M-th order column vector. As described above, the square root calculation by sqrt is 4 flops. Therefore, the total computational complexity of the computing device 10C is 6M+19 flops. As described above, the computing device 10B has a reduction effect of 7 flops, so the computing device 10C also has a reduction effect of 7 flops due to the computing device 10B.
[Fourth embodiment]
Hereinafter, a calculation device 10D, which is the calculation device 10 in the fourth embodiment, will be described. Fig. 8 is a block diagram showing the configuration of the calculation device 10D in the embodiment, which calculates the second singular value of a 2x2 matrix, the inverse of the sum of the singular values, and the inverse of the difference of the singular values. The calculation device 10D is a calculation device that receives a 2x2 matrix Y = [ _y1 , _y2 ] from the input device 3, and outputs the second singular value _σ2 , the inverse of the sum of the singular values 1/( _σ1 + _σ2 ), and the inverse of the difference of the singular values 1/( _σ1 - _σ2 ) to the output device 5.

計算装置１０Ｄは、第２実施形態で説明した計算装置１０Ｂを備える。また、計算装置１０Ｄは、分解部１２０、内積部４６、５６、行列式計算部１９、絶対値計算部１８、乗算部９７、１０７、および加算部６３を備える。The calculation device 10D includes the calculation device 10B described in the second embodiment. The calculation device 10D also includes a decomposition unit 120, inner product units 46 and 56, a determinant calculation unit 19, an absolute value calculation unit 18, multiplication units 97 and 107, and an addition unit 63.

計算装置１０Ｄにおいて、２×２行列Ｙ＝［ｙ_１，ｙ_２］が入力されると、Ｙは、分解部１２０に入力される。分解部１２０は、２×２行列Ｙを列ベクトルｙ_１，ｙ_２に分解する。分解部１２０は、ｙ_１を内積部４６に出力する。分解部１２０は、ｙ_２を内積部５６に出力する。 In the calculation device 10D, when a 2×2 matrix Y=[y ₁ , y ₂ ] is input, Y is input to the decomposition unit 120. The decomposition unit 120 decomposes the 2×2 matrix Y into column vectors y ₁ and y _2. The decomposition unit 120 outputs y ₁ to the inner product unit 46. The decomposition unit 120 outputs y ₂ to the inner product unit 56.

内積部４６は、ｙ_１とｙ_１との内積ａを計算し、ａを加算部６３に出力する。内積部５６は、ｙ_２とｙ_２との内積ｃを計算し、ｃを加算部６３に出力する。行列式計算部１９は、Ｙの行列式ｄを計算し、ｄを絶対値計算部１８、および出力装置５に出力する。 The inner product unit 46 calculates the inner product a of _y1 and _y1 , and outputs a to the addition unit 63. The inner product unit 56 calculates the inner product c of _y2 and _y2 , and outputs c to the addition unit 63. The determinant calculation unit 19 calculates the determinant d of Y, and outputs d to the absolute value calculation unit 18 and the output device 5.

加算部６３は、ｆ＝ａ＋ｃを計算装置１０Ｂおよび出力装置５に出力する。絶対値計算部１８は、ｄの絶対値ｅを乗算部９７に出力する。乗算部９７は、２ｅを計算装置１０Ｂに出力する。The addition unit 63 outputs f = a + c to the calculation device 10B and the output device 5. The absolute value calculation unit 18 outputs the absolute value e of d to the multiplication unit 97. The multiplication unit 97 outputs 2e to the calculation device 10B.

計算装置１０Ｂは、１／√（ｆ＋２ｅ）を１／（σ_１＋σ_２）として出力装置５に出力する。計算装置１０Ｂは、１／√（ｆ－２ｅ）を１／（σ_１－σ_２）として出力装置５に出力する。計算装置１０Ｂは、√（ｆ＋２ｅ）－√（ｆ－２ｅ）を２σ_２として乗算部１０７に出力する。乗算部１０７は、σ_２を出力装置５に出力する。 Calculation device 10B outputs 1/√(f+2e) as 1/(σ ₁ +σ ₂ ) to output device 5. Calculation device 10B outputs 1/√(f-2e) as 1/(σ ₁ -σ ₂ ) to output device 5. Calculation device 10B outputs √(f+2e)-√(f-2e) as 2σ ₂ to multiplication unit 107. Multiplication unit 107 outputs σ ₂ to output device 5.

ここで１／√（ｆ±２ｅ）が１／（σ_１±σ_２）であることは、第３実施形態で示した通りである。なお、図８に示されるように、計算装置１０Ｄは、第２特異値σ_２、特異値の和の逆数１／（σ_１＋σ_２）および特異値の差の逆数１／（σ_１－σ_２）だけではなく、ｄ、ｆも出力装置５に出力する。 Here, 1/√(f±2e) is 1/(σ ₁ ±σ ₂ ) as shown in the third embodiment. As shown in Fig. 8, the calculation device 10D outputs not only the second singular value σ ₂ , the reciprocal of the sum of the singular values 1/(σ ₁ +σ ₂ ), and the reciprocal of the difference of the singular values 1/(σ ₁ -σ ₂ ), but also d and f to the output device 5.

計算装置１０Ｄの計算量について説明する。行列式計算部１９の計算量は３ｆｌｏｐｓである。絶対値計算部１８は符号を評価し負の場合に反転させるだけなので浮動小数点演算は行われない(０ｆｌｏｐ)。計算装置１０Ｄによる計算量の総計は２４ｆｌｏｐｓである。上述したように、計算装置１０Ｂにおいて、７ｆｌｏｐｓの削減効果があるため、計算装置１０Ｄも計算装置１０Ｂによって７ｆｌｏｐｓの削減効果がある。
［第５実施形態］
以下、第５実施形態における計算装置１０である計算装置１０Ｅについて説明する。図９は、実施形態における、Ｍ×２行列のＳＶＴを計算する計算装置１０Ｅの構成を示すブロック図である。本実施形態において、Ｍ≧３であり、Ｍ＝２については第６実施形態で説明する。計算装置１０Ｅは、入力装置３からＭ×２行列Ｙ＝［ｙ_１，ｙ_２］と正の実数μが入力され、特異値閾値処理（Singular Value Thresholding；ＳＶＴ）を行い、その特異値閾値処理結果として、下記（１）に示される行列Ｚを計算する。 The calculation amount of the calculation device 10D will be described. The calculation amount of the determinant calculation unit 19 is 3 flops. The absolute value calculation unit 18 only evaluates the sign and inverts it if it is negative, so no floating point calculation is performed (0 flops). The total calculation amount by the calculation device 10D is 24 flops. As described above, the calculation device 10B has a reduction effect of 7 flops, so the calculation device 10D also has a reduction effect of 7 flops due to the calculation device 10B.
[Fifth embodiment]
Hereinafter, a calculation device 10E, which is the calculation device 10 in the fifth embodiment, will be described. Fig. 9 is a block diagram showing the configuration of the calculation device 10E in the embodiment, which calculates the SVT of an M x 2 matrix. In this embodiment, M ≥ 3, and M = 2 will be described in the sixth embodiment. The calculation device 10E receives an M x 2 matrix Y = [ _y1 , _y2 ] and a positive real number μ from the input device 3, performs singular value thresholding (SVT), and calculates a matrix Z shown in the following (1) as a result of the singular value thresholding.

（１）におけるＩ_２は２×２の単位行列である。また、γ、δ、ｅ、ａ、ｂ、ｃ、ｄについては後述する。 In (1), _I2 is a 2 × 2 unit matrix. γ, δ, e, a, b, c, and d will be described later.

計算装置１０Ｅは、第３実施形態で説明した計算装置１０Ｃと、係数算出装置２００と、Ｍ×２行列変換装置３００とを備える。The calculation device 10E includes the calculation device 10C described in the third embodiment, a coefficient calculation device 200, and an M×2 matrix transformation device 300.

計算装置１０Ｅにおいて、Ｍ×２行列Ｙ＝［ｙ_１，ｙ_２］が入力されると、Ｙは、計算装置１０Ｃ、係数算出装置２００、およびＭ×２行列変換装置３００に入力される。また、計算装置１０Ｅにおいて、実数μが入力されると、μは、係数算出装置２００に入力される。計算装置１０Ｃは、第３実施形態で説明したように、第２特異値σ_２、特異値の和の逆数１／（σ_１＋σ_２）、特異値の差の逆数１／（σ_１－σ_２）、ａ、ｂ、ｃ、ｄ、ｅ、ｆを出力する。ここで、図９に示されるように、ｇ＝１／（σ_１＋σ_２）、ｈ＝１／（σ_１－σ_２）とする。 When an M×2 matrix Y=[y ₁ , y ₂ ] is input to the calculation device 10E, Y is input to the calculation device 10C, the coefficient calculation device 200, and the M×2 matrix transformation device 300. When a real number μ is input to the calculation device 10E, μ is input to the coefficient calculation device 200. As described in the third embodiment, the calculation device 10C outputs the second singular value σ ₂ , the reciprocal of the sum of the singular values 1/(σ ₁ +σ ₂ ), the reciprocal of the difference of the singular values 1/(σ ₁ -σ ₂ ), a, b, c, d, e, and f. Here, as shown in FIG. 9, g=1/(σ ₁ +σ ₂ ), h=1/(σ ₁ -σ ₂ ).

計算装置１０Ｃの出力のうち、ｄ、ｆ、σ_２、ｇ、ｈは、係数算出装置２００に出力される。ａ、ｂ、ｃ、ｅは、Ｍ×２行列変換装置３００に出力される。 Of the outputs of the calculation device 10C, d, f, σ ₂ , g, and h are output to the coefficient calculation device 200. A, B, C, and E are output to the M×2 matrix transformation device 300.

係数算出装置２００は、ｄ、ｆ、σ_２、ｇ、ｈ、μが入力される。係数算出装置２００は、γ、δを出力する。このγ、δの算出方法について説明する。まず、十分に大きい実数をＲとおく。具体的にＲの大きさとして、単精度浮動小数点型の最大の数の１０分の１程度の大きさが挙げられる。その上で、係数算出装置２００は、下記（ａ）から（ｄ）の４つの場合分けを行うことでγ、δを算出し、それらを出力する。 The coefficient calculation device 200 receives d, f, σ ₂ , g, h, and μ as input. The coefficient calculation device 200 outputs γ and δ. The calculation method of γ and δ will be described. First, a sufficiently large real number is set as R. Specifically, the size of R can be about one tenth the maximum number of a single-precision floating-point number. Then, the coefficient calculation device 200 calculates γ and δ by dividing the case into the following four cases (a) to (d), and outputs them.

（ａ）Ｙが零行列の場合
（ｂ）（ａ）に該当せず、ｄ＝０の場合
（ｃ）（ｂ）に該当せず、ｈ＞Ｒの場合
（ｄ）（ｃ）に該当しない場合 (a) When Y is a zero matrix. (b) When (a) does not apply and d = 0. (c) When (b) does not apply and h > R. (d) When (c) does not apply.

以下、各場合ごとに出力されるγ、δを示す。なお、高速逆数平方根法による逆数平方根をｉｓｑｒｔ（・）と表現することがある。例えば、高速逆数平方根法により計算された正の実数Ａの逆数平方根を、ｉｓｑｒｔ（Ａ）と表現することがある。また、（・）_＋における右下添字の＋は、ランプ関数を示す。ランプ関数は、入力が負の実数なら０を出力し、入力が非負の実数なら入力された実数をそのまま出力する。関数ｍｉｎは、入力された数値のうちで最も小さい値を出力する。 The following shows γ and δ output for each case. The reciprocal square root using the fast reciprocal square root method may be expressed as isqrt(·). For example, the reciprocal square root of a positive real number A calculated using the fast reciprocal square root method may be expressed as isqrt(A). The subscript + on the lower right of (·) ₊ indicates a ramp function. The ramp function outputs 0 if the input is a negative real number, and outputs the input real number as is if the input is a non-negative real number. The function min outputs the smallest value among the input numerical values.

（ａ）：γ＝０、δ＝０
（ｂ）：γ＝（１－μ×ｉｓｑｒｔ（ｆ））_＋、δ＝０
（ｃ）：γ＝（１－（√２）×μ×ｉｓｑｒｔ（ｆ））_＋、δ＝０
（ｄ）：γ＝（１－（μ－σ_２）_＋×ｈ）_＋、δ＝ｍｉｎ（μ、σ_２）×ｇ (a): γ=0, δ=0
(b): γ=(1-μ×isqrt(f)) ₊ , δ=0
(c): γ=(1-(√2)×μ×isqrt(f)) ₊ , δ=0
(d): γ=(1-(μ-σ ₂ ) ₊ ×h) ₊ , δ=min(μ, σ ₂ )×g

係数算出装置２００は、上記場合分けにより、γ、δをＭ×２行列変換装置３００に出力する。これにより、Ｍ×２行列変換装置３００には、上記（１）に含まれるパラメータが全て揃うため、それらを用いて（１）を出力装置５に出力する。なお、ｅ＝０の場合には、（１）におけるγδ／ｅを０とする。The coefficient calculation device 200 outputs γ and δ to the M×2 matrix transformation device 300 according to the above case distinction. As a result, the M×2 matrix transformation device 300 has all the parameters included in (1) above, and uses them to output (1) to the output device 5. Note that when e=0, γδ/e in (1) is set to 0.

計算装置１０Ｅの計算量について説明する。ランプ関数および関数ｍｉｎは実数値の比較が主な計算処理であり、浮動小数点演算は行われない(０ｆｌｏｐ)。図１６Ｂに示されるアルゴリズムに従って計算する場合、計算量は１２Ｍ+３８ｆｌｏｐｓである。一方、計算装置１０Ｅによる計算量の総計は１２Ｍ+２９ｆｌｏｐｓである。以上より９ｆｌｏｐｓの浮動小数点演算を削減すると見積もることができる。
［第６実施形態］
以下、第６実施形態における計算装置１０である計算装置１０Ｆについて説明する。図１０は、実施形態における、２×２行列のＳＶＴを計算する計算装置１０Ｆの構成を示すブロック図である。計算装置１０Ｆは、入力装置３から２×２行列Ｙ＝［ｙ_１，ｙ_２］と正の実数μが入力され、上述したＳＶＴを行い、その特異値閾値処理結果として、下記（１）に示される行列Ｚを計算する。なお、ｙ_ｉｊは、Ｙのｉ行ｊ列成分である。 The calculation amount of the calculation device 10E will be described. The ramp function and the function min are mainly calculated by comparing real values, and no floating point calculation is performed (0 flop). When calculating according to the algorithm shown in FIG. 16B, the calculation amount is 12M+38 flops. On the other hand, the total calculation amount by the calculation device 10E is 12M+29 flops. From the above, it can be estimated that 9 flops of floating point calculations will be reduced.
Sixth Embodiment
Hereinafter, a calculation device 10F, which is the calculation device 10 in the sixth embodiment, will be described. Fig. 10 is a block diagram showing the configuration of the calculation device 10F that calculates the SVT of a 2x2 matrix in the embodiment. The calculation device 10F receives the 2x2 matrix Y = [ _y1 , _y2 ] and a positive real number μ from the input device 3, performs the above-mentioned SVT, and calculates the matrix Z shown in the following (1) as the result of the singular value threshold processing. Note that _yij is the i-th row, j-th column component of Y.

（２）における関数ｓｉｇｎ（・）は符号関数であり、入力が負の場合には－１を出力し、入力が０の場合には０を出力し、入力が正の場合に＋１を出力する。また、γ、δ、ｄについては後述する。 The function sign(.) in (2) is a sign function that outputs -1 when the input is negative, 0 when the input is 0, and +1 when the input is positive. γ, δ, and d will be explained later.

計算装置１０Ｆは、第４実施形態で説明した計算装置１０Ｄと、係数算出装置２０１と、２×２行列変換装置３０１とを備える。The calculation device 10F includes the calculation device 10D described in the fourth embodiment, a coefficient calculation device 201, and a 2x2 matrix transformation device 301.

計算装置１０Ｆにおいて、２×２行列Ｙ＝［ｙ_１，ｙ_２］が入力されると、Ｙは、計算装置１０Ｄ、係数算出装置２０１、およびＭ×２行列変換装置３０１に入力される。また、計算装置１０Ｆにおいて、実数μが入力されると、μは、係数算出装置２０１に入力される。計算装置１０Ｄは、第３実施形態で説明したように、第２特異値σ_２、特異値の和の逆数１／（σ_１＋σ_２）、特異値の差の逆数１／（σ_１－σ_２）、ｄ、ｆを出力する。ここで、図１０に示されるように、ｇ＝１／（σ_１＋σ_２）、ｈ＝１／（σ_１－σ_２）とする。 When a 2×2 matrix Y=[y ₁ , y ₂ ] is input to the calculation device 10F, Y is input to the calculation device 10D, the coefficient calculation device 201, and the M×2 matrix transformation device 301. When a real number μ is input to the calculation device 10F, μ is input to the coefficient calculation device 201. As described in the third embodiment, the calculation device 10D outputs the second singular value σ ₂ , the reciprocal of the sum of the singular values 1/(σ ₁ +σ ₂ ), the reciprocal of the difference of the singular values 1/(σ ₁ -σ ₂ ), d, and f. Here, as shown in FIG. 10, g=1/(σ ₁ +σ ₂ ), and h=1/(σ ₁ -σ ₂ ).

計算装置１０Ｄの出力のうち、ｄ、ｆ、σ_２、ｇ、ｈは、係数算出装置２０１に出力される。ｄは、２×２行列変換装置３００に出力される。 Of the outputs of the calculation device 10D, d, f, σ ₂ , g, and h are output to the coefficient calculation device 201. d is output to the 2×2 matrix transformation device 300.

係数算出装置２０１は、ｄ、ｆ、σ_２、ｇ、ｈ、μが入力される。係数算出装置２０１は、γ、δを出力する。このγ、δの算出方法について説明する。まず、十分に大きい実数をＲとおく。具体的にＲの大きさとして、単精度浮動小数点型の最大の数の１０分の１程度の大きさが挙げられる。その上で、係数算出装置２００は、下記（ａ）から（ｄ）の４つの場合分けを行うことでγ、δを算出し、それらを出力する。 The coefficient calculation device 201 receives d, f, σ ₂ , g, h, and μ as input. The coefficient calculation device 201 outputs γ and δ. The calculation method of γ and δ will be described. First, a sufficiently large real number is set as R. Specifically, the size of R can be about one tenth the maximum number of a single-precision floating-point number. Then, the coefficient calculation device 200 calculates γ and δ by dividing the case into the following four cases (a) to (d), and outputs them.

以下、各場合ごとに出力されるγ、δを示す。
（ａ）：γ＝０、δ＝０
（ｂ）：γ＝（１－μ×ｉｓｑｒｔ（ｆ））_＋、δ＝０
（ｃ）：γ＝（１－（√２）×μ×ｉｓｑｒｔ（ｆ））_＋、δ＝０
（ｄ）：γ＝（１－（μ－σ_２）_＋×ｈ）_＋、δ＝ｍｉｎ（μ、σ_２）×ｇ The following shows γ and δ that are output for each case.
(a): γ=0, δ=0
(b): γ=(1-μ×isqrt(f)) ₊ , δ=0
(c): γ=(1-(√2)×μ×isqrt(f)) ₊ , δ=0
(d): γ=(1-(μ-σ ₂ ) ₊ ×h) ₊ , δ=min(μ, σ ₂ )×g

係数算出装置２００は、上記場合分けにより、γ、δを２×２行列変換装置３０１に出力する。これにより、２×２行列変換装置３０１には、上記（２）に含まれるパラメータが全て揃うため、それらを用いて（２）を出力装置５に出力する。The coefficient calculation device 200 outputs γ and δ to the 2×2 matrix transformation device 301 based on the above case distinction. As a result, the 2×2 matrix transformation device 301 has all the parameters included in (2) above, and uses these to output (2) to the output device 5.

計算装置１０Ｆの計算量について説明する。図１６Ｂに示されるアルゴリズムに従って計算する場合、計算量は４１ｆｌｏｐｓである。一方、計算装置１０Ｆによる計算量の総計は３２ｆｌｏｐｓである。以上より９ｆｌｏｐｓの浮動小数点演算を削減すると見積もることができる。The calculation amount of the calculation device 10F will be described. When calculating according to the algorithm shown in FIG. 16B, the calculation amount is 41 flops. On the other hand, the total calculation amount by the calculation device 10F is 32 flops. From the above, it can be estimated that 9 flops of floating-point operations will be reduced.

次に、グラフの単純化処理について説明する。図１１は、グラフの単純化処理の処理概要を示す図である。図１１におけるＧ＝（Ｖ，Ｅ，Ｐ）は、Ｍ次元空間に埋め込まれたグラフを示す。ここでＶ，Ｅはそれぞれ頂点、辺の集合とし、Ｐ∈Ｒ^{｜Ｖ｜×Ｍ}を頂点座標とする（ここでのＲは実数全体の集合）。Ｐの各行ベクトル（ｐ_１）^Ｔ、（ｐ_２）^Ｔ、…、（ｐ_｜ｖ｜）^Ｔはグラフの各頂点の座標を表す。また、次数が２の頂点の集合をＶ~＝｛ｖ∈Ｖ｜ｄｅｇｖ＝２｝とする。 Next, the graph simplification process will be described. Fig. 11 is a diagram showing an overview of the graph simplification process. In Fig. 11, G = (V, E, P) indicates a graph embedded in an M-dimensional space. Here, V and E are a set of vertices and edges, respectively, and P ∈ R ^{|V| × M} is the vertex coordinates (here, R is the set of all real numbers). Each row vector (p ₁ ) ^T , (p ₂ ) ^T , ..., (p _|v| ) ^T of P represents the coordinates of each vertex of the graph. In addition, the set of vertices with degree 2 is denoted as V~ = {v ∈ V | degv = 2}.

グラフ単純化処理では図１１に示される通り、多くの頂点を持つ歪な形状の埋め込みグラフを入力とし、局所的に線形に整列した埋め込みグラフを中間生成し（Ｓ１０１）、最後に不要点を除去して（Ｓ１０２）、形状単純された所望のグラフを得る。In the graph simplification process, as shown in Figure 11, an embedded graph with a distorted shape and many vertices is input, and an embedded graph that is locally linearly aligned is intermediately generated (S101), and finally unnecessary points are removed (S102) to obtain the desired graph with a simplified shape.

図１２は、グラフ単純化装置５００の構成例を示す図である。グラフ単純化装置５００は、閾値λとグラフＧが入力され、単純化したグラフＧ’’を出力する。グラフ単純化装置５００は、局所線形整列部５１０と、不要頂点除去部５２０とを備える。局所線形整列部５１０は、グラフＧを閾値λを用いて局所線形整列させたＧ’を不要頂点除去部５２０に出力する。不要頂点除去部５２０は、入力したグラフＧ’の頂点のうち、不要な頂点を除去したＧ’’を出力する。 Figure 12 is a diagram showing an example configuration of a graph simplification device 500. The graph simplification device 500 receives a threshold λ and a graph G as input, and outputs a simplified graph G". The graph simplification device 500 includes a local linear alignment unit 510 and an unnecessary vertex removal unit 520. The local linear alignment unit 510 outputs G', which is obtained by locally linearly aligning the graph G using the threshold λ, to the unnecessary vertex removal unit 520. The unnecessary vertex removal unit 520 outputs G" in which unnecessary vertices have been removed from the vertices of the input graph G'.

図１３は、図１２における局所線形整列部５１０の構成例を示す図である。局所線形整列部５１０は、上述したように、グラフＧを閾値λを用いて局所線形整列させたＧ’を不要頂点除去部５２０に出力する。局所線形整列部５１０は、凸最適化問題立式部５１１と、凸最適化問題求解部５１２と、座標情報置換部５１３とを備える。 Fig. 13 is a diagram showing an example of the configuration of the local linear alignment unit 510 in Fig. 12. As described above, the local linear alignment unit 510 outputs G', which is obtained by locally linearly aligning the graph G using the threshold value λ, to the unnecessary vertex removal unit 520. The local linear alignment unit 510 includes a convex optimization problem formulation unit 511, a convex optimization problem solution unit 512, and a coordinate information replacement unit 513.

最初に凸最適化問題立式部５１１について説明する。図１４は、隣接辺のベクトルを並べる操作を説明するための図である。図１４に示される行列Ｌ_Ｖは、ｖから隣接する頂点へのベクトル並べるための行列である。図１４に示される行列Ｌ_Ｖの要素について、「…」の箇所は全て０である。また、ｖの座標がＰのｋ行目としたとき、図１４に示されるＬ_Ｖにおける「－１」がｋ列目なっている。Ｌ_ＶＰにより、ｖを始点としてｋ－１行目の座標を終点とするベクトルと、ｖを始点としてｋ＋１行目の座標を終点とするベクトルが得られる。このＬ_Ｖを作用させる操作は線形写像である。 First, the convex optimization problem formulation unit 511 will be described. FIG. 14 is a diagram for explaining the operation of arranging vectors of adjacent sides. The matrix L _V shown in FIG. 14 is a matrix for arranging vectors from v to adjacent vertices. In the elements of the matrix L _V shown in FIG. 14, all the places marked with "..." are 0. Also, when the coordinate of v is the kth row of P, "-1" in L _V shown in FIG. 14 is the kth column. By L _V P, a vector with v as the starting point and the coordinate of the k-1th row as the end point, and a vector with v as the starting point and the coordinate of the k+1th row as the end point are obtained. The operation of applying this L _V is a linear mapping.

元の入力グラフの形を忠実に再現しながらも曲折回数が少なれば、グラフの局所線形整列化に成功したと言える。ここでは忠実再現の尺度をＬ１ノルムとし、辺の曲折回数の正則化を核型ノルム関数として、凸最適化問題立式部５１１は、下記（３）のとおり最適化問題を立式する。ここで核型ノルム関数とは入力行列の特異値の和を計算する関数である。

If the number of turns is small while faithfully reproducing the shape of the original input graph, it can be said that local linear alignment of the graph has been successful. Here, the measure of faithful reproduction is the L1 norm, and the regularization of the number of turns of an edge is the kernel norm function, and the convex optimization problem formulation unit 511 formulates the optimization problem as shown in (3) below. Here, the kernel norm function is a function that calculates the sum of the singular values of the input matrix.

上記（３）の最適化問題を解くために、Primal-Dual Splitting（L. Condat, “A primal-dual splitting method for convex optimization involving lipschitzian, proximable and linear composite terms,” Journal of Optimization Theory and Applications, 2013.参照）を用いる。凸最適化問題求解部５１２が実行する具体的な手順を図１５に示す。図１５は、局所線形整列問題の解法を示すアルゴリズムを示す図である。なお、図１５の４行目のｐｒｏｘ_τｆにおけるｆは、下記（４）である。 In order to solve the optimization problem of (3) above, Primal-Dual Splitting (see L. Condat, "A primal-dual splitting method for convex optimization involving lipschitzian, proximable and linear composite terms," Journal of Optimization Theory and Applications, 2013.) is used. FIG. 15 shows a specific procedure executed by the convex optimization problem solving unit 512. FIG. 15 is a diagram showing an algorithm showing a method for solving the local linear alignment problem. Note that f in prox _τf on the fourth line of FIG. 15 is the following (4).

図１５に示されるアルゴリズムでは、３行目から１１行目までのｎ＝１からｒまでのｒ回のループの中に、上述したＶ~に属する元の全てに対して、６行目から１０行目までループが行われることが示されている。 In the algorithm shown in Figure 15, in the r loops from n = 1 to r from lines 3 to 11, it is shown that the loops from lines 6 to 10 are performed for all elements belonging to V~ mentioned above.

したがって、６行目から１０行目までは、ｒ×（Ｖ~に属する元の総数）回実行される。その６行目から１０行目のうちの８行目で計算される下記（５）は、核型ノルムの近接写像である。Therefore, lines 6 to 10 are executed r × (total number of elements belonging to V~) times. The following (5), calculated in line 8 of lines 6 to 10, is the proximity map of the kernel type norm.

上記（５）は、下記（６）の行列の閾値λによるＳＶＴである。

The above (5) is an SVT using the threshold value λ of the matrix of the following (6).

この８行目の処理は、図１５に示されるアルゴリズムの計算時間のうちの約５３．８％を占める。そこで、上述した計算装置１０Ｅ、１０Ｆを用いて上記ＳＶＴを計算することで、従来と比較して、グラフ単純化処理を高速に実行することができる。The processing on line 8 takes up approximately 53.8% of the calculation time of the algorithm shown in Figure 15. Therefore, by calculating the SVT using the above-mentioned calculation devices 10E and 10F, the graph simplification process can be executed faster than in the past.

座標情報置換部５１３は、凸最適化問題求解部５１２により局所線形整列された頂点座標に座標情報を置換して、局所線形整列させたＧ’を出力する。The coordinate information replacement unit 513 replaces the coordinate information with the vertex coordinates locally linearly aligned by the convex optimization problem solving unit 512, and outputs the locally linearly aligned G'.

本実施形態は、グラフの単純化だけではなく、ＳＶＴを行う全ての処理に適用可能である。例えば、画像偽色除去は、グラフの単純化と同様に、多数の小型行列を正則化する問題に分類される。This embodiment is applicable not only to graph simplification but also to all processes that perform SVT. For example, image false color removal, like graph simplification, is classified as a problem of regularizing a large number of small matrices.

以上説明したように、本実施形態によれば、核型ノルムを特異値を用いずに混合ノルムで表現することで、ＳＶＤが不要なＳＶＴ計算を実現することで、計算量を削減可能となる。さらに、アルゴリズムを容易にデータ並列化でき、多数の行列について同時処理が可能である。アルゴリズムをデータ並列化できれば、パソコンに搭載されるＣＰＵの多くが採用しているSingle Instrucion Multiple Data(ＳＩＭＤ)等のデータ並列アーキテクチャを用いる実装で高速化できる。As described above, according to this embodiment, the nucleus norm is expressed by a mixed norm without using singular values, thereby realizing SVT calculations that do not require SVD, thereby reducing the amount of calculations. Furthermore, the algorithm can be easily data-parallelized, allowing simultaneous processing of multiple matrices. If the algorithm can be data-parallelized, it can be speeded up by implementing it using a data-parallel architecture such as Single Instruction Multiple Data (SIMD), which is used in many CPUs installed in personal computers.

以上説明した計算装置１０Ａ～１０Ｆにおいて、加算部や内積部などの各種計算を行う構成が計算結果を一時的に記憶するメモリなどの記憶装置を設け、この記憶装置に計算結果を一時的に記憶してもよい。In the calculation devices 10A to 10F described above, the configurations that perform various calculations, such as the addition unit and the inner product unit, may be provided with a storage device such as a memory that temporarily stores the calculation results, and the calculation results may be temporarily stored in this storage device.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although an embodiment of the present invention has been described in detail above with reference to the drawings, the specific configuration is not limited to this embodiment and also includes designs that do not deviate from the gist of the present invention.

本発明は、平方根の計算を行う計算装置に適用可能である。 The present invention is applicable to a computing device that performs square root calculations.

１０、１０Ａ、１０Ｂ、１０Ｃ、１０Ｄ、１０Ｅ、１０Ｆ、２０、４０…計算装置、１１、２１、３１、４１…高速逆数平方根計算部、１２、２２、３２、４２、５２…減算部、１３、２３、３３、４３、５３、６３…加算部、１４、２４、３４、４４、５４、６４…除算部、１５、２５、３５、４５、５５…平方根計算部、１６、２６、３６、４６、５６…内積部、１７、２７、３７、４７、５７、６７、７７、８７、９７、１０７、１１７…乗算部、１８…絶対値計算部、１９…行列式計算部、１１０、１２０…分解部、２００、２０１…係数算出装置、３００、３０１…行列変換装置、５００…グラフ単純化装置、５１０…局所線形整列部、５１１…凸最適化問題立式部、５１２…凸最適化問題求解部、５１３…座標情報置換部、５２０…不要頂点除去部10, 10A, 10B, 10C, 10D, 10E, 10F, 20, 40... Calculation device, 11, 21, 31, 41... High speed reciprocal square root calculation section, 12, 22, 32, 42, 52... Subtraction section, 13, 23, 33, 43, 53, 63... Addition section, 14, 24, 34, 44, 54, 64... Division section, 15, 25, 35, 45, 55... Square root calculation section, 16, 26, 36, 46, 56... Inner product part, 17, 27 , 37, 47, 57, 67, 77, 87, 97, 107, 117 ... multiplication unit, 18 ... absolute value calculation unit, 19 ... determinant calculation unit, 110, 120 ... decomposition unit, 200, 201 ... coefficient calculation device , 300, 301 ... matrix conversion device, 500 ... graph simplification device, 510 ... local linear alignment unit, 511 ... convex optimization problem formulation unit, 512 ... convex optimization problem solving unit, 513 ... coordinate information replacement unit, 520 …Unnecessary vertex removal section

Claims

a first reciprocal square root step in which the computer calculates the reciprocal square root of a positive real number A by a fast reciprocal square root method;
a second reciprocal square root step in which the computer calculates the reciprocal square root of the positive real number B using a fast reciprocal square root method;
A subtraction step in which the computer subtracts B from A;
a first multiplication step in which the computer multiplies a result of the first reciprocal square root step by a result of the second reciprocal square root step;
a second multiplication step in which the computer multiplies a result of the first multiplication step by a result of the subtraction step;
an addition step in which the computer adds a result of the calculation in the first reciprocal square root step and a result of the calculation in the second reciprocal square root step;
a division step in which the computer divides the result of the second multiplication step by the result of the addition step;
A calculation method that includes:

a first addition step in which a computer adds a positive real number X and a positive real number e smaller than the real number X;
A subtraction step in which the computer subtracts e from X;
a first multiplication step in which the computer doubles e;
a first reciprocal square root step in which the computer calculates a reciprocal square root of the calculation result in the first addition step by using a fast reciprocal square root method;
a second reciprocal square root step in which the computer calculates a reciprocal square root of the calculation result in the subtraction step by a fast reciprocal square root method;
a second addition step in which the computer adds a result of the calculation in the first reciprocal square root step and a result of the calculation in the second reciprocal square root step;
a second multiplication step in which the computer multiplies a result of the first reciprocal square root step by a result of the second reciprocal square root step;
a third multiplication step in which the computer multiplies a result of the second multiplication step by a result of the subtraction step;
a division step in which the computer divides a result of the third multiplication step by a result of the second addition step;
A calculation method that includes:

a first inner product step in which the computer calculates an inner product between first column vectors among first column vectors and second column vectors that configure an M×2 matrix;
a second dot product step in which the computer calculates a dot product between the second column vectors;
a third dot product step in which the computer calculates an dot product of the first column vector and the second column vector;
an addition step in which a computer adds a calculation result in the first dot product step and a calculation result in the second dot product step;
a first multiplication step in which the computer multiplies a result of the first inner product step by a result of the second inner product step;
a second multiplication step in which the computer multiplies the inner products calculated in the third inner product step;
a subtraction step in which the computer subtracts the calculation result in the second multiplication step from the calculation result in the first multiplication step;
a square root calculation step in which the computer calculates the square root of the calculation result in the subtraction step;
a third multiplication step in which the computer doubles the result of the square root calculation step;
Equipped with
A method for calculating the inverse of the sum of two singular values of an M×2 matrix, the inverse of the difference between two singular values, and one singular value by the method according to claim 2, where the calculation result in the addition step is X and the calculation result in the third multiplication step is e.

A calculation method comprising the steps of executing each step of claim 3 , and further comprising a step of a computer calculating a singular value threshold process of an M×2 matrix using the inverse of the sum of two singular values and the inverse of the difference of the two singular values calculated by the calculation method of claim 3.

An alignment step in which a computer locally linearly aligns vertices of a graph in an M-dimensional space;
a removing step in which the computer removes unnecessary edges and vertices from the vertices of the graph linearly arranged in the sorting step, based on angles between edges connecting the vertices;
Equipped with
The method according to claim 4, wherein in the alignment step, singular value threshold processing is performed using a kernicrotype norm function that calculates the sum of singular values.

A program for causing a computer to execute the calculation method according to any one of claims 1 to 5.

a first reciprocal square root unit that calculates a reciprocal square root of a positive real number A by a fast reciprocal square root method;
a second reciprocal square root unit that calculates the reciprocal square root of a positive real number B by a fast reciprocal square root method;
A subtraction unit that subtracts B from A;
a first multiplication unit that multiplies a calculation result in the first reciprocal square root unit by a calculation result in the second reciprocal square root unit;
a second multiplication unit that multiplies a result of the calculation in the first multiplication unit by a result of the calculation in the subtraction unit;
an adder that adds a calculation result of the first reciprocal square root unit and a calculation result of the second reciprocal square root unit;
a division unit that divides a calculation result in the second multiplication unit by a calculation result in the addition unit;
A computing device comprising: