JP4850754B2

JP4850754B2 - Automatic adjustment system for control parameters

Info

Publication number: JP4850754B2
Application number: JP2007050576A
Authority: JP
Inventors: 俊介手塚
Original assignee: Fuji Jukogyo KK
Current assignee: Subaru Corp
Priority date: 2007-02-28
Filing date: 2007-02-28
Publication date: 2012-01-11
Anticipated expiration: 2027-02-28
Also published as: JP2008217155A

Description

本発明は、移動体の制御に関わる制御パラメータを最適値に適合させる制御パラメータの自動適合システムに関する。 The present invention relates to a control parameter automatic adaptation system for adapting a control parameter related to control of a moving body to an optimum value.

従来、自動車等の移動体の制御パラメータの最適値を自動的に獲得（自動調整）するため、シミュレーションによって得た評価値に基づいて最適なパラメータを学習的に獲得する技術が知られている。 2. Description of the Related Art Conventionally, in order to automatically obtain (automatically adjust) an optimum value of a control parameter of a moving body such as an automobile, a technique for learning an optimum parameter based on an evaluation value obtained by simulation is known.

しかしながら、シミュレーションによって評価値を得る方法では、学習によって獲得したパラメータがシミュレーションモデルの近似誤差が原因で、実機に適用した際に有効なパラメータとならない問題がある。 However, in the method of obtaining an evaluation value by simulation, there is a problem that a parameter acquired by learning is not an effective parameter when applied to an actual machine due to an approximation error of the simulation model.

このため、最近では、学習ループに実機を導入し、評価値を実機から直接得る技術が試みられている。例えば、特許文献１には、シミュレーションモデルと入れ替える形でエンジン実機を用いて学習を行い、目標とする評価条件に適合する制御パラメータを獲得する技術が提案されている。 For this reason, recently, an attempt has been made to introduce a real machine into a learning loop and obtain an evaluation value directly from the real machine. For example, Patent Document 1 proposes a technique for learning using an actual engine in a form that replaces a simulation model and acquiring a control parameter that matches a target evaluation condition.

また、特許文献２には、複数の目的関数の評価値を学習的に向上させる手法として、エンジンやモータの実機から予めサンプリングした評価値集合に基づいて計算機内で学習演算を行う技術が開示されている。
特開２００４−１１６３５１号公報特開２００５−２８５０９０号公報 Patent Document 2 discloses a technique for performing a learning operation in a computer based on an evaluation value set sampled in advance from real machines of an engine or a motor as a method for learningly improving evaluation values of a plurality of objective functions. ing.
JP 2004-116351 A JP 2005-285090 A

しかしながら、実機では評価値を得るための時間がシミュレーションに比べ長くかかるため、学習ロジックを改善することなく、評価値の取得方法を単にシミュレーションから実機に変更しただけでは、現実的な時間で最適なパラメータを得ることは困難である。 However, since it takes a long time to obtain an evaluation value in an actual machine compared to simulation, simply changing the acquisition method of the evaluation value from simulation to actual machine without improving the learning logic is optimal in realistic time. It is difficult to obtain parameters.

また、実機には、シミュレーションと異なり、計測ノイズや制御結果のバラツキ、車両やエンジン等の実機の特性のドリフト等の問題があり、評価値を確定的なものとして扱うことが困難であり、評価値の取り扱いを改良せず、単に学習進度の高速化だけでは、最適なパラメータを得ることができない。 Also, unlike simulations, actual machines have problems such as measurement noise, variation in control results, drift in characteristics of actual machines such as vehicles and engines, etc., and it is difficult to treat evaluation values as deterministic. Optimal parameters cannot be obtained by simply improving the learning progress without improving the value handling.

この点、特許文献１の技術は、遺伝的アルゴリズムを用いて個体の段階的な評価を行うことで上述の問題に対応しているものの、個体同士の優劣を評価する際の適合度に関して、複数の評価値の線形和を用いているため、各評価値間のバランスを決める重み係数を設計者がトライアンドエラーで決めなければならず、効率の点で改善の余地がある。 In this regard, although the technique of Patent Document 1 addresses the above-described problem by performing a step-by-step evaluation of individuals using a genetic algorithm, there are a plurality of degrees of fitness when evaluating superiority or inferiority between individuals. Since the linear sum of the evaluation values is used, the designer must determine the weighting coefficient for determining the balance between the evaluation values by trial and error, and there is room for improvement in terms of efficiency.

また、特許文献２では、先ず実機から代表となるサンプル点を取得し、後の学習計算をこのサンプル点から算出される推定評価値で行うようにしている。このため、学習時に直接実機から評価値を取得しない点では学習時間を短縮できるものの、事前にサンプルの取得が必要であり、評価値のバラツキなどの問題に対しては、サンプル点を取得した時点での値を確定的に扱うことで無視してしまっているため、問題の解決には不十分である。 In Patent Document 2, first, representative sample points are acquired from an actual machine, and later learning calculation is performed using estimated evaluation values calculated from the sample points. For this reason, although learning time can be shortened in that the evaluation value is not obtained directly from the actual machine during learning, it is necessary to obtain a sample in advance. Since the value at is neglected by treating it deterministically, it is insufficient to solve the problem.

更に、特許文献２では、推定評価値が真の評価値と異なる場合も考えられ、シミュレーションによる評価値の取得の問題点であるモデル近似誤差と同様に、得られた制御パラメータが実機適用時に有効でないケースが排除できない。 Further, in Patent Document 2, it is conceivable that the estimated evaluation value is different from the true evaluation value, and the obtained control parameter is effective when applied to an actual machine as in the case of the model approximation error which is a problem in obtaining the evaluation value by simulation. The case which is not cannot be excluded.

本発明は上記事情に鑑みてなされたもので、実機を用いてシミュレーションモデルの近似誤差を排除しつつ、実機の使用による学習時間の増加や特性のバラツキ・ドリフトによる影響を抑制し、良好な最適パラメータを高速且つ効率的に得ることのできる制御パラメータの自動適合システムを提供することを目的としている。 The present invention has been made in view of the above circumstances, and while eliminating the approximation error of the simulation model using a real machine, it suppresses the influence of increase in learning time due to the use of the real machine and fluctuations in characteristics and drift, and is a good optimal An object of the present invention is to provide a control parameter automatic adaptation system capable of obtaining parameters quickly and efficiently.

上記目的を達成するため、本発明による制御パラメータの自動適合システムは、移動体の制御に関わる制御パラメータを最適値に適合させる制御パラメータの自動適合システムであって、上記移動体としての実機を学習ループに導入し、該実機の自動運転によって評価値を自動的に取得して複数の評価軸で独立して評価し、最適解となる制御パラメータを多目的遺伝的アルゴリズムによって獲得する学習演算部を備え、上記学習演算部は、上記制御パラメータを遺伝子として含む個体の適合度を算出する適合度算出部と、複数の上記個体からなる個体群の中から一部の個体を親個体として選択する際に、上記個体の適合度が同等の場合、他の個体との粗密の度合いに応じた指標を用いて解の優劣を判定する親選択部と、上記選択された親個体から遺伝的操作により子個体を生成させる世代交代部と、上記生成された子個体を、選択的に複数回評価する子個体評価部と、次世代に保存する個体を選択する生存選択部とを有し、上記子個体評価部は、生成された子個体が初回の評価でパレート最適解と判定された場合にのみ、該子個体の再評価を行うことを特徴とする。 In order to achieve the above object, a control parameter automatic adaptation system according to the present invention is an automatic control parameter adaptation system that adapts a control parameter related to control of a moving body to an optimum value, and learns a real machine as the mobile body. Introduced into a loop, equipped with a learning operation unit that automatically obtains evaluation values by automatic operation of the actual machine, independently evaluates them with multiple evaluation axes, and obtains control parameters that are optimal solutions by a multi-purpose genetic algorithm The learning operation unit calculates a fitness of an individual including the control parameter as a gene, and when selecting a part of individuals from a plurality of individuals as a parent individual. A parent selection unit for determining superiority or inferiority of solutions using an index according to the degree of density with other individuals when the fitness of the individuals is equivalent; and the selected parent individuals A generation alternation unit that generates a child individual by genetic manipulation, a child individual evaluation unit that selectively evaluates the generated child individual multiple times, and a survival selection unit that selects an individual to be stored in the next generation Yes, and said child individual evaluation unit only if the generated child individual is determined to Pareto optimal solutions in the evaluation for the first time, and wherein the row Ukoto the re-evaluation of the child individuals.

本発明によれば、実機を用いてシミュレーションモデルの近似誤差を排除しつつ、実機の使用による学習時間の増加や特性のバラツキ・ドリフトによる影響を抑制することができ、良好な最適パラメータを高速且つ効率的に得ることができる。 According to the present invention, while eliminating the approximation error of the simulation model using the real machine, it is possible to suppress the influence of the increase in the learning time and the characteristic variation / drift due to the use of the real machine. Can be obtained efficiently.

以下、図面を参照して本発明の実施の形態を説明する。図１〜図１１は本発明の実施の一形態に係り、図１は自動適合システムの全体構成図、図２は速度パターンを示す説明図、図３は自動運転制御系のブロック図、図４は学習演算部のブロック図、図５は解集団の例を示す説明図、図６はニッチサイズの説明図、図７はニッチカウントの算出例を示す説明図、図８は最適化処理のフローチャート、図９は子個体評価処理のフローチャート、図１０は子個体再評価の結果例を示す説明図、図１１は最適化結果を示す説明図である。 Embodiments of the present invention will be described below with reference to the drawings. 1 to 11 relate to an embodiment of the present invention, FIG. 1 is an overall configuration diagram of an automatic adaptation system, FIG. 2 is an explanatory diagram showing a speed pattern, FIG. 3 is a block diagram of an automatic operation control system, and FIG. Is a block diagram of a learning operation unit, FIG. 5 is an explanatory diagram showing an example of a solution group, FIG. 6 is an explanatory diagram of a niche size, FIG. 7 is an explanatory diagram showing an example of calculating a niche count, and FIG. 8 is a flowchart of an optimization process FIG. 9 is a flowchart of the child individual evaluation process, FIG. 10 is an explanatory diagram showing an example of the result of the child individual reevaluation, and FIG. 11 is an explanatory diagram showing the optimization result.

本発明による制御パラメータの自動適合システムは、自動車等の人間が操縦する移動体の制御に係るパラメータを最適化するものであり、図１に示す自動適合システム１として構成されている。この自動適合システム１は、シミュレーションによることなく実機を直接運転することで目的の評価値を獲得し、学習によって最適な制御パラメータに自動調整する。 The automatic adjustment system for control parameters according to the present invention optimizes parameters related to the control of a moving body operated by a person such as an automobile, and is configured as an automatic adjustment system 1 shown in FIG. The automatic adaptation system 1 obtains a target evaluation value by directly operating an actual machine without using a simulation, and automatically adjusts the optimum control parameter by learning.

詳細には、自動適合システム１は、制御パラメータを最適化する対象の実機１０（例えば、自動車のエンジンやモータ等のパワーユニット系、トランスミッション系、ブレーキ系、或いは実車両等）、この実機１０を制御する制御器として実機１０を自動的に運転する自動運転装置２０、実機１０の自動運転状態下における各種データを測定するセンサ類等からなる測定器３０、測定器３０を介して得られたデータから目的とする評価値を算出する評価値演算部４０、算出された評価値に基づいて制御パラメータをより良い値に更新して最適化する学習演算部５０、更新されたパラメータを実機の制御器（自動運転装置２０）に与えるパラメータ更新部６０を備えて構成されている。 Specifically, the automatic adaptation system 1 controls the actual machine 10 (for example, a power unit system such as an automobile engine or motor, a transmission system, a brake system, or an actual vehicle) whose control parameters are to be optimized. An automatic driving device 20 that automatically operates the actual machine 10 as a controller that performs the measurement, a measuring instrument 30 that includes various sensors that measure various data under the automatic operating state of the actual machine 10, and data obtained through the measuring instrument 30 An evaluation value calculation unit 40 that calculates a target evaluation value, a learning calculation unit 50 that updates and optimizes a control parameter to a better value based on the calculated evaluation value, and an updated parameter ( A parameter update unit 60 is provided to the automatic driving device 20).

学習演算部５０による制御パラメータの最適化は、多目的最適化の手法を用いて行い、目的とする評価項目を各軸とする評価空間上で制御パラメータの優劣を判定し、この優劣判定に基づき制御パラメータを遺伝的アルゴリズムによって更新していく。そして、この制御パラメータを実機に与えて自動運転し、運転結果を評価空間上で評価することを繰り返して学習を行う。 The optimization of the control parameters by the learning calculation unit 50 is performed using a multi-objective optimization method, and the superiority or inferiority of the control parameters is determined on the evaluation space having each target evaluation item as an axis, and control is performed based on the superiority / inferiority determination. The parameters are updated using a genetic algorithm. Then, this control parameter is given to the actual machine for automatic operation, and learning is repeated by repeatedly evaluating the operation result in the evaluation space.

以下では、図２に示すように、自動車を指定の速度パターンに沿って走行させる場合の速度誤差のＲＭＳ(Root Mean Square)[km/h]と燃費率[km/L]との二つを多目的最適化の評価関数として選択し、設計変数としてのアクセル並びにブレーキの操作ゲインを最適化する例について説明する。 In the following, as shown in Fig. 2, there are two speed error RMS (Root Mean Square) [km / h] and fuel consumption rate [km / L] when the car is driven along a specified speed pattern. An example of selecting an evaluation function for multi-objective optimization and optimizing the accelerator and brake operation gain as design variables will be described.

この場合、加速側では速度を抑え燃料消費量を少なくし、減速側では速度を高めに保ち、走行距離をかせぐ等の場合に燃費率の向上が予想され、一方で速度パターンからの速度誤差は増大してしまうというトレードオフ関係が想定されるが、多目的遺伝的アルゴリズムを用いることにより、トレードオフ関係のバランスを良好に保つ設計変数セットを得ることができる。 In this case, the fuel consumption rate is expected to increase when the speed is reduced on the acceleration side and the fuel consumption is reduced, while the speed is kept high on the deceleration side and the mileage is increased, while the speed error from the speed pattern is Although a trade-off relationship of increasing is assumed, by using a multi-purpose genetic algorithm, a design variable set that maintains a good balance of the trade-off relationship can be obtained.

指定速度パターンへの追従運転を制御する自動運転装置２０は、具体的には、図３に示すように、シャーシダイナモメータ（ＣＤＭ）上の自動車１１を実機１０として、ＦＡ用コンピュータ等による速度制御器２１と、この速度制御器２１からの出力を受けて自動車１１のアクセル並びにブレーキを駆動する駆動器２２を備えて構成されている。 Specifically, as shown in FIG. 3, the automatic driving device 20 that controls the follow-up operation to the specified speed pattern uses the automobile 11 on the chassis dynamometer (CDM) as an actual machine 10 and performs speed control by an FA computer or the like. And a driver 22 that receives the output from the speed controller 21 and drives the accelerator and brake of the automobile 11.

駆動器２２を介した自動車１１の速度制御は、アクセルペダルの踏込位置を検出するアクセルポジションセンサ（ＡＰＳ）の出力電圧を可変して自動車１１の電子制御スロットル装置に出力する電気的なアクセル操作と、自動車１１のブレーキペダル近傍に取り付けたアクチュエータ（例えば、電動スライダ機構）によってブレーキペダルを踏込む機械的なブレーキ操作とによって行われる。 The speed control of the automobile 11 via the driver 22 includes an electric accelerator operation in which the output voltage of an accelerator position sensor (APS) that detects the depression position of the accelerator pedal is varied and output to the electronically controlled throttle device of the automobile 11. This is performed by a mechanical brake operation in which the brake pedal is depressed by an actuator (for example, an electric slider mechanism) attached in the vicinity of the brake pedal of the automobile 11.

速度制御器２１は、予め設定したドライバモデルによる速度制御則に従ってアクセル並びにブレーキの操作量を算出し、これらの操作量をＡＰＳ電圧並びにブレーキ踏量に変換する演算部２１ａと、アクセル並びにブレーキの操作ゲインを変更して車両特性を可変とするゲイン変更部２１ｂとを備えている。本形態においては、速度制御則（ドライバモデル）は固定であるため、アクセル並びにブレーキの操作ゲインを変更することは、ドライバモデル側から見ると操作意図に対する車両側の反応が緩慢或いは過敏の状態に変更されることに相当する。 The speed controller 21 calculates accelerator and brake operation amounts in accordance with a speed control law based on a preset driver model, converts these operation amounts into APS voltage and brake pedal stroke, and operates the accelerator and brake. And a gain changing unit 21b that changes the gain and makes the vehicle characteristics variable. In this embodiment, since the speed control law (driver model) is fixed, changing the operation gain of the accelerator and the brake may cause the vehicle side response to the operation intention to be slow or sensitive when viewed from the driver model side. It corresponds to being changed.

自動運転装置２０によって自動運転される自動車１１の運転データは、測定器３０を介して評価値演算部４０に送られる。評価値演算部４０は、測定器３０を介して取得した運転データから燃料消費率と走行パターンに対する速度誤差のＲＭＳとを評価値として算出し、学習演算部５０に送る。 Driving data of the automobile 11 that is automatically driven by the automatic driving device 20 is sent to the evaluation value calculation unit 40 via the measuring device 30. The evaluation value calculation unit 40 calculates the fuel consumption rate and the RMS of the speed error with respect to the travel pattern as the evaluation value from the operation data acquired via the measuring device 30, and sends the evaluation value to the learning calculation unit 50.

尚、評価値演算部４０、学習演算部５０、パラメータ更新部６０は、自動運転装置２０を構成するＦＡ用コンピュータの一部の機能として構成しても良く、また、自動運転装置２０とは別の単体或いは複数のコンピュータ装置によって構成しても良い。 The evaluation value calculation unit 40, the learning calculation unit 50, and the parameter update unit 60 may be configured as a part of the functions of the FA computer that configures the automatic driving device 20. It may be configured by a single computer or a plurality of computer devices.

学習演算部５０は、アクセル並びにブレーキのそれぞれのゲイン（スカラー値）を多目的遺伝的アルゴリズムの遺伝子として最適化を行い、評価値のバランスが最も良い設計変数を適応的に学習・獲得する。尚、本形態では、遺伝子は値域０〜１に正規化して扱うが、この正規化した値域はゲイン０．３〜１．５に相当する。ゲイン１．０は、自動運転装置２０の設計値である。 The learning calculation unit 50 optimizes the accelerator and brake gains (scalar values) as genes of the multipurpose genetic algorithm, and adaptively learns and acquires design variables with the best balance of evaluation values. In this embodiment, the gene is handled by normalizing to a range of 0 to 1, and this normalized range corresponds to a gain of 0.3 to 1.5. The gain 1.0 is a design value of the automatic driving device 20.

本形態における最適化は、多目的最適化手法の一種であるSPEA2(Strength Pareto Evolutionary Algorithm 2)の手法をベースとしているが、シミュレーションによらない実車両を用いての自動適合では、以下の（１）〜（３）に示すように、評価値の不確性、試行回数の制限、車両特性のドリフトに対する配慮が必要となる。 The optimization in this form is based on the SPEA2 (Strength Pareto Evolutionary Algorithm 2) method, which is a kind of multi-objective optimization method. However, the following (1) As shown in (3), it is necessary to consider the uncertainty of the evaluation value, the limitation on the number of trials, and the drift of vehicle characteristics.

尚、SPEA2については、例えば、E.Zitzler,M.Laumanns,L.Thiele:SPEA2:Improving the Performanceof the Strength Pareto Evolutionary Algorithm,Technical Report 103,ComputerEngineering and Communication Networks Lab(TIK),Swiss Federal Institute of Technology(ETH)Zurich(2001)等に詳述されている。 Regarding SPEA2, for example, E. Zitzler, M. Laumanns, L. Thiele: SPEA2: Improving the Performance of the Strength Pareto Evolutionary Algorithm, Technical Report 103, Computer Engineering and Communication Networks Lab (TIK), Swiss Federal Institute of Technology ( ETH) Zurich (2001) and the like.

（１）評価値の不確定性
同じ設計変数（定数）を与えても、車両挙動はある分布を持ってばらつく。そのため、本来であれば複数回の評価を行ってその分布や平均値を得ることが望ましいが、以下の（２）に示す試行回数の制限の問題もあり、工夫が必要である。 (1) Uncertainty of evaluation values Even if the same design variables (constants) are given, vehicle behavior varies with a certain distribution. For this reason, it is desirable to perform evaluation multiple times to obtain the distribution and the average value. However, there is a problem of limitation of the number of trials shown in the following (2), which requires some ingenuity.

（２）試行回数の制限
次に問題となるのは、評価回数の制限である。特に、評価値を得るためにある程度の時間長の速度パターンを走行する必要がある一方、現実的な時間内に有効な解を得るためには、評価回数を削減する必要がある。そのため、多目的遺伝的アルゴリズムの個体数を削減すると、少ない個体数ではトレードオフ境界面の形を十分につかむことができない可能性がある。 (2) Limiting the number of trials The next problem is the limitation on the number of evaluations. In particular, in order to obtain an evaluation value, it is necessary to travel a speed pattern of a certain length of time, while in order to obtain an effective solution within a realistic time, it is necessary to reduce the number of evaluations. Therefore, if the number of individuals in the multipurpose genetic algorithm is reduced, there is a possibility that the shape of the trade-off boundary surface cannot be grasped sufficiently with a small number of individuals.

（３）車両特性のドリフト
ＣＤＭ上で暖気を行った車両であっても、最適化を行うにつれ、水温、油温、タイヤの空気圧、その他の特性の変化により、車両特性は徐々に変化していく。この特性ドリフトを考慮しない場合、最適化初期に評価された評価値が最適化終盤には信頼性を失っている虞がある。 (3) Drift of vehicle characteristics Even if a vehicle is warmed up on the CDM, the vehicle characteristics will gradually change due to changes in water temperature, oil temperature, tire pressure, and other characteristics as optimization is performed. Go. If this characteristic drift is not taken into account, there is a possibility that the evaluation value evaluated at the initial stage of optimization loses reliability at the end of optimization.

以上の（１）〜（３）の課題は、以下に説明するように、それぞれ、「子個体の再評価」、「シェアリング評価」、「子個体の優先評価」の手法を導入することによって解決することができる。このため、学習演算部５０は、図４に示すように、初期化部５１、適合度算出部５２、親選択部５３、世代交代部５４、子個体評価部５５、生存選択部５６、寿命判定部５７、終了判定部５８を備えて構成され、親選択部５３、子個体評価部５５、生存選択部５６で、それぞれ、「シェアリング評価」、「子個体の再評価」、「子個体の優先評価」を実施している。 The problems (1) to (3) described above are achieved by introducing the methods of “re-evaluation of child individuals”, “sharing evaluation”, and “priority evaluation of child individuals”, respectively, as described below. Can be solved. For this reason, as shown in FIG. 4, the learning calculation unit 50 includes an initialization unit 51, a fitness calculation unit 52, a parent selection unit 53, a generation change unit 54, a child individual evaluation unit 55, a survival selection unit 56, a life determination. Unit 57 and end determination unit 58. The parent selection unit 53, the child individual evaluation unit 55, and the survival selection unit 56 respectively include "sharing evaluation", "re-evaluation of child individual", and "child individual evaluation". Priority evaluation ”is implemented.

初期化部５１は、最適化の対象となる制御パラメータを遺伝子として、これらの遺伝子を持つ個体をランダム或いは所定の規則に従って均一に発生させる。本形態においては、アクセルゲイン並びにブレーキゲインを遺伝子として、これらの遺伝子を持つ個体を初期個体として生成する。初期個体の数は、上述の試行回数の制限に対応するため、通常のシミュレーションによる個体数よりも少ない個体数、例えば個体数１００から１０個体へ削減し、１０個体のうち、９個体を設計変数の値域に格子状に配置し、残り１個体をランダム値として配置する。 The initialization unit 51 uses control parameters to be optimized as genes, and randomly generates individuals having these genes according to a random or predetermined rule. In this embodiment, an accelerator gain and a brake gain are used as genes, and an individual having these genes is generated as an initial individual. The number of initial individuals corresponds to the limitation on the number of trials described above, so the number of individuals is smaller than the number of individuals by normal simulation, for example, the number of individuals is reduced from 100 to 10 and 9 out of 10 individuals are designed variables. Are arranged in a grid pattern in the value range, and the remaining one individual is arranged as a random value.

適合度算出部５２は、各個体についてそれぞれ適合度を算出する。本実施形態で用いられるSPEA2では、１つの個体Ｉｋに対して複数の評価基準についての評価値ｆ１ｋ，ｆ２ｋ，…がそれぞれ個別に割り当てられ、これらの評価値に基づく個体間の支配関係（優劣関係）から各個体Ｉｋの適合度Ｑｋが算出される。 The fitness level calculation unit 52 calculates the fitness level for each individual. In SPEA2 used in this embodiment, evaluation values f1k, f2k,... For a plurality of evaluation criteria are individually assigned to one individual Ik, and the dominance relationship (dominance relationship) between individuals based on these evaluation values. ), The fitness Qk of each individual Ik is calculated.

具体的には、適合度算出部５２は、各個体の中から１つの個体Ｉｋを選択し、その個体Ｉｋの制御パラメータＰ１ｋ〜Ｐｎｋを自動運転装置２０に送って評価運転を行い、測定器３０から出力される測定結果や自動運転装置２０から出力される制御記録等のレスポンスＲｋに基づいて、各個体Ｉｋ毎に複数の評価基準についての評価値ｆ１ｋ，ｆ２ｋ，…を算出する。 Specifically, the fitness calculation unit 52 selects one individual Ik from each individual, sends the control parameters P1k to Pnk of the individual Ik to the automatic driving device 20, performs the evaluation operation, and measures the measuring device 30. Evaluation values f1k, f2k,... For a plurality of evaluation criteria are calculated for each individual Ik based on the response Rk such as the control result output from the control result output from the automatic driving device 20.

本形態では、評価基準が燃費率と速度誤差（ＲＭＳ）であり、評価値ｆ１を燃費率についての評価値、評価値ｆ２を速度誤差についての評価値とし、評価値ｆ１を横軸、評価値ｆ２を縦軸とする評価平面上に各個体Ｉｋをプロットすると、例えば図５に示すような解集団が得られる。本形態では、評価値ｆ１が大きく評価値ｆ２が小さい程、優良な個体となる。 In this embodiment, the evaluation criteria are the fuel consumption rate and the speed error (RMS), the evaluation value f1 is the evaluation value for the fuel consumption rate, the evaluation value f2 is the evaluation value for the speed error, the evaluation value f1 is the horizontal axis, and the evaluation value When each individual Ik is plotted on the evaluation plane with f2 as the vertical axis, for example, a solution population as shown in FIG. 5 is obtained. In this embodiment, the larger the evaluation value f1 and the smaller the evaluation value f2, the better the individual.

適合度Ｑｋの算出は、先ず、全ての個体Ｉｋについて、その個体Ｉｋを支配している他の個体の数（より優良な個体の数；支配数）を求め、次に、各個体毎に、その個体を支配している他の個体の持つ支配数を全て足し合わせる。この支配数を足し合わせた値が各個体Ｉｋの適合度Ｑｋとなる。ある個体が、どの個体にも支配されていない場合、適合度は０となり、その個体はパレート最適解（以下、単に「パレート解」と記載）となる。図５の例では、丸印で示す個体がパレート解であり、これらのパレート解を結ぶ破線が、四角印で示す非パレート解（劣解）に対する境界（パレート境界）を形成している。 In calculating the fitness Qk, first, for all individuals Ik, the number of other individuals that dominate the individual Ik (the number of superior individuals; the number of dominant individuals) is obtained, and then, for each individual, Add all the control numbers of other individuals who control the individual. A value obtained by adding the control numbers is the fitness Qk of each individual Ik. When an individual is not controlled by any individual, the fitness is 0, and the individual is a Pareto optimal solution (hereinafter simply referred to as “Pareto solution”). In the example of FIG. 5, an individual indicated by a circle is a Pareto solution, and a broken line connecting these Pareto solutions forms a boundary (a Pareto boundary) for a non-Pareto solution (inferior solution) indicated by a square mark.

親選択部５３は、個体群の中から個体群の全体数より少ない所定数の個体（例えば、３つの個体）を親個体として選択する。この親個体の選択は、前世代に生成された子個体がパレート解であった場合には、その個体を現世代で親個体の１つとして選択し、また、前世代に生成された子個体がパレート解でなかった場合には、トーナメント方式等の選択手法で選択する。 The parent selection unit 53 selects a predetermined number of individuals (for example, three individuals) smaller than the total number of individual groups from the individual groups as parent individuals. In the selection of the parent individual, when the child individual generated in the previous generation is a Pareto solution, the individual is selected as one of the parent individuals in the current generation, and the child individual generated in the previous generation is selected. Is not a Pareto solution, it is selected by a selection method such as a tournament method.

この場合、適合度が同等の個体同士が親候補として選ばれた場合には、周辺に他の個体がどれだけ密集しているかの指標（密集度；ニッチカウント）を計算し、解集合の中でより密集度が低い個体を優先して親個体とするシェアリングを行う。すなわち、個体数が少ない場合（本形態では、初期個体として１０個体）には、複数の解が評価空間上で一箇所にまとまってしまい、パレート境界面の形状を判断することが困難になる場合がある。このような場合、一般的には、「端切り法」による解の分散ロジックが採用されるが、「端切り法」は、子個体も含めて全ての解がパレート解と判定された場合にのみ発動するアルゴリズムであり、学習初期の解には作用しない。 In this case, if individuals with similar fitness are selected as parent candidates, an index (density niche count) of how close other individuals are around is calculated and In the sharing, the individuals with lower density are preferentially used as parent individuals. That is, when the number of individuals is small (in this embodiment, 10 individuals as initial individuals), a plurality of solutions are collected in one place in the evaluation space, and it is difficult to determine the shape of the Pareto boundary surface There is. In such a case, generally the distributed logic of the solution by the “end-cut method” is adopted, but the “end-cut method” is used when all the solutions including the child individuals are determined as Pareto solutions. This algorithm is only activated and does not affect the initial learning solution.

そこで、本形態においては、遺伝的アルゴリズムのトーナメント方式親選択の際、SPEA2の適合度の指標のほかに、シェアリングによる評価手法を導入する。具体的には、適合度が同値の場合、以下の（１）式に示すシェアリング関数ｓ(d)に基づくニッチカウントを計算して適合度に作用させ、より密集度の低い個体（ニッチカウントが小さい個体）を親個体とする。
ｓ(d)＝１−（ｄ／ｄs）^α ｄ＜ｄs …（１）
＝０ｄ≧ｄs Therefore, in this embodiment, in the case of selecting the parent of the tournament method of the genetic algorithm, an evaluation method by sharing is introduced in addition to the SPEA2 fitness index. Specifically, when the fitness is the same value, a niche count based on the sharing function s (d) shown in the following formula (1) is calculated and applied to the fitness, and an individual with a lower density (niche count) Is the parent individual.
s (d) = 1- (d / ds) ^α d <ds (1)
= 0 d ≧ ds

ここで、図６に示すように、ｄは、目的関数空間における個体ｉと個体ｊとのユークリッド距離であり、ｄsは、類似した個体群の中でどれくらい近くの位置にあるとき評価を下げるかを見積もるための最大距離であり、ニッチサイズと称される。このニッチサイズｄsは、予め指定した値や、評価値或いは制御パラメータ値の分散に対する割合（例えば、制御パラメータの値域の３０％程度）で規定される。尚、αはべき乗の係数である。 Here, as shown in FIG. 6, d is the Euclidean distance between the individual i and the individual j in the objective function space, and ds is how close the position is in the similar individuals when the evaluation is lowered. Is the maximum distance for estimating, and is called the niche size. The niche size ds is defined by a value specified in advance, a ratio to the variance of the evaluation value or the control parameter value (for example, about 30% of the control parameter value range). Α is a power coefficient.

個体ｉのニッチカウントｎ_iは、以下の（２）式に示すように、ニッチサイズｄs内に存在する個体ｊのシェアリング関数ｓ_j(d)の総和で求めることができる。このニッチカウントｎ_iが大きくなる程、個体が密集していることになり、個体ｉの適合度が小さくなり、より広がりのあるパレート解を求めることができる。図７は、ニッチカウントの算出例を示し、各個体を表す丸印の上，中，下段に、それぞれ、その個体の支配数，適合度，ニッチカウントの数値が付記されている。
ｎ_i＝Σｓ_ｊ(d) …（２） The niche count n _i of the individual i can be obtained as the sum of the sharing functions s _j (d) of the individual j existing within the niche size ds, as shown in the following equation (2). The greater the niche count n _i is increased, will be are densely individuals, it becomes small fitness of the individual i, can be determined Pareto solutions with more expansive. FIG. 7 shows an example of calculation of the niche count, and the control number, fitness, and niche count values of the individual are added to the upper, middle, and lower stages of the circles representing the individual, respectively.
n _i = Σs _j (d) (2)

以上のシェアリング戦略により、解が密集しているエリアの親個体の選択度が下がり、結果的に子個体を分散させるよう作用させることができる。これにより、少ない個体数で多様性と解空間で適切に分布した解集合を得ることができ、しかも、このアルゴリズムは、後述する交叉の際の親選択にも作用する手法であるため、学習初期から進化戦略に作用し、早期に解を分散させることができる。 With the above sharing strategy, the selectivity of the parent individuals in the area where the solutions are dense can be lowered, and as a result, the child individuals can be dispersed. This makes it possible to obtain a solution set that is appropriately distributed in diversity and solution space with a small number of individuals, and this algorithm also works for parent selection at the time of crossover described later. It can act on the evolution strategy and disperse the solution at an early stage.

尚、ニッチカウントは、評価平面上と設計変数平面（パラメータ平面）上との双方で定義できるが、最適化後の制御パラメータに対する多様性の観点から、パラメータ平面での評価とすることが望ましい。 The niche count can be defined on both the evaluation plane and the design variable plane (parameter plane), but it is preferable to evaluate the parameter plane from the viewpoint of diversity with respect to the control parameters after optimization.

世代交代部５４は、親選択部５３で選択された親個体から遺伝的操作により子個体を生成させる。本形態においては、３つの親個体から交叉操作により、各世代で１つの子個体を生成する。これは、親選択の対象となるアーカイブ集団に常に最新の子個体を反映させるためである。 The generation change unit 54 generates a child individual from the parent individual selected by the parent selection unit 53 by genetic operation. In this embodiment, one child individual is generated in each generation by a crossover operation from three parent individuals. This is because the latest child individual is always reflected in the archive group that is the target of parent selection.

交叉操作としては、例えば、UNDX（Unimodal Normal Distribution Crossover：単峰性正規分布交叉）が用いられる。UNDXでは、３つの親個体の位置関係に基づいて定められる正規分布乱数に従って子個体を生成する。例えば最適化対象の制御パラメータが２つであるとして、パラメータ平面上に第１，第２の親個体を配置した場合、２つの親個体を結ぶ軸の周辺に子個体が正規分布に従って正規乱数を用いて生成される。第３の親個体は、２つの親個体を結ぶ軸に直交する軸方向の標準偏差の成分を決めるために補助的に用いられ、子個体が第１〜第３の親個体から遠くに離れた位置に生成されることがない。 As the crossover operation, for example, UNDX (Unimodal Normal Distribution Crossover) is used. In UNDX, a child individual is generated according to a normally distributed random number determined based on the positional relationship between three parent individuals. For example, assuming that there are two control parameters to be optimized, and if the first and second parent individuals are arranged on the parameter plane, the child individuals generate normal random numbers according to the normal distribution around the axis connecting the two parent individuals. Generated using. The third parent individual is used as an auxiliary to determine the component of the axial standard deviation orthogonal to the axis connecting the two parent individuals, and the child individual is far away from the first to third parent individuals. It is not generated at the location.

尚、UNDX（単峰性正規分布交叉）については、例えば、北野宏明編、「遺伝的アルゴリズム４」、産業図書株式会社、p.232-235等を参照のこと。 Regarding UNDX (unimodal normal distribution crossover), see, for example, edited by Hiroaki Kitano, “Genetic Algorithm 4”, Sangyo Tosho Co., Ltd., p.232-235.

子個体評価部５５は、生成された子個体の制御パラメータで実機評価を行う。この実機評価の結果、パレート解であると判定された場合には、その子個体の制御パラメータで少なくとももう一度実機において再評価を行う。これは、子個体が初回の判定時にパレート解と判定された場合にのみ再評価を行うことで、全ての個体を複数回評価する等して評価回数が増加することを抑制するためであり、実機による評価値取得に関わる時間を抑えることができる。 The child individual evaluation unit 55 performs actual machine evaluation using the generated control parameters of the child individual. If it is determined as a Pareto solution as a result of the actual machine evaluation, the actual machine is re-evaluated at least once with the control parameter of the child individual. This is to suppress an increase in the number of evaluations, such as evaluating all individuals multiple times, by re-evaluating only when the child individual is determined to be a Pareto solution at the time of the first determination, It is possible to reduce the time related to the evaluation value acquisition by the actual machine.

子個体の再評価では、複数の評価値の平均や分散を元に、再度、評価空間上で判定し、再びパレート解と判定されたときに初めてパレート解として次世代に生き残らせる。また、再評価時に非パレート解（劣解）と判定された子個体は、初回のパレート判定が破棄され、初回判定時に劣解と判定された解と同様に扱われる。 In the re-evaluation of the child individual, the determination is made again on the evaluation space based on the average and variance of the plurality of evaluation values, and the Pareto solution is first survived as the Pareto solution when it is determined again. In addition, a child individual determined to be a non-Pareto solution (inferior solution) at the time of re-evaluation is discarded in the first Pareto determination and treated in the same manner as a solution determined to be inferior at the time of initial determination.

すなわち、再評価による評価値を平均化した新たな評価値とすることで、実機特性のバラツキを排除して安定したパレート解を取得できる。また、再評価の結果、評価値間の差が規定された値以上の場合、また、予め予測される評価値のバラツキを上回る評価値の分散が検出された場合には、評価値の一部又は子個体そのものを異常値として破棄し、更に、再評価を行う等して異常値の検出及び排除が可能となる。 That is, by setting the evaluation value obtained by re-evaluation as a new evaluation value, a stable Pareto solution can be obtained by eliminating variations in actual machine characteristics. As a result of re-evaluation, if the difference between the evaluation values is equal to or greater than the prescribed value, or if variance of the evaluation values exceeding the variation of the evaluation values predicted in advance is detected, a part of the evaluation values Alternatively, the abnormal value can be detected and eliminated by discarding the child individual itself as an abnormal value and performing reevaluation.

生存選択部５６は、生成された子個体の制御パラメータで走行評価を行い、その結果、既存の親個体と同値の適合度となった場合には、上位選択の際に子個体を優先する。これは、前述したように、実車両にて評価値を算出する場合、評価を得てからの時間が経つと、車両特性の変化により評価値の信頼性が低下する虞があるためであり、この子個体優先の生存選択を行うことで、より最近の評価を優先し、評価値と実特性の乖離を抑制することができる。 The survival selection unit 56 performs the travel evaluation using the generated control parameter of the child individual. As a result, when the matching degree is the same as that of the existing parent individual, the child individual is given priority in the upper selection. This is because, as described above, when the evaluation value is calculated in an actual vehicle, the reliability of the evaluation value may decrease due to the change in the vehicle characteristics when the time elapsed after obtaining the evaluation, By performing the survival selection with priority on the individual child, priority is given to the more recent evaluation, and the difference between the evaluation value and the actual characteristic can be suppressed.

また、生存選択部５６は、各個体が生成された世代数を記憶し、各個体の年齢を保持しており、次世代に生き残らせるための選択の際に同等の適合度となった場合、より高齢の個体を淘汰する。 In addition, the survival selection unit 56 stores the number of generations that each individual has been generated, holds the age of each individual, and when it becomes an equivalent fitness at the time of selection for survival in the next generation, Spear older individuals.

尚、この個体年齢は、前述の親個体の選択条件の一つとして適用することも可能であり、適合度が同じ場合、より若い親を優先的に選択するようしても良い。 This individual age can also be applied as one of the above-mentioned parent individual selection conditions. If the fitness is the same, a younger parent may be selected preferentially.

寿命判定部５７は、各個体の年齢が予め決められた世代数（寿命）に達したか否かを判定し、寿命を超過した場合、その個体の制御パラメータで再評価を行い、パラメータの有効性を定期的に確認する。その結果、パレート解が非パレート解に変更となる場合があり、またその逆の場合もあり得る。 The life determination unit 57 determines whether or not the age of each individual has reached a predetermined number of generations (life), and when the life is exceeded, re-evaluates with the control parameters of the individual, Check gender regularly. As a result, the Pareto solution may be changed to a non-Pareto solution, and vice versa.

終了判定部５８は、このようにして選択された次世代の個体について、最適化の終了条件が満たされたか否かを判定し、終了条件が満たされない限り、以上の進化過程を繰り返す。そして、終了条件が満たされたと判定した時点で、パレート解である個体の制御パラメータを、最適化された制御パラメータとして出力する。最適化の終了条件は、例えば、予め設定した時間や世代数等とする。 The end determination unit 58 determines whether or not the optimization end condition is satisfied for the next-generation individual selected in this manner, and repeats the above evolution process unless the end condition is satisfied. When it is determined that the end condition is satisfied, the control parameter of the individual that is a Pareto solution is output as an optimized control parameter. The optimization termination condition is, for example, a preset time or the number of generations.

次に、以上の自動適合システム１における最適化処理について、図８，９のフローチャートを用いて説明する。 Next, the optimization process in the above automatic adaptation system 1 is demonstrated using the flowchart of FIG.

図８は、自動適合システム１における最適化処理の全体の流れを示すフローチャートであり、最初のステップＳ１において、遺伝的アルゴリズムの進化過程の初期化過程として、先ず、初期化部５１により所定個数の初期個体を生成する。次に、ステップＳ２で、初期個体の制御パラメータを自動運転装置２０に与えて実機（自動車１１）のＣＤＭ上での走行運転による初期評価運転を行い、各初期個体に対してそれぞれ適合度を算出するための試行を行う。尚、２世代目以降は、生成された１個の子個体に対してのみ試行が行われ、１世代当たりの試行回数が低減される。 FIG. 8 is a flowchart showing the overall flow of the optimization process in the automatic adaptation system 1. In the first step S1, as an initialization process of the evolution process of the genetic algorithm, first, the initialization unit 51 performs a predetermined number of processes. Create an initial individual. Next, in step S2, the initial individual control parameters are given to the automatic driving device 20 to perform the initial evaluation operation by the driving operation on the CDM of the actual machine (the automobile 11), and the fitness is calculated for each initial individual. Try to do. In the second and subsequent generations, only one generated child individual is tried, and the number of trials per generation is reduced.

ステップＳ２での初期評価運転が終わった後は、ステップＳ３へ進み、測定器３０を介して取得したデータから各個体の評価値を演算し、この評価に基づいて各個体の適合度を算出する。初期個体の適合度を算出した後は、ステップＳ４以降で親個体を選択して子個体を生成する世代交代を行う。 After the initial evaluation operation in step S2, the process proceeds to step S3, the evaluation value of each individual is calculated from the data acquired via the measuring device 30, and the fitness of each individual is calculated based on this evaluation. . After calculating the fitness of the initial individual, generational change is performed in step S4 and subsequent steps to select a parent individual and generate a child individual.

この世代交代に際しては、適合度の低い個体を親個体として選択し、適合度が同等の場合、シェアリングにより密集度の低い個体を優先的に親個体として選択する。すなわち、ステップＳ４において、各個体について前述のニッチカウントによる密集度を計算し、ステップＳ５において、適合度及び密集度が小さい個体を親個体として選択し、ステップＳ６で、選択した親個体に交叉操作等を施して子個体を生成する。本形態においては、１世代で生成される子個体は、１個体のみである。 In this generation change, an individual with a low fitness is selected as a parent individual, and if the fitness is the same, an individual with a low density is preferentially selected as a parent by sharing. That is, in step S4, the density by the above-described niche count is calculated for each individual, and in step S5, an individual having a low fitness and density is selected as a parent individual, and in step S6, the selected parent individual is crossovered. Etc. to generate child individuals. In this embodiment, only one individual child is generated in one generation.

その後、ステップＳ７において、世代交代によって生成された子個体を実機にて評価し、パレート解であるか否かを判定する。このパレート解の判定においては、初回の判定でパレート解と判定されても、すぐには判定を確定せず、複数回の再評価を行うことで、実機特性のばらつき等によって過大評価され、たまたまパレート解となった個体を排除する。 Thereafter, in step S7, the child individual generated by the generation change is evaluated by the actual machine to determine whether or not it is a Pareto solution. In this Pareto solution determination, even if it is determined to be a Pareto solution in the initial determination, the determination is not confirmed immediately, but by performing multiple re-evaluations, it is overestimated due to variations in actual machine characteristics, etc. Eliminate individuals that are Pareto solutions.

本形態においては、子個体の評価は、図９のフローチャートに示す子個体評価のサブ処理にて実施され、２回の評価を経てパレート解の判定が確定する。このサブ処理では、最初のステップＳ２１において、子個体の制御パラメータでの実機走行運転を行い、ステップＳ２２で実機走行運転結果から評価値を算出する。次に、ステップＳ２３で実機走行運転での評価値から個体の適合度を計算し、ステップＳ２４でパレート解か否かを判定する。 In this embodiment, the evaluation of the child individual is performed in the child individual evaluation sub-process shown in the flowchart of FIG. 9, and the determination of the Pareto solution is finalized after two evaluations. In this sub-process, in the first step S21, the actual machine running operation is performed with the control parameters of the child individuals, and in step S22, the evaluation value is calculated from the actual machine running operation result. Next, in step S23, the fitness of the individual is calculated from the evaluation value in the actual machine driving operation, and in step S24, it is determined whether or not the Pareto solution.

ステップＳ２４での判定結果、生成された子個体がパレート解でない場合には、本サブ処理を抜け、生成された子個体がパレート解であると判定された場合、ステップＳ２４からステップＳ２５へ進む。ステップＳ２５では、再評価のための実機走行運転を行い、ステップＳ２６で２回の評価値を平均して平均評価値を計算する。そして、ステップＳ２７で平均評価値に基づいて適合度を計算し、ステップＳ２８でパレート解か否かを再度判定する。 If the result of determination in step S24 is that the generated child individual is not a Pareto solution, this sub-process is exited, and if it is determined that the generated child individual is a Pareto solution, the process proceeds from step S24 to step S25. In step S25, an actual machine running operation for re-evaluation is performed, and the average evaluation value is calculated by averaging two evaluation values in step S26. In step S27, the fitness is calculated based on the average evaluation value. In step S28, it is determined again whether the pareto solution is used.

その結果、ステップＳ２８での２度目の判定においてもパレート解であると判定された場合には、ステップＳ２９でその子個体はパレート解であると認定して本サブ処理を抜け、パレート解でないと判定された場合、ステップＳ３０で初回のパレート判定を破棄し、その子個体を非パレート解（劣解）として本サブ処理を抜ける。 As a result, if it is determined that it is a Pareto solution in the second determination in step S28, it is determined in step S29 that the child individual is a Pareto solution and this sub-process is exited, and it is determined that the child is not a Pareto solution If so, the first pareto determination is discarded in step S30, and the child individual is regarded as a non-pareto solution (inferior solution) and the present sub-process is exited.

例えば、図１０に示す例では、総生成子個体数が７７個体で初回の評価でパレート解と判定されていた個体が２０個体であるのに対して、再評価後の平均評価値では、半数の１０個体が劣解（非パレート解）又は淘汰（次世代に残らない個体）と判定されていることが分かる。従って、子個体再評価の効果により、信頼性の高いパレート解が次世代へ残っていると言える。 For example, in the example shown in FIG. 10, the total number of offspring individuals is 77 and 20 individuals were determined to be Pareto solutions in the initial evaluation, whereas the average evaluation value after re-evaluation is half 10 individuals are determined to be inferior (non-Pareto solution) or cocoon (individual not remaining in the next generation). Therefore, it can be said that the Pareto solution with high reliability remains in the next generation due to the effect of the reevaluation of the offspring.

次に、子個体を評価した後は、図８の全体の最適化処理におけるステップＳ８へ進み、生存選択処理を行う。この生存選択処理では、車両特性のドリフトに追従してより実機において有効な制御パラメータを獲得するため、新たに生成された子個体と既存の親個体とが同値の適合度となった場合、子個体を優先して次世代に生き残らせ、また、各個体の年齢を比較し、同等の適合度では、より高齢の個体を淘汰する。 Next, after evaluating the child individual, the process proceeds to step S8 in the overall optimization process of FIG. 8, and the survival selection process is performed. In this survival selection process, in order to follow the drift of the vehicle characteristics and acquire more effective control parameters in the actual machine, if the newly generated child individual and the existing parent individual have the same fitness, The individual is prioritized to survive to the next generation, and the age of each individual is compared.

ステップＳ８で上位選択された各個体は、ステップＳ９において、その年齢が予め決められた世代数（寿命）に達したか否かが判定される。そして、寿命を超過したと判定された場合には、ステップＳ１０で、その個体の制御パラメータで再評価運転を行った後、ステップＳ３へ戻って以上の処理を繰り返し、制御パラメータの有効性を確認する。 Each individual selected in step S8 is determined in step S9 whether or not its age has reached a predetermined number of generations (lifetime). If it is determined that the lifetime has been exceeded, in step S10, after performing the reevaluation operation with the control parameter of the individual, the process returns to step S3 and the above processing is repeated to confirm the effectiveness of the control parameter. To do.

また、ステップＳ９において、寿命に達していないと判定された場合には、ステップＳ１１で最適化の終了条件（予め設定した時間や世代数等）が満たされたか否かを判定し、終了条件が満たされていない場合には、ステップＳ１０の再評価運転を経てステップＳ３へ戻り、以上の進化過程を繰り返す。そして、終了条件が満たされたと判定した時点で、本最適化処理を終了し、パレート解である個体の制御パラメータを最適化された制御パラメータとして出力する。 If it is determined in step S9 that the lifetime has not been reached, it is determined in step S11 whether or not an optimization termination condition (preset time, number of generations, etc.) has been satisfied. If not satisfied, the process returns to step S3 through the re-evaluation operation in step S10, and the above evolution process is repeated. When it is determined that the termination condition is satisfied, the optimization process is terminated, and the control parameter of the individual that is the Pareto solution is output as the optimized control parameter.

図１１は、以上の最適化処理による進化結果の例をパラメータ平面上で示したものであり、同図において、三角印で示す個体は最適化前の初期値、黒丸印で示す個体は最適化後の劣解、ハッチングの丸印で示す個体は、最適化後のパレート解を示している。図１１より、燃費率と速度誤差との間にはトレードオフ関係があること、並びに、最適化の結果、初期個体と比較して良好なパラメータセット獲得に成功していることが分かる。例えば、点線丸印で示した個体同士を比較すると、速度誤差ＲＭＳの評価を維持、或いは若干向上させながら、燃費率の評価値が大幅に向上している。 FIG. 11 shows an example of the evolution result by the above optimization processing on the parameter plane. In FIG. 11, individuals indicated by triangles are initial values before optimization, and individuals indicated by black circles are optimized. Individuals indicated by the later inferior solution and hatched circles indicate the Pareto solution after optimization. From FIG. 11, it can be seen that there is a trade-off relationship between the fuel consumption rate and the speed error, and that, as a result of the optimization, a favorable parameter set has been successfully obtained as compared to the initial individual. For example, comparing the individuals indicated by dotted circles, the evaluation value of the fuel consumption rate is greatly improved while maintaining or slightly improving the evaluation of the speed error RMS.

以上のように、本形態の自動適合システム１においては、実機を学習のループに導入して直接評価値を取得することで、オフラインシシミュレーションにおけるモデル誤差等の影響を排除することができ、精度の高いパラメータの獲得が可能となるばかりでなく、制御開発におけるパラメータ調整の負担を軽減して作業コストを低減し、より高品質な性能の追及を可能とする開発環境を実現することができる。 As described above, in the automatic adaptation system 1 of the present embodiment, the influence of the model error or the like in the offline simulation can be eliminated by introducing the actual machine into the learning loop and directly acquiring the evaluation value. It is possible not only to obtain a high parameter, but also to realize a development environment that can reduce the work cost by reducing the burden of parameter adjustment in control development and pursue higher quality performance.

特に、本形態の自動適合システム１においては、多目的最適化手法である多目的遺伝的アルゴリズムを用いることで、複数の目的関数間の重み係数等を用いる必要がなく、パレート解集合として最適解集合を獲得することができる。これにより、解集合の評価空間上での分布を見ることで評価軸間のトレードオフ関係を容易に把握することができ、最も好ましい目的関数のバランスを実現する解を選択することができる。 In particular, in the automatic adaptation system 1 of the present embodiment, by using a multi-objective genetic algorithm that is a multi-objective optimization method, it is not necessary to use a weighting coefficient between a plurality of objective functions, and an optimal solution set is obtained as a Pareto solution set. Can be earned. Thus, by viewing the distribution of the solution set in the evaluation space, the trade-off relationship between the evaluation axes can be easily grasped, and the solution that realizes the most preferable balance of objective functions can be selected.

しかも、この多目的遺伝的アルゴリズムの適用においては、個体数を少なくして評価値獲得の回数を減らすと共に、個体当たりの更新回数を増やすため、実機による評価値獲得に時間がかかることを抑制して制御パラメータの最適値が求められるまでの時間を短縮することができ、実用的な時間内で有効な制御パラメータの獲得が可能となる。 Moreover, in the application of this multi-purpose genetic algorithm, the number of individuals is reduced to reduce the number of evaluation value acquisitions, and the number of updates per individual is increased. The time until the optimum value of the control parameter is obtained can be shortened, and an effective control parameter can be acquired within a practical time.

また、多目的遺伝的アルゴリズムの進化過程において、評価空間上での解の密集度を計算し、より疎な解を優先的に親に選択（シェアリング戦略）するため、個体数が少ない場合でも効率的に解（個体）を分散させることが可能となり、最終的に得られるパレート解集合によって評価軸間のトレードオフ関係を容易に把握することができると共に、多様性に富んだ制御パラメータの獲得が可能となる。 Also, in the evolution process of multi-objective genetic algorithms, the density of solutions in the evaluation space is calculated, and sparse solutions are preferentially selected as parents (sharing strategy), so even if the number of individuals is small Solution (individuals) can be dispersed, and the trade-off relationship between the evaluation axes can be easily grasped by the finally obtained Pareto solution set, and a variety of control parameters can be obtained. It becomes possible.

また、パレート解の判定においては、子個体が初回の判定時にパレート解と判定されても直ちにパレート解として確定せず、複数回の再評価を行うことで個体の評価値を平均化し、実機特性のバラツキを排除して安定したパレート解を取得することができる。 In the determination of the Pareto solution, even if a child individual is determined to be a Pareto solution at the time of the first determination, it is not immediately determined as a Pareto solution. It is possible to obtain a stable Pareto solution by eliminating the variation.

子個体の再評価の結果、予測される評価値のバラツキを上回る評価値の分散が検出されたときには、その結果を破棄し、更に再評価を行う等して異常値の検出・排除が可能となる。しかも、子個体の再評価は、初回の判定時にパレート解と判定されたときにのみ行うため、全ての個体を複数回評価するのに比較して評価回数の増加を抑えることができ、実機による評価値取得の時間を抑えることができる。 As a result of re-evaluation of offspring individuals, when variance of evaluation values exceeding the expected variation in evaluation values is detected, the results can be discarded and re-evaluation can be performed to detect and eliminate abnormal values. Become. Moreover, since re-evaluation of the child individual is performed only when it is determined as a Pareto solution at the time of the first determination, the increase in the number of evaluations can be suppressed compared to evaluating all the individuals multiple times, and depending on the actual machine Evaluation time acquisition time can be reduced.

また、進化過程において、子個体を優先的に次世代に生き残らせるので、車両特性のドリフトに追従してより実機において有効なパラメータを獲得することができる。更に、個体の年齢を保持し、決められた寿命毎に同じ制御パラメータで再評価による確認を行うため、実機の特性ドリフトに対応して、有効性が失われた個体の排除（パレート解認定の剥奪）が可能となる。 In addition, since the child individuals are preferentially survived to the next generation in the evolution process, it is possible to follow the drift of the vehicle characteristics and acquire more effective parameters in the actual machine. Furthermore, since the age of the individual is retained and confirmation is performed by re-evaluation with the same control parameters for each determined life, the elimination of the individual whose effectiveness has been lost (corresponding to the Pareto solution certification) corresponding to the characteristic drift of the actual machine Deprivation).

自動適合システムの全体構成図Overall configuration diagram of the automatic calibration system 速度パターンを示す説明図Explanatory drawing showing the speed pattern 自動運転制御系のブロック図Block diagram of automatic operation control system 学習演算部のブロック図Block diagram of learning operation unit 解集団の例を示す説明図Explanatory diagram showing examples of solution groups ニッチサイズの説明図Illustration of niche size ニッチカウントの算出例を示す説明図Explanatory drawing showing an example of niche count calculation 最適化処理のフローチャートFlow chart of optimization process 子個体評価処理のフローチャートFlow chart of offspring evaluation process 子個体再評価の結果例を示す説明図Explanatory diagram showing an example of the results of child reevaluation 最適化結果を示す説明図Explanatory diagram showing optimization results

Explanation of symbols

１自動適合システム
１０実機
２０自動運転装置
５０学習演算部
５２適合度算出部
５３親選択部
５４世代交代部
５５子個体評価部
５６生存選択部 DESCRIPTION OF SYMBOLS 1 Automatic adaptation system 10 Real machine 20 Automatic driving device 50 Learning calculation part 52 Goodness-of-fit calculation part 53 Parent selection part 54 Generation change part 55 Child individual evaluation part 56 Survival selection part

Claims

An automatic adjustment system for control parameters that adapts control parameters related to control of a moving body to optimum values,
The real machine as the moving body is introduced into the learning loop, the evaluation value is automatically obtained by the automatic operation of the real machine, and the evaluation parameter is independently evaluated by a plurality of evaluation axes, and the control parameter which is the optimum solution is determined by the multi-purpose genetic algorithm With a learning operation unit to be acquired by
The learning calculation unit is
A fitness calculation unit that calculates the fitness of an individual including the control parameter as a gene;
When selecting some individuals as a parent individual from a group of individuals consisting of multiple individuals, if the fitness of the individuals is the same, use an index according to the degree of density with other individuals. A parent selection unit for determining the superiority or inferiority of
A generation alternation part that generates a child individual by genetic manipulation from the selected parent individual,
A child individual evaluation unit that selectively evaluates the generated child individual multiple times;
Have a survival and selection unit for selecting the individual to save the next generation,
The offspring individual evaluation part
An automatic control parameter adaptation system, wherein a re-evaluation of a child individual is performed only when the generated child individual is determined to be a Pareto optimal solution in an initial evaluation .

The survival selection unit is
2. The system for automatically adjusting control parameters according to claim 1, wherein when the degree of matching between the generated child individual and the parent individual is equal, the child individual is preferentially stored in the next generation.

The survival selection unit is
The individual age is maintained based on the number of generations of the above-mentioned individual, and the evaluation value is confirmed using the control parameter of the individual that has reached the specified life, thereby eliminating the individual whose effectiveness has been reduced. The system for automatically adjusting control parameters according to claim 1.

After re-evaluation of said child individual, automatic adaptation system of the control parameters according to claim 1, characterized in that an average value of evaluation values and evaluation values of the child individuals.

After re-evaluation of said child individual, when the above value difference or variance is defined between evaluation values, according to claim 1, wherein the discarding part of the evaluation values or the child individual itself as an outlier Automatic adjustment system for control parameters.

An automatic adjustment system for control parameters that adapts control parameters related to control of a moving body to optimum values,
The real machine as the moving body is introduced into the learning loop, the evaluation value is automatically obtained by the automatic operation of the real machine, and the evaluation parameter is independently evaluated by a plurality of evaluation axes, and the control parameter which is the optimum solution is determined by the multi-purpose genetic algorithm With a learning operation unit to be acquired by
The learning calculation unit is
A fitness calculation unit that calculates the fitness of an individual including the control parameter as a gene;
When selecting some individuals as a parent individual from a group of individuals consisting of multiple individuals, if the fitness of the individuals is the same, use an index according to the degree of density with other individuals. A parent selection unit for determining the superiority or inferiority of
A generation alternation part that generates a child individual by genetic manipulation from the selected parent individual,
A child individual evaluation unit that selectively evaluates the generated child individual multiple times;
A survival selection unit for selecting individuals to be stored in the next generation,
The parent selection part
The maximum value of the distance between individuals in at least one of the evaluation value space and the control parameter space is defined by either a predesignated value or a ratio to the variance of the evaluation value or the control parameter value. and, automatic adaptation system features and to that control parameter to calculate an index of the degree of the density based on the relation between the distance and the maximum value.