JP7764025B2

JP7764025B2 - Survey response data processing device, survey response data processing method, and survey response data processing program

Info

Publication number: JP7764025B2
Application number: JP2021207944A
Authority: JP
Inventors: 知大山形; 永和富野
Original assignee: 株式会社Find
Priority date: 2021-12-22
Filing date: 2021-12-22
Publication date: 2025-11-05
Anticipated expiration: 2041-12-22
Also published as: JP2023092749A

Description

本発明は、アンケート回答データ処理装置、アンケート回答データ処理方法及びアンケート回答データ処理用プログラムの技術分野に属する。より詳細には、質問に対する回答を求める形式によりインターネット等のネットワークを介して実施されたアンケートにおける上記回答を示す回答データの処理を行うアンケート回答データ処理装置及びアンケート回答データ処理方法並びに当該アンケート回答データ処理装置用のプログラムの技術分野に属する。 The present invention relates to the technical fields of a survey response data processing device, a survey response data processing method, and a survey response data processing program. More specifically, the present invention relates to a survey response data processing device, a survey response data processing method, and a program for the survey response data processing device that processes response data indicating responses to a survey conducted over a network such as the Internet in the form of requesting answers to questions.

従来、例えば市場調査等を目的とした、いわゆるアンケート調査が一般的に行われている。このとき、これまで行われていたアンケート調査の方法は、例えば、無作為に選んだ家に質問状を郵送するか又はその家を訪問して当該質問状の質問への回答を依頼する方法や、街頭で無作為に通行者を選んで回答して貰う方法、或いは、無作為に選んだ家に電話をかけることで質問に対する回答を得る方法等が用いられていた。 So-called questionnaire surveys have traditionally been conducted for purposes such as market research. Previous methods for conducting questionnaire surveys have included, for example, mailing questionnaires to randomly selected homes or visiting those homes to request responses to the questionnaire, randomly selecting passersby on the street to respond, or calling randomly selected homes to obtain responses.

これに対し、いわゆるスマートフォンが広く普及している近年では、アンケート調査として、回答を期待する者が所有しているスマートフォンに質問の内容のデータをインターネット経由で送信し、その回答を再度インターネット経由で取得する方法が用いられることが多くなっている。このような、スマートフォンとインターネットを用いたアンケート調査には、安価且つ迅速に回答が集められるという利点がある。 In contrast, in recent years, as smartphones have become more widespread, surveys are often conducted by sending question data over the Internet to the smartphones of those expected to respond, and then retrieving the responses again via the Internet. Such surveys using smartphones and the Internet have the advantage of being able to collect responses quickly and inexpensively.

一方、スマートフォンとインターネットを用いた上記アンケート調査の方法では、スマートフォンの画面自体が小型であることや回答者の移動中に回答が入力されることが多いため、例えば質問数が多かったり多岐に渡ったりすると、その回答の信憑性が低下する（すなわち、「いい加減な回答」しか得られない）といった問題点がある。より具体的には、以下の（ア）項乃至（ウ）項のような問題点が、現実に存在している。 However, the above-mentioned method of conducting questionnaire surveys using smartphones and the Internet has problems, such as the small size of smartphone screens and the fact that responses are often entered while respondents are on the move, which means that if there are a large number of questions or they cover a wide range of topics, the credibility of the answers decreases (i.e., only "careless answers" are obtained). More specifically, the following problems (a) to (c) actually exist.

（ア）異なる質問に対して相互に矛盾する回答をする回答者、より具体的に例えば、ある場所（例えば観光地）に「訪れたことがある」と回答しているにも拘わらず、一方ではその場所を「知らない」と回答する回答者等が一定数存在すること（なお以下では、このような回答者を単に「矛盾回答者」と称する）。 (a) There is a certain number of respondents who give contradictory answers to different questions; more specifically, for example, respondents who answer that they have "visited" a certain place (such as a tourist spot), but at the same time answer that they "don't know" that place (hereinafter, such respondents will be referred to simply as "contradictory respondents").

（イ）例えば数十の飲料を含む質問をしても「どれも飲んだことがない」と回答する回答者、すなわち、例えば全ての選択肢を読むことが面倒になったと予想される回答者が一定数存在すること（なお以下では、このような回答者を単に「低反応者」と称する）。 (i) For example, there are a certain number of respondents who, even when asked a question containing dozens of beverages, will answer "I have never tried any of them," i.e., respondents who are likely to find it troublesome to read all the options (hereinafter, such respondents will be referred to simply as "low responders").

（ウ）例えば、一般的な統計上は十代の顧客が殆どいないとされている商品に関するアンケート調査において、「使っている」と回答した十代の回答者が他の年代の回答者に比べて過度に多い場合、すなわち、その十代の回答者を正当な回答者として扱うには統計分布上の疑義がある場合がある。このような回答者は、例えば、回答をすることにより得られる当該回答への報酬等を取得したいがゆえに虚偽の回答をしている可能性があると考えられる回答者である。 (c) For example, in a questionnaire survey about a product that, according to general statistics, has almost no teenage customers, if the number of teenage respondents who answered "I use it" is disproportionately higher than respondents of other age groups, there may be doubts about the statistical distribution of those teenage respondents and whether they can be treated as legitimate respondents. Such respondents are likely to be giving false answers because they want to receive compensation for their answers, for example.

そして、上記（ア）項乃至上記（ウ）項記載のような回答者を他の正当な回答をした回答者と同じように扱ってアンケート結果を集計した場合、当該アンケート結果としては、その統計的な分布に悪影響が及ぼされると共に、アンケート調査に対応した企画の意図や意思決定が誤ったものとなる可能性が生じることとなる。従って、上記（ア）項乃至上記（ウ）項記載のような回答者の回答をアンケート結果の集計から除外することが求められる。このような要請に応じることを目的として検討された先行技術を開示した文献としては、例えば下記特許文献１が挙げられる。 Furthermore, if respondents such as those described in paragraphs (a) to (c) above were treated the same as other respondents who provided valid responses when compiling survey results, the statistical distribution of the survey results would be adversely affected, and there would be a possibility that the intentions of the plan and decision-making associated with the survey would be misleading. Therefore, it is necessary to exclude the responses of respondents such as those described in paragraphs (a) to (c) above from the compilation of survey results. Patent Document 1 below is an example of a document disclosing prior art that has been considered with the aim of meeting this demand.

この特許文献１に記載されている技術では、回答者が質問に回答するのに要した回答所要時間を計測し、質問ごとに、その質問の回答者の回答所要時間の代表値を求め、回答した質問における回答時間指数を回答者ごとに求めると共に、回答時間指数の和を回答した質問数で割って平均値を求め、平均値が小さい順に所定の割合の回答を除外してアンケート集計を行う構成とされている。 The technology described in Patent Document 1 measures the response time required by respondents to answer questions, calculates a representative response time for each question, calculates a response time index for each respondent for the questions answered, and calculates an average by dividing the sum of the response time indexes by the number of questions answered. The survey is then compiled by excluding a predetermined percentage of responses with the smallest average value.

特許第４７９５４９６号公報Patent No. 4795496

しかしながら、上記（ア）項乃至上記（ウ）項記載のような回答者は、いわゆる「アンケート調査慣れ」していることが考えられるため、例えば特許文献１に記載されている回答所要時間を適切にするような回答操作をわざとしたり、また、自身の回答の傾向を分散させたりすることで不適切回答者に見えないような回答操作をすることがあり、これらの結果として、上記（ア）項乃至上記（ウ）項記載のような回答者があたかも正当な回答者のように見えてしまう問題があった。 However, respondents such as those described in paragraphs (a) to (c) above are thought to be "accustomed to questionnaire surveys," and may, for example, deliberately manipulate their responses to make the response time appropriate, as described in Patent Document 1, or may manipulate their responses to make them appear less appropriate by diversifying their own response trends. As a result, there is a problem in that respondents such as those described in paragraphs (a) to (c) above may appear to be legitimate respondents.

更に、上記（ア）項乃至上記（ウ）項記載のような回答者から得られた回答データを除外することは、アンケート調査の結果としての回答データ自体を取得するアンケート調査実施者（例えば、当該アンケート調査を行う調査会社）であれば可能ではある。しかしながら、そのアンケート調査実施者から取得した回答データを分析する分析者（例えば、当該分析を行うアンケート分析会社）は、上記のような不適切な回答操作の結果として得られた回答データと他の正当な（すなわち信憑性が高い）回答データとを分別することが難しい。よって当該分析者は、当該回答操作の結果としての回答データを含んだ回答データ全体を対象とした分析（換言すれば、正確ではない分析）をせざるを得ないという問題点もあった。 Furthermore, excluding response data obtained from respondents as described in paragraphs (a) through (c) above is possible for the survey implementer (e.g., the research company conducting the survey) who obtains the response data itself as a result of the survey. However, it is difficult for an analyst (e.g., the survey analysis company conducting the analysis) who analyzes the response data obtained from the survey implementer to distinguish between response data obtained as a result of inappropriate response manipulation such as the above and other legitimate (i.e., highly credible) response data. Therefore, there is also the problem that the analyst is forced to perform an analysis of the entire response data, including the response data that is a result of the response manipulation (in other words, an inaccurate analysis).

更にまた、例えば特開２０２１－１７９８６５号公報に記載された本発明の発明者等によるデータ生成方法のように、ＡＩ（Artificial Intelligence）を用いてアンケート調査の結果としての回答データの数を増やす（回答データを拡張する）ことを考える場合に、上記（ア）項乃至上記（ウ）項記載のような回答者から得られた回答データが元の回答データの中に含まれていると、それらが含まれた回答データに基づき上記ＡＩによる回答データの拡張が行われることとなる。そしてその結果として、当該拡張後の回答データの品質が低下するという問題点もあった。 Furthermore, when considering using AI (Artificial Intelligence) to increase the number of response data (expand response data) resulting from a questionnaire survey, such as in the data generation method by the inventors of the present invention described in JP 2021-179865 A, if the original response data contains response data obtained from respondents such as those described in items (a) to (c) above, the AI will expand the response data based on the response data that contains this data. As a result, there is also the problem of a decrease in the quality of the expanded response data.

そこで本発明は、上記の各問題点に鑑みて為されたもので、その課題の一例は、アンケート調査の回答の集計時及びその後の当該回答の分析時に除外されるべき低品質な回答を効果的に除外することが可能なアンケート回答データ処理装置及びアンケート回答データ処理方法並びに当該アンケート回答データ処理装置用のプログラムを提供することにある。 The present invention was made in consideration of the above-mentioned problems, and one example of its objective is to provide a survey response data processing device, a survey response data processing method, and a program for such a survey response data processing device that can effectively filter out low-quality responses that should be filtered out when compiling survey responses and when subsequently analyzing those responses.

上記の課題を解決するために、請求項１に記載の発明は、質問に対する回答を求める形式によりネットワークを介して実施されたアンケートに対する当該回答の内容を示す内容データと、当該回答を識別するための識別データと、を含む回答データを複数取得する処理部等の回答データ取得手段と、前記取得された回答データの中から、前記回答として予め設定された低品質基準に該当する当該回答である低品質回答に対応する低品質回答データを抽出する矛盾回答者クリーニング部等の抽出手段と、前記抽出された低品質回答データを示す前記識別データに基づき、当該低品質回答データを前記取得された回答データから削除する矛盾回答者クリーニング部等の削除手段と、前記低品質回答データが削除された後の前記回答データに基づいて、当該削除された低品質回答データの数に対応した数の補完回答データであって前記削除された低品質回答データに代わる補完回答データを新たに生成する生成部等の生成手段と、を備え、前記低品質基準は、（ｉ）第１低品質基準：複数の前記質問への各前記回答が相互に矛盾している場合に、当該各回答を前記低品質回答とすること、（ii）第２低品質基準：選択形式の前記質問への前記回答における選択数が予め設定された数より少ない当該回答を前記低品質回答とすること、及び、（iii）第３低品質基準：前記回答が期待される回答者群と異なる回答者群に属する回答者による前記回答を前記低品質回答とすること、のいずれか又は全てであるように構成される。 In order to solve the above problem, the invention of claim 1 includes an answer data acquisition means such as a processing unit that acquires a plurality of answer data including content data indicating the content of the answer to a questionnaire conducted via a network in the form of requesting answers to questions and identification data for identifying the answer; an extraction means such as an inconsistent answer cleaning unit that extracts, from the acquired answer data, low-quality answer data corresponding to a low-quality answer that is an answer that falls under a low-quality standard that is preset as the answer; a deletion means such as an inconsistent answer cleaning unit that deletes the low-quality answer data from the acquired answer data based on the identification data indicating the extracted low-quality answer data; and a deletion unit that deletes the low-quality answer data from the acquired answer data after the low-quality answer data has been deleted. and a generating means such as a generating unit that generates new complementary answer data, the number of which corresponds to the number of deleted low-quality answer data, to replace the deleted low-quality answer data, based on the answer data after the deletion, wherein the low-quality criteria are configured to be any one or all of : (i) a first low-quality criterion: when the answers to the multiple-choice questions are mutually contradictory, each answer is designated as a low-quality answer; (ii) a second low-quality criterion: when the number of selections in the answer to the multiple-choice question is less than a predetermined number, the answer is designated as a low-quality answer; and (iii) a third low-quality criterion: when the answer by an answerer belonging to a group of answerers different from the group of answerers from which the answer is expected, the answer is designated as a low-quality answer.

上記の課題を解決するために、請求項６に記載の発明は、回答データ取得手段と、抽出手段と、削除手段と、生成手段と、を備える回答データ処理装置において実行される回答データ処理方法であって、質問に対する回答を求める形式によりネットワークを介して実施されたアンケートに対する当該回答の内容を示す内容データと、当該回答を識別するための識別データと、を含む回答データを前記回答データ取得手段により複数取得する回答データ取得工程と、前記取得された回答データの中から、前記回答として予め設定された低品質基準に該当する当該回答である低品質回答に対応する低品質回答データを前記抽出手段により抽出する抽出工程と、前記抽出された低品質回答データを示す前記識別データに基づき、当該低品質回答データを前記削除手段により前記取得された回答データから削除する削除工程と、前記低品質回答データが削除された後の前記回答データに基づいて、当該削除された低品質回答データの数に対応した数の補完回答データであって前記削除された低品質回答データに代わる補完回答データを前記生成手段により新たに生成する生成工程と、を含み、前記低品質基準は、（ｉ）第１低品質基準：複数の前記質問への各前記回答が相互に矛盾している場合に、当該各回答を前記低品質回答とすること、（ii）第２低品質基準：選択形式の前記質問への前記回答における選択数が予め設定された数より少ない当該回答を前記低品質回答とすること、及び、（iii）第３低品質基準：前記回答が期待される回答者群と異なる回答者群に属する回答者による前記回答を前記低品質回答とすること、のいずれか又は全てであるように構成される。 In order to solve the above-mentioned problems, the invention of claim 6 is an answer data processing method executed in an answer data processing device including answer data acquisition means, extraction means, deletion means, and generation means, the answer data processing method including an answer data acquisition step of acquiring, by the answer data acquisition means, a plurality of answer data each including content data indicating the content of the answer to a questionnaire conducted via a network in a format in which an answer to a question is requested, and identification data for identifying the answer; an extraction step of extracting, by the extraction means, low-quality answer data corresponding to a low-quality answer that is an answer that falls under a low-quality standard that is preset as the answer, from the acquired answer data; and an extraction step of deleting, by the deletion means, the low-quality answer data from the acquired answer data based on the identification data indicating the extracted low-quality answer data. and a generating step of generating new complementary answer data by the generating means based on the answer data after the low-quality answer data has been deleted, the complementary answer data being in a number corresponding to the number of deleted low-quality answer data and replacing the deleted low-quality answer data, wherein the low-quality criteria are configured to be any one or all of: (i) a first low-quality criterion: when the answers to the multiple-choice questions are mutually contradictory, the answers are designated as low-quality answers; (ii) a second low-quality criterion: when the number of selections in the answer to the multiple-choice question is less than a predetermined number, the answer is designated as low-quality answers; and (iii) a third low-quality criterion: when the answer to the multiple-choice question is selected less than a predetermined number, the answer is designated as low-quality answers.

上記の課題を解決するために、請求項７に記載の発明は、回答データ処理装置に含まれるコンピュータを、質問に対する回答を求める形式によりネットワークを介して実施されたアンケートに対する当該回答の内容を示す内容データと、当該回答を識別するための識別データと、を含む回答データを複数取得する回答データ取得手段、前記取得された回答データの中から、前記回答として予め設定された低品質基準に該当する当該回答である低品質回答に対応する低品質回答データを抽出する抽出手段、前記抽出された低品質回答データを示す前記識別データに基づき、当該低品質回答データを前記取得された回答データから削除する削除手段、及び、前記低品質回答データが削除された後の前記回答データに基づいて、当該削除された低品質回答データの数に対応した数の補完回答データであって前記削除された低品質回答データに代わる補完回答データを新たに生成する生成手段、として機能させる回答データ処理用プログラムであって、前記低品質基準は、（ｉ）第１低品質基準：複数の前記質問への各前記回答が相互に矛盾している場合に、当該各回答を前記低品質回答とすること、（ii）第２低品質基準：選択形式の前記質問への前記回答における選択数が予め設定された数より少ない当該回答を前記低品質回答とすること、及び、（iii）第３低品質基準：前記回答が期待される回答者群と異なる回答者群に属する回答者による前記回答を前記低品質回答とすること、のいずれか又は全てであるように構成される。 In order to solve the above problem, the invention of claim 7 provides a computer included in a response data processing device, comprising: response data acquisition means for acquiring a plurality of pieces of response data, each of which includes content data indicating the content of the response to a questionnaire conducted via a network in a format in which the response is requested to a question, and identification data for identifying the response ; extraction means for extracting, from the acquired response data, low-quality response data corresponding to a low-quality response that is an answer that falls within a low-quality standard that is preset as the answer; deletion means for deleting the low-quality response data from the acquired response data based on the identification data indicating the extracted low-quality response data; and deletion means for deleting the low-quality response data based on the answer data after the low-quality response data has been deleted. and a generating means for generating new complementary answer data corresponding to the number of deleted low-quality answer data to replace the deleted low-quality answer data, wherein the low-quality criteria are configured to be any one or all of: (i) a first low-quality criterion: when the answers to the multiple-choice questions are mutually contradictory, the answers are designated as low-quality answers; (ii) a second low-quality criterion: when the number of selections in the answer to the multiple-choice question is less than a predetermined number, the answer is designated as low-quality answers; and (iii) a third low-quality criterion: when the answer from an answerer belonging to a group of answerers different from the group of answerers from which the answer is expected, the answer is designated as low-quality answers.

請求項１、請求項６又は請求項７のいずれか一項に記載の発明によれば、質問に対する回答を求める形式によりネットワークを介して実施されたアンケートに対応する回答データを複数取得し、その回答データの中から低品質回答に対応する低品質回答データを抽出して削除する。このとき、低品質回答を抽出するための低品質基準として第１低品質基準乃至第３低品質基準のいずれか又は全てを用いる。よって、アンケートの回答の集計時及びその後の当該回答の分析時に除外されるべき低品質回答を効果的に除外することができる。
また、低品質回答データが削除された後の回答データに基づいて、削除された低品質回答データの数に対応した数の補完回答データを新たに生成するので、低品質回答データが削除された回答データを用いることで、当該低品質回答データが削除され且つ回答数としても必要十分な回答データを取得することができる。 According to the invention of any one of claims 1, 6 , and 7 , a plurality of pieces of answer data corresponding to a questionnaire conducted via a network in the form of requesting answers to questions are acquired, and low-quality answer data corresponding to low-quality answers are extracted and deleted from the answer data. At this time, any or all of the first to third low-quality criteria are used as low-quality criteria for extracting low-quality answers. Therefore, low-quality answers that should be excluded when collecting the answers to the questionnaire and when subsequently analyzing the answers can be effectively excluded.
Furthermore, based on the answer data after the low-quality answer data has been deleted, new complementary answer data is generated in a number corresponding to the number of deleted low-quality answer data. Therefore, by using the answer data from which the low-quality answer data has been deleted, the low-quality answer data can be deleted and a necessary and sufficient number of answer data can be obtained.

上記の課題を解決するために、請求項２に記載の発明は、請求項１に記載の回答データ処理装置において、前記第１低品質基準が、一の前記アンケートに含まれている複数の前記質問それぞれへの前記回答が相互に矛盾していることであり、前記抽出手段は、当該矛盾していることに該当する前記低品質回答に対応する前記低品質回答データを前記取得された回答データの中から抽出するように構成される。 To solve the above problem, the invention described in claim 2 provides the response data processing device described in claim 1, wherein the first low-quality criterion is that the responses to each of the multiple questions included in one questionnaire are mutually contradictory, and the extraction means is configured to extract the low-quality response data corresponding to the low-quality response that falls under the contradiction from the acquired response data.

請求項２に記載の発明によれば、請求項１に記載の発明の作用に加えて、第１低品質基準が、一のアンケートに含まれている複数の質問それぞれへの回答が相互に矛盾していることであり、当該矛盾していることに該当する低品質回答に対応する低品質回答データを回答データの中から抽出するので、適正な第１低品質基準を用いて低品質回答データを適格に抽出して削除することができる。 According to the invention described in claim 2, in addition to the effects of the invention described in claim 1, the first low-quality criterion is that the answers to multiple questions included in a single questionnaire are mutually contradictory, and low-quality answer data corresponding to the low-quality answers that fall under the contradiction is extracted from the answer data, so that low-quality answer data can be appropriately extracted and deleted using an appropriate first low-quality criterion.

上記の課題を解決するために、請求項３に記載の発明は、請求項２に記載の回答データ処理装置において、前記第１低品質基準が、請求項２に記載の前記第１低品質基準である第１低品質基準（ａ）、又は、前記アンケートに対応する予備調査に含まれている前記質問と、当該アンケートに対応する本調査に含まれている前記質問と、が同旨であり、且つ当該各質問それぞれへの前記回答が相互に異なっていることである第１低品質基準（ｂ）、のいずれかであり、前記抽出手段は、前記いずれかに該当する前記低品質回答に対応する前記低品質回答情報を前記取得された回答情報の中から抽出し、前記第１低品質基準（ｂ）の方が前記第１低品質基準（ａ）よりも前記回答としての低品質となるように、各前記第１低品質基準のそれぞれが点数化されており、点数化された各前記第１低品質基準及び予め設定された削除数基準に基づいて、削除される前記低品質回答データの数を決定する決定手段を更に備え、前記削除手段は、前記決定された数の前記低品質回答データを前記取得された回答データから削除するように構成される。 In order to solve the above problem, the invention described in claim 3 provides a response data processing device described in claim 2, wherein the first low-quality criterion is either first low-quality criterion (a), which is the first low-quality criterion described in claim 2, or first low-quality criterion (b), which is that the questions included in the preliminary survey corresponding to the questionnaire and the questions included in the main survey corresponding to the questionnaire are similar and the answers to the questions are different from each other; the extraction means extracts low-quality answer information corresponding to the low-quality answers that fall into either of the above categories from the acquired response information; each of the first low-quality criteria is scored so that the first low-quality criterion (b) is lower in quality as an answer than the first low-quality criterion (a); the device further includes a determination means for determining the number of low-quality response data to be deleted based on the scored first low-quality criterion and a preset deletion number criterion; and the deletion means is configured to delete the determined number of low-quality answer data from the acquired response data.

請求項３に記載の発明によれば、請求項２に記載の発明の作用に加えて、第１低品質基準（ａ）及び第１低品質基準（ｂ）それぞれが点数化されており、その点数化された各第１低品質基準及び既定の削除数基準に基づいて決定された数の低品質回答データを回答データから削除するので、より客観的な低品質基準を用いて低品質回答データを抽出して削除することができる。 According to the invention described in claim 3, in addition to the effects of the invention described in claim 2, the first low-quality criterion (a) and the first low-quality criterion (b) are each scored, and a number of low-quality answer data determined based on each scored first low-quality criterion and a predetermined deletion number criterion is deleted from the answer data, making it possible to extract and delete low-quality answer data using more objective low-quality criteria.

上記の課題を解決するために、請求項４に記載の発明は、請求項１から請求項３のいずれか一項に記載の回答データ処理装置において、前記第３低品質基準が、前記回答が期待される前記回答者群に属する回答者数の全回答者数に対する割合と、前記異なる回答者群に属し且つ前記回答をした回答者数の前記全回答者数に対する割合と、の差が予め設定された基準以上であることであり、前記抽出手段は、前記第３低品質基準に該当する前記異なる回答者群に属する回答者による前記低品質回答に対応する前記低品質回答データを前記取得された回答データの中から抽出するように構成される。 In order to solve the above problem, the invention described in claim 4 is a response data processing device described in any one of claims 1 to 3 , wherein the third low-quality criterion is that the difference between the ratio of the number of respondents belonging to the respondent group from which the response is expected to be given to the total number of respondents and the ratio of the number of respondents belonging to the different respondent group who have given the response to the total number of respondents is equal to or greater than a predetermined standard, and the extraction means is configured to extract from the acquired response data the low-quality response data corresponding to the low-quality responses by respondents belonging to the different respondent group that meet the third low-quality criterion.

請求項４に記載の発明によれば、請求項１から請求項３のいずれか一項に記載の発明の作用に加えて、第３低品質基準が、回答が期待される回答者群に属する回答者数の割合と、当該回答者群と異なる回答者群に属し且つ上記回答をした回答者数の割合と、の差が既定の基準以上である場合に、当該異なる回答者群に属する回答者による低品質回答に対応する低品質回答データを回答データの中から抽出する。よって、適正な第３低品質基準を用いて低品質回答データを適格に抽出して削除することができる。 According to the invention of claim 4 , in addition to the effects of the invention of any one of claims 1 to 3 , when the difference between the proportion of respondents belonging to a respondent group from which an answer is expected and the proportion of respondents belonging to a respondent group different from the expected respondent group and who have given the answer is equal to or greater than a predetermined standard, low-quality answer data corresponding to low-quality answers given by respondents belonging to the different respondent group is extracted from the answer data. Thus, low-quality answer data can be properly extracted and deleted using an appropriate third low-quality criterion.

上記の課題を解決するために、請求項５に記載の発明は、請求項１に記載の回答データ処理装置において、前記生成手段は、前記取得された回答データ全体の分布を参照して前記補完回答データを新たに生成するように構成される。 In order to solve the above problem, the invention described in claim 5 is the response data processing device described in claim 1 , wherein the generation means is configured to newly generate the complementary response data by referring to the distribution of the entire acquired response data.

請求項５に記載の発明によれば、請求項１に記載の発明の作用に加えて、元の回答データ全体の分布を参照して補完回答データを新たに生成するので、元の回答データの分布に対応しつつ、回答数としても必要十分な回答データを取得することができる。 According to the invention described in claim 5 , in addition to the effect of the invention described in claim 1 , new complementary answer data is generated by referring to the distribution of the entire original answer data, so that answer data that corresponds to the distribution of the original answer data and is necessary and sufficient in terms of the number of answers can be obtained.

以上説明したように、本発明によれば、質問に対する回答を求める形式によりネットワークを介して実施されたアンケートに対応する回答データを複数取得し、その回答データの中から低品質回答に対応する低品質回答データを抽出して削除する。このとき、低品質回答を抽出するための低品質基準として第１低品質基準乃至第３低品質基準のいずれか又は全てを用いる。 As described above, according to the present invention, multiple pieces of response data corresponding to a survey conducted over a network in the form of requesting answers to questions are obtained, and low-quality response data corresponding to low-quality answers are extracted and deleted from the response data. At this time, any or all of the first to third low-quality criteria are used as the low-quality criteria for extracting low-quality answers.

従って、アンケートの回答の集計時及びその後の当該回答の分析時に除外されるべき低品質回答を効果的に除外することができる。
また、低品質回答データが削除された後の回答データに基づいて、削除された低品質回答データの数に対応した数の補完回答データを新たに生成するので、低品質回答データが削除された回答データを用いることで、当該低品質回答データが削除され且つ回答数としても必要十分な回答データを取得することができる。 Therefore, low-quality responses that should be excluded when collecting survey responses and when subsequently analyzing the responses can be effectively excluded.
Furthermore, based on the answer data after the low-quality answer data has been deleted, new complementary answer data is generated in a number corresponding to the number of deleted low-quality answer data. Therefore, by using the answer data from which the low-quality answer data has been deleted, the low-quality answer data can be deleted and a necessary and sufficient number of answer data can be obtained.

第１実施形態の矛盾回答者の回答例を示す図である。FIG. 10 is a diagram showing an example of a response by a contradiction respondent in the first embodiment. 第１実施形態の低反応者の回答例を示す図である。FIG. 10 is a diagram showing an example of a response from a low responder in the first embodiment. 第１実施形態の統計分布的な低品質回答の回答例を示す図である。FIG. 10 is a diagram illustrating an example of a statistically distributed low-quality response in the first embodiment. 第１実施形態のアンケート回答処理装置の概要構成を示すブロック図である。1 is a block diagram showing a schematic configuration of a questionnaire response processing device according to a first embodiment; 第１実施形態のアンケート回答処理を示すフローチャートである。10 is a flowchart showing a questionnaire response process according to the first embodiment. 第２実施形態のアンケート回答処理装置の概要構成を示すブロック図である。FIG. 10 is a block diagram showing a schematic configuration of a questionnaire response processing device according to a second embodiment.

次に、本発明を実施するための形態について、図面に基づいて説明する。なお、以下に説明する各実施形態は、スマートフォン等の端末装置を使用する者を対象として行われた質問形式のアンケート調査の結果としての回答から上記（ア）項乃至上記（ウ）項記載のような回答者による回答を削除することで、当該アンケート調査としての品質を高めるアンケート回答処理装置に対して本発明を適用した場合の実施の形態である。なお以下の説明において、上記（ア）項乃至上記（ウ）項記載のような回答者による回答を、纏めて「低品質回答」と称する。 Next, embodiments of the present invention will be described with reference to the drawings. Note that each embodiment described below is an embodiment in which the present invention is applied to a survey response processing device that improves the quality of a questionnaire survey conducted among users of terminal devices such as smartphones by deleting responses by respondents such as those described in items (a) to (c) above from the responses that result from the survey. Note that in the following description, responses by respondents such as those described in items (a) to (c) above will be collectively referred to as "low-quality responses."

（Ｉ）第１実施形態
初めに、本発明の第１実施形態について、図１乃至図５を用いて説明する。 (I) First embodiment
First, a first embodiment of the present invention will be described with reference to FIGS. 1 to 5. FIG.

（Ａ）低品質回答について
初めに、第１実施形態のアンケート回答処理装置について説明する前に、当該アンケート回答処理装置によるアンケート回答処理の対象となる低品質回答について、具体例を用いて説明する。なお、図１は第１実施形態の矛盾回答者の回答例を示す図であり、図２は第１実施形態の低反応者の回答例を示す図であり、図３は第１実施形態の統計分布的な低品質回答の回答例を示す図である。 (A) Low-quality answers
First, before explaining the questionnaire response processing device of the first embodiment, specific examples of low-quality responses that are the target of questionnaire response processing by the questionnaire response processing device will be explained. Note that Fig. 1 is a diagram showing example responses of a contradictory respondent in the first embodiment, Fig. 2 is a diagram showing example responses of a low-responder in the first embodiment, and Fig. 3 is a diagram showing example responses of a statistically distributed low-quality response in the first embodiment.

第１実施形態のアンケート回答処理の対象となる低品質回答は、上述したように、上記（ア）項記載の矛盾回答者による回答、上記（イ）項記載の低反応者による回答、及び上記（ウ）項記載の統計分布上の疑義がある回答を含んでいる。なお以下の説明において、上記（ア）項記載の矛盾回答者による回答を「矛盾回答」と称し、上記（イ）項記載の低反応者による回答を「低反応回答」と称し、上記（ウ）項記載の統計分布上の疑義がある回答を「分布疑義回答」と称する。 As described above, the low-quality answers that are the target of the survey response processing in the first embodiment include answers by inconsistent respondents described in section (a) above, answers by low responders described in section (b) above, and answers with questionable statistical distribution described in section (c) above. In the following description, answers by inconsistent respondents described in section (a) above will be referred to as "inconsistent answers," answers by low responders described in section (b) above will be referred to as "low-response answers," and answers with questionable statistical distribution described in section (c) above will be referred to as "answers with questionable distribution."

そして、上記矛盾回答は、上記矛盾回答者による「異なる質問に対して相互に矛盾する回答」であり、その例としては、例えば図１に示すように、ドリンク（飲料）の認知の有無及び実際の飲用の有無に関するアンケート調査の調査結果Ｄ１において、「知っている」なる質問項目について「×」を選択したにも拘わらず、「飲んだことがある」なる質問項目について「○」を選択した回答ＤＸ１が挙げられる。当該矛盾回答をした矛盾回答者は、結局のところ「そのドリンクは知らない」にも拘わらず「そのドリンクを飲んだ」と答えていることになり、これらの回答は相互に矛盾している、すなわち図１に例示する回答ＤＸ１は上記矛盾回答の典型例であることになる。 The above-mentioned contradictory answers are "mutually contradictory answers to different questions" given by the contradictory respondent. An example of this is answer DX1, as shown in Figure 1, in survey result D1 of a questionnaire survey regarding awareness of a drink (beverage) and whether or not the respondent has actually consumed it, in which the respondent selected "x" for the question item "I know," but also selected "o" for the question item "I have consumed it." The contradictory respondent who gave this contradictory answer ultimately answered that "I have consumed that drink" despite "not knowing that drink," and these answers are mutually contradictory; in other words, answer DX1 shown in Figure 1 is a typical example of the above-mentioned contradictory answer.

次に、上記低反応回答は、上記低反応者による「多数の選択肢のいずれにも該当しない旨の安易な（手抜きの）回答」であり、その例としては、例えば図２に示すように、多数の飲料について飲んだことがあるか否かのアンケート調査の調査結果Ｄ２において、「あてはまるものはない」（すなわち、飲んだことがある飲料は一つもない）を選択した回答ＤＸ２が挙げられる。当該低反応回答をした低反応者については、図２に例示するくらいの数の飲料が選択肢として挙げられているにも拘わらず、「あてはまるものはない」と回答することは、通常は考え難い。よって、図２に例示する回答ＤＸ２の回答者は、例えば全ての選択肢を読むことが面倒になったと予想される回答者（低反応者）であることになる。 Next, the low response response is a "simple (lazy) response indicating that none of the many options apply" given by the low responder. An example of this is response DX2, as shown in Figure 2, in which a respondent selected "none of these apply" (i.e., there is not a single beverage that they have ever drunk) in survey result D2 of a questionnaire survey asking whether or not the respondent has ever drunk a large number of beverages. It is generally difficult to imagine a low responder who gave such a low response response answering "none of these apply" when there are as many beverages as shown in Figure 2 as options. Therefore, the respondent who gave response DX2 in Figure 2 is likely to be a respondent (low responder) who found it troublesome to read through all the options, for example.

最後に、上記分布疑義回答は、上記統計分布上の虚偽回答をしていると推測される回答者による回答であり、その例としては、例えば図３に示すように、ある商品又はサービスの顧客データ全体の統計分布の中では十代の男性の割合は極めて少ない（図３において「１．２」と示されている）にも拘わらず、その商品又はサービスについてのあるアンケート調査の調査結果Ｄ３では十代の男性が最も多く回答した回答ＤＸ３（図３において「１６．５」と示されている）が挙げられる。このような分布疑義回答をした回答者は、正当な回答者として扱われるには元の統計分布上の疑義があると考えられ、例えば上述したように、所定の目的のために虚偽の回答をしている可能性があると考えられる回答者であることになる。 Finally, the distribution-questionable answers are answers given by respondents who are suspected of giving false answers based on the statistical distribution, and an example of this is, as shown in Figure 3, answer DX3 (shown as " 16.5 " in Figure 3) given most frequently by teenage males in survey result D3 of a questionnaire survey about a certain product or service, even though the proportion of teenage males is extremely low (shown as "1.2" in Figure 3) in the statistical distribution of all customer data for that product or service. Respondents who give such distribution-questionable answers are considered to have doubts about the original statistical distribution and are therefore considered to be respondents who are suspected of giving false answers for a specific purpose, as described above, for example.

（Ｂ）第１実施形態のアンケート回答処理装置について
次に、上記端末装置を使用する者を対象として行われた質問形式のアンケート調査の結果である回答から上記低品質回答を削除する第１実施形態のアンケート回答処理装置について、図４及び図５を用いて具体的に説明する。なお、図４は第１実施形態のアンケート回答処理装置の概要構成を示すブロック図であり、図５は第１実施形態のアンケート回答処理を示すフローチャートである。このとき図４においては、「データベース」を適宜「ＤＢ」と表している。また図５に示すフローチャートでは、時系列的な処理の流れを実線矢印で、各処理におけるデータの流れを破線矢印で、それぞれ示している。 (B) Regarding the questionnaire response processing device of the first embodiment
Next, a questionnaire response processing device according to a first embodiment, which deletes low-quality responses from responses resulting from a question-and-answer survey conducted targeting users of the terminal device, will be described in detail with reference to Figures 4 and 5. Note that Figure 4 is a block diagram showing the general configuration of the questionnaire response processing device according to the first embodiment, and Figure 5 is a flowchart showing the questionnaire response processing according to the first embodiment. In Figure 4, "database" is appropriately represented as "DB." In the flowchart shown in Figure 5, the chronological flow of processing is indicated by solid arrows, and the data flow in each process is indicated by dashed arrows.

図４に示すように、第１実施形態のアンケート回答処理装置Ｓは、具体的には例えばサーバコンピュータやパーソナルコンピュータ等により実現されるものであり、ＣＰＵ等からなる処理部１と、ＨＤＤ（Hard Disk Drive）又はＳＳＤ（Solid State Drive）等からなる記録部２と、キーボード及びマウス等からなる操作部３と、液晶ディスプレイ等からなるディスプレイ４と、により構成されている。 As shown in FIG. 4, the questionnaire response processing device S of the first embodiment is specifically realized by, for example, a server computer or a personal computer, and is composed of a processing unit 1 consisting of a CPU or the like, a recording unit 2 consisting of an HDD (Hard Disk Drive) or SSD (Solid State Drive) or the like, an operation unit 3 consisting of a keyboard, mouse, etc., and a display 4 consisting of a liquid crystal display, etc.

また処理部１は、矛盾回答者クリーニング部１０と、低反応者クリーニング部１１と、統計分布クリーニング部１２と、生成部１３と、により構成されている。 The processing unit 1 is also composed of a contradictory respondent cleaning unit 10, a low responder cleaning unit 11, a statistical distribution cleaning unit 12, and a generation unit 13.

このとき、矛盾回答者クリーニング部１０、低反応者クリーニング部１１、統計分布クリーニング部１２及び生成部１３は、処理部１を構成するＣＰＵ等を含むハードウェアロジック回路により実現されてもよいし、後述する（図５参照）第１実施形態のアンケート回答処理に相当するプログラムを上記ＣＰＵ等が読み込んで実行することにより、ソフトウェア的に実現されてもよい。このとき上記プログラムは、記録部２に予め記録されているものを上記ＣＰＵ等が読み込んでもよいし、図示しない外部のサーバ装置に記録されている当該プログラムをインターネット等のネットワークを介して上記ＣＰＵ等が取得して用いるように構成してもよい。 In this case, the contradictory respondent cleaning unit 10, low responder cleaning unit 11, statistical distribution cleaning unit 12, and generation unit 13 may be realized by a hardware logic circuit including a CPU or the like constituting the processing unit 1, or may be realized in software by the CPU or the like reading and executing a program corresponding to the questionnaire response processing of the first embodiment described below (see FIG. 5). In this case, the program may be pre-recorded in the recording unit 2 and read by the CPU or the like, or the program may be recorded on an external server device (not shown) and retrieved and used by the CPU or the like via a network such as the Internet.

そして、処理部１が本発明の「回答データ取得手段」の一例に相当し、矛盾回答者クリーニング部１０、低反応者クリーニング部１１及び統計分布クリーニング部１２が本発明の「抽出手段」の一例及び「削除手段」の一例にそれぞれ相当し、矛盾回答者クリーニング部１０が本発明の「決定手段」の一例に相当する。また、低反応者クリーニング部１１が本発明の「分類手段」の一例及び「第２決定手段」の一例にそれぞれ相当し、生成部１３が本発明の「生成手段」の一例に相当する。 The processing unit 1 corresponds to an example of the "response data acquisition means" of the present invention, the inconsistent respondent cleaning unit 10, the low responder cleaning unit 11, and the statistical distribution cleaning unit 12 correspond to an example of the "extraction means" and an example of the "deletion means" of the present invention, respectively, and the inconsistent respondent cleaning unit 10 corresponds to an example of the "determination means" of the present invention. Furthermore, the low responder cleaning unit 11 corresponds to an example of the "classification means" and an example of the "second determination means" of the present invention, respectively, and the generation unit 13 corresponds to an example of the "generation means" of the present invention.

以上の構成において、アンケート回答処理装置Ｓの処理部１は、調査結果データベース１００及び統計データベース１０２にそれぞれ接続可能とされている。このとき、調査結果データベース１００は、上記端末装置の使用者を対象として行われた質問形式のアンケート調査の結果（回答）としての回答データであって、当該回答の内容を示す内容データと、当該回答又はその回答者を識別するための識別データと、を含む回答データを格納するデータベースである。そして、調査結果データベース１００に格納されている回答データには、上記低品質回答の回答データ（図１乃至図３参照）と、当該低品質回答ではない（すなわち、通常の正当な）回答の回答データと、が含まれている。これに対し、統計データベース１０２は、例えば公的な機関や地方自治体等が行った種々の且つ一般的な統計データを格納するデータベースである。これらの調査結果データベース１００及び統計データベース１０２それぞれのデータは、記録部２に予め記録されているものであってもよいし、第１実施形態のアンケート回答処理が実行される度に図示しない外部のサーバ装置等からインターネット等のネットワークを介して取得されるものであってもよい。 In the above configuration, the processing unit 1 of the questionnaire response processing device S is connectable to a survey results database 100 and a statistical database 102. The survey results database 100 is a database that stores response data, which is the results (responses) of a question-and-answer survey conducted for users of the terminal device. The response data includes content data indicating the content of the response and identification data for identifying the response or the respondent. The response data stored in the survey results database 100 includes response data for the low-quality responses (see Figures 1 to 3) and response data for responses that are not low-quality responses (i.e., normal, legitimate responses). In contrast, the statistical database 102 is a database that stores various general statistical data collected by, for example, public institutions or local governments. The data for the survey results database 100 and the statistical database 102 may be pre-recorded in the recording unit 2, or may be obtained via a network such as the Internet from an external server device (not shown) each time the questionnaire response processing of the first embodiment is executed.

そして、アンケート回答処理装置Ｓでは、上記調査結果データベース１００及び上記統計データベース１０２を用いつつ、調査結果データベース１００を構成する回答データから低品質回答の回答データを削除する。これに加えてアンケート回答処理装置Ｓは、低品質回答の回答データを削除した後の調査結果データベース１００を構成する回答データに基づき、当該削除前の調査結果データベース１００を構成する回答データを参照しつつ、当該削除した回答データの総数に相当する数（例えば当該削除した回答データの数と同数）の補完回答データを生成する。なお、当該補完回答データの生成方法については、後ほど詳述する。 Then, the survey response processing device S uses the survey results database 100 and the statistical database 102 to delete the response data of low-quality responses from the response data that constitutes the survey results database 100. In addition, the survey response processing device S generates complementary response data in a number equivalent to the total number of deleted response data (for example, the same number as the number of deleted response data) based on the response data that constitutes the survey results database 100 after the response data of low-quality responses has been deleted, while referencing the response data that constitutes the survey results database 100 before the deletion. The method for generating this complementary response data will be described in detail later.

その後アンケート回答処理装置Ｓは、上記削除後の調査結果データベース１００に対して上記生成された補完回答データを加えて新たな高品質データベース１０３を生成する。このとき、新たに生成された高品質データベース１０３を構成する回答データの数は、低品質回答の回答データを削除する前の調査結果データベース１００を構成する回答データの数と同一となる。 Then, the survey response processing device S adds the generated complementary response data to the post-deletion survey results database 100 to generate a new high-quality database 103. At this time, the number of response data constituting the newly generated high-quality database 103 will be the same as the number of response data constituting the survey results database 100 before the response data of the low-quality responses was deleted.

より具体的に、先ずアンケート回答処理装置Ｓの処理部１の矛盾回答者クリーニング部１０は、予め設定された矛盾回答の削除方法を用いて、調査結果データベース１００を構成する回答データから矛盾回答の回答データを削除する。この矛盾回答の削除方法については、後ほど詳述する。 More specifically, first, the inconsistent answerer cleaning unit 10 of the processing unit 1 of the questionnaire response processing device S uses a preset method for deleting inconsistent answers to delete the answer data of inconsistent answers from the answer data that make up the survey results database 100. This method for deleting inconsistent answers will be described in detail later.

次に処理部１の低反応者クリーニング部１１は、予め設定された低反応回答の削除方法を用いて、調査結果データベース１００を構成する回答データから低反応回答の回答データを削除する。この低反応回答の削除方法についても、後ほど詳述する。低反応者クリーニング部１１による低反応回答の回答データの削除は、上記矛盾回答者クリーニング部１０による矛盾回答の回答データの削除と並行して実行される。 Next, the low-response person cleaning unit 11 of the processing unit 1 uses a preset method for deleting low-response answers to delete the response data of low-response answers from the response data that makes up the survey results database 100. This method for deleting low-response answers will be described in detail later. The deletion of the response data of low-response answers by the low-response person cleaning unit 11 is carried out in parallel with the deletion of the response data of contradictory answers by the contradictory answerer cleaning unit 10.

最後に処理部１の統計分布クリーニング部１２は、予め設定された分布疑義回答の削除方法を用いて、統計データベース１０２に格納されている上記統計データに基づき、調査結果データベース１００を構成する回答データから分布疑義回答の回答データを削除する。この分布疑義回答の削除方法についても、後ほど詳述する。統計分布クリーニング部１２による分布疑義回答の回答データの削除は、上記矛盾回答者クリーニング部１０による矛盾回答の回答データの削除及び上記低反応者クリーニング部１１による低反応回答の回答データの削除それぞれと並行して実行される。 Finally, the statistical distribution cleaning unit 12 of the processing unit 1 uses a preset method for deleting distribution-questioning answers to delete the answer data for distribution-questioning answers from the answer data constituting the survey results database 100, based on the statistical data stored in the statistical database 102. This method for deleting distribution-questioning answers will also be described in detail later. The deletion of the answer data for distribution-questioning answers by the statistical distribution cleaning unit 12 is carried out in parallel with the deletion of the answer data for contradictory answers by the contradictory answerer cleaning unit 10 and the deletion of the answer data for low-response answers by the low-response answer cleaning unit 11.

そして、調査結果データベース１００を構成する回答データから、矛盾回答、低反射回答及び分布疑義回答それぞれの回答データが削除された後の回答データは、クリーニング済みデータベース１０１として記録部２に一時的に記録される。 Then, after the response data constituting the survey results database 100 has been deleted for each of the contradictory responses, low-reflection responses, and distribution-questionable responses, the response data is temporarily recorded in the recording unit 2 as a cleaned database 101.

次に、処理部１の生成部１３は、矛盾回答者クリーニング部１０、低反応者クリーニング部１１及び統計分布クリーニング部１２によりそれぞれ削除された回答データの数の総数と同数の上記補完回答データを、低品質回答の回答データの削除後の調査結果データベース１００を構成する回答データに基づき、当該削除前の調査結果データベース１００を構成する回答データを参照しつつ生成する。その後生成部１３は、低品質回答の回答データを削除した後の調査結果データベース１００に対して上記生成された補完回答データを加える。これにより生成部１３は、上記高品質データベース１０３を構成すべき回答データであって、低品質回答の回答データを削除する前の調査結果データベース１００を構成する回答データの数と同数の高品質の回答データを生成し、これらを上記高品質データベース１０３に格納する。 Next, the generation unit 13 of the processing unit 1 generates the above-mentioned complementary answer data in the same number as the total number of answer data deleted by the contradictory answerer cleaning unit 10, the low responder cleaning unit 11, and the statistical distribution cleaning unit 12, based on the answer data that constitutes the survey results database 100 after the answer data of the low-quality answers has been deleted, while referencing the answer data that constitutes the survey results database 100 before the deletion. The generation unit 13 then adds the generated complementary answer data to the survey results database 100 after the answer data of the low-quality answers has been deleted. In this way, the generation unit 13 generates high-quality answer data, which is answer data that should constitute the above-mentioned high-quality database 103 and is the same number as the number of answer data that constitutes the survey results database 100 before the answer data of the low-quality answers has been deleted, and stores this in the above-mentioned high-quality database 103.

なお、上述してきた各機能を実行するに当たって必要な操作は操作部３において実行され、当該操作に対応する操作信号が処理部１に出力される。これにより処理部１は、当該操作信号に基づき、上述してきた一連の機能を実行する。また、当該機能の実行に当たって必要な情報は、例えばディスプレイ４に表示され、アンケート回答処理装置Ｓの操作者等に提示される。 The operations required to execute each of the functions described above are performed by the operation unit 3, and operation signals corresponding to these operations are output to the processing unit 1. Based on these operation signals, the processing unit 1 then executes the series of functions described above. Furthermore, information required to execute these functions is displayed, for example, on the display 4 and presented to the operator of the questionnaire response processing device S.

次に、アンケート回答処理装置Ｓにおいて実行される第１実施形態のアンケート回答処理について、具体的に図４及び図５を用いて説明する。 Next, the survey response processing of the first embodiment executed by the survey response processing device S will be explained in detail using Figures 4 and 5.

上述した機能を有するアンケート回答処理装置Ｓにより実行される第１実施形態のアンケート回答処理は、例えばアンケート回答処理装置Ｓの図示しない電源スイッチがオンとされたタイミングから開始される。なお、第１実施形態のアンケート回答処理の対象となる回答データは、予め調査結果データベース１００に格納されているものとする。 The survey response processing of the first embodiment, which is executed by the survey response processing device S having the above-described functions, begins, for example, when a power switch (not shown) of the survey response processing device S is turned on. It is assumed that the response data that is the subject of the survey response processing of the first embodiment is stored in advance in the survey results database 100.

当該アンケート回答処理が開始されると、先ず、アンケート回答処理装置Ｓの処理部１は、当該アンケート回答処理の対象となる回答データを調査結果データベース１００から取得する（ステップＳ１）。その後、調査結果データベース１００から取得された回答データは、矛盾回答者クリーニング部１０、低反応者クリーニング部１１及び統計分布クリーニング部１２に対して、それぞれに並行して同様に出力される。その後、調査結果データベース１００から取得された回答データに対して、矛盾回答者クリーニング部１０は上述した矛盾回答の削除処理（矛盾回答者クリーニング処理）を実行し（ステップＳ２）、低反応者クリーニング部１１は上記低反応回答の削除処理（低反応者クリーニング処理）を実行し（ステップＳ３）、統計分布クリーニング部１２は上記分布疑義回答の削除処理（統計分布クリーニング処理）を実行する（ステップＳ４）。そして、上記矛盾回答、上記低反応回答及び上記分布疑義回答がそれぞれ削除された調査結果データベース１００の回答データは、それぞれ、クリーニング済みデータベース１０１に一時的に格納される。このとき、元の調査結果データベース１００に格納されている回答データの数（換言すれば回答者の数。各データベースにおける回答データの数について、以下同様。）が１，０００個であり、上記矛盾回答者クリーニング処理（ステップＳ２）により５０個の回答が削除され、上記低反応者クリーニング処理（ステップＳ３）により１００個の回答が削除され、上記統計分布クリーニング処理（ステップＳ４）により３０個の回答が削除された場合に、クリーニング済みデータベース１０１に格納される回答データの数は８２０個であることになる。 When the survey response processing begins, the processing unit 1 of the survey response processing device S first acquires the response data to be processed from the survey result database 100 (step S1). The response data acquired from the survey result database 100 is then output in parallel to the inconsistent respondent cleaning unit 10, the low-responder cleaning unit 11, and the statistical distribution cleaning unit 12. The inconsistent respondent cleaning unit 10 then executes the inconsistent response deletion process (inconsistent respondent cleaning process) described above for the response data acquired from the survey result database 100 (step S2), the low-responder cleaning unit 11 executes the low-responder response deletion process (low-responder cleaning process) (step S3), and the statistical distribution cleaning unit 12 executes the distribution-questionable response deletion process (statistical distribution cleaning process) (step S4). The response data from the survey result database 100 from which the inconsistent responses, low-responder responses, and distribution-questionable responses have been deleted is temporarily stored in the cleaned database 101. In this case, if the number of response data (in other words, the number of respondents; the same applies below to the number of response data in each database) stored in the original survey results database 100 is 1,000, and 50 responses are deleted by the inconsistent respondent cleaning process (step S2), 100 responses are deleted by the low responder cleaning process (step S3), and 30 responses are deleted by the statistical distribution cleaning process (step S4), the number of response data stored in the cleaned database 101 will be 820.

（ａ）第１実施形態の矛盾回答者クリーニング処理について
ここで、上記ステップＳ２の矛盾回答者クリーニング処理について、より具体的に説明する。当該矛盾回答者クリーニング処理においては、上記ステップＳ１で調査結果データベース１００から取得した回答データにより示される回答の中から矛盾回答を判別するための判別手順（以下、当該判別手順を「矛盾回答判別手順」と称する）が予め設定されている。当該矛盾回答判別手順は、それを実現するためのプログラム等であって例えば記録部２に予め記録されているプログラム等を矛盾回答者クリーニング部１０が読み出して実行することにより実現される。なお、第１実施形態の矛盾回答判別手順は、矛盾回答者クリーニング部１０として固定化されたものではなく、例えば第１実施形態の矛盾回答者として排除されるべき回答者の属性等に基づいて変更可能とされている。また、以下に例示する第１実施形態の矛盾回答判別手順の複数の例については、それらを単独で用いてもよいし、二以上の当該判別手順を組み合わせて用いてもよい。 (a) Regarding the contradictory respondent cleaning process of the first embodiment
The inconsistent answerer cleaning process of step S2 will now be described in more detail. In the inconsistent answerer cleaning process, a determination procedure (hereinafter, referred to as the "inconsistent answer determination procedure") is set in advance to determine inconsistent answers from answers indicated by the answer data acquired from the survey result database 100 in step S1. The inconsistent answer determination procedure is implemented by the inconsistent answerer cleaning unit 10 reading and executing a program or the like for implementing the procedure, such as a program pre-recorded in the recording unit 2. The inconsistent answer determination procedure of the first embodiment is not fixed to the inconsistent answerer cleaning unit 10 but can be changed based on, for example, the attributes of respondents to be excluded as inconsistent answerers of the first embodiment. The following examples of the inconsistent answer determination procedure of the first embodiment may be used individually, or two or more of the determination procedures may be used in combination.

そして、このような矛盾回答判別手順の第１例としては、図１を用いて説明した回答ＤＸ１のように、一のアンケート調査に含まれている（又は一のアンケート調査に対応する）複数の質問それぞれへの回答が相互に矛盾している場合に、当該各回答を上記矛盾回答と判別する判別手順が挙げられる。 A first example of such a contradictory answer determination procedure is a determination procedure in which, when answers to multiple questions included in a questionnaire survey (or corresponding to a questionnaire survey) are mutually contradictory, such as answer DX1 described using Figure 1, each answer is determined to be a contradictory answer.

次に、矛盾回答判別手順の第２例としては、上記矛盾回答判別手順の第１例のより具体的な例として、調査結果データベース１００に記録されている回答データが得られたアンケート調査に対応する予備調査（いわゆるスクリーニング調査）に含まれている質問と、当該アンケート調査に対応する本調査に含まれている質問と、が同旨であるにも拘わらず、当該各質問それぞれへの回答が相互に異なっている（すなわち矛盾している）場合に、上記予備調査及び上記本調査それぞれにおける当該回答を上記矛盾回答と判別する判別手順が挙げられる。 Next, a second example of a procedure for determining contradictory answers is a more specific example of the first example of the procedure for determining contradictory answers, in which, even though questions included in a preliminary survey (so-called screening survey) corresponding to the questionnaire survey from which the response data recorded in the survey results database 100 was obtained and questions included in a main survey corresponding to that questionnaire survey are of the same nature, the answers to those questions differ (i.e., are contradictory) from each other, and the answers in the preliminary survey and the main survey are determined to be contradictory answers.

次に、矛盾回答判別手順の第３例として、矛盾回答における矛盾の程度を予め点数化（スコア化。以下同様。）しておくことが挙げられる。より具体的に、上記矛盾回答判別手順の第１例により矛盾回答と判別された回答をした回答者に対応する点数（スコア。以下同様。）を例えば「－１００点」としてその旨を当該回答者に関連付けて記録し、上記矛盾回答判別手順の第２例により矛盾回答と判別された回答をした回答者の点数を「－７０点」としてその旨を当該回答者に関連付けて記録する。そして、当該回答者ごとに点数を加算し、加算後の点数（負の数）が相対的に大きい回答者ほど矛盾回答者としての順位が低い（換言すれば、正常な回答者に近いことになる。以下同様。）回答者としてランキングする手順が挙げられる。 Next, as a third example of a procedure for determining inconsistent answers, the degree of inconsistency in an inconsistent answer may be assigned a score (score; the same applies below) in advance. More specifically, a score (score; the same applies below) corresponding to a respondent whose answer was determined to be inconsistent by the first example of the procedure for determining inconsistent answers above may be set to, for example, "-100 points," and this information may be recorded in association with the respondent. A score for a respondent whose answer was determined to be inconsistent by the second example of the procedure for determining inconsistent answers above may be set to "-70 points," and this information may be recorded in association with the respondent. The scores are then added up for each respondent, and respondents with relatively higher scores (negative numbers) after the addition are ranked lower as inconsistent respondents (in other words, closer to normal respondents; the same applies below).

最後に、矛盾回答判別手順の第４例として、上記矛盾回答判別手順の第１例乃至第３例で矛盾回答と判別された回答の回答データそれぞれの実際の削除方法の例としては、以下の方法が挙げられる。
（あ）矛盾回答と判別された回答の回答データを調査結果データベース１００から全て削除すると共に、当該矛盾回答をした回答者（つまり矛盾回答者）の全てを、そのアンケートの集計対象から除外する方法
（い）上記第３例で矛盾回答と判別された回答の矛盾回答者につき、当該矛盾回答としての点数に応じてアンケートの集計対象から除外する方法。より具体的には、例えば、矛盾回答としての点数の集計結果が高い順に上位２００人までの矛盾回答者を上記集計対象から除外する方法や、当該集計結果が高い矛盾回答者のうち上位１０パーセントまでを上記集計対象から除外する方法が挙げられる。 Finally, as a fourth example of the procedure for determining a contradictory answer, the following method can be given as an example of how to actually delete the answer data of answers determined to be contradictory in the first to third examples of the procedure for determining a contradictory answer.
(a) A method of deleting all response data of answers determined to be inconsistent answers from the survey result database 100, and excluding all respondents who gave such inconsistent answers (i.e., inconsistent respondents) from the survey results. (b) A method of excluding inconsistent respondents whose answers were determined to be inconsistent answers in the third example above from the survey results according to the score they received as an inconsistent answer. More specifically, for example, a method of excluding the top 200 inconsistent respondents in descending order of the score for their inconsistent answers from the survey results, or a method of excluding the top 10 percent of inconsistent respondents with the highest scores from the survey results.

（ｂ）第１実施形態の低反応者クリーニング処理について
次に、上記ステップＳ３の低反応者クリーニング処理について、より具体的に説明する。当該低反応者クリーニング処理では、上記ステップＳ１で調査結果データベース１００から取得した回答データにより示される回答の中から低反応回答を判別するための判別手順（以下、当該判別手順を「低反応回答判別手順」と称する）が予め設定されている。当該低反応回答判別手順は、それを実現するためのプログラム等であって例えば記録部２に予め記録されているプログラム等を低反応者クリーニング部１１が読み出して実行することにより実現される。なお、第１実施形態の低反応回答判別手順は、低反応者クリーニング部１１として固定化されたものではなく、例えば第１実施形態の矛盾回答者として排除されるべき回答者の属性等に基づいて変更可能とされている。また、以下に例示する第１実施形態の低反応回答判別手順の複数の例については、それらを単独で用いてもよいし、二以上の当該判別手順を組み合わせて用いてもよい。 (b) Low-responder cleaning process in the first embodiment
Next, the low-responder cleaning process of step S3 will be described in more detail. In the low-responder cleaning process, a discrimination procedure (hereinafter referred to as the "low-responder response discrimination procedure") is set in advance to discriminate low-responder responses from among the responses indicated by the response data acquired from the survey result database 100 in step S1. The low-responder response discrimination procedure is realized by the low-responder cleaning unit 11 reading and executing a program or the like for implementing the procedure, such as a program pre-recorded in the recording unit 2. The low-responder response discrimination procedure of the first embodiment is not fixed to the low-responder cleaning unit 11, but is changeable based on, for example, the attributes of respondents to be excluded as inconsistent respondents of the first embodiment. Furthermore, the multiple examples of the low-responder response discrimination procedure of the first embodiment illustrated below may be used alone, or two or more of the discrimination procedures may be used in combination.

そして、このような低反応回答判別手順の第１例としては、調査結果データベース１００に記録されている回答データを対象としていわゆるクラスタ分析法による分析を行い、当該分析（分類）されたクラスタの中で最も低反応な（すなわち、質問における具体的な選択肢の選択が最も少ない）クラスタに属する回答者（低反応者）の回答を低反応回答と判別する判別手順が挙げられる。 A first example of such a procedure for identifying low-response responses is a procedure in which the response data recorded in the survey results database 100 is analyzed using the so-called cluster analysis method, and responses from respondents (low responders) who belong to the cluster with the lowest response rate (i.e., the fewest specific options selected in the question) among the analyzed (classified) clusters are identified as low-response responses.

次に、低反応回答判別手順の第２例としては、図２を用いて説明した回答ＤＸ２のように、一の質問における複数（多数）の選択肢について「該当する選択肢が一つもない」と回答した回答者（低反応者）の回答を低反応回答と判別する判別手順が挙げられる。 Next, a second example of a procedure for determining low-response responses is a procedure for determining that a response from a respondent (low responder) who answered "none of the options apply" to multiple (large number of) options in a question, such as response DX2 described using Figure 2, is a low-response response.

また、低反応回答判別手順の第３例としては、複数の質問の全てについて「該当しない」と回答した回答者（低反応者）の回答を低反応回答と判別する判別手順が挙げられる。 A third example of a procedure for identifying low-response responses is a procedure for identifying responses from respondents (low responders) who answer "not applicable" to all of a number of questions as low-response responses.

更に、低反応回答判別手順の第４例として、低反応回答における低反応の程度を予め点数化しておくことが挙げられる。より具体的に、上記低反応回答判別手順の第１例における最も低反応なクラスタに属する回答者に対応する点数を例えば「－９０点」としてその旨を当該回答者に関連付けて記録し、上記低反応回答判別手順の第２例により低反応回答と判別された回答をした回答者の点数を「－３０点」としてその旨を当該回答者に関連付けて記録し、上記低反応回答判別手順の第３例（質問数が三つの場合）により低反応回答と判別された回答をした回答者の点数を「－１５０点」としてその旨を当該回答者に関連付けて記録する。そして、当該回答者ごとに点数を加算し、加算後の点数（負の数）が相対的に大きい回答者ほど低反応者としての順位が低い回答者としてランキングする手順が挙げられる。 Furthermore, as a fourth example of the procedure for determining low response responses, the degree of low response in low response responses may be assigned a score in advance. More specifically, the score corresponding to a respondent belonging to the lowest response cluster in the first example of the procedure for determining low response responses may be recorded as "-90 points," with this information associated with the respondent; the score of a respondent whose answer was determined to be a low response response in the second example of the procedure for determining low response responses may be recorded as "-30 points," with this information associated with the respondent; and the score of a respondent whose answer was determined to be a low response response in the third example of the procedure for determining low response responses (when there are three questions) may be recorded as "-150 points," with this information associated with the respondent. Then, the scores for each respondent may be added up, and respondents with relatively larger scores (negative numbers) after the addition may be ranked as respondents with lower low response ratings.

最後に、低反応回答判別手順の第５例として、上記低反応回答判別手順の第１例乃至第４例で低反応回答と判別された回答の回答データそれぞれの実際の削除方法の例としては、以下の方法が挙げられる。
（あ）低反応回答と判別された回答の回答データを調査結果データベース１００から全て削除すると共に、当該低反応回答をした回答者（つまり低反応者）の全てを、そのアンケートの集計対象から除外する方法
（い）上記第４例で低反応回答と判別された回答の低反応者につき、当該低反応回答としての点数に応じてアンケートの集計対象から除外する方法。より具体的には、例えば、低反応回答としての点数の集計結果が高い順に上位２００人までの低反応者を上記集計対象から除外する方法、又は、当該集計結果が高い低反応者のうち上位１０パーセントまでを上記集計対象から除外する方法
（う）上記低反応回答判別手順の第１例における最も低反応なクラスタに属する回答者の中で予め設定された三つの質問の全てについて「該当しない」と回答した回答者を上記集計対象から除外する方法
（え）上記矛盾回答者と上記低反応者とを合せて上記集計対象から除外する場合の上限を予め設定し、その上限以下の回答者を上記集計対象から除外する方法 Finally, as a fifth example of the procedure for determining low response responses, the following method can be given as an example of how to actually delete the response data of each of the responses determined to be low response responses in the first to fourth examples of the procedure for determining low response responses.
(a) A method of deleting all response data for answers determined to be low response answers from the survey result database 100 and excluding all respondents who gave such low response answers (i.e., low responders) from the survey results database 100; (b) A method of excluding low responders whose answers were determined to be low response answers in the fourth example above from the survey results database 100 according to the score of the low response answer. More specifically, for example, a method of excluding the top 200 low responders in descending order of the score of the low response answer from the survey results database, or a method of excluding the top 10% of low responders with the highest score from the survey results database; (c) A method of excluding respondents who answered "not applicable" to all three preset questions among respondents belonging to the lowest response cluster in the first example of the low response answer discrimination procedure from the survey results database; (d) A method of setting a limit in advance for excluding the inconsistent respondents and the low responders together from the survey results database, and excluding respondents below that limit from the survey results database;

（ｃ）第１実施形態の統計分布クリーニング処理について
次に、上記ステップＳ４の統計分布クリーニング処理について、より具体的に説明する。当該統計分布クリーニング処理では、上記ステップＳ１で調査結果データベース１００から取得した回答データにより示される回答の中から、統計データベース１０２に格納されている上記統計データに基づいて分布疑義回答を判別するための判別手順（以下、当該判別手順を「分布疑義回答判別手順」と称する）が予め設定されている。当該分布疑義回答判別手順は、それを実現するためのプログラム等であって例えば記録部２に予め記録されているプログラム等を統計分布クリーニング部１２が読み出して実行することにより実現される。なお、第１実施形態の分布疑義回答判別手順は、統計分布クリーニング部１２として固定化されたものではなく、例えば第１実施形態の分布疑義回答をしたとして排除したい回答者の属性等に基づいて変更可能とされている。また、以下に例示する第１実施形態の分布疑義回答判別手順の複数の例については、それらを単独で用いてもよいし、二以上の当該判別手順を組み合わせて用いてもよい。 (c) Statistical Distribution Cleaning Process of the First Embodiment
Next, the statistical distribution cleaning process of step S4 will be described in more detail. In this statistical distribution cleaning process, a discrimination procedure (hereinafter referred to as the "distribution-questioned answer discrimination procedure") is preset for discriminating distribution-questioned answers from among the answers indicated by the answer data acquired from the survey result database 100 in step S1 based on the statistical data stored in the statistical database 102. This distribution-questioned answer discrimination procedure is implemented by the statistical distribution cleaning unit 12 reading and executing a program or the like for implementing the procedure, such as a program pre-recorded in the recording unit 2. Note that the distribution-questioned answer discrimination procedure of the first embodiment is not fixed to the statistical distribution cleaning unit 12 but can be changed based on, for example, the attributes of respondents who are to be excluded as having given the distribution-questioned answer of the first embodiment. Furthermore, the multiple examples of the distribution-questioned answer discrimination procedure of the first embodiment shown below may be used individually, or two or more of these discrimination procedures may be used in combination.

そして、このような分布疑義回答判別手順の第１例としては、統計データベース１０２に格納されている上記統計データに基づき、調査結果データベース１００に回答データが記録されている回答において統計的に不適切なセグメントに属する回答者からの当該回答（図３符号「ＤＸ３」参照）を上記分布疑義回答と判別する判別手順が挙げられる。 A first example of such a procedure for determining whether an answer is distributionally questionable is a procedure for determining whether an answer from a respondent who belongs to a statistically inappropriate segment (see "DX3" in Figure 3) is a distributionally questionable answer based on the statistical data stored in the statistical database 102.

より具体的に例えば、上記統計データが性別及び年代別で十歳刻みのセグメントで統計が取られているとする。そして、調査結果データベース１００に回答データが格納されている回答者を上記統計データと同様に性別及び年代別で十歳刻みのセグメントに分けたとき、当該セグメントごとの回答者数が、対応する上記統計データのセグメントに属する者の数から±１０パーセント以上ずれている場合（例えば、統計データ上は男性十代が３パーセントしかいないはずなのに、調査結果データベース１００に回答データが格納されている回答者で男性十代が１８パーセントであった場合等）に、調査結果データベース１００に回答データが格納されている当該男性十代の回答者の回答を上記分布疑義回答と判別する。 More specifically, suppose the statistical data is collected by gender and age group in ten-year age segments. Then, when the respondents whose response data is stored in the survey results database 100 are divided into ten-year age segments by gender and age group in the same way as the statistical data, if the number of respondents in each segment deviates by ±10% or more from the number of people belonging to the corresponding segment of the statistical data (for example, if the statistical data indicates that only 3% of respondents are teenage males, but 18% of the respondents whose response data is stored in the survey results database 100 are teenage males), the response of the teenage male respondent whose response data is stored in the survey results database 100 is determined to be a response with questionable distribution.

次に、分布疑義回答判別手順の第２例としては、第１実施形態の統計分布クリーニング処理として、上記第１例と同様にセグメント化された統計データのセグメント（以下、「統計データセグメント」と称する）と調査結果データベース１００に格納されている回答データのセグメント（以下、「回答データセグメント」と称する）とを比較し、対応する統計データセグメントの回答数に対応していない回答数の回答データセグメントに属する回答者数が、上記対応する統計データセグメントの回答数から±５パーセントのずれの範囲になるまで、回答データセグメントごとに同じ規則に沿ってそれに属する回答データを削減することとする。そして、その削減の対象となる回答データの回答を上記分布疑義回答と判別する判別手順が挙げられる。このとき、削除の対象となる上記回答データセグメントの回答の回答者について、例えば、上記矛盾回答者又は上記低反応者としての点数化における上位者から（すなわち、矛盾回答者又は低反応者に該当する蓋然性が高い回答者から）順に削減の対象とする。 Next, as a second example of a procedure for determining distribution-questionable answers, as in the statistical distribution cleaning process of the first embodiment, segments of statistical data segmented in the same manner as in the first example above (hereinafter referred to as "statistical data segments") are compared with segments of response data stored in the survey results database 100 (hereinafter referred to as "response data segments"), and the response data belonging to each response data segment, whose number of responses does not correspond to the number of responses in the corresponding statistical data segment, is reduced according to the same rules until the number of respondents belonging to that response data segment is within a ±5% deviation from the number of responses in the corresponding statistical data segment. A procedure for determining whether the responses in the response data to be reduced are distribution-questionable answers is then exemplified. In this case, respondents whose responses in the response data segment to be deleted are targeted for reduction, for example, starting with those ranked highest in terms of scoring as inconsistent respondents or low responders (i.e., respondents most likely to be inconsistent respondents or low responders).

上記矛盾回答者クリーニング処理（ステップＳ２）、上記低反応者クリーニング処理（ステップＳ３）及び上記統計分布クリーニング処理（ステップＳ４）により、それぞれ低品質回答が削除されると、次に処理部１の生成部１３は、低品質回答の回答データを削除した後の調査結果データベース１００を構成する回答データに基づき、当該削除前の調査結果データベース１００を構成する回答データを参照しつつ、例えば上記特開２０２１－１７９８６５号公報に記載された本発明の発明者等によるデータ生成方法を用いて、削除された低品質回答の回答データの数の総数と同数の上記補完回答データを新たに生成する（ステップＳ５）。このステップＳ５における上記補完回答データの生成に当たって、上記特開２０２１－１７９８６５号公報に記載されたＡＩを用いたデータ生成方法を用いれば、その補完回答データにより補完された回答データ全体の品質を向上させることができることになる。 After the low-quality answers are deleted by the inconsistent answerer cleaning process (step S2), the low responder cleaning process (step S3), and the statistical distribution cleaning process (step S4), the generation unit 13 of the processing unit 1 then generates new complementary answer data in the same number as the total number of deleted low-quality answer data, based on the answer data that constitutes the survey results database 100 after the answer data of the low-quality answers has been deleted, while referencing the answer data that constitutes the survey results database 100 before the deletion, using, for example, the data generation method developed by the inventors of the present invention described in JP 2021-179865 A (step S5). Using the AI-based data generation method described in JP 2021-179865 A to generate the complementary answer data in step S5 can improve the overall quality of the answer data complemented by the complementary answer data.

より具体的に、上記矛盾回答者クリーニング処理（ステップＳ２）により５０個の回答が削除され、上記低反応者クリーニング処理（ステップＳ３）により１００個の回答が削除され、上記統計分布クリーニング処理（ステップＳ４）により３０個の回答が削除された場合に、生成部１３は、上記データ生成方法等を用いて新たに１８０個の補完回答データを生成する。その後生成部１３は、低品質回答の回答データを削除した後の調査結果データベース１００の回答データを上記クリーニング済みデータベース１０１から取得し、当該取得した回答データに対して上記ステップＳ５で生成された補完回答データを追加する（ステップＳ６）。これにより生成部１３は、低品質回答の回答データを削除する前の調査結果データベース１００を構成する回答データの数と同数の高品質な回答データを生成し、これらを上記高品質データベース１０３に格納する（ステップＳ６）。 More specifically, if 50 answers are deleted by the inconsistent answerer cleaning process (step S2), 100 answers are deleted by the low responder cleaning process (step S3), and 30 answers are deleted by the statistical distribution cleaning process (step S4), the generation unit 13 generates 180 new complementary answer data using the data generation method, etc. The generation unit 13 then acquires the answer data of the survey results database 100 after the low-quality answer data has been deleted from the cleaned database 101, and adds the complementary answer data generated in step S5 to the acquired answer data (step S6). As a result, the generation unit 13 generates high-quality answer data in the same number as the number of answer data constituting the survey results database 100 before the low-quality answer data was deleted, and stores this in the high-quality database 103 (step S6).

その後処理部１は、例えば操作部３による終了操作等により第１実施形態のアンケート回答処理を終了するか否かを判定する（ステップＳ７）。ステップＳ７の判定において、当該アンケート回答処理を終了する場合（ステップＳ７：ＹＥＳ）、処理部１は、そのまま当該アンケート回答処理を終了する。一方、ステップＳ７の判定において、例えば他の調査結果データベース１００を対象として当該アンケート回答処理を継続する場合（ステップＳ７：ＮＯ）、処理部１は、上記ステップＳ１に戻り、上記他の調査結果データベース１００を対象として上述してきた処理を継続する。 The processing unit 1 then determines whether or not to terminate the survey response processing of the first embodiment, for example, by performing a termination operation on the operation unit 3 (step S7). If the determination in step S7 is that the survey response processing is to be terminated (step S7: YES), the processing unit 1 terminates the survey response processing. On the other hand, if the determination in step S7 is that the survey response processing is to be continued, for example, with another survey results database 100 as the target (step S7: NO), the processing unit 1 returns to step S1 and continues the processing described above with the other survey results database 100 as the target.

以上説明したように、第１実施形態のアンケート回答処理装置Ｓによるアンケート回答処理によれば、質問に対する回答を求める形式によりネットワークを介して実施されたアンケート調査に対応する回答データを複数取得し（図５ステップＳ１参照）、その回答データの中から低品質回答の回答データを抽出して削除する（図５ステップＳ２乃至ステップＳ４参照）。このとき、低品質回答の回答データとして矛盾回答（図５ステップＳ２参照）の回答データ、低反応回答（図５ステップＳ３参照）の回答データ及び分布疑義回答（図５ステップＳ４参照）の回答データをそれぞれ削除するので、アンケートの回答の集計時及びその後の当該回答の分析時に除外されるべき低品質回答の回答データを効果的に除外（削除）することができる。 As described above, according to the survey response processing by the survey response processing device S of the first embodiment, multiple pieces of response data corresponding to a survey conducted over a network in the form of requesting answers to questions are acquired (see step S1 in Figure 5), and low-quality response data are extracted from the response data and deleted (steps S2 to S4 in Figure 5). At this time, response data for inconsistent responses (see step S2 in Figure 5), low-response responses (see step S3 in Figure 5), and distribution-questionable responses (see step S4 in Figure 5) are deleted as low-quality response data, thereby effectively excluding (deleting) low-quality response data that should be excluded when compiling survey responses and when subsequently analyzing those responses.

また、上記矛盾回答判別手順の第１例乃至上記矛盾回答判別手順の第４例を用いて矛盾回答の回答データを判別する場合（図５ステップＳ２参照）は、適格且つ適正に当該矛盾回答の回答データを抽出して削除することができる。 Furthermore, when discriminating answer data for contradictory answers using the first example of the contradictory answer discrimination procedure to the fourth example of the contradictory answer discrimination procedure (see step S2 in Figure 5), the answer data for the contradictory answer can be extracted and deleted appropriately and properly.

このとき、上記矛盾回答判別手順の第３例又は第４例を用いて矛盾回答の回答データを判別する場合（図５ステップＳ２参照）は、点数化による客観的な基準を用いて低品質回答の回答データを抽出して削除することができる。 In this case, when identifying inconsistent answer data using the third or fourth example of the inconsistent answer identification procedure described above (see step S2 in Figure 5), it is possible to extract and delete low-quality answer data using objective criteria based on scoring.

更に、上記低反応回答判別手順の第１例乃至上記低反応回答判別手順の第３例を用いて低反射回答の回答データを判別する場合（図５ステップＳ３参照）は、適正な基準を用いて低反応回答の回答データを適格に抽出して削除することができる。 Furthermore, when identifying response data for low-response responses using the first example of the low-response response discrimination procedure or the third example of the low-response response discrimination procedure (see step S3 in Figure 5), the response data for low-response responses can be appropriately extracted and deleted using appropriate criteria.

更にまた、上記低反応回答判別手順の第４例を用いる場合には、より客観的な基準を用いて低反射回答の回答データを削除することができる。 Furthermore, when using the fourth example of the low-response response determination procedure described above, response data for low-response responses can be deleted using more objective criteria.

また、上記分布疑義回答判別手順により分布疑義回答の回答データを判別する場合（図５ステップＳ４参照）は、適正な基準を用いて分布疑義回答の回答データを適格に抽出して削除することができる。 Furthermore, when determining the distribution of questioned answers using the above-mentioned distribution of questioned answers determination procedure (see step S4 in Figure 5), the distribution of questioned answers can be appropriately extracted and deleted using appropriate criteria.

更に、低品質回答の回答データが削除された後の回答データに基づいて、削除された低品質回答の数と同数の補完回答の補完回答データを新たに生成するので（図５ステップＳ５及びステップＳ６参照）、低品質回答の回答データが削除された後の回答データを用いることで、低品質回答の回答データが削除され且つ回答データの数としても必要十分な高品質データベース１０３を構築することができる。 Furthermore, new complementary answer data for complementary answers in the same number as the deleted low-quality answers is generated based on the answer data after the answer data for low-quality answers has been deleted (see steps S5 and S6 in Figure 5).By using the answer data after the answer data for low-quality answers has been deleted, it is possible to construct a high-quality database 103 in which the answer data for low-quality answers has been deleted and which has a necessary and sufficient number of answer data.

このとき、上記補完回答データの生成に当たって例えばＡＩを用いたデータ生成方法を用いれば、その補完回答データにより補完された回答データ全体（すなわち、アンケート調査の補完された回答の回答データ全体）の品質を向上させることができる。また、上記ステップＳ５及びステップＳ６により生成される補完回答データの数は、上記削除された低品質回答の数と同数である他に、当該削除された低品質回答の数より少ない数であって当該削除された低品質回答の数に対応する数であってもよい。 In this case, if a data generation method using AI, for example, is used to generate the complementary response data, the quality of the entire response data complemented by the complementary response data (i.e., the entire response data of the complemented answers in the questionnaire survey) can be improved. Furthermore, the number of complementary response data generated in steps S5 and S6 above may be the same as the number of deleted low-quality answers, or it may be a number that is smaller than the number of deleted low-quality answers and corresponds to the number of deleted low-quality answers.

更にまた、補完回答データの生成に当たって、元の調査結果データベース１００における分布を参照して補完回答データが新たに生成されるので、元の調査結果データベース１００の分布に対応しつつ、回答数としても必要十分な高品質データベース１０３を構築することができる。 Furthermore, when generating the complementary response data, the new complementary response data is generated by referencing the distribution in the original survey results database 100, making it possible to construct a high-quality database 103 that corresponds to the distribution in the original survey results database 100 while also having a necessary and sufficient number of responses.

なお、図５に示す第１実施形態のアンケート回答処理においては、上記矛盾回答者クリーニング処理（ステップＳ２）、上記低反応者クリーニング処理（ステップＳ３）、及び上記統計分布クリーニング処理（ステップＳ４）が並行して実行される場合について説明したが、これ以外に、これらのクリーニング処理が直列に、すなわち、上記矛盾回答者クリーニング処理（ステップＳ２）→上記低反応者クリーニング処理（ステップＳ３）→上記統計分布クリーニング処理（ステップＳ４）の順に、時系列的に各クリーニング処理が実行されてもよい。 In the questionnaire response processing of the first embodiment shown in FIG. 5, the inconsistent respondent cleaning processing (step S2), the low respondent cleaning processing (step S3), and the statistical distribution cleaning processing (step S4) are described as being executed in parallel. However, these cleaning processes may also be executed serially, i.e., in the order of inconsistent respondent cleaning processing (step S2) → low responder cleaning processing (step S3) → statistical distribution cleaning processing (step S4).

また、上述した第１実施形態のアンケート回答処理では、当該アンケート回答処理における削除の対象となる低品質回答の回答データが、上述した矛盾回答の回答データ、低反応回答の回答データ又は分布疑義回答の回答データである場合について説明した。しかしながら、当該削除の対象となる低品質回答の回答データとしては、上記矛盾回答等の回答データに加えて、アンケート調査の回答者がそれに含まれる質問に回答するのに要した回答所要時間が予め設定された閾値よりも短い低時間回答の回答データを加えてもよい。この場合は、上述した矛盾回答の回答データ、低反応回答の回答データ又は分布疑義回答の回答データ或いは上記低時間回答の回答データのいずれか又は全てに該当する回答データを削除するように構成すればよい。 Furthermore, in the survey response processing of the first embodiment described above, the case was explained in which the response data of low-quality answers to be deleted in the survey response processing is the above-mentioned response data of contradictory answers, response data of low-response answers, or response data of answers with questionable distribution. However, in addition to the above-mentioned response data of contradictory answers, the low-quality response data to be deleted may also include response data of low-time answers, where the response time required by the survey respondent to answer the included questions is shorter than a preset threshold. In this case, the system may be configured to delete response data that corresponds to any or all of the above-mentioned response data of contradictory answers, response data of low-response answers, response data of answers with questionable distribution, or the above-mentioned response data of low-time answers.

（II）第２実施形態
次に、本発明の他の実施形態である第２実施形態について、図６を用いて説明する。なお、図６は第２実施形態のアンケート回答処理装置の概要構成を示すブロック図である。 (II) Second embodiment
Next, a second embodiment of the present invention will be described with reference to Fig. 6. Fig. 6 is a block diagram showing the general configuration of a questionnaire response processing device according to the second embodiment.

上述した第１実施形態のアンケート回答処理では、調査結果データベース１００から低品質回答の回答データを削除すると共に当該削除数と同数の補完回答データを新たに生成し、これらにより、低品質回答の回答データを削除する前の調査結果データベース１００を構成する回答データの数と同数の高品質な回答データを生成して上記高品質データベース１０３に格納した。これに対し、以下に説明する第２実施形態のデータベース生成処理では、生成部１３により新たな回答データを更に生成し、第１実施形態の高品質データベース１０３の回答データ数よりも更に回答データの数を増やした、例えば仮想市場データベースとしての新たな大規模データベースを生成する。 In the survey response processing of the first embodiment described above, response data for low-quality responses is deleted from the survey results database 100, and new complementary response data is generated in the same number as the deleted response data. As a result, high-quality response data is generated in the same number as the number of response data constituting the survey results database 100 before the low-quality response data was deleted, and stored in the high-quality database 103. In contrast, in the database generation processing of the second embodiment described below, new response data is further generated by the generation unit 13, and a new large-scale database is generated, for example as a virtual market database, with an even larger number of response data than the number of response data in the high-quality database 103 of the first embodiment.

なお、第２実施形態のアンケート回答処理装置のハードウェア的な構成は、基本的には第１実施形態のアンケート回答処理装置Ｓのハードウェア的な構成と同一である。よって以下の説明では、当該アンケート回答処理装置Ｓと同様の部材については同一の部材番号を付して細部の説明は省略する。 The hardware configuration of the questionnaire response processing device of the second embodiment is basically the same as the hardware configuration of the questionnaire response processing device S of the first embodiment. Therefore, in the following explanation, components that are similar to those of the questionnaire response processing device S are assigned the same component numbers, and detailed explanations will be omitted.

図６に示すように、第２実施形態のアンケート回答処理装置Ｓ１の処理部１Ａの生成部１３Ａは、第１実施形態の生成部１３の機能に加えて、例えば統計データベース１０２を構成する統計データの分布等を参照しつつ、高品質データベース１０３の回答データに基づき、上記生成部１３と同様の方法により、例えば高品質データ１０３の数に対応して予め設定された数の新たな回答データを更に生成する。 As shown in FIG. 6, the generation unit 13A of the processing unit 1A of the questionnaire response processing device S1 of the second embodiment, in addition to the functions of the generation unit 13 of the first embodiment, further generates a predetermined number of new response data corresponding to the number of high-quality data 103, for example, in a manner similar to that of the generation unit 13, based on the response data in the high-quality database 103, while referring to, for example, the distribution of statistical data constituting the statistical database 102.

その後、生成部１３Ａは、当該新たに生成された回答データと、高品質データベース１０３に既に格納されている回答データと、を合せて（すなわち回答データの拡張を行って）、第２実施形態の大規模データベース１１０を構成する回答データを生成し、当該大規模データベース１１０に格納する。 The generation unit 13A then combines the newly generated response data with the response data already stored in the high-quality database 103 (i.e., expands the response data) to generate response data that constitutes the large-scale database 110 of the second embodiment, and stores the response data in the large-scale database 110.

以上説明した第２実施形態のアンケート回答処理によれば、第１実施形態のアンケート回答処理と同様の効果に加えて、大規模データベース１１０を簡易に構築することができるという効果を奏する。 The questionnaire response processing of the second embodiment described above not only provides the same effects as the questionnaire response processing of the first embodiment, but also has the effect of making it possible to easily construct a large-scale database 110.

これに加えて、上述した第２実施形態のアンケート回答処理によれば、第１実施形態と同様の（例えばＡＩを用いて生成された）補完回答データを用いた回答データの拡張を行って大規模な仮想市場データベースを構築する場合に、その仮装市場データベースに含まれるデータの品質をも向上させることができる。 In addition, according to the questionnaire response processing of the second embodiment described above, when a large-scale virtual market database is constructed by expanding response data using complementary response data (e.g., generated using AI) similar to that of the first embodiment, the quality of the data contained in the virtual market database can also be improved.

以上それぞれ説明したように、本発明はデータベースを構成する回答データの処理の分野に利用することが可能であり、特に当該回答データの高品質化の分野に適用すれば特に顕著な効果が得られる。 As explained above, the present invention can be used in the field of processing response data that makes up a database, and can produce particularly significant effects when applied to the field of improving the quality of that response data.

１、１Ａ処理部
２記録部
３操作部
４ディスプレイ
１０矛盾回答者クリーニング部
１１低反応者クリーニング部
１２統計分布クリーニング部
１３、１３Ａ生成部
１００調査結果データベース
１０１クリーニング済みデータベース
１０２統計データベース
１０３高品質データベース
１１０大規模データベース
Ｓ、Ｓ１アンケート回答処理装置
Ｄ１、Ｄ２、Ｄ３調査結果
ＤＸ１、ＤＸ２、ＤＸ３回答 1, 1A Processing unit 2 Recording unit 3 Operation unit 4 Display 10 Inconsistent respondent cleaning unit 11 Low responder cleaning unit 12 Statistical distribution cleaning unit 13, 13A Generation unit 100 Survey result database 101 Cleaned database 102 Statistical database 103 High-quality database 110 Large-scale database S, S1 Questionnaire response processing device D1, D2, D3 Survey results DX1, DX2, DX3 Responses

Claims

an answer data acquisition means for acquiring a plurality of answer data including content data indicating the content of the answers to a questionnaire conducted via a network in a format requesting answers to questions, and identification data for identifying the answers;
An extraction means for extracting low-quality answer data corresponding to a low-quality answer that is an answer that falls under a low-quality standard that is preset as the answer from the acquired answer data;
a deletion means for deleting the extracted low-quality response data from the acquired response data based on the identification data indicating the extracted low-quality response data;
a generating means for generating new complementary answer data in a number corresponding to the number of deleted low-quality answer data, based on the answer data after the low-quality answer data has been deleted, to replace the deleted low-quality answer data;
Equipped with
The low quality standard is:
(i) a first low-quality criterion: when the answers to a plurality of the questions are mutually contradictory, the answers are determined to be low-quality answers;
(ii) a second low-quality criterion: determining an answer to the multiple-choice question in which the number of selections is less than a predetermined number as the low-quality answer; and
(iii) Third low-quality criterion: determining that the answer given by a respondent belonging to a different respondent group from the respondent group from which the answer is expected is a low-quality answer;
Response data processing device characterized in that it is any one or all of the above.

2. The response data processing device according to claim 1,
the first low-quality criterion is that the answers to the questions included in one of the questionnaires are mutually contradictory;
The answer data processing device is characterized in that the extraction means extracts the low-quality answer data corresponding to the low-quality answer corresponding to the contradiction from the acquired answer data.

3. The response data processing device according to claim 2,
The first low quality criterion is
A first low-quality criterion (a) according to claim 2, or
A first low-quality criterion (b) is that the questions included in the preliminary survey corresponding to the questionnaire and the questions included in the main survey corresponding to the questionnaire are of the same meaning, and the answers to the questions are different from each other;
Either
The extraction means extracts low-quality answer information corresponding to any of the low- quality answers from the acquired answer information,
Each of the first low quality standards is scored so that the first low quality standard (b) is lower in quality as the answer than the first low quality standard (a),
The method further includes determining means for determining the number of the low-quality response data to be deleted based on each of the first low-quality criteria scored and a predetermined deletion number criterion;
The answer data processing device, wherein the deletion means deletes the determined number of low-quality answer data from the acquired answer data.

4. The response data processing device according to claim 1,
the third low-quality criterion is that the difference between the ratio of the number of respondents belonging to the group of respondents from which the answer is expected to be given to the total number of respondents and the ratio of the number of respondents belonging to the different group of respondents who have given the answer to the total number of respondents is equal to or greater than a predetermined standard;
The answer data processing device is characterized in that the extraction means extracts the low-quality answer data corresponding to the low-quality answers by respondents belonging to the different answer group that fall under the third low-quality criterion from the acquired answer data .

2. The response data processing device according to claim 1 ,
The answer data processing device, wherein the generating means generates the complementary answer data by referring to the distribution of the entire acquired answer data .

A response data processing method executed in a response data processing device including a response data acquisition means, an extraction means, a deletion means, and a generation means,
a response data acquisition step of acquiring, by the response data acquisition means, a plurality of response data pieces, each of which includes content data indicating the content of the response to a questionnaire conducted via a network in a format requesting an answer to a question, and identification data for identifying the response;
an extraction step of extracting, from the acquired answer data, low-quality answer data corresponding to a low-quality answer that is an answer that falls under a low-quality standard that is preset as the answer by the extraction means;
a deleting step of deleting the extracted low-quality response data from the acquired response data by the deleting means based on the identification data indicating the extracted low-quality response data;
a generating step of generating, by the generating means, new complementary answer data in a number corresponding to the number of deleted low-quality answer data, based on the answer data after the low-quality answer data has been deleted, to replace the deleted low-quality answer data;
Including,
The low quality standard is:
(i) a first low-quality criterion: when the answers to a plurality of the questions are mutually contradictory, the answers are determined to be low-quality answers;
(ii) a second low-quality criterion: determining an answer to the multiple-choice question in which the number of selections is less than a predetermined number as the low-quality answer; and
(iii) Third low-quality criterion: determining that the answer given by a respondent belonging to a different respondent group from the respondent group from which the answer is expected is a low-quality answer;
The answer data processing method is characterized in that any one or all of the above is performed .

The computer included in the response data processing device is
a response data acquisition means for acquiring a plurality of response data including content data indicating the content of the response to a questionnaire conducted via a network in a format requesting an answer to a question, and identification data for identifying the response;
an extraction means for extracting, from the acquired answer data, low-quality answer data corresponding to a low-quality answer that is an answer that falls under a low-quality standard that is preset as the answer;
a deletion means for deleting the extracted low-quality response data from the acquired response data based on the identification data indicating the extracted low-quality response data; and
a generating means for generating new complementary answer data in a number corresponding to the number of deleted low-quality answer data, based on the answer data after the low-quality answer data has been deleted, to replace the deleted low-quality answer data;
A response data processing program that functions as
The low quality standard is:
(i) a first low-quality criterion: when the answers to a plurality of the questions are mutually contradictory, the answers are determined to be low-quality answers;
(ii) a second low-quality criterion: determining an answer to the multiple-choice question in which the number of selections is less than a predetermined number as the low-quality answer; and
(iii) Third low quality criterion: Answers that belong to a different group of respondents than the group of respondents from whom the answer is expected.
determining the answer by the user as the low quality answer;
A program for processing answer data, characterized in that it is any one or all of the above .