WO2018002953A1 - Integrated decision support system and method for deriving inferences from data sets - Google Patents
Integrated decision support system and method for deriving inferences from data sets Download PDFInfo
- Publication number
- WO2018002953A1 WO2018002953A1 PCT/IN2017/050258 IN2017050258W WO2018002953A1 WO 2018002953 A1 WO2018002953 A1 WO 2018002953A1 IN 2017050258 W IN2017050258 W IN 2017050258W WO 2018002953 A1 WO2018002953 A1 WO 2018002953A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- neurule
- attributes
- sub
- inferences
- uncertainty
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
Definitions
- the embodiments herein relate to expert systems and more particularly relates to an integrated decision support system and method for handling uncertain data by integrating uncertainty reasoning, learning and inference mechanism.
- the present application is based on, and claims priority from an Indian Application Number 201621022150 filed on 28 th June, 2016 the disclosure of which is hereby incorporated by reference herein.
- the principal object of the embodiments herein is to provide an integrated decision support system and method which have a reasoning mechanism to quantify uncertainty from the incomplete or inaccurate data and a learning mechanism to evaluate rules and an inference mechanism to make inferences from learned rules.
- Another object of the embodiments herein is to provide an integrated decision support system which can handle the uncertain data by integrating an uncertainty reasoning mechanism along with the learning mechanism.
- Another object of the embodiments herein is to provide an integrated decision support system and method for deriving inferences from data sets.
- Another object of the embodiments herein is to provide an integrated decision support system which produces better models of knowledge acquisition, robust learning and reasoning in presence of uncertainty.
- Another object of the embodiments herein is to provide a method for deriving inferences which can be applied to any real world problem, including both practical and theoretical applications.
- Another object of the embodiments herein is to provide an integration of knowledge representation, reasoning the knowledge, learning and producing rules from it and finally making inferences.
- Another object of the embodiments herein is to provide a method to model conditions for evaluating uncertainty in a system.
- Another object of the embodiments herein is to provide an expert decision support system which can make accurate inferences.
- Another object of the embodiment is to provide an efficient inference mechanism to make inferences from any real world data.
- Another object of the embodiments herein is to provide an integrated reasoning and learning system, which can reason from uncertain or incomplete or inaccurate data and learn and make inferences using associated rules.
- Another object of the embodiments herein is to provide an integrated decision support system which quantifies the uncertainty through the reasoning mechanism and derives inferences by evaluating the learned rules.
- Another object of the embodiment is to provide a reasoning mechanism to handle the incomplete or inaccurate or uncertain data
- Another object of the embodiments herein is to provide a learning mechanism to converge with a pattern of data.
- Another object of the embodiments herein is to provide a human computer interactive system for user friendliness.
- the embodiments herein provide a method for deriving inferences from data sets.
- the method includes obtaining a plurality of data sets. Each data set includes a set of attributes.
- the method includes determining sub-attributes for each of the attribute in each data set.
- the method includes constructing a neurule based on a relation between the set of attributes and the sub-attributes in the each data set.
- the method includes creating a training set for each constructed neurule.
- the method includes deriving the inferences from each constructed neurule based on the set of attributes and the sub-attributes in each data set.
- the method includes creating a training set for each constructed neurule.
- the derived inferences include a plurality of neurules with a value associated with each neurule.
- deriving the inferences from each constructed neurule includes modeling input conditions for evaluating uncertainty in each attribute and sub-attribute.
- each neurule is constructed using the evaluated uncertainty for deriving the inferences from each construced neurule.
- the embodiments herein provide an integrated decision support system which has a reasoning mechanism to quantify the uncertainty from the incomplete or inaccurate data and a learning mechanism to evaluate rules and an inference mechanism to make inferences from the learned rules.
- the proposed integrated decision support system includes a reasoning module and a learning module for deriving inferences from data sets.
- the reasoning mechanism derives conclusions based on beliefs and the inference mechanism derives conclusions based on facts.
- the embodiments herein provide a computer program product comprising computer executable program code recorded on a computer readable non-transitory storage medium, the computer executable program code when executed causing the actions including obtaining a plurality of data sets. Each data set includes a set of attributes. The method includes determining sub-attributes for each of the attribute in each data set. The computer executable program code when executed causing the further actions including constructing a neurule based on a relation between the set of attributes and the sub-attributes in the each data set. The computer executable program code when executed causing the further actions including creating training set for each constructed neurule. The computer executable program code when executed causing the further actions including deriving the inferences from each constructed neurule based on the set of attributes and the sub-attributes in each data set.
- FIG. la illustrates an architecture of an integrated decision support based system, according to the embodiments as disclosed herein;
- FIG. lb illustrates various units of integrated inference engine of the integrated decision support system described in the FIG. la, according to the embodiments as disclosed herein;
- FIG. lc illustrates the integrated decision support system with an input and an output, according to an embodiment as disclosed herein;
- FIG. Id illustrates various modules involved in the input and output of the integrated decision support system, according to an embodiment as disclosed herein;
- FIG. 2 is a flow diagram illustrating a method for deriving inferences from data sets, according to the embodiments as disclosed herein;
- FIGS. 3a and 3b illustrate an example representation of a neurule, according to the embodiments as described herein;
- FIG. 4 illustrates fact assertions in a neurule, according to the embodiments as disclosed herein;
- FIG. 5a illustrates an example neurule structure for a Supercilliosis disease, according to the embodiments as disclosed herein;
- FIG. 5b illustrates the example neurule structure for the Supercilliosis disease with a belief region, according to the embodiments as disclosed herein;
- FIG. 5c illustrates the example neurule structure for the Supercilliosis disease with a sub-arbitrary belief region, according to the embodiments as disclosed herein;
- FIG. 6 is a graph showing comparison of inference mechanisms in terms of runtime, according to the embodiments as disclosed herein;
- FIG. 7 is a graph showing comparison of inference mechanisms in terms of computations, according to the embodiments as disclosed herein;
- FIG. 8 is a graph showing comparison of inference mechanisms in terms of convergent rate, according to the embodiments as disclosed herein.
- FIG. 9 is a graph showing comparison of generalization performance for various learning methods, according to the embodiments as disclosed herein.
- FIG. 10 illustrates a computing environment implementing the method for deriving inferences from data sets, according to the embodiments as disclosed herein.
- the embodiments herein achieve an integrated decision support system and method for deriving inferences from data sets.
- the method includes obtaining a plurality of data sets.
- the data sets are obtained from a knowledge base.
- Each data set includes a set of attributes.
- each of the symptom in the data set can be considered as an attribute.
- the method includes determining sub-attributes for each of the attribute in each data set.
- the method includes constructing a neurule based on a degree of dependency between the set of attributes and the sub-attributes in the each data set.
- the degree of dependency indicates a measure of dependency of the sub-attributes to each of the attribute in the data set.
- the degree of dependency is determined between the set of attributes and the sub-attributes in the each data set.
- the attribute is Hair Loss and the sub-attributes for the Hair Loss are considered as Namastosis, Cancer, Mental Stress and Baldness.
- the degree of dependency can be considered as the sub-attributes which cause the Hair Loss.
- the Hair Loss attribute is dependent only on the sub -attributes which include only Namastosis and Cancer (but not Mental Stress and Baldness).
- the method includes creating a training set for each constructed neurule. If each constructed neurule N k has n3 ⁇ 4 conditions, then its corresponding initial training set for each initial neurule is extracted. For each initial rule N k , an initial training set X k is extracted. In an embodiment, each initial neurule is individually trained using a least mean square (LMS) technique after determining the corresponding training set associated with the neurule.
- LMS least mean square
- the method includes quantifying uncertainty from the incomplete or inaccurate data. Further, the method includes evaluating neurules and deriving the inferences from each constructed neurule based on the set of attributes and the sub-attributes in each data set. In an embodiment, the derived inferences include a plurality of neurules with a value (i.e., a significant factor value) associated with each neurule. Further, the method includes storing the derived inferences. In an embodiment, the derived inferences are stored in a neurule base.
- the proposed method can be used to derive inferences from the data.
- the proposed method provides a mechanism for handling uncertain data by integrating uncertainty reasoning and learning.
- the proposed integrated decision support system produces better models of knowledge acquisition, robust learning and reasoning under uncertainty, and the proposed integrated decision support system also performs better in inference and generalization.
- symbolic systems are flexible for reasoning, while neural networks are too complex for learning. Hence, by integration of neuro- symbolic learning and reasoning, the logical nature of reasoning and the statistical nature of learning can be combined.
- the proposed method can be used to model the conditions for uncertainty evaluation using uncertainty reasoning principle.
- the proposed method can be used to derive conclusions based on beliefs and the inference mechanism derives conclusions based on facts.
- the proposed method can be used to model conditions for evaluating uncertainty in a system using uncertainty reasoning principle.
- the proposed method can be applied to any real world problem, both practical and theoretical applications.
- the proposed integrated decision support system includes uncertainty reasoning and inference mechanism along with learning module.
- the proposed integrated decision support system and method is applied and experimented to many problems with sample datasets available from a machine learning repository, such as Lenses prescription, car evaluation, mutant transcriptional activity problem, Parkinsons Telemonitoring problem, Nursery ranking problem, Cancer diagnosis problem.
- the proposed method can be applied to any real world problem pertaining to engineering field or basic science field applications, i.e., electrical circuitry problems, electronics measurement or path detecting problems, mechanical device measurement problems, many computational modeling problems, various data manipulating problems in data mining and machine learning filed, network security problems, mathematical problems, basic science and bio science application problems, physics related measurement problems, noise cancellation problems in mobile networks or electrical or electronic networks, micro controller applications, risk management, share marketing problems, for training inference mechanisms in case of humanoid robots or the like.
- the proposed method is a decision support system which can provide integrated uncertainty reasoning and inference mechanism along with the learning module.
- the proposed method can be used to quantify the uncertainty in climate studies such as for estimating components in water balancing, evapotranspiration measurement, water reuse measurement, reservoir water level balancing or the like.
- the proposed method can be used to compute epistemic uncertainty present in ground motion prediction in structural studies of earth quake, geological analysis or the like.
- the proposed method can be applied to train humanoid robots to answer or to do any particular tasks.
- the inference mechanism of the robot can be trained to handle an uncertain situation.
- the robot's inference mechanism is not trained to handle uncertain situations, the robot cannot answer or cannot do any of the tasks.
- the robot can at least ask some related questions (which includes the sub-attributes in the proposed method) to a user and learns from the training to arrive at a decision.
- robot's inference mechanism can learn well and can take proper decisions like human beings.
- FIGS. 1 through 10 where similar reference characters denote corresponding features consistently throughout the figures, there are shown preferred embodiments.
- FIG. la illustrates architecture of an integrated decision support system 100a, according to the embodiments as disclosed herein.
- the integrated decision support system 100a includes a reasoning module 102 and a learning module 104.
- the reasoning module 102 includes a knowledge base 102a, belief base 102b and an uncertainty reasoning module 102c.
- the learning module 104 includes a working memory 104a, neurule base 104b and an integrated inference engine 104c.
- the reasoning module 102 for quantifying the uncertainty is integrated with the learning module 104 for evaluation of neurule and for deriving inferences and hence the architecture is termed as the integrated decision support system 100a.
- the knowledge base 102a stores an input data which includes the plurality of data sets. Each data set includes the set of attributes.
- the belief base 102b stores the sub- attributes and corresponding weights to be assigned for each of the sub- attributes associated with the set of attributes in the each data set.
- the uncertainty reasoning module 102c computes a combined belief value and a combined disbelief value using weights (assigned for the each sub-attribute) stored in the belief base 102b.
- the reasoning module 102 processes the data sets (stored in the knowledge base 102a). While processing the data sets, if any uncertain factor is occurred, then the reasoning module 102 determines the sub attributes and the corresponding weights for the sub-attributes stored in the belief base 102b. Further, the uncertainty reasoning module 102c computes the combined belief value using the sub attributes and the corresponding weights stored in the belief base 102b.
- the working memory 104a has four fact assertion modules such as true, false, unknown, and uncertain as shown in the FIG. 1.
- the 'fact' assertion has the structure: (Fi, ass(F ), where Fi is a fact and ass(Fi) is the assertion value related to the fact, which can include any one of ⁇ TRUE, FALSE, UNKNOWN, UNCERTAIN ⁇ .
- a fact has the same format as a condition or a conclusion of a rule:
- D F i is the variable and d F i is the corresponding value associated with the fact.
- the neurule base 104b stores the produced neurules from empirical data.
- the integrated inference engine 104c processes the inferences by considering fact assertions from the working memory 104a and the produced rules from the neurule base 104b.
- the reasoning module 102 based on the uncertainty reasoning under the Dempster-Shafer theory and the learning module 104 based on the neurule inference mechanism are integrated. If the uncertain data is encountered during the processing of data from the knowledge base 102a, uncertainty reasoning is performed. The uncertainty reasoning module 102c computes the uncertain factor value by considering corresponding uncertain factor sub attribute values from the belief base 102b. The resultant values are evaluated by the integrated inference engine 104c and are stored in the working memory 104a. By using all available input combinations, plurality of neurules are produced for each intermediate or output conclusion. The plurality of neurules are stored in the neurule base 104b after evaluating each neurule.
- the input parameters such as domain variables, dependency information, a set of empirical data and a set of sub-attributes are provided to the integrated decision support system 100.
- the various steps involved in constructing the neurles based on the domain variables, the dependency information, the set of empirical data and the set of sub-attributes is described in conjunction with the FIG. 2.
- FIG. la shows a limited overview of the integrated decision support system 100a but it is to be understood that other embodiments are not limited thereon.
- the system 100 may include any number of modules other than the modules shown in the FIG. 1.
- the labels or names of the modules are used only for illustrative purpose and does not limit the scope of the invention.
- One or more modules can be combined together to perform same or substantially similar function in the system 100.
- FIG. lb illustrates various modules of integrated inference engine of the integrated decision support system 100b described in the FIG. la, according to the embodiments as disclosed herein.
- the various modules of the integrated inference engine 104c are the major functional parts in an expert system.
- the integrated inference engine 104c includes a condition evaluation module 104c 1, a neurule evaluation module 104c2 and a goal stack 104c3.
- condition evaluation module 104c 1 evaluates each condition in each neurule. In order to evaluate a neurule in an efficient inference process, it is necessary to evaluate conditions of neurule by considering all aspects associated with the neurule. During every inference session, evaluations of the associated conditions are required.
- a fourth condition 'uncertain' and its value can be evaluated from the associated sub-attributes to derive a proper inference for the neurules of sarcophagus disease, i.e., an input condition L can evaluate to TRUE, FALSE, UNKNOWN or UNCERTAIN. According to the contents of working memory 104a or the user responses or firings of rules, conditions are evaluated.
- evaluation of an input condition can be performed using the following rules:
- An input condition 7 evaluates to UNKNOWN, denoted by
- a condition evaluates to 'uncertain', if there is a fact with the same variable, predicate and ' Ur as its value.
- the user has to assign a value value(Di) to the corresponding variable for evaluating a condition 7, ⁇ "Dt is di " (in which 7), € D INP ), where
- a neurule uncertainty reasoning principle is as described herein.
- the input condition Ii of a neurule N k is evaluated to uncertain, when the assertion of fact lies between 0 and 1. i.e., 0 ⁇ ass(Fi) ⁇ 1.
- conditional probabilities are the key factors which can be used to formalize the process of accumulating evidence and updating probabilities based on new evidence, i.e., the belief in one claim (event, conclusion, diagnosis, and so on) can be conditioned on another claim (evidences, feature or symptoms).
- the claim can be formulated as P(C I Ei, E2, ⁇ , E k ), which means the degree of belief for C given the evidences Ei,..., Ek.
- conditional probabilities are within the range from 0 to 1 and sum to 1, thus the indeterminacy present in information can be evaluated by considering the evidences or sub-attributes which support the claim.
- the probability of any event is a number between 0 and 1.
- the power set of the elements can be taken to represent propositions which containing all and only the states in which the proposition is true.
- the theory of evidence assigns a belief mass to each element of the power set, defined by a function as
- the input condition has a set of sub belief factors bm for the corresponding belief factors 3 ⁇ 4 for a variable such that bm € Boi , then
- Two sets of input conditions for a neurule N k can be differentiated at any time of an inference process as: the set of evaluated conditions ⁇ ⁇ and the set of unevaluated conditions Ij Nk .
- the input conditions which are evaluated and the results of their evaluation have been taken into account in deriving the conclusion of N k are denoted as I E Nk and the remaining ones referred to as Iu Nk .
- the neurule evaluation unit 104c2 evaluates each neurule to determine the validity of each neurule.
- the neurule evaluation unit 104c2 has two states, fired and blocked. If the neurule is in fired state, then the neurle is evaluated as ⁇ ' and if the neurule is in blocked state, then the neurule is evaluated as '- .
- the set of fired rules during an inference is denoted by NF and the set of blocked rules during inference is denoted as NB- Further, the set of evaluated rules is denoted by NE and that of unevaluated rules by Nu-
- the output of a neurule N k is computed according to Equations (14) and (15). For all input conditions of the neurule should be evaluated to compute v(N k ), the activation value of N k , by using the below equations so that each input conditions contribution to the current activation value can be encountered.
- the known sum of a neurule N k represents current potential of the neurule to fire and is defined as the weighted sum of the values of the already "known,” i.e., evaluated, conditions (inputs) of it.
- ks(N k ) sf 0 Nk + ⁇ sf? k assv(l? k ) t (9)
- the firing ratio (fr) of a neurule N k is an estimate of its intention to be fired (or blocked) and is defined as the ratio of the absolute value of its known sum over the value of its remaining sum, given that it is nonzero:
- the output of a neurule R k is evaluated to 1, i.e., N k succeeds or is fired, as soon as ks(Nk ) ⁇ 0 and ks(Nk) ⁇ rs(Nk).
- the output of a neurule Rk is evaluated to i.e., Nk fails or is blocked, as soon as ks(N k ) ⁇ 0 and ks(N k ) > rs(N k ).
- Success or firing condition of a neurule N k is the situation where, lks(N k )
- N k Failure or blocking condition of a neurule N k is the situation where lks(N k )l>rs(N k ) (or equivalently, fr > 1; rs(N k ⁇ )0), and ks(N k ) ⁇ 0.
- the integrated inference engine 104c uses the goal stack 104c3, where the possible solutions or answers are stored in the form of facts, which is termed as "goal facts.”
- the goal facts denote the conclusions of the neurules which contain a goal variable.
- a goal fact 'Gi' related to a neurule base 104b is an expression of the form "VGi is V G " where VGi C V G and VGi C S V Gi- [00100]
- the integrated inference engine 104c determines the firing potential of each neurule. Initially, the fr's of all neurules are computed. In this process, after evaluation of a condition of a neurule, the fr's of the neurules that contain that variable (called affected rules) are updated. Further, the neurule with the maximum fr is considered, given that it is the neurule most likely (i.e., with the greatest intention) to fire. This means that its first unevaluated condition is considered as the next goal and so on.
- the working memory 104 is updated with corresponding conclusions.
- a similar thing occurs when a neurule is blocked.
- rules are competing between each other towards which is closer to fire, based on their fr's. The one with the greatest fr takes the lead.
- the inference process is terminated either successfully, when there are facts in the working memory 104 containing goal variables and assigned the TRUE value or UNCERTAIN value and no further action can be taken, or unsuccessfully.
- the basic inference procedure is as described herein. Initially, the goal(s) are set on the goal stack 104c3 and the initial facts are fed to the in the working memory 104a, and fr for each neurule is computed. For each fact in the working memory 104, the fr for each neurule is computed. Further, all the affected neurules are determined and their corresponding fr's are updated.
- the working memory 104a is updated with the sibling assertions related to the conclusion of the rule. Further, the goals having the same variable as the conclusion of the fired rule are removed from the goal stack 104c3.
- the affected rules are updated and their fr's are updated. Further, the goal corresponding to the conclusion of the blocked rule is removed from the goal stack 104c3. 3. If neither of the above condition is satisfied, the first unevaluated condition of the rule is selected.
- condition contains an input variable
- a value from the user is requested and the working memory 104a is updated with the sibling assertions related to the condition. Further, the affected rules are updated and their fr's are updated.
- condition with the maximum fr select the condition with the maximum fr.
- the working memory 104 contains any goal facts assigned a TRUE or UNCERTAIN value, then those goal facts are extracted.
- FIG. lc illustrates the integrated decision support system with an input and an output, according to an embodiment as disclosed herein.
- the input of the integrated decision support system includes a user interface module (UTM).
- the output of the integrated decision system includes a decision module.
- FIG. Id illustrates modules involved in the input and output of the integrated decision support system, according to an embodiment as disclosed herein.
- the functional part of the integrated decision support system is shown in the FIG. Id.
- the integrated decision support system mainly includes four module such as the input module, the reasoning module 102, the learning module 104 and the output module.
- the input module consists of four sub modules namely the UEVI, Domain Knowledge, Problem Modeling Module (PMM) and a Pre-Processing Module (PPM).
- the reasoning module consists of three sub modules namely Knowledge Base (KEB), Belief Base (BB) and Uncertainty Reasoning Module (URM).
- KEB Knowledge Base
- BB Belief Base
- UPM Uncertainty Reasoning Module
- the learning module consists of three sub modules namely Working Memory (WM), Neurule Base (NRB) and Integrated Inference Engine (HE).
- the output module consists of three sub modules namely a Post Analysis Module (PAM), a Decision Module (DM) and an Explanation Module (EM).
- PAM Post Analysis Module
- DM Decision Module
- EM Explanation Module
- the user interacts with the system through UTM module.
- the Domain Knowledge module stores the relevant data about the problem.
- the PMM models the problem into input conditions with attributes.
- the preprocessing is performed by PPM.
- This preprocessed input data is stored in the KEB. While processing the data from the KEB if any uncertain factor is occurred then the reasoning module will assign sub belief factors and the belief masses are stored in BB. Further, the uncertainty reasoning module 102c will compute the composite belief value using the sub belief factor mass values from the BB.
- the WM has four fact assertion modules such as true, false, unknown, uncertain.
- a fact assertion has the structure: (Fi, ass(Fi))m where , is a fact and ass(Fj) is the assertion value related to it, which can take any one of (TRUE, FALSE, UNKNOWN, UNCERTAIN ⁇ .
- a fact has the same format as a condition or a conclusion of a rule.
- D Fi is the variable and ⁇ i F i is the corresponding value associated with the fact.
- Fact assertions are produced as intermediate or final conclusions during an inference process or provided as an initial input data by the user.
- the condition evaluation and neurule evaluation is performed in integrated inference engine 104c using Goal Stack 104c2.
- the produced neurules from empirical data are stored in the neurule base 104b.
- the integrated inference engine 104c processes the inferences by considering fact assertions from the working memory 104a and produces rules from neurule base 104b using Goal Facts.
- the conclusions have been transferred to the output module for further processing.
- the post analysis of the conclusions is performed in the post analysis module and the decisions are taken at decision module. Further, the explanation module provides explanations of the decisions taken at the decision module to the user.
- FIG. 2 is a flow diagram illustrating a method 200 for deriving inferences from data, according to the embodiments as disclosed herein.
- the method 200 includes obtaining the plurality of data sets.
- the method 200 allows the reasoning module 102 to obtain the plurality of data sets from the knowledge base 102a.
- Each data set includes the set of attributes.
- a first data set can include first set of attributes as hair loss, Red Ears and Dizziness.
- a second data set includes either the first set of attributes as hair loss, Red Ears and Dizziness or a different set of attributes.
- the method 200 includes determining sub- attributes for each of the attribute in each data set.
- the method 200 allows the reasoning module 102 to determine the sub-attributes for each of the attribute in the each data set.
- the determined sub-attributes can be Namastosis, Cancer Symptoms, Baldness and Dandruff.
- each symptom can be considered as an attribute in the data set and the associated diseases for that symptom are determined as the sub attributes. If the hair loss is considered as the attribute, then the sub-attributes are determined as Namostosis, Cancer Symptoms, Baldness and Dandruff which are the diseases that are considered for the hair loss symptom.
- the method includes initiating uncertainty reasoning when uncertain data is encountered in each attribute and each sub-attribute.
- the method 200 allows the uncertainty reasoning module to initiate uncertainty reasoning when uncertain data is encountered in each attribute and each sub-attribute.
- the uncertainty reasoning is initiated in the uncertainty reasoning module 102c when uncertain data is encountered in each sub-attribute such as Namostosis, Cancer Symptoms, Baldness, Dandruff or the like.
- the method 200 includes constructing the neurule based on the dependency information between the set of attributes and the sub-attributes in the each data set.
- the method 200 allows the working memory module 104a to construct the neurule based on the relation (i.e., dependency information) between the set of attributes and the sub-attributes in each data set.
- the degree of dependency is determined between the set of attributes and the sub-attributes in the each data set. In an example, consider the attribute as Hair Loss and the sub-attributes for the Hair Loss are considered as Namastosis, Cancer, Mental Stress, Baldness. In this case, the degree of dependency can be should be considered as the sub-attributes which cause the Hair Loss. In an example, for the Hair Loss attribute is dependent only on the sub-attributes which include only Namastosis and Cancer (but not Mental Stress and Baldness).
- the input conditions and values are fed to the learning module 104 from the knowledge base 102a.
- the dependency information is formulated according to domain variables and learning process is initiated.
- the initial neurules are constructed for each value of intermediate or output variable by considering the dependency information and it represents the possible intermediate or final conclusions.
- the method 200 includes creating the training set for each constructed neurule. If each constructed neurule N k has ⁇ 3 ⁇ 4 conditions, then its corresponding initial training set for each initial neurule is extracted. For each initial rule N k , an initial training set X k is extracted. In an embodiment, each initial neurule is individually trained using the LMS technique after determining the corresponding training set associated with the neurule.
- the method 200 includes deriving the inferences from each constructed rule based on the set of attributes and the sub-attributes in each data set.
- the method 200 allows the integrated inference engine 104c to derive the inferences from each constructed neurule based on the set of attributes and the sub-attributes in each data set.
- the method 200 includes creating the training set for each constructed neurule.
- the method 200 allows the working memory module 104a to create the training set for each constructed rule.
- each constructed initial neurule N k has ⁇ 3 ⁇ 4 conditions, then its corresponding initial training set for each initial neurule is extracted.
- an initial training set X k is extracted.
- each initial neurule is individually trained using a least mean square (LMS) technique after determining the corresponding training set associated with the neurule.
- LMS least mean square
- the method 200 includes evaluating the plurality of neurules.
- the method 200 allows the neurule evaluation unit 104c2 to evaluate each neurule from the plurality of neurules.
- the neurule evaluation has two states, fired and blocked. If the neurule is in fired state it is evaluated as T and if the neurule is in blocked state it is evaluated as '- .
- the set of fired rules during an inference process is denoted by N F
- that of blocked rules by N B - the set of evaluated rules is denoted by N E and that of unevaluated rules by Nu-
- the method includes evaluating the plurality of neurules.
- the method 200 allows the neurule base 104b to store the plurality of neurules.
- the produced neurules are stored in the neurule base 104b and will be used for inferencing.
- the various actions, acts, blocks, steps, or the like in the method 200 may be performed in the order presented, in a different order or simultaneously. Further, in some embodiments, some of the actions, acts, blocks, steps, or the like may be omitted, added, modified, skipped, or the like without departing from the scope of the invention.
- the possible related elements are the elements that may be relevant to the attributes in the domain.
- the dependency information indicates how the variables are dependent to each other.
- Each variable Di can take values from a set of discrete or continuous values
- the attributes are determined according to the domain knowledge and it consists of the amount of uncertainty, belief and disbelief. It should be noted that the total belief present in a system will be equal to one.
- the composite belief value, the maximum amount of belief can be represented by BD where Bni. € Toi- .
- the maximum belief value lies between 0 and 1.
- the dependency information for each variable is computed according to domain knowledge (by the expert) and the dependency information represents how the domain variables depend on each other or related to each other.
- the dependency information can be represented as a set of ordered pairs D ⁇ D.
- the dependency information can be represented as, (Di, D j ), i.e., "Di depends on D j .”
- An example dependency matrix is provided in Table 1.
- the empirical data set, E, consists of a number of patterns pi:
- Each pattern pi is an 'm' tuple of values:
- Each d ik denotes that the fact (condition) "D k is d ⁇ " is true.
- Goal value of the pattern is the last value of a pattern pi, which corresponds to a goal variable from a subset E k of E which is usually used for training a neurule and is denoted by Did. For negative examples having and for positive examples having "1" is termed as a goal value.
- the various steps involved in constructing the neurles based on the domain variables, the dependency information, the set of empirical data and the set of sub- attributes is as described herein.
- the empirical data in the knowledge base 102a is pre-processed. If an incomplete or uncertain data is encountered, the uncertainty reasoning process is initiated.
- each possible sub-attribute is determined, and weights are assigned. Further, the weights to subsets of each sub- attribute is associated using theory of evidence, b D i C B D .
- the upper and lower bounds of a probability interval can be defined.
- the lower bound, belief for a set is defined as the sum of all the weights of subsets of the set of interest. Further, the belief interval is categorized into belief, disbelief and uncertainty. The uncertainty value 'Un' is determined by subtracting the belief and disbelief from the total belief.
- the uncertain values are computed in the uncertainty reasoning module 102c, using joint mass representation table' s orthogonal summation process in which the independent evident sources are combined to compute a composite belief.
- the composite belief (called the joint mass) Boi from different data sets of masses is determined using orthogonal summation.
- the input conditions and its values are fed to the learning module 104 from the knowledge base 102a.
- the dependency information is formulated according to domain variables and learning process is initiated.
- the initial neurules are constructed for each value of intermediate or output variable by considering the dependency information and the neurules represents the possible intermediate or final conclusions.
- the initial neurules are constructed using the dependency information.
- One initial neurule is constructed for each value of each intermediate or output variable.
- Initial neurules represent the possible intermediate or final conclusions.
- the conditions of each initial neurule include the variables that contribute in drawing the corresponding conclusion, as specified by the dependency information.
- an initial training set X k is extracted from X.
- Each pattern in X k consists of as many values as the number of different variables in N k - The last value is the goal value.
- each initial neurule is individually trained, using the LMS algorithm, using the training set. It should be noted that the training is not always successful, i.e.,, a set of significance and bias factors cannot always be found that correctly classifies all of the training examples. This is the scenario, when the training patterns constitute a nonlinear set and are therefore inseparable. Further, the values for the bias and significance factors are calculated that classifies all training patterns, one neurule is produced. When it fails, due to inseparability of the training examples, a splitting process is followed.
- the initial training set of the neurule is divided into two subsets and two copies of the initial neurule are trained, each using one of the training subsets. If training of either neurule copy is failed, its subset is further divided into two other subsets and so on, until there is no failure. In this way, more than one neurule are produced, having the same conditions with different bias and significance factors and the same conclusion, called sibling neurule s.
- a point left unspecified is to divide the training set into "suitable" subsets. Dividing the training set is based on the following criteria.
- the patterns in each of the two subsets are closer between each other than between each one of them and each one of those in the other subset.
- the "closeness" between patterns is estimated through their "distance,” based on some distance metric, like hamming, euclidean, manhattan, or value difference metric (VDM), or the like.
- VDM value difference metric
- the considered sub-attributes are "SA” for “Acute Namastosis,” “CS” for “CancerSymptoms,” “MS” for “MentalStress”, “DA” for “Dandruff,” “BA” for “Baldness.”
- SA for “Acute Sarcophagus”
- WD for “WaxDeposit”
- IN for "Injury”
- AL for "Allergy”
- CO for “Cold.”
- the considered belief factors are, “SA” for “Acute Sarcophagus,” “SL” for “SugarLevel,” “PR” for “PulseRate,” “FI” for “Foodlntake,” “PE” for “Physical Exercise.
- the reasoning module 102 for quantifying the uncertainty is integrated with the learning module 104 for evaluation of neurule and for deriving inferences and hence the architeure is termed as the integrated decision support system 100a. While processing the data, if any uncertainty factor occurs then the corresponding sub-attributes are determined. The upper and lower bounds of probability interval are defined by using Dempster-Shafer's evidence theory and assigned the belief masses for the sub-attributes. In order to reason with uncertainty, the uncertainty region is determined by categorizing the belief region into belief, disbelief and uncertainty present, according to the belief interval theorem. Further, the amount of uncertainty present for the individual knowledge sources is computed.
- the training set is created for each neurule which is denoted as SI, S2, S3, S4 and S5. It should be noted training set is created with various combinations of TRUE, FALSE, Unknown and Uncertaini In the table 3, for the neurule S I, the values of Swollen Feet", “Red Ears” and “Hair Loss” should be TRUE for the disease "Supercilliosis” to be present (which is indicated as "1") in the table 3. In a similar manner, various combinations of TRUE, FALSE, Unknown and Uncertainiare assigned for each neurule as mentioned in the table 3. The Belief mass assignments formed and the composite belief value computed is mentioned in the table 4 below.
- the knowledge sources KSl and KS2 includes attributes and sub-attributes respectively. Each sub-attribute is assigned with a weight and the composite belief value is computed for different weights of the assigned to each sub-attribute corresponding to each attribute in the KS l and KS2 respectively.
- the condition evaluation and the initial neurules are produced as mentioned in the table 5 below.
- the conditions are arranged in various ways to achieve more efficient response. It is noted that an overall ordering of the conditions of neurules derives inferences in an efficient manner.
- the derived inferences can be accurate when the conditions in a neurule N k consisting of nN k conditions are ordered in a way that
- the conditions of each neurule are ordered in that way.
- each constructed neurule is examined as mentioned below.
- FIGS. 3a and 3b illustrate an example representation of a neurule, according to the embodiments as described herein.
- the neurules are the integration of neuro computing and symbolic rules.
- the formation of a neurule is represented in the FIG. 3a.
- Ii, I 2 , ... ,I n are the input conditions with corresponding weight values sfi, sf 2 ,... , sf n known as significance factors.
- the bias value sfo is termed as a bias factor of the neurule.
- each neurule is considered as an adaline unit as shown in the FIG. 7b, which uses LMS for learning and are more safely convergent for nonlinear training sets.
- Each input condition receives a value from the set of values [l(true), - 1 (false), O(unknown), Un (uncertain)].
- the 'Un' denotes the uncertain factor value to be calculated using the reasoning module 102a.
- V is the activation value and the threshold function f(v) is called as activation function.
- the output can take one of two values (-1, 1) representing failure or success of the neurule.
- the significance factor of a condition represents the significance (weight) of the condition in deriving the conclusion.
- the neurule evaluation has two states, fired and blocked. If the neurule is in fired state, the neurule is evaluated as T . If the neurule is in blocked state, then it is evaluated as '- ⁇ .
- the set of fired rules during the inference process is denoted by Np, whereas that of blocked rules by NB- Also, the set of evaluated rules is denoted by NE and that of unevaluated rules by Nu-
- the output of a neurule N k is computed according to equations (22) and (23). All input conditions of the neurule is evaluated to compute v(N k ), the activation value of N k , by using the below Equations so that each input conditions contribution to the current activation value can be encountered.
- the success or firing condition of a neurule N k is the situation where, lks(N k )
- the failure or blocking condition of a neurule N k is the situation where lks(N k )l>rs(N k ) (or equivalently, fr > 1 ; rs(N k ⁇ )0), and ks(N k ) ⁇ 0.
- FIG. 4 illustrates fact assertions in a neurule, according to the embodiments as disclosed herein.
- the fact assertions in the neurule for a Supercilliosis disease are shown in the FIG. 5.
- the fact assertions in the neurule include values as True, False, Unknown and Uncertain as shown in the FIG. 4.
- the input conditions are denoted as condition 1, condition 2 and so on to condition n.
- the weight values weight values sfi, sf 2 , . . . , sf n are the significance factors.
- Each input condition receives a value from the set of values [l(true), - 1 (false), 0(unknown), Un (uncertain)].
- the 'Un' denotes the uncertain factor value which is evaluated and received from the reasoning module 102a.
- FIG. 5a illustrates an example neurule structure for a Supercilliosis disease, according to the embodiments as disclosed herein.
- the neurule structure and adaptation of the neurule to a medical diagnosis problem is depicted in the FIGS. 5a-5c.
- the neurule structure is shown by considering the medical diagnosis problem, it should be noted that any real world problem can be modeled as neurule structure. There can be many such neurules according to the problem domain and its attributes and the sub-attributes.
- the neurule structure for the Supercilliosis disease and the attributes of the Supercilliosis disease (which include Swollen Feet, Hair 5 Loss and Red Ears) and the sub-attributes (i.e., Namastosis, Cancer Symptoms, Mental Stress, Baldness and Dandruff) for the Hair Loss attribute are shown in the FIG. 4a.
- the input values include [l(true), - 1 (false), O(unknown), Un (uncertain)].
- the 'Un' denotes the uncertain factor value to be calculated using the reasoning module 102a.
- FIG. 5b illustrates the example neurule structure for the
- the neurule structure for the Supercilliosis disease and the attributes of the Supercilliosis disease (which include Swollen Feet, Hair Loss and Red Ears) and the sub-attributes for the Hair Loss attribute along
- FIG. 5c illustrates the example neurule structure for the Supercilliosis disease with a sub-arbitrary belief region, according to the embodiments as disclosed herein.
- FIG. 6 is a graph showing comparison of inference mechanisms in terms of runtime, according to the embodiments as disclosed herein.
- the experimental results regarding the performance of integrated decision support system 100a which includes learning and reasoning system are presented in the table 6 below, and are compared with basic neurule based system which is mentioned in the table 6. (msec) (msec)
- FIG. 7 is a graph showing comparison of inference mechanisms in terms of computations, according to the embodiments as
- the table 6 shows comparison of computational cost of the neurules under uncertainty (integrated decision support system 100b) with that of the neurules without uncertainty.
- integrated decision support system 100b consumes slightly larger number of computations and runtime than the basic neurule based system but in a reasonable amount as shown in the FIG. 6 and the
- FIG. 8 is a graph showing comparison of inference mechanisms in terms of convergent rate, according to the embodiments as 5 disclosed herein.
- the above table 7 shows the experimental results comparing the performance of the inference mechanisms of 'integrated decision support system 100b and 'basic neurule based system' in terms of convergent rate.
- the convergent rate is little bit higher for the integrated decision support system , but the classification accuracy is high.
- the "Convergent rate” is the ratio of the number of necessary (i.e., the least required) inputs to the total number of asked inputs. Since, the set of sub-attributes are considered for calculating the uncertainty factor, the number of least required factors will be somewhat more in neurules with uncertainty compared to neurules without uncertainty.
- Two datasets and two rules are used for the comparison of convergent rate. After performing the statistical t-test with 99% confidence, analyzed that the difference in convergent rate is statistically not significant as compared to neurules without uncertainty.
- the resultant classification will be more specific and accurate. In other words, when more parameters are considered that are relevant to the particular problem, then the resultant classification will be far more accurate and perfect for that domain although it may take a little more computational time.
- This method finds application in the medical field aiming to find the indeterminacy present in disease identification, since in most cases the symptoms will be incomplete or partial. Another interpretation of this approach is, while dealing with the medical domain problems, the existence of symptom can be determined by calculating its degree and thus identify the disease diagnosed.
- FIG. 9 is a graph showing comparison of generalization performance for various learning methods, according to the embodiments as disclosed herein.
- Generalization capabilities of a learning method can be interpreted as how well the method is capable of handling new input data after the system has been trained. It has been proved that learning methods such as back propagation that use continuous variables might generalize better to unseen examples, thereby creating more robust systems.
- a number of experiments are conducted to test the generalization capabilities of the integrated decision support system 100b i.e., under continuous variables.
- the table 8 and the FIG. 9 show results regarding the classification accuracy (generalization) of the integrated decision support 5 system 100b, i.e., by considering continuous variables on unseen test examples as compared with the ones of the basic neurule based system without considering uncertain factors, i.e., discrete factors alone.
- the presented network generalized quite well in unseen examples, i.e., to a new pattern, since the network learned to handle the incomplete data.
- FIG. 10 illustrates a computing environment implementing the method for deriving inferences from data sets, according to the embodiments as disclosed herein.
- the computing environment 1002 comprises at least one processing unit 1008 that is equipped with a control unit 1004 and an Arithmetic Logic Unit (ALU)
- ALU Arithmetic Logic Unit
- the processing unit 1008 is responsible for processing the instructions of the technique.
- the processing unit 1008 receives commands from the control unit in order to perform its processing. Further, any logical and arithmetic operations
- the overall computing environment 1002 can be composed of multiple homogeneous and/or heterogeneous cores, multiple CPUs of different kinds, special media and other accelerators.
- the processing unit 25 1008 is responsible for processing the instructions of the technique. Further, the plurality of processing units 1008 may be located on a single chip or over multiple chips.
- the technique comprising of instructions and codes required for the implementation are stored in either the memory unit 1010 or the storage 1012 or both. At the time of execution, the instructions may be fetched from the corresponding memory 1010 or storage 1012, and executed by the processing unit 1008.
- networking devices 1016 or external I/O devices 1014 may be connected to the computing environment to support the implementation through the networking unit and the I/O device unit.
- the embodiments disclosed herein can be implemented through at least one software program running on at least one hardware device and performing network management functions to control the elements.
- the elements shown in the FIGS. 1 through 10 include blocks which can be at least one of a hardware device, or a combination of hardware device and software module.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Embodiments herein provide an integrated decision support system for deriving inferences from data sets. The integrated decision support system includes a reasoning mechanism to quantify the uncertainty from the incomplete or inaccurate data, a learning mechanism to evaluate rules and an inference mechanism to make inferences from the learned rules. The integrated decision support system and method provides mechanism to reason and learn by obtaining a plurality of data sets. Each data set includes a set of attributes. The method includes determining sub-attributes for each of the attribute in each data set. The method includes constructing a neurule based on a relation between the set of attributes and the sub-attributes in each data set. The method includes creating a training set for each constructed neurule. The method includes deriving the inferences from each constructed neurule based on set of attributes and the sub-attributes in each data set.
Description
INTEGRATED DECISION SUPPORT SYSTEM AND METHOD FOR DERIVING INFERENCES FROM DATA SETS FIELD OF INVENTION
[0001] The embodiments herein relate to expert systems and more particularly relates to an integrated decision support system and method for handling uncertain data by integrating uncertainty reasoning, learning and inference mechanism. . The present application is based on, and claims priority from an Indian Application Number 201621022150 filed on 28th June, 2016 the disclosure of which is hereby incorporated by reference herein.
BACKGROUND OF INVENTION
[0002] Generally, expert systems are designed to solve complex problems by reasoning about knowledge. However, the presence of imprecision or inconsistency of the database and incompleteness of a knowledge base makes the process of reasoning a complex problem. The reasoning is a process to generate conclusions from available knowledge using logical techniques of deduction, induction and so on. There is no general methodology been identified to handle different types of imprecision and incompleteness of data and knowledge which results in a state of uncertainty during reasoning. There exist two categories of uncertainty namely Aleatory uncertainty, which is type of uncertainty that results from the fact that a system can behave in random ways; Epistemic uncertainty, which is a type of uncertainty that results from the lack of knowledge about a system. Usually traditional probability theory along with a frequentist approach is used to deal with aleatory uncertainty and Bayesian approach is used to capture epistemic uncertainty. However, Dempster-Shafer theory can deal with these two categories of uncertainty at the same time and hence offers an advantage over other assessment methods.
[0003] Most of the data observed in the real world contains errors, missing values and inconsistencies; the effective integration of automated learning and cognitive reasoning in real-world applications is cumbersome. It is difficult to construct a cognitive model of an intelligent agent that is able to deal with the many complex relations in the observed data. This is because the expert behaviour on high-level cognition is complex to model, elicit and to represent in an automated system. For controlling the accumulation of errors in uncertain environments, i.e., for robustness, there is a need for learning from the changes in the environment and reasoning about commonsense knowledge.
[0004] The above information is presented as background information only to help the reader to understand the present invention. Applicants have made no determination and make no assertion as to whether any of the above might be applicable as Prior Art with regard to the present application.
OBJECT OF INVENTION
[0005] The principal object of the embodiments herein is to provide an integrated decision support system and method which have a reasoning mechanism to quantify uncertainty from the incomplete or inaccurate data and a learning mechanism to evaluate rules and an inference mechanism to make inferences from learned rules.
[0006] Another object of the embodiments herein is to provide an integrated decision support system which can handle the uncertain data by integrating an uncertainty reasoning mechanism along with the learning mechanism.
[0007] Another object of the embodiments herein is to provide an integrated decision support system and method for deriving inferences from data sets.
[0008] Another object of the embodiments herein is to provide an integrated decision support system which produces better models of knowledge acquisition, robust learning and reasoning in presence of uncertainty.
[0009] Another object of the embodiments herein is to provide a method for deriving inferences which can be applied to any real world problem, including both practical and theoretical applications.
[0010] Another object of the embodiments herein is to provide an integration of knowledge representation, reasoning the knowledge, learning and producing rules from it and finally making inferences.
[0011] Another object of the embodiments herein is to provide a method to model conditions for evaluating uncertainty in a system.
[0012] Another object of the embodiments herein is to provide an expert decision support system which can make accurate inferences.
[0013] Another object of the embodiment is to provide an efficient inference mechanism to make inferences from any real world data.
[0014] Another object of the embodiments herein is to provide an integrated reasoning and learning system, which can reason from uncertain or incomplete or inaccurate data and learn and make inferences using associated rules.
[0015] Another object of the embodiments herein is to provide an integrated decision support system which quantifies the uncertainty through the reasoning mechanism and derives inferences by evaluating the learned rules.
[0016] Another object of the embodiment is to provide a reasoning mechanism to handle the incomplete or inaccurate or uncertain data
[0017] Another object of the embodiments herein is to provide a learning mechanism to converge with a pattern of data.
[0018] Another object of the embodiments herein is to provide a human computer interactive system for user friendliness.
SUMMARY
[0019] Accordingly the embodiments herein provide a method for deriving inferences from data sets. The method includes obtaining a plurality of data sets. Each data set includes a set of attributes. The method includes determining sub-attributes for each of the attribute in each data set. The method includes constructing a neurule based on a relation between the set of attributes and the sub-attributes in the each data set. The method includes creating a training set for each constructed neurule. The method includes deriving the inferences from each constructed neurule based on the set of attributes and the sub-attributes in each data set.
[0020] In an embodiment, the method includes creating a training set for each constructed neurule.
[0021] In an embodiment, the derived inferences include a plurality of neurules with a value associated with each neurule.
[0022] In an embodiment, deriving the inferences from each constructed neurule includes modeling input conditions for evaluating uncertainty in each attribute and sub-attribute.
[0023] In an embodiment, each neurule is constructed using the evaluated uncertainty for deriving the inferences from each construced neurule.
[0024] The embodiments herein provide an integrated decision support system which has a reasoning mechanism to quantify the uncertainty from the incomplete or inaccurate data and a learning mechanism to evaluate rules and an inference mechanism to make inferences from the learned rules. The proposed integrated decision support system includes a reasoning module and a learning module for deriving inferences from data sets. The reasoning mechanism derives conclusions
based on beliefs and the inference mechanism derives conclusions based on facts.
[0025] Accordingly the embodiments herein provide a computer program product comprising computer executable program code recorded on a computer readable non-transitory storage medium, the computer executable program code when executed causing the actions including obtaining a plurality of data sets. Each data set includes a set of attributes. The method includes determining sub-attributes for each of the attribute in each data set. The computer executable program code when executed causing the further actions including constructing a neurule based on a relation between the set of attributes and the sub-attributes in the each data set. The computer executable program code when executed causing the further actions including creating training set for each constructed neurule. The computer executable program code when executed causing the further actions including deriving the inferences from each constructed neurule based on the set of attributes and the sub-attributes in each data set.
[0026] These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating preferred embodiments and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.
BRIEF DESCRIPTION OF FIGURES
[0027] This invention is illustrated in the accompanying drawings, throughout which like reference letters indicate corresponding parts in the
various figures. The embodiments herein will be better understood from the following description with reference to the drawings, in which:
[0028] FIG. la illustrates an architecture of an integrated decision support based system, according to the embodiments as disclosed herein;
[0029] FIG. lb illustrates various units of integrated inference engine of the integrated decision support system described in the FIG. la, according to the embodiments as disclosed herein;
[0030] FIG. lc illustrates the integrated decision support system with an input and an output, according to an embodiment as disclosed herein;
[0031] FIG. Id illustrates various modules involved in the input and output of the integrated decision support system, according to an embodiment as disclosed herein;
[0032] FIG. 2 is a flow diagram illustrating a method for deriving inferences from data sets, according to the embodiments as disclosed herein;
[0033] FIGS. 3a and 3b illustrate an example representation of a neurule, according to the embodiments as described herein;
[0034] FIG. 4 illustrates fact assertions in a neurule, according to the embodiments as disclosed herein;
[0035] FIG. 5a illustrates an example neurule structure for a Supercilliosis disease, according to the embodiments as disclosed herein;
[0036] FIG. 5b illustrates the example neurule structure for the Supercilliosis disease with a belief region, according to the embodiments as disclosed herein;
[0037] FIG. 5c illustrates the example neurule structure for the Supercilliosis disease with a sub-arbitrary belief region, according to the embodiments as disclosed herein;
[0038] FIG. 6 is a graph showing comparison of inference mechanisms in terms of runtime, according to the embodiments as disclosed herein;
[0039] FIG. 7 is a graph showing comparison of inference mechanisms in terms of computations, according to the embodiments as disclosed herein;
[0040] FIG. 8 is a graph showing comparison of inference mechanisms in terms of convergent rate, according to the embodiments as disclosed herein; and
[0041] FIG. 9 is a graph showing comparison of generalization performance for various learning methods, according to the embodiments as disclosed herein; and
[0042] FIG. 10 illustrates a computing environment implementing the method for deriving inferences from data sets, according to the embodiments as disclosed herein.
DETAILED DESCRIPTION OF INVENTION
[0043] The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well- known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. Also, the various embodiments described herein are not necessarily mutually exclusive, as some embodiments can be combined with one or more other embodiments to form new embodiments. The term "or" as used herein, refers to a nonexclusive or, unless otherwise indicated. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein can be practiced and to further enable those skilled in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.
[0044] The embodiments herein achieve an integrated decision support system and method for deriving inferences from data sets.
[0045] The method includes obtaining a plurality of data sets. The data sets are obtained from a knowledge base.
[0046] Each data set includes a set of attributes. In an example, consider the data set includes a set of symptoms, each of the symptom in the data set can be considered as an attribute.
[0047] The method includes determining sub-attributes for each of the attribute in each data set.
[0048] The method includes constructing a neurule based on a degree of dependency between the set of attributes and the sub-attributes in the each data set. The degree of dependency indicates a measure of dependency of the sub-attributes to each of the attribute in the data set. The degree of dependency is determined between the set of attributes and the
sub-attributes in the each data set. In an example, the attribute is Hair Loss and the sub-attributes for the Hair Loss are considered as Namastosis, Cancer, Mental Stress and Baldness. In this case, the degree of dependency can be considered as the sub-attributes which cause the Hair Loss. In an example, the Hair Loss attribute is dependent only on the sub -attributes which include only Namastosis and Cancer (but not Mental Stress and Baldness).
[0049] The method includes creating a training set for each constructed neurule. If each constructed neurule Nk has n¾ conditions, then its corresponding initial training set for each initial neurule is extracted. For each initial rule Nk, an initial training set Xk is extracted. In an embodiment, each initial neurule is individually trained using a least mean square (LMS) technique after determining the corresponding training set associated with the neurule.
[0050] The method includes quantifying uncertainty from the incomplete or inaccurate data. Further, the method includes evaluating neurules and deriving the inferences from each constructed neurule based on the set of attributes and the sub-attributes in each data set. In an embodiment, the derived inferences include a plurality of neurules with a value (i.e., a significant factor value) associated with each neurule. Further, the method includes storing the derived inferences. In an embodiment, the derived inferences are stored in a neurule base.
[0051] Unlike the conventional methods, the proposed method can be used to derive inferences from the data. The proposed method provides a mechanism for handling uncertain data by integrating uncertainty reasoning and learning. The proposed integrated decision support system produces better models of knowledge acquisition, robust learning and reasoning under uncertainty, and the proposed integrated decision support system also performs better in inference and generalization.
[0052] In terms of modularity and explanation, symbolic systems are flexible for reasoning, while neural networks are too complex for learning. Hence, by integration of neuro- symbolic learning and reasoning, the logical nature of reasoning and the statistical nature of learning can be combined.
[0053] In addition, the proposed method can be used to model the conditions for uncertainty evaluation using uncertainty reasoning principle. The proposed method can be used to derive conclusions based on beliefs and the inference mechanism derives conclusions based on facts. The proposed method can be used to model conditions for evaluating uncertainty in a system using uncertainty reasoning principle. Moreover, the proposed method can be applied to any real world problem, both practical and theoretical applications. The proposed integrated decision support system includes uncertainty reasoning and inference mechanism along with learning module.
[0054] The proposed integrated decision support system and method is applied and experimented to many problems with sample datasets available from a machine learning repository, such as Lenses prescription, car evaluation, mutant transcriptional activity problem, Parkinsons Telemonitoring problem, Nursery ranking problem, Cancer diagnosis problem.
[0055] The proposed method can be applied to any real world problem pertaining to engineering field or basic science field applications, i.e., electrical circuitry problems, electronics measurement or path detecting problems, mechanical device measurement problems, many computational modeling problems, various data manipulating problems in data mining and machine learning filed, network security problems, mathematical problems, basic science and bio science application problems, physics related measurement problems, noise cancellation problems in mobile networks or
electrical or electronic networks, micro controller applications, risk management, share marketing problems, for training inference mechanisms in case of humanoid robots or the like. Hence, the proposed method is a decision support system which can provide integrated uncertainty reasoning and inference mechanism along with the learning module.
[0056] The proposed method can be used to quantify the uncertainty in climate studies such as for estimating components in water balancing, evapotranspiration measurement, water reuse measurement, reservoir water level balancing or the like.
[0057] While computing the evapotranspiration from the sub- attributes such as, dew point, temperature, humidity, radiation, wind speed, sunshine and so on, the associated uncertainties with sub-attribute can be quantified using the proposed method and hence can make accurate predictions. Further, the proposed method can be used to compute epistemic uncertainty present in ground motion prediction in structural studies of earth quake, geological analysis or the like.
[0058] The proposed method can be applied to train humanoid robots to answer or to do any particular tasks. With the proposed method, the inference mechanism of the robot can be trained to handle an uncertain situation. When the robot's inference mechanism is not trained to handle uncertain situations, the robot cannot answer or cannot do any of the tasks. If the robot is trained to handle uncertain situation with the proposed method, the robot can at least ask some related questions (which includes the sub-attributes in the proposed method) to a user and learns from the training to arrive at a decision. Hence, with the proposed method, robot's inference mechanism can learn well and can take proper decisions like human beings.
[0059] Referring now to the drawings and more particularly to FIGS. 1 through 10 where similar reference characters denote
corresponding features consistently throughout the figures, there are shown preferred embodiments.
[0060] FIG. la illustrates architecture of an integrated decision support system 100a, according to the embodiments as disclosed herein. As depicted in the FIG. 1, the integrated decision support system 100a includes a reasoning module 102 and a learning module 104. The reasoning module 102 includes a knowledge base 102a, belief base 102b and an uncertainty reasoning module 102c. The learning module 104 includes a working memory 104a, neurule base 104b and an integrated inference engine 104c. The reasoning module 102 for quantifying the uncertainty is integrated with the learning module 104 for evaluation of neurule and for deriving inferences and hence the architecture is termed as the integrated decision support system 100a.
[0061] In an embodiment, the knowledge base 102a stores an input data which includes the plurality of data sets. Each data set includes the set of attributes.
[0062] In an embodiment, the belief base 102b stores the sub- attributes and corresponding weights to be assigned for each of the sub- attributes associated with the set of attributes in the each data set.
[0063] In an embodiment, the uncertainty reasoning module 102c computes a combined belief value and a combined disbelief value using weights (assigned for the each sub-attribute) stored in the belief base 102b.
[0064] In order to compute the uncertainty in the system, the reasoning module 102 processes the data sets (stored in the knowledge base 102a). While processing the data sets, if any uncertain factor is occurred, then the reasoning module 102 determines the sub attributes and the corresponding weights for the sub-attributes stored in the belief base 102b. Further, the uncertainty reasoning module 102c computes the combined
belief value using the sub attributes and the corresponding weights stored in the belief base 102b.
[0065] For a better understanding of the method for computing or quantifying uncertainty in a system, a reference is made to Patent Application No. 201621017838.
[0066] These composite belief values are evaluated as a 'fact' assertion of uncertain factor in the working memory 104a during learning process. The working memory 104a has four fact assertion modules such as true, false, unknown, and uncertain as shown in the FIG. 1. The 'fact' assertion has the structure: (Fi, ass(F ), where Fi is a fact and ass(Fi) is the assertion value related to the fact, which can include any one of {TRUE, FALSE, UNKNOWN, UNCERTAIN} .
[0067] In an embodiment, a fact has the same format as a condition or a conclusion of a rule:
Fi≡"DFi is dFi"
where DFi is the variable and dFi is the corresponding value associated with the fact. The fact assertions are produced as intermediate or final conclusions during the inference process.
[0068] The neurule base 104b stores the produced neurules from empirical data. The integrated inference engine 104c processes the inferences by considering fact assertions from the working memory 104a and the produced rules from the neurule base 104b.
[0069] It should be noted that the reasoning module 102 based on the uncertainty reasoning under the Dempster-Shafer theory and the learning module 104 based on the neurule inference mechanism are integrated. If the uncertain data is encountered during the processing of data from the knowledge base 102a, uncertainty reasoning is performed. The uncertainty reasoning module 102c computes the uncertain factor value by considering corresponding uncertain factor sub attribute values from the
belief base 102b. The resultant values are evaluated by the integrated inference engine 104c and are stored in the working memory 104a. By using all available input combinations, plurality of neurules are produced for each intermediate or output conclusion. The plurality of neurules are stored in the neurule base 104b after evaluating each neurule.
[0070] In order to construct neurules, the input parameters such as domain variables, dependency information, a set of empirical data and a set of sub-attributes are provided to the integrated decision support system 100. The various steps involved in constructing the neurles based on the domain variables, the dependency information, the set of empirical data and the set of sub-attributes is described in conjunction with the FIG. 2.
[0071] The FIG. la shows a limited overview of the integrated decision support system 100a but it is to be understood that other embodiments are not limited thereon. In other embodiments, the system 100 may include any number of modules other than the modules shown in the FIG. 1. Further, the labels or names of the modules are used only for illustrative purpose and does not limit the scope of the invention. One or more modules can be combined together to perform same or substantially similar function in the system 100.
[0072] FIG. lb illustrates various modules of integrated inference engine of the integrated decision support system 100b described in the FIG. la, according to the embodiments as disclosed herein. The various modules of the integrated inference engine 104c are the major functional parts in an expert system. As depicted in the FIG. lb, the integrated inference engine 104c includes a condition evaluation module 104c 1, a neurule evaluation module 104c2 and a goal stack 104c3.
[0073] In an embodiment, the condition evaluation module 104c 1 evaluates each condition in each neurule. In order to evaluate a neurule in an efficient inference process, it is necessary to evaluate conditions of
neurule by considering all aspects associated with the neurule. During every inference session, evaluations of the associated conditions are required.
[0074] Consider an acute sarcophagus problem of a medical diagnosis. The individual sources of knowledge in the knowledge base 102a describe a mapping from a symptom space to a disease space. For example, one such piece of knowledge is represented by producing rules, which can be,
Rule: IF Has-HairLoss (Patient) AND
Has-Dizziness (Patient) AND
Has-SensitiveAretha (Patient)
THEN Bears-Namastosis (Patient).
[0075] From the above rule, it can be inferred that if a patient has HairLoss, Dizziness and SensitiveAretha and if the integrated inference engine 104c infers that the patient is suffering from Namastosis, then the diagnosis may be the correct one in every hundred cases. This type of knowledge part can suffer from two kinds of incompleteness. In the first case, many diseases can occur with same symptoms, and in the second case, the knowledge may be deficient of the degree or level of the symptoms. In these kinds of situations there is a need for reasoning with incomplete or uncertain knowledge. Hence, the conditions with incomplete knowledge can be termed as uncertain.
[0076] In the above mentioned scenarios, a fourth condition 'uncertain' and its value can be evaluated from the associated sub-attributes to derive a proper inference for the neurules of sarcophagus disease, i.e., an input condition L can evaluate to TRUE, FALSE, UNKNOWN or UNCERTAIN. According to the contents of working memory 104a or the user responses or firings of rules, conditions are evaluated.
working memory = {(Fi, ass (Fi )), i= 1, n}
where Fi = "Dn is dFi " is a fact and ass(Fi)C
(TRUE,FALSE, UNKNOWN, UNCERTAIN}
[0077] Based on contents of the working memory 104a, evaluation of an input condition can be performed using the following rules:
[0078] An input condition 7, evaluates to TRUE, denoted by ass(L) = TRUE, if there is a fact assertion (Fi, ass(Fi)) in the working memory 104a with F{ ≡ h and ass(F ≡ TRUE.
1. An input condition 7, evaluates to FALSE, denoted by
FALSE, if there is a fact assertion (Fi, ass(Fi)) in the working memory 104a with Fi≡ I, and as s(Fi)= FALSE.
UNKNOWN, if there is a fact assertion (Ft, ass(Fi)) in the working memory 104a with F,≡ h and ass(Fi)= UNKNOWN.
UNCERTAIN, if there is a fact assertion (Fi, ass(Fi)) in the working memory 104a with F,≡ 7, and ass (F,)= UNCERTAIN.
[0079] A condition evaluates to 'uncertain', if there is a fact with the same variable, predicate and ' Ur as its value. The user has to assign a value value(Di) to the corresponding variable for evaluating a condition 7,≡ "Dt is di " (in which 7),€ DINP), where
Value (D C XDi U {UNCERTAIN }
[0080] In an embodiment, a neurule uncertainty reasoning principle is as described herein. The input condition Ii of a neurule Nk is evaluated to uncertain, when the assertion of fact lies between 0 and 1. i.e., 0 < ass(Fi) < 1.
[0081] For reasoning in a technical system, conditional probabilities are the key factors which can be used to formalize the process of accumulating evidence and updating probabilities based on new evidence, i.e., the belief in one claim (event, conclusion, diagnosis, and so on) can be conditioned on another claim (evidences, feature or symptoms). If several evidences, Ej, Ek, are given for a claim, C, the claim can be formulated as P(C I Ei, E2, ···, Ek), which means the degree of belief for C given the evidences Ei,..., Ek.
[0082] Consider an input condition Ik of neurule Nk which cannot be evaluated as discrete value and can be computed by considering some evidences or sub factors. If the condition Ik have evidences Ei, E2, . . . , Em.
(2)
[0083] The conditional probabilities are within the range from 0 to 1 and sum to 1, thus the indeterminacy present in information can be evaluated by considering the evidences or sub-attributes which support the claim.
[0084] According to probability axiom, the probability of any event is a number between 0 and 1.
i.e., 0 < P(E) < 1 (3)
[0085] If the frequency of an event to occur is lying between 0 and 1, then the uncertainty, the personal belief degree for an event to occur also lies in between 0 and 1.
[0086] According to the Dempster-Shafer theorem, in order to reason the uncertain state, the power set of the elements can be taken to represent propositions which containing all and only the states in which the
proposition is true. The theory of evidence assigns a belief mass to each element of the power set, defined by a function as
m: 2X ► [0,1] (4)
[0087] Thus the total belief value of an uncertain event will vary between 0 and 1.
[0088] In order to evaluate uncertainty, the input condition has a set of sub belief factors bm for the corresponding belief factors ¾ for a variable such that bm€ Boi, then
where, bDi C TDi
Thus, 0 < BDi <1 (6)
Hence the uncertain value of the domain variable Z)„ value (Dj) will vary from 0 to 1.
i.e., 0 < Di < l (7)
[0089] Thus for an input condition , if the value can be computed as a continuous value, then the fact assertion, ass(Fj) for the variable 'Z½ is dpi can be evaluated as uncertainty. Further, this fact assertion is stored in the working memory 104a of the integrated decision support system 100b for the corresponding condition.
[0090] Two sets of input conditions for a neurule Nk can be differentiated at any time of an inference process as: the set of evaluated conditions ΙΕΜ and the set of unevaluated conditions IjNk. The input conditions which are evaluated and the results of their evaluation have been taken into account in deriving the conclusion of Nk are denoted as IE Nk and the remaining ones referred to as IuNk.
[0091] In an embodiment, the neurule evaluation unit 104c2 evaluates each neurule to determine the validity of each neurule. The neurule evaluation unit 104c2 has two states, fired and blocked. If the neurule is in fired state, then the neurle is evaluated as Ί ' and if the neurule
is in blocked state, then the neurule is evaluated as '- . The set of fired rules during an inference is denoted by NF and the set of blocked rules during inference is denoted as NB- Further, the set of evaluated rules is denoted by NE and that of unevaluated rules by Nu- The output of a neurule Nk is computed according to Equations (14) and (15). For all input conditions of the neurule should be evaluated to compute v(Nk), the activation value of Nk, by using the below equations so that each input conditions contribution to the current activation value can be encountered.
where
(1 if assv "K) = TRUE
- 1 if assv(l K ) = FALSE
0 if assv(l K) = UNKNOWN
Un if assv(l K) = UNCERTAIN
[0092] The known sum of a neurule Nk, at some time, represents current potential of the neurule to fire and is defined as the weighted sum of the values of the already "known," i.e., evaluated, conditions (inputs) of it. ks(Nk) = sf0 Nk + ^ sf?k assv(l?k )t (9)
l K l"
[0093] The remaining sum of a neurule Nk, at some time, represents its remaining potential to fire and is defined as the largest possible weighted sum of the "remaining," i.e., unevaluated, conditions of it: s(Nk) = sf^
(10)
[0094] The firing ratio (fr) of a neurule Nk, at some time, is an estimate of its intention to be fired (or blocked) and is defined as the ratio
of the absolute value of its known sum over the value of its remaining sum, given that it is nonzero:
( 1 1 )
[0095] The output of a neurule Rkis evaluated to 1, i.e., Nk succeeds or is fired, as soon as ks(Nk )≥ 0 and ks(Nk)≥ rs(Nk).
[0096] The output of a neurule Rk is evaluated to i.e., Nk fails or is blocked, as soon as ks(Nk) < 0 and ks(Nk) > rs(Nk).
[0097] Based on the above, the following definitions are provided herein.
1. Success or firing condition of a neurule Nk is the situation where, lks(Nk)|> rs( k) (or equivalently, fr > 1; rs(Nk≠)0), and ks(Nk)>0.
2. Failure or blocking condition of a neurule Nk is the situation where lks(Nk )l>rs(Nk) (or equivalently, fr > 1; rs(Nk≠)0), and ks(Nk)<0.
[0098] In an embodiment, the integrated inference engine 104c uses the goal stack 104c3, where the possible solutions or answers are stored in the form of facts, which is termed as "goal facts." The goal facts denote the conclusions of the neurules which contain a goal variable.
[0099] A goal fact 'Gi' related to a neurule base 104b is an expression of the form "VGi is VG " where VGi C VG and VGi C SVGi- [00100] The integrated inference engine 104c determines the firing potential of each neurule. Initially, the fr's of all neurules are computed. In this process, after evaluation of a condition of a neurule, the fr's of the neurules that contain that variable (called affected rules) are updated. Further, the neurule with the maximum fr is considered, given that it is the neurule most likely (i.e., with the greatest intention) to fire. This means that its first unevaluated condition is considered as the next goal and so on. After the neurule is fired, the working memory 104 is updated with
corresponding conclusions. A similar thing occurs when a neurule is blocked. Hence, at each step of the inference process, rules are competing between each other towards which is closer to fire, based on their fr's. The one with the greatest fr takes the lead. The inference process is terminated either successfully, when there are facts in the working memory 104 containing goal variables and assigned the TRUE value or UNCERTAIN value and no further action can be taken, or unsuccessfully.
[00101] The basic inference procedure is as described herein. Initially, the goal(s) are set on the goal stack 104c3 and the initial facts are fed to the in the working memory 104a, and fr for each neurule is computed. For each fact in the working memory 104, the fr for each neurule is computed. Further, all the affected neurules are determined and their corresponding fr's are updated.
[00102] It is determined whether there are goals in the goal stack 104c3, then the neurule with the maximum fr is selected from affected rules or from the unevaluated neurules, In case,
1. If the firing condition is satisfied for the rule, then the working memory 104a is updated with the sibling assertions related to the conclusion of the rule. Further, the goals having the same variable as the conclusion of the fired rule are removed from the goal stack 104c3.
2. If the blocking condition is satisfied and all the sibling rules of the rule have already been evaluated and blocked, the working memory
104a is updated with the fact related to the falsity of the conclusion of the rule, the affected rules are updated and their fr's are updated. Further, the goal corresponding to the conclusion of the blocked rule is removed from the goal stack 104c3.
3. If neither of the above condition is satisfied, the first unevaluated condition of the rule is selected.
4. If the condition contains an input variable, a value from the user is requested and the working memory 104a is updated with the sibling assertions related to the condition. Further, the affected rules are updated and their fr's are updated.
5. If the condition contains an intermediate variable, consider from the sibling rules having that variable in their conclusions, the condition with the maximum fr, select the first unevaluated condition.
6. If the working memory 104 contains any goal facts assigned a TRUE or UNCERTAIN value, then those goal facts are extracted.
[00103] FIG. lc illustrates the integrated decision support system with an input and an output, according to an embodiment as disclosed herein. As depicted in the FIG. lc, the input of the integrated decision support system includes a user interface module (UTM). Further, the output of the integrated decision system includes a decision module.
[00104] FIG. Id illustrates modules involved in the input and output of the integrated decision support system, according to an embodiment as disclosed herein. The functional part of the integrated decision support system is shown in the FIG. Id. The integrated decision support system mainly includes four module such as the input module, the reasoning module 102, the learning module 104 and the output module. The input module consists of four sub modules namely the UEVI, Domain Knowledge, Problem Modeling Module (PMM) and a Pre-Processing Module (PPM). The reasoning module consists of three sub modules namely Knowledge Base (KEB), Belief Base (BB) and Uncertainty Reasoning Module (URM). The learning module consists of three sub modules namely Working
Memory (WM), Neurule Base (NRB) and Integrated Inference Engine (HE). The output module consists of three sub modules namely a Post Analysis Module (PAM), a Decision Module (DM) and an Explanation Module (EM). The user interacts with the system through UTM module. The Domain Knowledge module stores the relevant data about the problem. The PMM models the problem into input conditions with attributes. Then the preprocessing is performed by PPM. This preprocessed input data is stored in the KEB. While processing the data from the KEB if any uncertain factor is occurred then the reasoning module will assign sub belief factors and the belief masses are stored in BB. Further, the uncertainty reasoning module 102c will compute the composite belief value using the sub belief factor mass values from the BB. These composite belief values are evaluated as the fact assertion of uncertain factor in the working memory 104a during the learning process. In the learning module 104, the input conditions with respective fact assertions are processed in WM. The WM has four fact assertion modules such as true, false, unknown, uncertain.
[00105] A fact assertion has the structure: (Fi, ass(Fi))m where , is a fact and ass(Fj) is the assertion value related to it, which can take any one of (TRUE, FALSE, UNKNOWN, UNCERTAIN}. A fact has the same format as a condition or a conclusion of a rule.
Where DFi is the variable and <iFi is the corresponding value associated with the fact. Fact assertions are produced as intermediate or final conclusions during an inference process or provided as an initial input data by the user.
[00106] The condition evaluation and neurule evaluation is performed in integrated inference engine 104c using Goal Stack 104c2. The produced neurules from empirical data are stored in the neurule base 104b. Further the integrated inference engine 104c processes the inferences by
considering fact assertions from the working memory 104a and produces rules from neurule base 104b using Goal Facts. The conclusions have been transferred to the output module for further processing. The post analysis of the conclusions is performed in the post analysis module and the decisions are taken at decision module. Further, the explanation module provides explanations of the decisions taken at the decision module to the user.
[00107] FIG. 2 is a flow diagram illustrating a method 200 for deriving inferences from data, according to the embodiments as disclosed herein. At step 202, the method 200 includes obtaining the plurality of data sets. The method 200 allows the reasoning module 102 to obtain the plurality of data sets from the knowledge base 102a. Each data set includes the set of attributes. In an example, a first data set can include first set of attributes as hair loss, Red Ears and Dizziness. A second data set includes either the first set of attributes as hair loss, Red Ears and Dizziness or a different set of attributes.
[00108] At step 204, the method 200 includes determining sub- attributes for each of the attribute in each data set. The method 200 allows the reasoning module 102 to determine the sub-attributes for each of the attribute in the each data set. In an example, for the hair loss attribute in the data set, the determined sub-attributes can be Namastosis, Cancer Symptoms, Baldness and Dandruff. In an example, when the data set includes a set of symptoms and associated diseases for those symptoms, then each symptom can be considered as an attribute in the data set and the associated diseases for that symptom are determined as the sub attributes. If the hair loss is considered as the attribute, then the sub-attributes are determined as Namostosis, Cancer Symptoms, Baldness and Dandruff which are the diseases that are considered for the hair loss symptom.
[00109] At step 206, the method includes initiating uncertainty reasoning when uncertain data is encountered in each attribute and each
sub-attribute. The method 200 allows the uncertainty reasoning module to initiate uncertainty reasoning when uncertain data is encountered in each attribute and each sub-attribute. The uncertainty reasoning is initiated in the uncertainty reasoning module 102c when uncertain data is encountered in each sub-attribute such as Namostosis, Cancer Symptoms, Baldness, Dandruff or the like.
[00110] At step 208, the method 200 includes constructing the neurule based on the dependency information between the set of attributes and the sub-attributes in the each data set. The method 200 allows the working memory module 104a to construct the neurule based on the relation (i.e., dependency information) between the set of attributes and the sub-attributes in each data set. The degree of dependency is determined between the set of attributes and the sub-attributes in the each data set. In an example, consider the attribute as Hair Loss and the sub-attributes for the Hair Loss are considered as Namastosis, Cancer, Mental Stress, Baldness. In this case, the degree of dependency can be should be considered as the sub-attributes which cause the Hair Loss. In an example, for the Hair Loss attribute is dependent only on the sub-attributes which include only Namastosis and Cancer (but not Mental Stress and Baldness).
[00111] In an embodiment, the input conditions and values are fed to the learning module 104 from the knowledge base 102a. Then, the dependency information is formulated according to domain variables and learning process is initiated. The initial neurules are constructed for each value of intermediate or output variable by considering the dependency information and it represents the possible intermediate or final conclusions.
[00112] At step 210, the method 200 includes creating the training set for each constructed neurule. If each constructed neurule Nk has η¾ conditions, then its corresponding initial training set for each initial neurule is extracted. For each initial rule Nk, an initial training set Xk is extracted.
In an embodiment, each initial neurule is individually trained using the LMS technique after determining the corresponding training set associated with the neurule.
[00113] At step 212, the method 200 includes deriving the inferences from each constructed rule based on the set of attributes and the sub-attributes in each data set. The method 200 allows the integrated inference engine 104c to derive the inferences from each constructed neurule based on the set of attributes and the sub-attributes in each data set. In an embodiment, the method 200 includes creating the training set for each constructed neurule. The method 200 allows the working memory module 104a to create the training set for each constructed rule. In an example, if each constructed initial neurule Nk has η¾ conditions, then its corresponding initial training set for each initial neurule is extracted. For each initial rule Nk, an initial training set Xk is extracted. In an embodiment, each initial neurule is individually trained using a least mean square (LMS) technique after determining the corresponding training set associated with the neurule.
[00114] At step 214, the method 200 includes evaluating the plurality of neurules. The method 200 allows the neurule evaluation unit 104c2 to evaluate each neurule from the plurality of neurules. The neurule evaluation has two states, fired and blocked. If the neurule is in fired state it is evaluated as T and if the neurule is in blocked state it is evaluated as '- . The set of fired rules during an inference process is denoted by NF, whereas that of blocked rules by NB- Also, the set of evaluated rules is denoted by NE and that of unevaluated rules by Nu-
[00115] At step 216, the method includes evaluating the plurality of neurules. The method 200 allows the neurule base 104b to store the plurality of neurules. The produced neurules are stored in the neurule base 104b and will be used for inferencing.
[00116] The various actions, acts, blocks, steps, or the like in the method 200 may be performed in the order presented, in a different order or simultaneously. Further, in some embodiments, some of the actions, acts, blocks, steps, or the like may be omitted, added, modified, skipped, or the like without departing from the scope of the invention.
[00117] In order to produce a neurule base under uncertainty, four types of data are required, which includes the following.
• A set of domain specific variables, with their possible set of values (i.e., the specific variables include the attributes and possible set of values include the sub-attributes associated with each attribute in the domain)
• A set of possible related elements and its subsets. The possible related elements are the elements that may be relevant to the attributes in the domain.
• Dependency information between variables. The dependency information indicates how the variables are dependent to each other, and
• A set of all possible empirical data. The empirical data denotes the data gathered from experimentation.
[00118] It is necessary to mention the problem domain specific variables and its possible set of values for the formation of problem specification. Representation of domain specific variables and their values get interchanged based on the dependency between concepts and relations between them to make decisions or conclusions.
[00119] Consider a finite set of domain specific variables of the problem domain involved in making decisions or inferences in the system as
D = {Di}, l<= i<=n (12)
[00120] Each variable Di can take values from a set of discrete or continuous values
XDl = { Dij } K= j<=q (13)
[00121] Consider a set of possible attributes in which the domain variables have continuous values
TDl = {bDi, bDm} ; l<= k<=m (14)
[00122] The attributes are determined according to the domain knowledge and it consists of the amount of uncertainty, belief and disbelief. It should be noted that the total belief present in a system will be equal to one.
i.e., 0<=TD1<=1 (15)
[00123] Consider a set of sub attributes for the corresponding attributes BDi
BOl = {b . . . .ba} ; 1<=1<=η (16)
[00124] The composite belief value, the maximum amount of belief can be represented by BD where Bni.€ Toi- . The maximum belief value lies between 0 and 1.
0 < BDi < 1, since BDi < TDi (17)
[00125] The dependency information for each variable is computed according to domain knowledge (by the expert) and the dependency information represents how the domain variables depend on each other or related to each other.
[00126] Dependency information fD Vi related to the set of domain specific variables D is a relation:
fDvi : D x D→{T, F, 0, Un} (18)
[00127] The dependency information can be represented as a set of ordered pairs
D^ D. Dj C D, i≠j } (19)
[00128] The dependency information can be represented as, (Di, Dj), i.e., "Di depends on Dj." An example dependency matrix is provided in Table 1. The empirical data set, E, consists of a number of patterns pi:
E = {pi, 1 < i <n } (20)
[00129] Each pattern pi is an 'm' tuple of values:
pi = <dii, di2, . . . , dim> (21) where m = IDI and each dij C Edj, Dj C D
[00130] Each dik denotes that the fact (condition) "Dk is d^" is true. Goal value of the pattern is the last value of a pattern pi, which corresponds to a goal variable from a subset Ek of E which is usually used for training a neurule and is denoted by Did. For negative examples having and for positive examples having "1" is termed as a goal value. The various steps involved in constructing the neurles based on the domain variables, the dependency information, the set of empirical data and the set of sub- attributes is as described herein. The empirical data in the knowledge base 102a is pre-processed. If an incomplete or uncertain data is encountered, the uncertainty reasoning process is initiated. Consider the possible sub attributes and assign the weights (i.e., belief masses) in the belief base 102b using Dempster- Shafer's evidence theory. For each attribute, BD of domain variable Dv, having uncertainty, each possible sub-attribute is determined, and weights are assigned. Further, the weights to subsets of each sub- attribute is associated using theory of evidence, bDi C BD.
[00131] From the assigned weights to the sub-attributes, the upper and lower bounds of a probability interval can be defined. The lower bound, belief for a set, is defined as the sum of all the weights of subsets of the set of interest. Further, the belief interval is categorized into belief, disbelief and uncertainty. The uncertainty value 'Un' is determined by subtracting the belief and disbelief from the total belief.
[00132] The uncertain values are computed in the uncertainty reasoning module 102c, using joint mass representation table' s orthogonal summation process in which the independent evident sources are combined to compute a composite belief. The composite belief (called the joint mass) Boi from different data sets of masses is determined using orthogonal summation.
[00133] Further, the input conditions and its values are fed to the learning module 104 from the knowledge base 102a. The dependency information is formulated according to domain variables and learning process is initiated. The initial neurules are constructed for each value of intermediate or output variable by considering the dependency information and the neurules represents the possible intermediate or final conclusions. For each inferable variable Dv in D and for each possible value dv of Dv, the initial neurules are constructed using the dependency information. One initial neurule is constructed for each value of each intermediate or output variable. Initial neurules represent the possible intermediate or final conclusions. The conditions of each initial neurule include the variables that contribute in drawing the corresponding conclusion, as specified by the dependency information. For each initial rule Nk, an initial training set Xk is extracted from X. Each pattern in Xk consists of as many values as the number of different variables in Nk- The last value is the goal value.
[00134] After the training set for each initial neurule is determined, each initial neurule is individually trained, using the LMS algorithm, using the training set. It should be noted that the training is not always successful, i.e.,, a set of significance and bias factors cannot always be found that correctly classifies all of the training examples. This is the scenario, when the training patterns constitute a nonlinear set and are therefore inseparable. Further, the values for the bias and significance factors are calculated that classifies all training patterns, one neurule is produced. When it fails, due to
inseparability of the training examples, a splitting process is followed. More specifically, the initial training set of the neurule is divided into two subsets and two copies of the initial neurule are trained, each using one of the training subsets. If training of either neurule copy is failed, its subset is further divided into two other subsets and so on, until there is no failure. In this way, more than one neurule are produced, having the same conditions with different bias and significance factors and the same conclusion, called sibling neurule s.
[00135] The above mentioned steps can be summarized as mentioned below:
For each Nk,
1. Train Nk using the LMS algorithm with Xk as the training set.
2. If training is successful, produce Nk with the calculated sfiN k and terminate (success).
3. Divide Xk into two suitable subsets, Xki and Xk2
4. Apply steps 1 to 3 with Xk = Xki and Xk = Xk2 separately.
[00136] In the above method, a point left unspecified is to divide the training set into "suitable" subsets. Dividing the training set is based on the following criteria.
1. The patterns in each of the two subsets are closer between each other than between each one of them and each one of those in the other subset.
2. The "closeness" between patterns is estimated through their "distance," based on some distance metric, like hamming, euclidean, manhattan, or value difference metric (VDM), or the like.
[00137] The method 200 for deriving the inferences from the data sets is explained with an example as described herein. Consider six symptoms (Swollen Feet, Red Ears, Hair Loss, Dizziness, Sensitive Aretha, Placibin Allergy), two diseases (SuperciUiosis and Namastosis), whose 5 diagnoses are based on the symptoms, and three possible treatments (Placibin, Biramibio, and Posiboost).
[00138] One variable is assigned for each symptom, disease, and treatment: "sf ' for "Swollen Feet," "re" for "RedEars," "hi" for "hair loss," "dz" for "Dizziness," "sa" for "SensitiveAretha," "pa" for 10 "PlacibinAllergy," "sc" for "SuperciUiosis," "nm" for "Namastosis," "pi" for "Placibin" "bi" for "Biramibio," "po" for "Posiboost." The dependency information is depicted in Table 1 (where x means "depends on").
Table 1
[00139] From the table 1, it should be noted that that the symptoms "Swollen Feet", "Red Ears" and "Hair Loss" should be present for the disease "SuperciUiosis". Further, the symptoms "Hair Loss", "Dizziness" and "SensitiveAretha" should be present for the disease "Namastosis". In a similar manner, for determining the disease to be present, the
corresponding symptoms (marked as 'x' in the above table 1) should be present. Further, the table shows five neurules for the five diseases which are "Supercilliosis", "Namastosis", "Placibin", "Biramibio" and "Posiboos ". Thus, each neurule is created based on dependency information mentioned in the table 1.
[00140] The empirical data set of the problem is presented in Table 2 (where T means "true," F means "false," and X means "unknown" and Un means "Uncertain"). Therefore,
D = { sf, re, hi, dz, sa, pa, sc, nm, pi, bi, po}, XDi = {true} f°vi = {(sc, sf), (sc, re), (sc, hi), (nm, hi), (nm, dz), (nm, sa),(pl, pa),
(pi, sc), (pi, nm), (bi, hi), (bi, sc), (bi, nm), (po, pi), (po, bi)}
Table 2
[00141] From the table 2, it can be inferred that that the symptoms "Swollen Feet", "Red Ears" and "Hair Loss" should be present for the disease "Supercilliosis". In a similar manner, the various combinations of TRUE, FALSE, and Uncertain are depicted in the above table 2.
[00142] It is considered that the Hair Loss, Red Ears and Dizziness are the attributes which have uncertain values for some instances. For finding the uncertain factor value, the sub-attributes are determined (by using the information from journals and other knowledge sources). For the Hair Loss, the considered sub-attributes are "SA" for "Acute Namastosis," "CS" for "CancerSymptoms," "MS" for "MentalStress", "DA" for "Dandruff," "BA" for "Baldness." For Red Ears, the considered belief factors are, "SA" for "Acute Sarcophagus," "WD" for "WaxDeposit," "IN" for "Injury," "AL" for "Allergy," "CO" for "Cold." For Dizziness, the considered belief factors are, "SA" for "Acute Sarcophagus," "SL" for "SugarLevel," "PR" for "PulseRate," "FI" for "Foodlntake," "PE" for "Physical Exercise.
[00143] The reasoning module 102 for quantifying the uncertainty is integrated with the learning module 104 for evaluation of neurule and for deriving inferences and hence the architeure is termed as the integrated decision support system 100a. While processing the data, if any uncertainty factor occurs then the corresponding sub-attributes are determined. The upper and lower bounds of probability interval are defined by using Dempster-Shafer's evidence theory and assigned the belief masses for the sub-attributes. In order to reason with uncertainty, the uncertainty region is determined by categorizing the belief region into belief, disbelief and uncertainty present, according to the belief interval theorem. Further, the amount of uncertainty present for the individual knowledge sources is computed. Then the belief, disbelief and the computed uncertainty of independent knowledge sources are distributed using a joint mass assignment table to compute the composite belief. This computed composite belief value will be evaluated as fact assertion for corresponding input condition. The uncertain factor value i.e., the composite belief value, is calculated. Five training sets are formed for each of the five initial rules
is depicted in Table 3 (where T means "true," F means "false," 0 means "Unknown" and Un means "uncertainty").
S I S2 S3 S4 S5
T T T 1 T F 0 -1 F T F 1 T F T -1 T F 1
F 0 F -1 F T T 1 F F T 1 F T F 1 T T -1
F T F 1 T T F 1 T T T -1 T T T -1 F F -1
T F T -1 Un Un T 1 F F F -1 F F F -1 F T -1
T Un Un 1 Un T T 1 F T T -1 F T T 1
F T T -1 F Un T 1 T T F -1 Un T T 1
F Un T 1 T Un F -1 0 T T 1 Un T F -1
T T Un -1 T F T 1 T F T 1 T T F -1
F F F -1 T F F 1
T F F 1
Table 3
[00144] From the table 3, it can be inferred that the training set is created for each neurule which is denoted as SI, S2, S3, S4 and S5. It should be noted training set is created with various combinations of TRUE, FALSE, Unknown and Uncertaini In the table 3, for the neurule S I, the values of Swollen Feet", "Red Ears" and "Hair Loss" should be TRUE for the disease "Supercilliosis" to be present (which is indicated as "1") in the table 3. In a similar manner, various combinations of TRUE, FALSE, Unknown and Uncertainiare assigned for each neurule as mentioned in the table 3. The Belief mass assignments formed and the composite belief value computed is mentioned in the table 4 below.
Knowledge Source - KSl Knowledge Source - KS2
Sub belief factors frame Sub belief factors frame
NM cs MS BA DA Θ NM CS MS BA DA Θ
0.25 0.2 0.1 0.15 0.05 0.25 0.37 0.03 0.09 0.06 0.13 0.32
0.4 0.05 0.2 0.04 0.02 0.29 0.23 0.02 0.05 0.07 0.23 0.4
0.38 0.16 0.22 0.03 0.01 0.2 0.19 0.08 0.16 0.09 0.22 0.26
HL 0.19 0.1 0.14 0.28 0.09 0.2 0.32 0.12 0.07 0.09 0.13 0.27
0.28 0.19 0.12 0.13 0.06 0.22 0.38 0.11 0.13 0.05 0.03 0.3
0.42 0.07 0.01 0.09 0.1 0.31 0.35 0.15 0.08 0.02 0.05 0.35 s Sub belief factors frame Sub belief factors frame
Y M SC WD AL CO IN Θ SC WD AL CO IN Θ P
RE
T 0.34 0.08 0.07 0.2 0.19 0.22 0.21 0.15 0.13 0.07 0.09 0.35 o 0.28 0.24 0.06 0.03 0.01 0.38 0.42 0.11 0.16 0.01 0.08 0.22
M
s Sub belief factors frame Sub belief factors frame
SA PR FI PE SL Θ SA PR FI PE SL Θ
0.27 0.11 0.07 0.09 0.1 0.36 0.31 0.09 0.02 0.14 0.12 0.32
DZ
0.19 0.21 0.2 0.18 0.01 0.21 0.25 0.06 0.18 0.11 0.16 0.24
0.3 0.12 0.13 0.04 0.23 0.18 0.29 0.15 0.17 0.08 0.1 0.21
Table 4
[00145] From the table 4, it should be noted that the knowledge sources KSl and KS2 includes attributes and sub-attributes respectively. Each sub-attribute is assigned with a weight and the composite belief value is computed for different weights of the assigned to each sub-attribute corresponding to each attribute in the KS l and KS2 respectively. The condition evaluation and the initial neurules are produced as mentioned in the table 5 below. Rl
(0.3) if swollen feet is true (4.7),
Hair Loss is true (3.7)
Red ears is true (0.7) then Supercilliosis
R2
(1.5) if Hair Loss is true (4.5),
Dizziness is true
Sensitive Aretha is true (1.1)
then Namastosis is true.
R3
(-1.8) if Placibin Allergy is true (-5.
Supercilliosis is true (4.3)
Namastosis is true (3.0)
then Placibin is true.
R4
(-3.6) if Hair Loss is true (-4.4),
Supercilliosis is true (4.2)
Namastosis is true (2.0)
then Biramibio is true.
R5
(-2.9) if Placibin is true (-2.1),
Biramibio is true (0.1)
then Possiboost is true.
R6
(2.7) if Biramibio is true (2.5),
Placibin is true (1.1)
then Possiboost is true
Table 5
[00146] In order to test the firing potential and blocking principle of neurules, consider a neurule R2 from the table 5.
[00147] The initial values of ks(R2) and rs(R2) can be given as ks(R2)=1.5 and rs(R2)=l4.5l+l2.5l+ll.11=8.1. Consider that the first condition "HairLoss is true" is evaluated to be uncertainty. Then, ks(R2)=1.5 +0.64*4.5= 4.38 and rs(R2)=8.1-14.51=3.6. So, ks(R2) > 0 and ks(R2) > rs(R2) and the rule is fired without any further evaluation, because the firing condition is met.
[00148] Similarly, consider the neurule R4 from the table 5. The initial values of ks(R4) and rs(Rl) can be given as ks(R4)=-3.6 and rs(R4) = 1-4.41+14.21+12.01=10.6. Let assume that the first condition "HairLossis true" is evaluated to be uncertainty. Then, ks(41)=-3.6+0.64*(-4.4)=-6.4 and rs(Rl)=10.6-l-4.4l=6.2. So, ks(R4) < 0 and ks(R4) > rs(R4) and the rule
is blocked without any further evaluation, because the blocking condition is met.
[00149] The conditions are arranged in various ways to achieve more efficient response. It is noted that an overall ordering of the conditions of neurules derives inferences in an efficient manner.
[00150] More specifically, the derived inferences can be accurate when the conditions in a neurule Nk consisting of nNk conditions are ordered in a way that |sfl |>|sf2|> ... >|sfnNkl. Thus, internally, the conditions of each neurule are ordered in that way."
[00151] In an example inference, consider that the input data (i.e., the data for sf, re, hi, dz, sa, and pa) of the first row of table 2 (as mentioned). This means that the following facts are available in the working memory 104a {("sf is true," T), ("re is true," F), ("hi is true," Un), ("dz is true," T), ("sa is true," T), ("pa is true," T)}.
[00152] In order to derive the inferences, each constructed neurule is examined as mentioned below.
• The neurule Rl is fired, because at some point known sum(Rl), ks(Rl) = 0.3+(l)*4.7+(0.64)*(3.7) = 7.3 > 0 and remaining sum(Rl), rs(Rl)= 0.7. Since ks(Rl)=7.3>0 and lks(Rl)l>rs(Rl), ie.,
|7.3|> 0.7. So, ("sc is true", F) is added to the working memory 104a.
• The neurule R2 is also fired, because at some point, known sum(R2), ks(R2) = 1.5+(0.64)*(4.5)+(0.42)*(2.5) = 5.4 > 0 and remaining sum(R2), rs(R2)=l. l. Since ks(R2)=5.4 >0 and
|ks(R2)|>rs(R2), ie., |5.4|>1.1. So, ("nm is true", T) is added to the working memory 104a.
[00153] After examining Rl and R2, at second level, R3 and R4 are examined:
The neurule R3 is fired, because at some point, known sum(R3), ks(R3) = -1.8+(-l)*(-5.6)+ (l)*(4.3) = 8.1 > 0 and remaining sum, rs(R3)=3.0. Since, ks(R3)=8.1> 0 and lks(R3)l>rs(R3), ie., I8.1l>3.0. So, ("pi is true," T) is added to the working memory 104a.
The neurule R4 is blocked , because at some point, known sum(R4), ks(R4)= -3.6 +(0.64)*(-4.4)+(-l)*(4.2) = -10.6 <0 and remaining sum, rs(R4)=2.0. Since, ks(R4)= -10.6 < 0 and lks(R4)l>rs(R4), ie., I-10.6I>2.0. So, ("bi is true," T) is added to the working memory 104a.
[00154] Finally, the neurule R5 and R6 are examined.
[00155] The neurule R5 is blocked, because at some point, known sum(r5), ks(R5)=-2.9+(l)*(-2.1) =-5.0 and remaining sum, rs(R5)=0.1. Since, ks(R5)=-5.0< 0 and lks(R5)l>rs(R5), ie., I-5.0|>0.1. So, ("po is true," T) is added to the working memory 104a.
The neurule R6 is fired, because at some point, known sum(R6), ks(R6)= 2.7+(l)*(2.5)= 5.2 < 0 and remaining sum, rs(R6)=l.l. Since, ks(R6)= 5.2 > 0 and |ks(R6)|>rs(R6), ie., |5.2|>1.1. So, ("po is true," T) is added to working memory 104a.
[00156] From the above mentioned description, it should be noted that that the last five values of the selected row of the table 2 is verified. After testing the above described method, using all the patterns of the example data set of the table 2, it is determined that all results (intermediate and final) are accurate.
[00157] FIGS. 3a and 3b illustrate an example representation of a neurule, according to the embodiments as described herein. The neurules are the integration of neuro computing and symbolic rules. The formation of a neurule is represented in the FIG. 3a. In the FIG. 3a, Ii, I2, ... ,In are the input conditions with corresponding weight values sfi, sf2,... , sfn known as
significance factors. The bias value sfo, is termed as a bias factor of the neurule. Internally, each neurule is considered as an adaline unit as shown in the FIG. 7b, which uses LMS for learning and are more safely convergent for nonlinear training sets. Each input condition receives a value from the set of values [l(true), - 1 (false), O(unknown), Un (uncertain)]. The 'Un' denotes the uncertain factor value to be calculated using the reasoning module 102a.
[00158] The conclusion (which is a decision) of the rule is represented by the output "O", which is calculated through the standard formulae of equations (22) and (23) as mentioned below.
n
0 = f v , v = sf0 + sfiIi
(22)
<- — 1 ootthheerrwwiise
(23) where V is the activation value and the threshold function f(v) is called as activation function. The output can take one of two values (-1, 1) representing failure or success of the neurule. The significance factor of a condition represents the significance (weight) of the condition in deriving the conclusion.
[00159] The neurule evaluation has two states, fired and blocked. If the neurule is in fired state, the neurule is evaluated as T . If the neurule is in blocked state, then it is evaluated as '- Γ . The set of fired rules during the inference process is denoted by Np, whereas that of blocked rules by NB- Also, the set of evaluated rules is denoted by NE and that of unevaluated rules by Nu- The output of a neurule Nk is computed according to equations (22) and (23). All input conditions of the neurule is evaluated to compute
v(Nk), the activation value of Nk, by using the below Equations so that each input conditions contribution to the current activation value can be encountered.
[00160] The success or firing condition of a neurule Nk is the situation where, lks(Nk )|≥ rs(Nk) (or equivalently, fr > 1 ; rs(Nk≠)0), and ks(Nk)>0.
[00161] The failure or blocking condition of a neurule Nk is the situation where lks(Nk )l>rs(Nk) (or equivalently, fr > 1 ; rs(Nk≠)0), and ks(Nk)<0.
[00162] FIG. 4 illustrates fact assertions in a neurule, according to the embodiments as disclosed herein. The fact assertions in the neurule for a Supercilliosis disease are shown in the FIG. 5. There can be many such neurules according to the problem domain and its attributes and the sub- attributes. The fact assertions in the neurule include values as True, False, Unknown and Uncertain as shown in the FIG. 4. The input conditions are denoted as condition 1, condition 2 and so on to condition n. The weight values weight values sfi, sf2, . . . , sfn are the significance factors. Each input condition receives a value from the set of values [l(true), - 1 (false), 0(unknown), Un (uncertain)]. The 'Un' denotes the uncertain factor value which is evaluated and received from the reasoning module 102a.
[00163] The conclusion (which is a decision) of the rule is represented by the output "O", which is calculated through the standard formulae of equations (11) and (12) as described above.
[00164] FIG. 5a illustrates an example neurule structure for a Supercilliosis disease, according to the embodiments as disclosed herein. The neurule structure and adaptation of the neurule to a medical diagnosis problem is depicted in the FIGS. 5a-5c. Although the neurule structure is shown by considering the medical diagnosis problem, it should be noted that any real world problem can be modeled as neurule structure. There can
be many such neurules according to the problem domain and its attributes and the sub-attributes.
[00165] The neurule structure for the Supercilliosis disease and the attributes of the Supercilliosis disease (which include Swollen Feet, Hair 5 Loss and Red Ears) and the sub-attributes (i.e., Namastosis, Cancer Symptoms, Mental Stress, Baldness and Dandruff) for the Hair Loss attribute are shown in the FIG. 4a. The input values include [l(true), - 1 (false), O(unknown), Un (uncertain)]. The 'Un' denotes the uncertain factor value to be calculated using the reasoning module 102a.
10 [00166] FIG. 5b illustrates the example neurule structure for the
Supercilliosis disease with a belief region, according to the embodiments as disclosed herein. The neurule structure for the Supercilliosis disease and the attributes of the Supercilliosis disease (which include Swollen Feet, Hair Loss and Red Ears) and the sub-attributes for the Hair Loss attribute along
15 with the belief region is shown in the FIG. 5b.
[00167] FIG. 5c illustrates the example neurule structure for the Supercilliosis disease with a sub-arbitrary belief region, according to the embodiments as disclosed herein. The neurule structure for the Supercilliosis disease, the attributes of the Supercilliosis disease and the
20 sub-attributes for the Hair Loss attribute along with the sub-arbitrary belief region is shown in the FIG. 5c.
[00168] FIG. 6 is a graph showing comparison of inference mechanisms in terms of runtime, according to the embodiments as disclosed herein. The experimental results regarding the performance of integrated decision support system 100a which includes learning and reasoning system are presented in the table 6 below, and are compared with basic neurule based system which is mentioned in the table 6.
(msec) (msec)
CANCER (20) 0.033 10.62 0.032 10.06 3.076 5.415
LENSES (24) 0.047 27.02 0.045 25.50 4.347 5.788
ACUTE (39) 0.043 13.05 0.041 11.56 4.761 12.108
IRIS (150) 0.058 63.48 0.055 60.51 5.309 4.790
CAR (1728) 0.158 122.27 0.154 117.96 2.564 3.588
NURSERY 3.674 5.659
0.194 147 0.187 138.91
(12960)
RB I 3.454 6.528
0.265 129.7 0.256 121.5
(59 rules)
RB2 2.189 2.553
0.508 289.91 0.497 282.60
(134 rules)
Table 6
[00169] FIG. 7 is a graph showing comparison of inference mechanisms in terms of computations, according to the embodiments as
5 disclosed herein. The table 6 shows comparison of computational cost of the neurules under uncertainty (integrated decision support system 100b) with that of the neurules without uncertainty.
[00170] The mean computational time required to draw the conclusions is represented by "Runtime" and the mean number of
10 computations required to reach conclusions is represented by
"Computations". In calculating computations, the mean number of times that a product of a significance factor and a corresponding condition value are added to the known sum of a neurule is considered. Since the neurules inference process follows a backward chaining-based strategy for symbolic
15 reasoning, efficiency of neurules inference mechanism is high compared to
other connectionist expert systems. During the integrated reasoning and learning process, integrated decision support system 100b consumes slightly larger number of computations and runtime than the basic neurule based system but in a reasonable amount as shown in the FIG. 6 and the
20 FIG. 7. After performing the statistical t-test with 99% confidence, analysis
is performed to determine that the difference in computational cost and
runtime is statistically not significant as compared to neurules without uncertainty.
[00171] FIG. 8 is a graph showing comparison of inference mechanisms in terms of convergent rate, according to the embodiments as 5 disclosed herein.
Table 7
[00172] The above table 7 shows the experimental results comparing the performance of the inference mechanisms of 'integrated decision support system 100b and 'basic neurule based system' in terms of convergent rate. As shown in the FIG. 8, the convergent rate is little bit higher for the integrated decision support system , but the classification accuracy is high. The "Convergent rate," is the ratio of the number of necessary (i.e., the least required) inputs to the total number of asked inputs. Since, the set of sub-attributes are considered for calculating the uncertainty factor, the number of least required factors will be somewhat more in neurules with uncertainty compared to neurules without uncertainty. Two datasets and two rules are used for the comparison of convergent rate. After performing the statistical t-test with 99% confidence,
analyzed that the difference in convergent rate is statistically not significant as compared to neurules without uncertainty.
[00173] If the conditions or criteria used for classification are more specific and accurate to the problem then the resultant classification also will be more specific and accurate. In other words, when more parameters are considered that are relevant to the particular problem, then the resultant classification will be far more accurate and perfect for that domain although it may take a little more computational time. This method finds application in the medical field aiming to find the indeterminacy present in disease identification, since in most cases the symptoms will be incomplete or partial. Another interpretation of this approach is, while dealing with the medical domain problems, the existence of symptom can be determined by calculating its degree and thus identify the disease diagnosed.
[00174] FIG. 9 is a graph showing comparison of generalization performance for various learning methods, according to the embodiments as disclosed herein. Generalization capabilities of a learning method can be interpreted as how well the method is capable of handling new input data after the system has been trained. It has been proved that learning methods such as back propagation that use continuous variables might generalize better to unseen examples, thereby creating more robust systems. A number of experiments are conducted to test the generalization capabilities of the integrated decision support system 100b i.e., under continuous variables.
BREAST CANCER 96.73 97.15 98.10
NURSERY 99.31 99.82
MONKS 1 99.77 99.95 100
Table 8
[00175] The table 8 and the FIG. 9 show results regarding the classification accuracy (generalization) of the integrated decision support 5 system 100b, i.e., by considering continuous variables on unseen test examples as compared with the ones of the basic neurule based system without considering uncertain factors, i.e., discrete factors alone. The presented network generalized quite well in unseen examples, i.e., to a new pattern, since the network learned to handle the incomplete data.
10 [00176] FIG. 10 illustrates a computing environment implementing the method for deriving inferences from data sets, according to the embodiments as disclosed herein. As depicted in the figure, the computing environment 1002 comprises at least one processing unit 1008 that is equipped with a control unit 1004 and an Arithmetic Logic Unit (ALU)
15 1006, a memory 1010, a storage unit 1012, plurality of networking devices 1016 and a plurality Input output (I/O) devices 1014. The processing unit 1008 is responsible for processing the instructions of the technique. The processing unit 1008 receives commands from the control unit in order to perform its processing. Further, any logical and arithmetic operations
20 involved in the execution of the instructions are computed with the help of the ALU 1006.
[00177] The overall computing environment 1002 can be composed of multiple homogeneous and/or heterogeneous cores, multiple CPUs of different kinds, special media and other accelerators. The processing unit 25 1008 is responsible for processing the instructions of the technique.
Further, the plurality of processing units 1008 may be located on a single chip or over multiple chips.
[00178] The technique comprising of instructions and codes required for the implementation are stored in either the memory unit 1010 or the storage 1012 or both. At the time of execution, the instructions may be fetched from the corresponding memory 1010 or storage 1012, and executed by the processing unit 1008.
[00179] In case of any hardware implementations various networking devices 1016 or external I/O devices 1014 may be connected to the computing environment to support the implementation through the networking unit and the I/O device unit.
[00180] The embodiments disclosed herein can be implemented through at least one software program running on at least one hardware device and performing network management functions to control the elements. The elements shown in the FIGS. 1 through 10 include blocks which can be at least one of a hardware device, or a combination of hardware device and software module.
[00181] The embodiments disclosed herein can be implemented through at least one software program running on at least one hardware device and performing network management functions to control the elements.
[00182] The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of
limitation. Therefore, while the embodiments herein have been described in terms of preferred embodiments, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the embodiments as described herein.
Claims
1. A method for deriving inferences from data sets, the method comprising:
obtaining a plurality of data sets, wherein each data set includes a set of attributes;
determining sub-attributes for each of the attribute in each data set; constructing a neurule based on a degree of dependency between the set of attributes and the sub-attributes in the each data set; and
deriving the inferences from each constructed neurule based on the set of attributes and the sub-attributes in each data set.
2. The method of claim 1, wherein said method comprising creating a training set for each constructed neurule.
3. The method of claim 1, wherein the derived inferences include a plurality of neurules with a value associated with each neurule.
4. The method of claim 1, wherein deriving the inferences from each constructed neurule includes modeling of input conditions for evaluating uncertainty in each attribute and sub-attribute.
5. The method of claim 4, wherein each neurule is constructed using the evaluated uncertainty for deriving the inferences from each construced neurule.
6. The method of claim 3, wherein the method further comprising: evaluating the plurality of neurules to determine validity of each neurule; and
storing the plurality of neurules.
7. A computer program product comprising computer executable program code recorded on a computer readable non-transitory storage medium, the computer executable program code when executed causing the actions including:
obtaining a plurality of data sets, wherein each data set includes a set of attributes;
determining sub-attributes for each of the attribute in each data set; constructing a neurule based on a relation between the set of attributes and the sub-attributes in the each data set; and
deriving the inferences from each constructed neurule based on the set of attributes and the sub-attributes in each data set.
8. The computer program product of claim 7, wherein the computer executable program code when executed causing the actions including creating training set for each constructed neurule.
9. The computer program product of claim 7, wherein the derived inferences include a plurality of neurules with a value associated with each neurule.
10. The computer program product of claim 5, wherein deriving the inferences from each constructed neurule includes modeling of input conditions for evaluating uncertainty in each attribute and sub-attribute.
11. The computer program product of claim 10, wherein each neurule is constructed using the evaluated uncertainty for deriving the inferences from each construced neurule.
12. The computer program product of claim 9, wherein the computer executable program code when executed causing the actions including: evaluating the plurality of neurules to determine validity of each neurule; and
storing the plurality of neurules.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| IN201621022150 | 2016-06-28 | ||
| IN201621022150 | 2016-06-28 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018002953A1 true WO2018002953A1 (en) | 2018-01-04 |
Family
ID=60785275
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/IN2017/050258 Ceased WO2018002953A1 (en) | 2016-06-28 | 2017-06-23 | Integrated decision support system and method for deriving inferences from data sets |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2018002953A1 (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112381321A (en) * | 2020-11-27 | 2021-02-19 | 广东电网有限责任公司肇庆供电局 | Power distribution network operation state sensing method based on gridding division |
| CN112949201A (en) * | 2021-03-17 | 2021-06-11 | 华翔翔能科技股份有限公司 | Wind speed prediction method and device, electronic equipment and storage medium |
| US20210241021A1 (en) * | 2019-03-14 | 2021-08-05 | Panasonic Intellectual Property Corporation Of America | Information processing method and information processing system |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| GB2465861A (en) * | 2008-12-03 | 2010-06-09 | Logined Bv | A reasoning inference making tool for recommending actions based on a hybridisation of a data driven model and knowledge based logic. |
-
2017
- 2017-06-23 WO PCT/IN2017/050258 patent/WO2018002953A1/en not_active Ceased
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| GB2465861A (en) * | 2008-12-03 | 2010-06-09 | Logined Bv | A reasoning inference making tool for recommending actions based on a hybridisation of a data driven model and knowledge based logic. |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20210241021A1 (en) * | 2019-03-14 | 2021-08-05 | Panasonic Intellectual Property Corporation Of America | Information processing method and information processing system |
| US11995150B2 (en) * | 2019-03-14 | 2024-05-28 | Panasonic Intellectual Property Corporation Of America | Information processing method and information processing system |
| CN112381321A (en) * | 2020-11-27 | 2021-02-19 | 广东电网有限责任公司肇庆供电局 | Power distribution network operation state sensing method based on gridding division |
| CN112381321B (en) * | 2020-11-27 | 2023-01-24 | 广东电网有限责任公司肇庆供电局 | Power distribution network operation state sensing method based on gridding division |
| CN112949201A (en) * | 2021-03-17 | 2021-06-11 | 华翔翔能科技股份有限公司 | Wind speed prediction method and device, electronic equipment and storage medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Creel | Transparency in complex computational systems | |
| Amizadeh et al. | Neuro-symbolic visual reasoning: Disentangling | |
| Ignatiev | Towards trustable explainable AI | |
| US20240330772A1 (en) | Calibrated model intervention with conformal threshold | |
| JP5907469B2 (en) | Artificial intelligence device that autonomously expands knowledge by language input | |
| US20220284288A1 (en) | Learning from biological systems how to regularize machine-learning | |
| US20230394304A1 (en) | Method and Apparatus for Neural Network Based on Energy-Based Latent Variable Models | |
| Wang et al. | Deep learning and its adversarial robustness: A brief introduction | |
| Rabuñal et al. | A new approach to the extraction of ANN rules and to their generalization capacity through GP | |
| WO2018002953A1 (en) | Integrated decision support system and method for deriving inferences from data sets | |
| Mishra et al. | Locomotion mode recognition using sensory data with noisy labels: A deep learning approach | |
| Stenning et al. | Probability-free judgment: Integrating fast and frugal heuristics with a logic of interpretation. | |
| CN109190692A (en) | The moving object recognition methods and system of mechanism are recognized and chosen based on biological brain | |
| US20210279547A1 (en) | Electronic device for high-precision behavior profiling for transplanting with humans' intelligence into artificial intelligence and operating method thereof | |
| Mustapha et al. | Introduction to machine learning and artificial intelligence | |
| Krebs et al. | A task driven 3d object recognition system using bayesian networks | |
| KR102781636B1 (en) | Electronic device for high-precision profiling to develop artificial inntelligence with human-like intelligence, and operating method thereof | |
| CN117009863A (en) | Immune repertoire classification methods, devices, equipment and storage media | |
| Wardani et al. | Measuring and Mitigating Bias in Bank Customers Data with XGBoost, LightGBM, and Random Forest Algorithm | |
| Yeganejou et al. | Explainable Artificial Intelligence and Computational Intelligence: Past and Present | |
| Thrun | Extracting symbolic knowledge from artificial neural networks | |
| Sreelekha | NeuroSymbolic integration with uncertainty | |
| Balafas et al. | Addressing over-fitting in passive constraint acquisition through active learning | |
| Khalifa et al. | Verification of neural networks for safety critical applications | |
| Setiono et al. | Rule extraction from neural networks and support vector machines for credit scoring |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17819515 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 17819515 Country of ref document: EP Kind code of ref document: A1 |