Top Graphic - logo
Skip Navigation
Home |
Search | Site Index | Links | Contact
Research | A.I. | Cognition | Services | About AKRI | Papers | Museum

Applied Knowledge & Innovation

Artificial Intelligence : Intelligent Databases

In order to discuss the concept of intelligent database systems it is first useful to briefly examine what is meant by "intelligent". Dictionary definitions of intelligence contain references to concepts such as "intellectual skill" (Chambers 1999), being "endowed with the faculty of reason" or having "the capacity for understanding" and the "ability to perceive and comprehend meaning" (Collins 1994). Other, more general definitions of intelligence often take into account ideas of learning from mistakes. These definitions cannot be credibly applied to even the most advanced machine intelligences currently available so a different interpretation of the term in the context of databases must be sought.

Minsky (1968) concisely defines artificial intelligence as "The science of making machines do things that would require intelligence if done by men" while Eysenck (1990) gives us a way forward into the domain of intelligent databases with the statement, "Artificial intelligence is concerned with the attempt to develop complex computer programs that will be capable of performing difficult cognitive tasks." The tasks that an intelligent database must address are potentially extremely difficult, if not impossible, for a human mind to cope with. Tasks such as these involve searching for and deriving meaningful information across a huge data set. It would be almost impossible for human minds to deduce, induce or infer any significant new data from the vast data repositories with the efficiency or speed that machine intelligences in the form of "intelligent" databases do. Artificial intelligence is very good at addressing the problems that people are very bad at and it is in this context perhaps, that "intelligent" databases should be viewed.

Recent writers in the field such as Bertino, Catania and Zarri (2001) make this marriage between the two technologies explicit; "Intelligent database systems (IDBS) derive from the integration of database (DB) technology with techniques developed in the field of artificial intelligence (AI)". They also point out the inherent weaknesses of the technologies used in isolation, traditional databases lacking any semantic value and the inability of artificial intelligence methods to deal with large-data sets.

Encyclopaedic sources can offer more simplistic definitions. The World Wide Web reference TechEncyclopeadia (2002) describes an intelligent database as "a database that contains knowledge about the content of its data. A set of validation criteria are stored with each field of data, such as the minimum and maximum values that can be entered or a list of all possible entries" whilst Ralston and Reilly (1993) touch upon the role of rules, "In an intelligent database, the rules governing these functions can be stored within the database itself. This allows the database to react with its environment."

It is also useful to highlight some of the differences between traditional database systems and intelligent systems. A traditional database built around the relational data model for instance can only represent facts. Kannan and Geetha (1976) point out strengths from the area of knowledge based systems that traditional databases could benefit from; " facilities for knowledge representation, deductive reasoning, search with backtracking, control structures for deductive, plausible and inductive reasoning, knowledge refinement and validation, automatic classification of knowledge, explanation and dynamic human intervention".

Many of the sources researched for this paper originate from the 1980’s and early 1990’s and therefore attempt to define the concept of "Intelligent" databases in terms of the technologies available at the time. Parsaye et al (1989) write, "Intelligent databases represent the evolution and merger of several technologies including automatic discovery, hypermedia, object orientation, expert systems and traditional databases’. This is the essence of what has become to be known as the "evolutionary" approach; the amalgamation or extension of existing technologies into hybrid forms. An example of this is Ricardo’s (1990) five architectures proposal for a "Knowledge Based Management system":

  1. Adding some database features to an expert system
  2. Adding some intelligence to a database management system
  3. Incorporating a database management system within an expert system
  4. Loose coupling of an expert system and a database management system
  5. Tight coupling of an expert system and an external database management system

Several authors, however, have also pointed out a fundamental incompatibility between expert systems and database systems when used as subsystems of the same system i.e. the relational data model in traditional systems and the frames and semantic network knowledge representation structures used in KBS.

Expert systems themselves are considered by some to have characteristics that qualify them as a form of "intelligent" database. "There is an obvious connection between expert systems and databases, because the knowledge base is simply a type of database. It differs from most databases in that it contains rules about facts rather than just facts" Fabbri and Schwab (1992).

Also known as Knowledge Based Systems, Expert Systems are designed to emulate an expert in a specialised knowledge domain such as medicine for instance, or any other area of knowledge where there is a shortage of expert knowledge or a "knowledge bottleneck". Commonly an expert system would consist of two components: a database of facts and rules, known as a knowledge base, and an inference engine, a program that can apply those rules and facts and come up with an "expert" solution to the question of a novice. The knowledge base (rules and facts) elicited from the expert by a trained "knowledge engineer" using various methods that can include methodical interviews and the repertory grid technique. Often the expert knowledge area is "fuzzy" in nature and contains a great deal of procedural knowledge (knowledge of how to do things as opposed to declarative, or fact based knowledge) so the knowledge engineer must be an expert in Knowledge elicitation themselves. Hart (1981) provides a definition of expert systems in this context, "A KBS is a computer system which is designed to help people w1th tasks involving uncertainty and imprecision, and which require judgment and knowledge. "

Ricardo (1999) gives us an example of how an expert system could be shown to be more useful than a traditional database, "while a database might be able to tell us whether an inventory item has reached its~ reorder point, an expert system could help us decide what the reorder point should be. Similarly, a database may tell us whether a student has a in each course this semester, but an expert system could suggest alternative course of action to be taken if the student's grade in a major course or grade-point average is below a C"

The classic expert system is MYCIN. Developed at Stanford University by Buchanan and Shortcliffe in 1974 as a research project, MYCIN is an attempt to systemise medical decision making for practical day-to-day use. Its main function is to help doctors, medical students and paramedics to select antimicrobial therapies for patients with serious infections such as meningitis and bacteraemia by identifying the responsible organisms. To this end, MYCIN instigates a dialogue with a user by asking a series of questions that can be answered with simple single word answers, circumventing the need for a complex language recognition subsystem. Questions and responses deal with results of body fluid analyses, symptoms displayed, and general patient characteristics such as age and gender and lead in the first instance to a diagnosis, and then to prescribe a course of drugs that should in theory control the infection and work favourably together. Provision for user uncertainty on test results is built into the system, as MYCIN will accept "unknown" as an input alongside a numerical degree of confidence factor within the range -1 to +1. In this way the system can reason with incomplete data. By inputting "Why" a user can also request MYCIN to explain which hypotheses it is currently testing and how the user response to the current question will support or rule out that line of enquiry. It was noted that in a series of double blind tests that MYCIN performed on a par with human medical experts.

MYCIN's underlying data or knowledge base is very detailed, being as comprehensive as most human medical specialists within the domain. Containing as it does the distilled knowledge of a wide range of these specialists MYCIN considers every disease that it holds data on where human neglect may cause an incorrect diagnosis and treatment. Data and rule maintenance is performed at a major medical centre covering developments in the field from journals that only specialists would normally look at, therefore promoting complete currency.

The inference engine that deals with reasoning is a completely separate module from the knowledge base and allows the easy modification and addition of rules that are immediately implemented through an English like language. This is part of a production system called EMYCIN that represents a collection of bilateral rules consisting of a pattern part and an action part. On rule activation, pattern parts are tested for matches in the knowledge base and when found the action part of the rule is triggered using variable values determined by the pattern. This production system operates on a backward chaining control strategy.

Although being the first to "address the problems of reasoning with uncertain or incomplete information" and provide "a clear and logical explanation of the reasoning" (Luger and Stubblefield 1998) it must be remembered that MYCIN was purely a research project and was never tested or used in a real world environment. MYCIN did however set standards for future developments in expert systems and is still studied as archetypal for the genre.

Some of the disadvantages of expert systems are detailed by Hart (1989). She mentions that some problems are too complex, for instance where experts disagree in a given domain, for an expert system to adequately handle. Some expert knowledge may suffer from too much uncertainty and there may be knowledge missing or the knowledge may be updating too quickly to capture it for machine representation. Other commentators have pointed out that expert systems are expensive to develop, maintain and run in terms of both computer power (not such a big problem with recent advances in PCs) and human resource time (expert time being at a premium in the first place). Sleight (1993) also points out deficiencies in computer experts; "Computers have problems with ambiguous words, and words that have multiple meanings. Currently it is impossible to build into a knowledge base a broad sense of the world, so context is a problem".

Abel, Castilho and, Campbell (1998) describe an intelligent database system that represents a "tight-coupled hybrid" similar in architecture to item 5 of Ricardo's proposals. PetroGrapher provides for petrographic analysis in the field of sedimentary geology, a domain encompassing large amounts of diverse data and complex knowledge. It aims to be a system that integrates the symbolic knowledge representation and inferencing strengths of an expert system and the large scale data handling abilities of a relational database i.e. storage, management and consultation. PetroGrapher takes as input a user generated case that describes a sedimentary geological feature through a sophisticated visual interface that that supports the user driven process of description by suggesting data input structure and making sedimentary petrography terminology available. The forward-chaining inferencing sub-system then attempts to find a best match between the parts of that case and "knowledge graph" nodes in the relational database subsystem that have been elicited from geology experts. The system eventually provides the user with a geologic interpretation of the described features according to knowledge extracted from an expert.

This is achieved by three tiered approach to knowledge modelling that involves relational and object oriented methods, AI frame techniques and inferential components termed "Knowledge graphs" which capture the associations between petrographic features and geological interpretations and represent the influence each feature has on interpretation. The graphs are mapped and stored alongside other data in the relational database as a special form of table used for consultation.

The main disadvantages of PetroGrapher are large overheads in terms of processing speed in periods of high interaction between the various sub-systems, most noticeable during the interpretation process where inferencing is performed. The developers comment that this is momentary and justifiable as the system architecture allows data to remain independent, as they desire. Multi-user access is also an area where performance is less than adequate and it is speculated that future research would be justified.

An alternative perspective to the evolutionary approach comes in the form of the revolutionary. Revolutionary systems propose new architectures integrating the rules for integrity constraints into the actual database leaving associated applications free to concentrate resources on their primary tasks. Ralston and Reilly (1993) give an example of integrity constraint as "one that specifies that oil suppliers have only one store from which they sell their products. A rule might specify that if a supplier supplies screwdrivers, they must also supply screws. This rule obviates the need to list screws explicitly in the database." When application programs handle integrity constraints there can often be inconsistency. For instance, one application may validate data where another may not. For the sake of consistency it would make the best sense that all validation and integrity functions are done in the same place i.e. in the database itself.

A revolutionary approach is detailed by Bertino, Catania and Zarri (2001) where they describe an intelligent database as a system characterised by the existence of a large database of several million persistent facts coupled with an extensive rule base that stores rules encoding "intensional domain knowledge". Where earlier attempts at expert systems produced applications that worked off very small fact bases often loaded into volatile memory at execution time, they describe contemporary systems as separate phenomena in that the facts remain persistent. By persistent they mean that the fact base must be maintained permanently even when the intensional rule base is not making inferences about those facts, and adding to the fact base with those inferences.

Examples of these types of databases can be found in deductive databases, sometimes referred to as Logic Databases due to their reliance on mathematical logic to define: rules that are expressed declaratively. Deductive rules are most useful when in possession of all the required information needed in order to reach a valid conclusion or decision. Rules in deductive databases can be used to infer new information about facts at ready stored i.e. they permit data to be defined by other data. They can also provide query functions that can operate on facts and rules as well as testing for consistency in the fact base. One of the disadvantages of deductive databases is that they can perform slowly when faced with recursive rule firing, that is, when a rule references itself and does so repeatedly until an escape condition is met. This problem is commented upon in the example of TEMIBASE, given below.

Eaglestone and Ridley (1998) conclude that despite little market impact there is great potential for deductive reasoning in databases and that "Few applications have been identified which can fully exploit the expressive power of deductive database technology." Bertino, Catania and Zarri (2001) confirm this, but also point to some successes in deductive database technology in scientific and medical information systems.

Deductive principles are used in what are termed temporal databases. These are database systems that deal not only with current data forms, as do conventional databases, but also maintain past, present and future data and support storage and query of information that may vary in time. Deficiencies in conventional databases for storing and manipulating temporal data are succinctly expressed by Kannan and Geetha (1996) : "Conventional databases capture only a snapshot of reality". They remark on the need for systems in areas such as forest information, weather monitoring and population statistics that can perform tasks using temporal reasoning in order to provide maximum utility where prediction, planning, explanation and learning from often incomplete, large scale time based data is crucial. To this end they have developed a system called TEMIBASE that "maintains past, present and future data effectively and allows querying on them with provisions for deductive inferencing to perform non-monotonic temporal reasoning on incomplete temporal data".

TEMIBASE provides a database query language that extends the commercially available Structured Query Language (SQL) to be able to process queries that deal with the change in data over time coupled with a separate rule manager and inference engine to perform the temporal reasoning. Using these tools conclusions can be drawn from incomplete data in a way that would be impossible for a conventional database system. This is also achieved by TEMIBASE's use of active rules alongside deductive rules. In TEMIBASE, passive deductive rules are used to handle recursion on data in historical systems, population statistics for example, and active rules are used to continuously monitor and respond to events in real time systems such as these used in banking.

TEMIBASE has been tested with data taken from the Tamil Nadu Forest Department and observations on system performance compared favourably with existing commercial database systems when not using the temporal reasoning subsystem or using only a few rules. It was shown however, that system performance suffered a speed deficit when many rules were used in reasoning. This performance lag argue Kannan and Geetha, is counterbalanced by the reduction in application development time offered by TEMIBASE and its extended functionality in terms of its temporal reasoning capabilities.

A different form of logic is implemented in a system developed by the Smart Medical Database (Xoetronics 2002). Xoetronics claim that their Smart Medical Database uses inductive rather that deductive logic to produce a database that is dynamic and capable of interaction in real time and that this jump in logic approaches is only really possible until recently due to advances in hardware and software technology. In contrast to the use of deductive rules, inductive rules are used when it is impractical or impossible to gather all of the information that would be required for deductive reasoning.

The Smart Medical Database (SMDB) addresses a need for an immediate, practical way for a doctor to compare treatment choices and relative outcomes for large numbers (1000 or so) of patients to support a well founded diagnosis from a readily accessible compilation of collective, but incomplete (hence the use of induction), medical wisdom. To this end, the SMDB is an integrated software tool based on artificial intelligence and Bayesian probability statistics seeking to enable the doctor with statistical information from a wide range of statistically significant relevant patient cases. The patient data is integrated with hospital management data such as cost, payment data and strategic planning which enables one outcome of using SMDB to be the lowering of health care expenses by up to 25%. This is made possible by the SMDB as it gives doctors access to the relevant information that can help to avoid making costly mistakes such as ineffective courses of treatment. SMDB is different to previously developed databases or expert systems, such as MYCIN in that it relies on a range of heterogeneous actual clinical data rather than static lists of weighted symptoms. Existing systems based on static list processing would produce differing results influenced by minute changes to single data field values whereas SMDB makes diagnoses based on probability of correctness functions based on a large collection of accumulated clinical data. For a particular patient subsequent medical data is used to refine the diagnosis and that updated data is itself added .to the database. SMDB does not rely on a particular diagnosis model as previous systems did but instead dynamically gives the doctor a set of probabilities and outcomes adapting in real time to help each individual patient case. In this capacity SMDB acts not as an imitation expert as discussed previously with reference to expert systems but as "a very sophisticated portal that allows access to data". It does not recommend decision actions but functions support tool for doctors to arrive at their own decisions based on clinical statistics.

Whether or not we can describe the database systems discussed in this paper as truly "intelligent" or not depends on how "intelligence" is actually defined. What we can say is that by incorporating various techniques of rule implementation from the domain of Artificial Intelligence it is possible to create advanced database systems that are capable of performing tasks that it would be impossible for a human to achieve unaided. Hardware advances in disk storage and memory alongside new developments in rule algorithms, such as those in the field of inductive reasoning, will bring new opportunities for progression in the area of databases, upgrading utility and providing us with increasingly advanced toolkits for information and knowledge management.

Lee Jorgensen Jan 2003

References

  1. Chambers Dictionary, by Chambers Harrap Publishers, Ltd. 1994
  2. Collins Dictionary, Harper Collins, 1994
  3. Minsky, M.L., ed., "Semantic information processing". Cambridge, MA: MIT Press 1968
  4. Eysenck, Michael W., "Artificial Intelligence," in M. W. Eysenck (ed.), The Blackwell Dictionary of Cognitive Psychology, Oxford, Basil Blackwell, 1990
  5. Bertino, B. Catania, G.P. Zarri, "Intelligent Database Systems", Addison Wesley, 2001
  6. TechEncyclopedia 2002
  7. Ralston and E.D. Reilly, "Encyclopaedia of Computer Science", Chapman and Hall, 1993
  8. Kannan and Geetha : "Temporal Reasoning with Intelligent Databases": 1996
  9. Parsaye et al: "Intelligent Databases: Object Oriented, Deductive Hypermedia Technologies" : John Wiley and Sons: 1989
  10. Ricardo, C. M: "Database Systems: Principles Design and Implementation" : Prentice Hall (Sd) (Mar 1, 1990)
  11. Fabbri and Schwab: "Practical Database Management" :PWS-KENT: 1992
  12. A. Hart: "Knowledge Acquisition for Expert Systems", Kogan Page, 1989
  13. Luger G.F., and Stubblefield, W.A.: "Artificial Intelligence: Structures and strategies for Complex Problem Solving" : London: Addison-Wesley: 1998
  14. D. Sleight : Intelligent Databases: Easing Access~ to Information": Michigan State University : 1993
  15. Abel, J M V Castilho and J A Campbell: "PetroGrapher: a solution in intelligent database system for petrographic analysis" : 1998
  16. Eaglestone and Ridley: "Object Databases: An introduction": McGraw Hill: 1998
  17. Xoetronics : "Smart Medical Database": 2002

Artificial Intelligence