Robert B. Doorenbos

Thesis Title: Production Matching for Large Learning Systems
Degree Type: Ph.D. in Computer Science
Advisor(s): Jill Fain Lehman
Graduated: May 1995

Abstract:

One of the central results of AI research in the 1970's was that to achieve good performance, AI systems must have large amounts of knowledge. Machine learning techniques have been developed to automatically acquire knowledge, often in the form of if-then rules (productions). Unfortunately, this often leads to a utility problem | the \learning" ends up causing an overall slowdown in the system. This is because the more rules a system has, the longer it takes to match them against the current situation in order to determine which ones are applicable.

To address this problem, this thesis is aimed at enabling the scaling up of the number of rules in production systems. We examine a diverse set of testbed systems, each of which learns at least 100,000 rules. We show that with the best existing match algorithms, the match cost increases linearly in the number of rules in these systems. This is inadequate for large learning systems, because it leads to a utility problem. We then examine the causes of this linear increase, and develop techniques which eliminate the major causes. The end result is an improved match algorithm, Rete/UL, which is a general extension of the existing state-of-the-art Rete match algorithm. Rete/UL's performance scales well on a significantly broader class of systems than existing match algorithms. The use of Rete/UL rather than Rete significantly reduces or eliminates the utility problem in all the testbed systems.
 

Thesis Committee:
Jill Fain Lehman (Chair)
Tom Mitchell
Manuela Veloso
Paul Rosenbloom (University of Southern California/Information Sciences Institute)

James Morris, Head, Computer Science Department
Raj Reddy Dean, School of Computer Science

Keywords:
artificial intelligence, large-scale production systems, machine learning, production match, Soar

CMU-CS-95-113.pdf (1.13 MB)
Copyright Notice