John Ockerbloom

Thesis Title: Mediating Among Diverse Data Formats
Degree Type: Ph.D. in Computer Science
Advisor(s): David Garlan
Graduated: May 1998

Abstract:

The growth of the Internet and other global networks has made large quantities of data available in a wide variety of formats. Unfortunately, most programs are only able to interpret a small number of formats, and cannot take advantage of data in unfamiliar formats. As the Internet grows, new applications arise, and legacy data persists, the diversity of formats will continue to increase, worsening the problem. Current approaches to data diversity fail to scale up gracefully, or fail to handle the full heterogeneity of data and data sources founds on the Internet.

I have developed a data model and a system of mediator agents that support the widespread use of diverse data formats much more effectively than current approaches do. In this thesis, I describe and evaluate the design and implementation of this data model, known as the Typed Object Model (or TOM), and the system of mediators that supports it. TOM is a read-only object-oriented data model that describes the abstract structure of data formats, their concrete representations, and relations between formats. TOM is supported by a distributed network of mediator agents (known as type brokers) that maintain information about data formats, and provide uniform access to conversions and other operations on those formats. Type brokers plan complex conversion strategies that can involve multiple servers, and ensure that conversions preserve information encoded by clients. Data providers can also register new formats, operations, and conversions with type brokers in a manner, and make them usable anywhere on the Internet. TOM type brokers now work with hundreds of data formats, often through integration of off-the-shelf programs. TOM also supports a wide variety of applications and interfaces, such as the Web-based TOM Conversion Service, that have users worldwide.

Thesis Committee:
David Garlan (Chair)
William L. Scherlis
Jeannette Wing
Peter Schwarz (IBM)

James Morris, Head, Computer Science Department
Raj Reddy Dean, School of Computer Science

Keywords:
Information system applications, mediators, distributed systems, abstract data types, data formats, conversion, object-oriented methodology

CMU-CS-98-102.pdf (843.1 KB) ( 166 pages)
Copyright Notice