Schema Evolution for Object-based Accounting Systems

Norman Chen

Dennis McLeod

Daniel E. O'Leary

Do Not Quote Without the Author's Permission

1. Introduction

1.1 Motivation

The static meta-data view of database management is that the schema of a database is designed before the database is populated and remains relatively fixed over the life cycle of the system. However, the need to support database evolution is clear: a static meta-data view of a database cannot support either next generation dynamic database applications such as interactive multi-media information systems (Christodoulakis, Vanderbroek, Li, Wan, Wang, Papa, & Bertino, 1984). In addition, database evolution is necessary to accommodate an environment where system migration, organizational reengineering and hetrogeneous operations occur.

There are at least three reasons that a database schema would need to change. First, the current schema may not meet the original requirements. Such a schema may be called a "premature schema," resulting from incomplete requirement analysis, or even erroneous schema design. Second, the current schema may not meet new requirements. This type of database schema may be termed an "obsolete schema"; such obsolete schema may be caused by new requirements for structural changes, e.g. company reorganization, and/or process changes, e.g. business reengineering. Third, there are likely to be implementation compromises, made for cost-benefit reasons. This type of database schema will be referred to as a "compromised schema."

1.2 Research Approach

When an accounting database schema does not meet the requirements of a firm, the schema must be changed. One important issue is how the data can be adapted to a new schema. The classical way to deal with this is to write a conversion program to manipulate directly the specific data to fit the new schema. This can be time consuming, nontransferable (e.g., database specific) and costly. An alternative approach is to develop a set of general evolution operators to handle the data adaptation. These schema evolution operators would be used to manipulate the original schema into a new schema, and the populated database would be modified accordingly. Although a set of schema evolution operators is very powerful in conducting an evolution task, how to use these operators to reach a desirable target schema remains as an important research issue.

The expert system approach to guide schema design has been proved very promising, for example, in the case of view creation systems (Storey & Goldstein, 1990), and view integration systems (McCarthy & Rockwell, 1989). This paper discusses a knowledge-based approach to guide schema evolution. To evolve a database schema by using evolution operators requires two types of knowledge: 1) a domain model that can suggest a potential target schema; and 2) domain-specific heuristics that can guide a user in the choice of a sequence of operators to evolve the current schema to the potential target schema. By using these two types of knowledge, we can build a knowledge-based system to guide schema evolution processes. This allows us to address our research question:

What heuristics and domain knowledge are necessary to guide a user in the choice of a sequence of operators to complete a given evolution task for accounting database systems?

This question can be addressed in the following three steps. First, we define a set of basic schema evolution operators and generic evolution principles. Second, we explore what kind of domain-specific knowledge and heuristics can be used to guide the evolution operators in completing evolution tasks for accounting database systems. The REA accounting data model (McCarthy, 1982) is used here to guide the schema evolution process of an accounting database system. Third, we discuss a tool built to choose and apply the schema evolution operators, using the evolution heuristics and domain-specific knowledge. The tool provides a user-friendly interface to guide a non-expert user to complete evolution tasks.

This paper is organized as follows. Section 2 provides the background for this research, including an Object-Based Data Model, where schema evolution is based on, and related research of schema evolution. Section 3 describes a set of schema evolution operators. Section 4 introduces an object-based REA accounting model and discusses how the REA model and REA-based heuristics is used to guide schema evolution. Section 5 describes the architecture and implementation of a schema evolution administration tool, REAtool, and shows the look and feel of its prototype. The last section, Section 6, summaries the research results and discusses some future research directions.

2. Background

2.1. Object-Based Data Model

The Object-Based Data Model (OBDM) is used as a framework to discuss schema evolution. OBDM adopts the most basic and common modeling constructs from such object-based data models as Entity-Relationship Model (Chen, 1976), Semantic Data Model (Hammer & McLeod, 1981), and Object-Oriented Data Model (e.g. Banerjee, Chou, Garza, Kim, Woelk, Ballou, & Kim, 1987). The modeling constructs of OBDM includes class, attribute, class hierarchy, inheritance, and their associated constraints.

A class is a set of objects of the same type. An instance is an object belonging to a class. There are two kinds of classes: abstract classes and value classes. Abstract classes are further classified as entity classes and relationship classes. In OBDM, these two types of abstract classes have no significant difference. A value class is a basic type of data such as Booleans, integers, reals, and strings.

An attribute of a class is a function mapping the class to another class, the domain class of an attribute. A key of a class is an attribute or a set of attributes that can be used to identify an instance of a class. A data-valued attribute has the value class as its domain. An abstract-valued attribute has an abstract class as its domain. An attribute can have a one-to-one mapping or a one-to-many mapping. An abstract class connects with other classes through its attributes. A value class has no attributes. The connections of classes of a database through attributes consist of a semantic network of a database schema.

A class has one superclass except a root class. A root has no superclass. Each class can have several subclasses. A subclass is connected to its superclass by an IS-A link. An IS-A link maintains two inheritance relationships between a subclass and its superclass: 1) A subclass inherits attributes of its superclass. Therefore, a set of attributes of a superclass is included to the set of attributes of its subclass. 2) An instance of a subclass is an instance of its superclass. Therefore, a set of instances of a subclass is included to a set of instances of its superclass. All classes connected through IS-A links consist of a class hierarchical tree. Each hierarchy of classes is a category. A database schema consists of a set of categories.

2.2 Related Research

ORION (Banerjee, Kim, Kim, & Korth, 1987), ENCORE (Skarra & Zdonik, 1986), and GemStone (Penny & Stein, 1987) use object-oriented data models and support evolution mechanisms; ORION and ENCORE employ a screening approach and Gemstone uses a conversion approach. They define modeling invariants and rules as schema evolution constraints. The semantics of schema evolution operators is used to maintain the evolution constraints. PKM (Li & MeLeod, 1989) identifies a rich set of evolution patterns that can be used in a conceptual evolution process. OSAM* Schema Tailoring Tool (Navathe, Geum, Desci & Lam, 1990) is based on OSAM* data model (Su, Krishnamurthy & Lam, 1989) and allows a non-expert user to redesign an OSAM* schema by tailoring its old schema. The tailoring process is accomplished through evolution operations. These operations maintain the schema constraints. If the modeling constraints are violated, the operations will be aborted.

Thus, several researchers have formulated schema evolution constraints as invariants and rules of object-oriented data models. The semantics of a set of evolution operators are then defined, based on the evolution constraints they proposed. However, their research did not employ heuristics or domain knowledge to guide a user in completing evolution tasks. In the research described in this paper, we indeed employ heuristics and domain knowledge to structure the evolution process.

3. Conceptual Schema Evolution

3.1 Schema Evolution Operators

In order to accomplish evolution tasks, a set of conceptual evolution operators are used. Those operators are relatively generic and generally domain independent. Specifically, we define four groups of schema evolution operators based on OBDM: (See Figure 1)

1) Schema Enhancement Operators create and sprout. The operator create creates a new class or attribute. The new class can be a generalized class of several classes or a specialization of some class. The operator sprout generates a new class with its instances having a one-to-one mapping to its source class.

2) Schema Reduction Operators merge and delete. The operator merge merges a class into another class or an attribute into another attribute. The operator merge deletes the meta-data such as classes or attributes, but keeps the data unchanged. The operator delete will delete data as well as meta-data.

3) Schema Restructure Operator move. The operator move changes the structure of class hierarchy by moving classes or by moving attributes.

4) Schema Conversion Operator convert. The operator convert converts a modeling construct among a value class, an entity class, and an attribute.

Insert Figure 1 about here

3.2 Generic Evolution Guiding Principles

The evolution operators are guided by three primary principles: consistency principle, propagation principle, and preservation principle. Schema evolution operators must keep a schema consistent after they have been applied. This is the consistency principle of schema evolution. For example, the instances of a subclass should be the instances of its superclass. To keep database consistent in schema evolution, evolution operators will be ruled by modeling constraints to propagate the changes to related parts of a schema. This is called a propagation principle of schema evolution. Since a propagation effect could change the data and meta-data of a schema that a user does not intend to change, a propagation effect must be controlled to maintain the losslessness of the data and meta-data. This is called the preservation principle of schema evolution. For example, when a subclass is merged into its superclass, its attributes and the values defined by these attributes could be lost. One way to avoid this kind of loss of meta-data and data is to move these attributes of the subclass to a superclass. These three principles are realized by using a set of generic evolution heuristics to guide an evolution process. The details of generic heuristics are beyond the scope of this paper. The following section will focus on how domain knowledge and domain-specific heuristics can be used to guide an evolution process.

4. Schema Evolution Guided by Domain Knowledge

4.1 Object-Based REA Accounting Model

Domain-specific knowledge can be used to guide a user to conduct schema evolution tasks. In particular, this research explores a well-known accounting database model to guide the schema evolution in the context of accounting information systems.

The REA accounting model is a generalized accounting framework used to capture the interaction of economic resources, economic events and economic agents for accounting systems (McCarthy, 1982). Economic resources are scarce assets such as inventory or cash under the control of an enterprise. Economic events are phenomena that reflect changes in economic resources resulting from production, exchange, consumption, and distribution. Purchase and cash disbursement are the examples of economic events. Economic agents are persons and parties who participate in the economic events, e.g. vendor. Economic units are a subset of economic agents and are inside participants, e.g. cashier and buyer.

Their are four types of relationships between these REA entities:

1) Stock-flow relationship: This relationship is used to connect an economic resource and an economic event. The stock part of the relationship is an economic resource; and the flow part of the relationship is an economic event. For example, the stock-flow relationship between Inventory and Purchase has Inventory as its stock part and Purchase as its flow part.

2) Duality relationship: A duality relationship links two events. One event is an increment part of the relationship and the other corresponding event would be a decrement part of the relationship. For example, Purchase Payment is a duality that links the event, Purchase, as its increment part and the event, Cash Disbursement, as its decrement part.

3) Control relationship: A control relationship is a three-way association among an economic event (as exchange transaction part), an economic agent (as outside party), and an economic unit (as inside party). For example, Purchase Supply is a control relationship that associates Purchase (an event with a role as its exchange transaction part), Vendor (an agent with a role as its outside party), and Buyer (a unit with a role as its inside party).

4) Responsibility relationship: This relationship indicates one economic unit as its superior part and the other economic unit as its subordinate part. For example, the relationship "works for" is a responsibility that has Cashier as its subordinate part and Treasurer Department as its superior part.

The REA model was originally described in an entity-relationship (ER) representation. Since this paper employs an object-based approach, the REA entities are modeled as entity classes and their relationships are modeled as relationship classes in OBDM. There are no significant difference between entity classes and relationship classes in an OBDM. Furthermore, the role of an entity, which is pertinent to a relationship in an ER representation, is modeled as an ordinary attribute of the relationship class in OBDM. This transformation of REA model eases an evolution process by simplifying modeling constructs. Since the evolution operators discussed here are based on the object-based data model, the object-based REA model will be used directly to guide the use of these evolution operators. An object-based representation of the REA model is summarized in Figure 2.

Insert Figure 2 about here

4.2 REA Guidance for Schema Evolution

The REA accounting model is used to guide schema evolution of an object-based accounting database. From the viewpoint of an accounting database schema, REA classes are meta-classes. A class of an accounting database are an instance of one of classes of the REA model. For example, the class Purchase Payment is an instance of the meta-class Duality. A family of classes in an accounting database will be said to be REA-compliant if and only if:

The class is an instance of one of REA meta-classes;

It inherits all the attributes from this REA meta-class; and

The values of these attributes of the class are defined.

For example, Class Purchase Payment is defined as an instance of REA meta-class Duality. The class inherits the attributes, increment and decrement, from the meta-class Duality. Furthermore, its increment part is defined as Purchase and its decrement part is defined as Cash Disbursement. Both of them are events. Hence, Class Purchase Payment is REA-compliant. If all classes of an accounting database are REA-compliant, then the schema of this accounting database is REA-compliant.

The concept of REA-compliance provides an instrument to verify if a current schema is REA-compliant. Moreover, it also suggests a potential target schema for a current schema if the current schema is not REA-compliant. There are two contexts where the REA model is used in schema evolution of an accounting database: 1) A current schema is already REA-compliant. The schema must be maintained to be REA-compliant while schema evolution is required. This case is called REA-to-REA evolution. In this case, the REA model is used to verify if an evolution operation keeps the current schema REA-compliant. 2) A current schema is not REA-compliant. The schema is required to evolve to be REA-compliant. This case is called non-REA-to-REA evolution. In this case, the schema description and the evolution heuristics based on the REA model are used to guide a system to select a sequence of evolution operations so that the target schema will be REA-compliant after this sequence of evolution operations. The guidance for non-REA-to-REA evolution is the focus of this paper.

4.2.1 REA Schema Description

To guide schema evolution from the current schema to a potential REA-compliant target schema, we need to build a connection between the current schema and its potential target schema. REA schema description provides this connection by describing a class of the current accounting schema as an instance of an REA meta-class. For example, class Purchase is described as an economic event and class Purchase Payment as a duality relationship. This description provides the necessary information to check the difference between the current schema and the potential target schema, and allows the system to trigger evolution heuristics to guide the schema evolution.

Since the current schema and the potential target schema could be complicated, the schema description is divided into four types of tasks. Each of them is centered around an REA relationship and its associated entity classes. These four tasks are: 1) Stock-Flow Description Task, including description for a stock-flow relationship, an economic resource, and an economic event; 2) Duality Description Task, including description for a duality relationship and two economic events; 3) Control Description Task, including description for a control relationship, an economic event, an economic agent, and an economic unit; and 4) Responsibility Description Task, including description for a responsibility relationship and two economic units. In this research, task selection and REA description are dependent on users' judgment.

A complete schema description of a given task could be redundant. Some sort of partial description is enough provide the necessary information for a system to accomplish an REA schema description. There are two methods for a user to specify a partial description:

1) REA Relationship-Driven Schema Description Method. This description method starts with a description of an REA relationship. While an REA relationship is specified for accounting database classes, the system will add REA entity descriptions accordingly. The REA Schema Description part of Session I in Section 4.3 provides an example.

2) REA Entity-Driven Schema Description Method. This description method starts with an REA entity description. While REA entities of a given task is described for accounting database classes, the system will add REA relationship descriptions accordingly. The REA Schema Description part of Session II in Section 4.3 provides an example.

4.2.2 REA Evolution Heuristics

Evolution will be accomplished using a variety of heuristic rules. A heuristic rule for guiding the evolution operation has three parts: 1) Trigger is the difference between the current schema and the target schema suggested by the REA model. 2) Condition is the relevant characteristics of the current schema required for an evolution operation. 3) Action is the selected evolution operation. Based on a selected evolution task and REA description supplied by a user, the system checks the difference between the current schema and the target schema. The difference is used to trigger evolution heuristic rules. If the condition required by a heuristic rule meets the current schema, a schema evolution operator will be activated. The selected schema evolution operation will further change the current schema. The change of the current schema will change the difference between the current schema and the target schema. This updated difference will further trigger evolution heuristic rules until there is no difference between the current schema and the target schema. The following is a set of REA-based heuristics :

A1: Value Conversion Heuristic

[Trigger]    A class X is missing and it is an REA entity class.       

[Condition]  There exists a value attribute whose domain Y is          
             equivalent to the missing class X.                        

[Action]     Apply the operator Value-to-Entity Convert to convert     
             the value class Y into an entity class.                   

Example:     See Evolution Operation Step 1 and Step 2 of Session II   
             in Section 4.3.                                           


A2: Attribute Conversion Heuristic

[Trigger]    A class X is missing and it is one of these classes:      
             stock-flow, duality, or responsibility.                   

[Condition]  There exists an attribute h who connects two REA entity   
             classes Y and Z.                                          

[Action]     Apply the operator Attribute-to-Entity Convert to         
             convert the attribute h into an entity class.             

Example:     See Evolution Operation Step 3 of Session I in Section    
             4.3.                                                      


A3: Entity Sprout Heuristic

[Trigger]    A class X is missing and it is class control.             

[Condition]  There exists an attribute h of a class Y that is one of   
             the classes Event, Agent, or Unit; and the attribute h    
             has class Z as its domain, where Z is one of the classes  
             Event, Agent, or Unit, but not Y.                         

[Action]     Apply the operator Class Sprout to generate the class     
             control from the class Y.                                 

Example:     See Evolution Operation Step 3 of Session II in Section   
             4.3.                                                      


A4: Attribute Move Heuristic

[Trigger]    An attribute h of the class control is missing. The       
             domain of the attribute h is one of the classes Event,    
             Agent, or Unit.                                           

[Condition]  There exists an attribute i of the class X, which is the  
             domain of an attribute of the control.  The domain of     
             the attribute i is the domain of the attribute h.         

[Action]     Apply the operator Attribute Move to move the attribute   
             i form the class X to class control. The attribute i      
             will become the attribute h.                              

Example:     See Evolution Operation Step 4 of Session II in Section   
             4.3.                                                      


A5: Value Link Heuristic

[Trigger]    A class X is missing and it is one of these classes:      
             stock-flow, duality, or responsibility.                   

[Condition]  There exists no attribute between class Y and class Z,    
             where either Y is a Resource and Z is an Event if X is a  
             stock-flow, or both Y and Z are Events if X is a          
             duality, or both Y and Z are Units if X is a              
             responsibility. And an non-key attribute of class Y has   
             a value domain W that is equivalent to the domain of the  
             key attribute of the class Z.                             

[Action]     Apply the operator Value-to-Entity Convert to convert     
             the value class W to an entity class X.                   

Example:     See Evolution Operation Step 1 of Session I in Section    
             4.3.                                                      


A6: Class Merge Heuristic

[Trigger]    The class X and class Y are redundant.                    

[Condition]  The class X and class Y have the same key attribute.      

[Action]     Apply the operator Class Merge to merge the classes X     
             and Y.                                                    

Example:     See Evolution Operation Step 2 of Session I in Section    
             4.3.                                                      


4.3 Schema Evolution Scenario

This section describes a schema evolution scenario and demonstrates how REA description and evolution heuristics can be used to guide non-REA-to-REA evolution. The example used here is a primitive accounting database for inventory purchases and it is not compliant to the REA model. Its schema is shown in Figure 3.2. In the example, it is assumed that due to growth of the company, this primitive schema is unable to support more sophisticated and complicated enterprise-wide accounting applications. Since a schema based on a REA model can provide a richer view in a larger organization context, the task here is to evolve this non-REA-compliant schema into an REA-compliant schema. The scenario of this non-REA-to-REA evolution contains several short interactive sessions where each session corresponds to a major schema evolution task.

Each session contains an REA schema description and severeal steps of schema evolution operations, which are guided by evolution heuristics. To illustrate this evolution process, we rename the classes of the starting schema to meet the class names of its target schema, i.e. "Purchase Record" becomes "Purchase" and "Payment Record" becomes "Cash Disbursement." (See Figure 3)

Insert Figure 3 about here

Session I. Evolution Session for Purchase Payment In this session, a schema evolution task for a duality relationship is selected.

REA Schema Description An REA Relationship-Driven Schema Description Method is used in this session. A user describes that there exists a duality relationship between class Purchase and class Cash Disbursement. According to this user's description, the system specifies class Purchase and class Cash Disbursement as events.

Schema Evolution Operation After REA schema dscription of the current schema is accomplished, the evolution heuristics will guide the evolution operators in the following three steps. Figure 4 shows the schema changes caused by schema evolution operation in each step of this session.

Step 1: Value-to-Enity Convert The duality task which includes Event Cash Disbursement and Event Purchase misses the relationship duality. This situation triggers the Value Link Heuristics (A5). The required condition of the heuristics meets the condition of the current schema, where event Cash Disbursement has an attribute invoice_no and this attribute happens to be the same as the key attribute of Event Purchase. The operation Value-to-Enity Convert is applied to the value class, which is the domain of the attribute invoice_no of event Cash Disbursement.

Step 2: Class Merge The domain of attribute invoice of event Cash Disbursement is class Invoice, which is redundant to event Purchase. The situation triggers Class Merge Heuristics (A6). The required condition of the heuristics meets the condition of the current schema, where event Purchase and event Invoice have the same key attribute invoice_no. the operation Class Merge is applied to class Invoice and merge it into event Purchase.

Step 3: Attribute-to-Entity Convert The duality task which includes event Cash Disbursement and event Purchase misses the relationship duality. This situation triggers the Attribute Conversion Heuristics (A2). The required condition of the heuristics meets the condition of the current schema, where event Cash Disbursement has an attribute invoice that connects to class Purchase. The operation Attribute-to-Entity Convert is applied to the attribute invoice and convert it a duality, named Purchase Payment.

Insert Figure 4 about here

Session II. Evolution Session for Purchase Supply In this session, a schema evolution task for a control relationshp is selected.

REA Schema Description An REA Entity-Driven Schema Description Method is used in this session. The user specifies Buyer as a unit and Vendor as an agent. Since the system already has classes Purchase and Cash Disbursement as events and REA control is a three-way relationship connecting an event, a unit and an agent, the system creates a control relationship. Since there are two events in the current schema, the user must decide which one participates in this control relationship and also name the control. Here, class Purchase is chosen by the user.

Schema Evolution Operation After REA schema description of the current schema is accomplished, the evolution heuristics will guide the evolution operators in the following five steps:

Step 1: Value-to-Entity Convert An entity class Buyer is missing. This situation triggers the Value Conversion Heuristics (A1). The required condition of the heuristics meets the condition of the current schema, where the domain of attribute buyer_name is equivalent to class Buyer. The operation Value-to-Entity Convert is applied to the value class, the domain of the attribute buyer_name. The newly converted entity class becomes the class Buyer.

Step 2: Value-to-Entity Convert An entity class Vendor is missing. This situation is the same as the situation of the last step and, therefore, it triggers the Value Conversion Heuristics (A1). The required condition of the heuristics meets the condition of the current schema, where the domain of attribute vendor_no is equivalent to class Vendor. The operation Value-to-Entity Convert is applied to the value class, the domain of the attribute vendor. The newly converted entity class becomes the class Vendor.

Step 3: Class Sprout In the task of Purchase Supply, class control is missing. This situation triggers the Entity Sprout Heuristics (AA3). The required condition of the heuristics meets the condition of the current schema, where event Purchase has an attribute buyer pointing to unit Buyer. The operation Class Sprout is applied to event Purchase and generate a control, Purchase Supply.

Step 4: Attribute Move The unit Buyer is not found as the domain of an attribute of the control Purchase Supply. This situation triggers the Attribute Move Heuristics (A4). The required condition of the heuristics meets the condition of the current schema, where the event Purchase has an attribute buyer and the event Purchase is a domain of the attribute exchange transaction of the control Purchase Supply. The operation Attribute Move is applied to the attribute buyer and move it to the control Purchase Supply.

Step 5: Attribute Move The unit Vendor is not found as the domain of an attribute of the control Purchase Supply. This situation triggers the Attribute Move Heuristics (A4). The required condition of the heuristics meets the condition of the current schema, where the event Purchase has an attribute vendor_no and the event Purchase is a domain of the attribute exchange transaction of the control Purchase Supply. The operation Attribute Move is applied to the attribute vendor_no and move it to the control Purchase Supply.

There are three more sessions. The task of Session III, Evolution Session for Payment Supply, is a task for control relationship. The REA schema description and the heuristic guidance of evolution operations of this task is similar to the task of Session II. The tasks of Session IV, Evolution Session for Inflow, and Session V, Evolution Session for Outflow, are tasks for stock-flow relationship. The REA schema description and the heuristic guidance of evolution operations of these tasks is similar to the task of Session I. The details of these three sessions are omitted. After five short sessions, this evolution process reaches its target schema, an REA-compliant schema. The target schema is shown in Figure 5.

Insert Figure 5 about here

5. REAtool: a Schema Evolution Guidance Tool

5.1 The Architecture of REAtool

A Schema Evolution Administration Tool, SEAtool for short, is an experimental prototype that implements the proposed schema evolution methodology. It also can employ domain knowledge, such as REA accounting model and REA evolution heuristics, to guide a user to complete evolution tasks. The version of SEAtool that uses the REA domain knowledge to guide schema evolution is called REAtool. REAtool is divided into such three layers as: SEAshell, SEAengine and SEAbase. Major modules of REAtool is shown in Figure 6.

SEAshell is a user interface layer of REAtool. It provides necessary context information and guides a user through a dialog to select an evolution task, which can be a direct evolution request or a REA schema description. It also gives feedback to allow a user to validate evolution operations. Section 5.2 will show further how SEAshell interacts with a user.

Insert Figure 6 about here

SEAengine is an intelligent layer of REAtool. It accepts direct evolution requests issued by a user from SEAshell and uses two kinds of generic knowledge to guide schema evolution:

Data Model Constraints: Object-Based Data Model provides the constraints for the usage of schema evolution operators.

Generic Evolution Principles: Schema evolution is guided by generic evolution principles to maintain schema consistency, handle evolution propagation, and minimize the loss of the data and meta-data.

SEAengine also accepts REA schema description and uses two additional kinds of domain knowledge to guide schema evolution:

Object-Based REA Model: The REA accounting model suggests a potential target schema to guide evolution process.

REA Evolution Heuristics: A set of REA heuristic rules is used to guide the evolution operators to reach a REA-compliant target schema.

SEAengine has three major modules: 1) Schema Comparison Analyzer, comparing the current schema and the potential target schema suggested by REA model; 2) Evolution Operation Planner, executing evolution operations with a copy of the current schema in Evolution Space; and 3) Evolution Heuristic Manager, using the heuristics to guide evolution operations. The basic information flow of SEAengin is shown in Figure 5. Each detail step is listed below:

(1) A copy of current database schema is sent to Evolution Operation Planner.

(2) Schema Comparison Analyzer accepts an REA schema description from a user through SEAshell.

(3) Schema Comparison Analyzer receives the current schema from Evolution Operation Planner.

(4) Schema Comparison Analyzer detects the difference between the current schema and the potential target schema. The schema difference is sent to Evolution Heuristic Manager to trigger the REA evolution heuristics.

(5) Evolution Heuristic Manager checks the state of the current schema in Evolution Operation Planner with the condition of a triggered rule.

(6) While the condition of a heuristic rule is satisfied, an evolution operation is issued and, thereafter, updates the current schema.

(7) The updated current schema and its REA description is fed back to Schema Comparison Analyzer and starts another cycle.

SEAbase layer defines a data manager to store the data and meta-data of a database. SEAbase is built on the top of Versant Object-Oriented DBMS and uses function calls provided by Versant libraries (Versant Object Technology, 1992). SEAshell and SEAengine are implemented in Objective-C and SEAbase is implemented in C++. SEAtool prototype is developed under the NeXTSTEP programming environment (NeXT Computer, Inc., 1990).

5.2 Task Guidance of REAtool

An evolution task is guided by the Task Guidance Panel (TGP), portion of SEAshell. A user interacts with TGP to submit a direct evolution request or a REA schema description. TGP has four components:

1) Schema Browser. There are three kinds of browsers that can be used. Network Browser and Hierarchical Browser show the semantic relationship and hierarchical structure of classes. REA Browser shows REA meta-classes and their instances.

2) Context Display illustrates graphic objects involved in an evolution process. Therefore, a user can get a comprehensive and coherent control of the evolution process.

3) Task Catalog. A set of direct evolution operation tasks and REA schema description tasks organized in a hierarchical menu to allow a user to choose. For example, a user can choose to do the task of duality description.

4) Working Space. Evolution tasks are guided and completed here. Working Space consists of a stack of Dialog Pages. A user is guided by the system to fill Dialog Pages step by step to complete an evolution task.

Insert Figure 7 about here

A snapshot of Task Guidance Panel is shown in Figure 7 to demonstrate the look and feel of REAtool. Assume a user has chosen the task Duality Description from Task Catalog. A Dialog Page for describing a duality is placed in Working Space. The user is first asked to supply the name of the duality. The user is also asked to specify the Increment and Decrement parts of this duality. After the user clicks the OK button to confirm the REA description, evolution operators will be evoked to complete the evolution task. After that, a newly created duality is shown in the REA Browser.

6. Concluding Remarks

6.1 Research Results

In this research, we have defined a set of basic schema evolution operators, and built a knowledge-based system to guide the usage of this set of schema evolution operators. The knowledge-based system employs such domain knowledge as a domain model and a set of domain-specific heuristics. The REA accounting model has been used on our research as an example of such domain knowledge. The experimental prototype, REAtool, demonstrates our results.

6.2 Future Research Direction

This research only uses the REA model as domain knowledge. While the REA accounting model has been successfully used to guide an evolution process, different types of firms may use different types of accounting databases. For example, the accounting database of a manufacturing-type firm may be quite different from that of a service-type firm (Grabski & Marsh, 1994). The REA model is a general accounting model, which does not capture some specific domain knowledge.

As a future research direction, we will explore more specific domain knowledge to guide schema evolution. We could employ knowledge from the set of three levels of generality for sources of accounting domain knowledge, including Principles Level Knowledge, Industry Level Knowledge, and Company Level Knowledge, to facilitate accounting database schema reconstruction (McCarthy and Rockwell, 1989, and Rockwell, 1992). To guide schema evolution, we provide an additional set of dimensions for classification that is based on the levels of granularity for utilization of knowledge:

Schema Case Knowledge: This type of knowledge includes abstract schema cases from different types of industry and concrete schema cases from various companies, that can be further classified by the type of its industry.

Relationship Template Knowledge: This includes knowledge of various of templates that are organized as REA relationships and whose participant entities are described by REA description.

Entity Index Knowledge: This knowledge includes a list of accounting term, and their synonyms, used to name classes and attributes of an accounting database.

These three levels of knowledge will enhance the current REAtool in at least three ways: 1) Schema Case Knowledge will assist in the choice of database schema by providing a schema "library" with additional knowledge, about specific and relevant schema, to guide evolution operations; 2) Relationship Template Knowledge will find related classes of each relationship in an evolution task and to describe these related classes in REA terminology; and 3) Entity Index Knowledge will resolve unknown classes and attributes of the current schema and to link them to Schema Case Knowledge and Relationship Template Knowledge. To acquire these three levels of knowledge, we will build an REA-based Metadata Analysis Package, called REA MAP, to analyze the different sources of accounting knowledge such as Encyclopedia of Accounting Systems (Pescow, 1976), collection of accounting database schema of various companies, and an accounting lexicon, etc. Figure 8 shows the architecture of REA knowledge base and the tool set of REAtool and REA MAP.

Insert Figure 8 about here

Our findings on a knowledge-based system to facilitate the evolution of databases also have implications for the evolution of knowledge bases themselves. For example, in some cases evolution of knowledge bases can be accomplished using a similar guiding framework, by which the current knowledge elements such as schema cases, relationship templates, and even heuristic rules, can be restructured to fit requirements of a "normalized" target system. However, that discussion is substantial and beyond the scope of this paper.

References

Banerjee, J., Chou, H-T., Garza, J., Kim, W., Woelk, D., Ballou, N., and  
Kim, H. (1987),  Data Model Issues for Object-Oriented Applications. ACM  
Transactions on Office Information Systems, 5(1),  3-26.                  

Banerjee, J., Kim, W., Kim, H., and Korth, H. (1987), Semantics and       
Implementation of Schema Evolution in Object-Oriented Databases.          
Proceedings of the ACM/SIGMOD Annual Conference on Management of Data,    
San Francisco, California,  311-322.                                      

Chen, P. P. (1976), The entity-relationship model: Toward a unified view  
of data. ACM Transactions on Database Systems, 1(1),  9-36.               

Christodoulakis, S., Vanderbroek, J., Li, J., Wan, S., Wang, Y., Papa,    
M., and Bertino, E. (1984), Development of a Multimedia Information       
System for an Office Environment. Proceedings of the 10th International   
Conference on Very Large Data Bases . VLDB Endowment, Saratoga,           
California,   261-271.                                                    

Grabski, S. and Marsh, R. (1994), Integrating Accounting and Advanced     
Manufacturing Information Systems: An ABC and REA-Based Approach. AIS     
Research Symposium, Phoenix, AZ.                                          

Hammer, M. and McLeod, D. (1981), Database Description with SDM: A        
Semantic Database Model. ACM Transactions on Database Systems, 6(3),      
351-387.                                                                  

Li, Q. and McLeod, D. (1989), Conceptual Database Evolution Through       
Learning. In Object-Oriented Databases and Applications, Gupta, R. and    
Horowitz, E. (eds.), Prentice-Hall, 62-74.                                

McCarthy, W. E. (1982) The REA Accounting Model: A Generalized Framework  
for Accounting Systems in a Shared Environment. Accounting Review,        
57(July), 554-578.                                                        

McCarthy, W. E., and Rockwell, S. R. (1989) The Integrated Use of         
First-order Theories, Reconstructive Expertise, and Implementation        
Heuristics in an Accounting Information System Design Tool. Proceedings   
of the 9th International Workshop on Expert Systems and their             
Applicaitons, Avignon, France, 537-548.                                   

NeXTSTEP, (1990), NeXTSTEP Version 3, NeXT Computer, Inc. (Note:          
NeXTSTEP is a registed trademark of NeXT Computer, Inc.)                  

Navathe, S. B., Geum, S., Desci, D. K., and Lam, H. (1990), Conceptual    
Design for Non-database Experts with an Interactive Schema Tailing Tool.  
Proceedings of the 9th International Conference on the                    
Entity-Relationship Approach, Lausanne, Switzerland, 25-42.               

Pescow, J. K. (Ed.), (1976), The Encyclopedia of Accounting Systems.      
Englewood Cliffs, NJ: Prentice-Hall.                                      

Penney, D. J. and Stein, J. (1987), Class Modification in the GemStone    
Object-Oriented DBMS. Proceedings of the Conference on Object-Oriented    
Programming Systems, Languages, and Applications,  111-117.               

Rockwell, S. R. (1992), The Conceptual Modeling and Automated Use of      
Reconstructive Accounting Domain Knowledge. Ph.D. Dissertation of         
Michigan State University.                                                

Skarra, A. H., and Zdonik, S. B. (1986), The Management of Changing       
Types in an Object-Oriented Database, Proceedings of ACM Conference on    
Object-Oriented Programming Systems, Languages, and Applications,         
Portland, Oregon,  483-495.                                               

Storey, V. C., and Goldstein, R. C. (1990), An Expert View Creation       
System for Database Design. Expert Systems Review for Business and        
Accounting, 2(3), 19-45.                                                  

Su, S., Krishnamurthy, V. and Lam, H. (1989), An Object-Oriented          
Semantic Association Model (OSAM*).  In AI in Industrial Engineering and  
Manufacturing: Theoretical Issues and Applications, Kumara, S., Kashyap,  
R., and Soyster, A. (Eds.), American Institue of Industrial Engineering,  
Norcross, GA.                                                             

Versant ODBMS (1992), Versant System Manual.. Versant Object Technology,  
Menlo Park, CA. (Note: Versant is a trademark of Versant Object           
Technology Co.)                                                           


Figure 1. Schema Evolution Operators

Figure 2. The Object-Based REA Accounting Model

Figure 3. Initial Schema of Inventory Purchase

Figure 4. Schema Evolution Task for a Duality Relationship

Figure 5. Target Schema of Inventory Purchase

Figure 6. Major Modules and Information Flow of REAtool

Figure 7. Snapshot of REAtool's Task Guidance Panel

Figure 8. Architecture of REA Knowledge Base and REA-based Tool Set A List of Figures

Figure 1. Schema Evolution Operators

Figure 2. The Object-Based REA Accounting Model

Figure 3. Initial Schema of Inventory Purchase

Figure 4. Schema Evolution Task for a Duality Relationship

Figure 5. Target Schema of Inventory Purchase

Figure 6. Major Modules and Information Flow of REAtool

Figure 7. Snapshot of REAtool's Task Guidance Panel

Figure 8. Architecture of REA Knowledge Base and REA-based Tool Set