A Knowledge Architecture Methodology for Integrating Data Models
May 7, 2009
A previous post outlined some reasons why we think data modeling and semantic approaches are poorly suited for developing common data models across applications, disciplines, functions, and organizations. In particular, we argue that formal, precise representations makes it difficult to discuss terms before we have agreed upon a common language. Another problem is class hierarchies, which typically are local to a community. Enforcing a single classification structure in a common model can alienate stakeholders who have a different way of seeing things. Finally, visual models are preferred over textual representations because they more easily work as a neutral common ground, avoiding terminology wars.
This post introduces a modeling methodology that utilizes knowledge architectures to arrive at integrated information and data architectures. By following this approach, you create a conceptual knowledge model, which is suitable for interdisciplinary, cross-functional and cross-organizational communication. The methodology outlines the steps involved in creating common understanding, and some modeling principles that should be followed.
A methodology for developing shared understanding, common language, and common data models, should be aligned with group dynamics, e.g. as reflected in the forming-storming-norming-performing phases. A primary concern is to ensure that all stakeholders’ perspectives are being represented. The approach therefore combines joint activities and activities that should be performed locally within a disciplinary or functional group:
- What are we talking about? – Joint identification and scoping
First we need to identify the elements and situations that we wish to exchange data about, preferably with practical, concrete examples, to anchor the further discussion. At this stage, the model will contain named elements, not connected to each other.
- What does it mean to us? – Local description
Then each group or community should work by themselves, to describe the elements and situations within the common scope, as they see them. The model will be forked into multiple views, one for each group, where properties and features are added to the elements.
- How do we use it, for what purpose, with what? – Local context
Each group then tries to connect the elements and phenomena into common contexts, as seen from their own local perspective. Dependencies to other groups should also be identified at this stage. The models are extended with relationships and possibly hierarchical structures, e.g. for classification and other forms of ordering. Stages 2 and 3 are closely interlinked, and both add new locally defined elements to the models.
- What do you think about this? – Sharing perspectives
Then the different groups should present their models to each other. The audience should try to put themselves in the place of the presenters, to take their perspective in order to understand how things fit together in their world. Active listening requires two-way communication, but the facilitator should not tolerate discussions about right or wrong at this stage.
- Can we use the same terms? – Common terminology
Now the time has come for careful generalization and ordering of common terms. Common aspects are extracted from the models developed by the different groups.
- Can we create a joint picture? – Common context
This means that we are starting to structure a common model. A main challenge in this phase is the selection of which elements to include, and which to leave out. A common model could cover everything found in any local model, just what is common to all, or everything that is common to two or more groups.
- How can we implement this? – Common architecture
Design of a logical information model involves detailing, classifying, composing and ordering of elements in a way that fits the IT architecture. This is the domain of information architects, and should be separated from the work on the conceptual model in steps 1-6.
When many communities use the same term for the same thing, a common terminology is already established. Often we find that different groups use different names for the same things. As long as the different terms refer to the same set of real world things or phenomena, we can regard them as synonyms. Often, though, two “synonyms” will emphasize different features, and these differences should not be neglected. The situation where one or more terms are applied to denote some of the same things, but not all the same things, is more problematic. Then we must ensure that the common data model is sufficiently precise to cover all the different, partially overlapping perspectives.
With local models as a starting point, we can create a federated knowledge architecture that consists of multiple views, separating common views from local views for each community. Typically, it is easier to agree on which terms to include in a common view, than about the structures and relationships among the terms. If two groups disagree on the classification hierarchy for a set of terms, it is probably wise not to include any one interpretation in the common view, to avoid alienating people and creating resistance against further implementation.
The process outlined above does not determine how broad the scope of each iteration should be. An incremental development plan is preferred, where the breadth and cycle time is adapted to the number of participants, and their level of competence in IT and data modeling. In order to build trust, it is convenient to start with a small and simple example, but the scope should be large enough for all stakeholder groups to have something meaningful to contribute.
The example model below illustrates some aspects of this approach. It is a typical result of step 6 in the methodology, alternatively step 3 in a group that has a wide focus. The model focuses on a concrete, easily identifiable element, the product called Seat Heating System A23. It further captures different contexts in which this product is found, such as the product parts and modules, the organizational roles involved, the processes of its lifecycle, databases, documents, application systems, engineering principles, financial and business aspects, government regulations etc. Our experience is that such concrete enterprise models communicate meaning a lot better than more abstract class diagrams. Rather than ordering the elements into a classification hierarchy, we focus on capturing the relationships between elements. Because they capture dependencies, relationships are the most important kind of element in a knowledge model.
Our methodology is based on modeling of typical or concrete individual instances, their properties and relationships. Instance modeling can be performed using conventional notations, as long as we apply a liberal interpretation of what the model means. For instance, we can apply a notation for entities or classes to represent concrete instances, if this is necessary for the modeling tool to allow local properties, specialization, and parts to be represented. A modeling tool should also facilitate the integration of multiple views into a common model, in order to compare, relate, and preserve the perspective models of each community.
More concretely, our core modeling language captures these aspects:
- Information, both concrete examples, types, classes and underlying frameworks such as meta-level hierarchies.
- Roles, actors who participate in the modeling process, or other stakeholders.
- Role Views, information selected and structured according to the perspectives of a given role or group.
- Tasks and processes, for when we need to discuss who (which role) is using what information in which contexts.
For more information about how these IRTV dimensions can be applied to structure complex business-level conceptual models, we refer to our general modeling methodology introduction.
Most methodologies for information modeling deal with three distinct kinds of models:
- Physical data models describe how the data are stored, e.g. the table structures of a relational database.
- Logical information models describe a language in a way which is comprehensible for business users, but where the structure is adapted to fit the underlying data storage technology, e.g. ER and UML class diagrams for relational databases.
- Conceptual knowledge models describe the semantics, the meaning of the concepts and ther interrelationships. This level should be completely independent of IT.
Class diagrams and ER models are typically applied for both the conceptual and the logical model. Some mechanisms, e.g. classification, is often kept out of the logical model, if the chosen technology does not support it. In order to support interdisciplinary communication and sense-making, the conceptual model should be as flexible as discussed above. Techniques derived from database structures (ER) and programming languages (UML) are not perfect for human communication. We therefore think that the conceptual model should be purified as an enterprise model, and structured as a knowledge architecture.
As mentioned in the previous post on this topic, the main focus is often on metadata and languages when a common information model is being developed. If the logical model is regarded as an abstraction of the physical model, and the conceptual model as an even more abstract representation, the focus too easily ends up on general features. We instead advocate that you concurrently analyse the data, metadata, and metametadata frameworks involved. All three meta-levels are found in conceptual, logical, and physical models, as exemplified in the table below.
Instances and values
Languages and types
|Conceptual||Concrete examples and typical instances, with relationships||Stereotypes, patterns and role-specific categories||Enterprise architecture frameworks, mediating between different perspectives from disciplines, functions and other groups|
|Logical||Prototypes and templates for objects and structures||Classes of objects and relationships, with properties and other features||Common data architecture, and differences between local application architectures|
|Physical||Values in database columns, data quality||Definition of tables and columns, with value types||General translation between application data architectures, meta-levels and languages, e.g. between SQL and XSD|
Classification and Specialization
When we start to construct class hierarchies in an information model, we may proceed bottom-up or top-down. This is illustrated below, in two different model fragments. Both represent project as a kind of process. The top-down model to the left separates projects form other kinds of processes. Such structures are useful when modeling the perspective of a single community, where a simple class hierarchy might suffice. If we are to create a common model for more disciplines, we do however need to take multiple classification approaches into account. This makes the bottom-up model, to the right, more suitable. Here we have defined some different contexts in which a project can be found, in this case as a budget item for the accounting department, as a temporary organizational entity that employes consultants, as the actor that designs a new product, and of course as a kind of business process.
For common models, we thus prefer to build classification hierarchies bottom up, by composing different aspects of each term. This approach avoids unnecessary terminology wars over which term is the most correct and important, because incongruent perspectives can co-exist.
An approach where terms are defined as compositions of aspects, bottom-up, requires a carefully designed overall structure. A knowledge architecture should meet these criteria. It defines the core dimensions that aspects should be structured into. For instance, a term like product component can be seen as composed of these aspects:
- A process, the lifecycle of the component, with phases such as development, manufacturing, operation, maintenance, recycling etc.
- The organizations and functions that fills roles and responsibilities in the lifecycle of the product.
- The knowledge and skills required,
- The systems, data and application services needed,
- Product decomposition structure of parts, and possibly variant hierarchies for product families,
- A timeline, expected lifespan etc.
- Physical and spatial properties (weight, size etc.)
- Financial properties (price, cost functions),
- Decisions that control the product, e.g. choice among alternative designs.
Depending on their roles and responsibilities, different disciplines will emphasize different aspects of such a term. A clarification of which dimensions is central in a domain, is therefore a useful tool for comparing different perspectives on the way towards a unified, common model. Typical dimensions for different domains, is thus an important part of our methodology.
Dimensions can also be used to simplify a common model. By defining the central terms along the axis of each dimension, more specific terms can be placed in a “coordinate system”, according to which of the core terms along each axis that they specialize. The core common model need not contain every single point in this conceptual knowledge space, just representing the basic dimensions is sufficient. This greatly reduces complexity.
We have applied this technique to simplify a unified model for five different enterprise modeling tools in the ATHENA project. Here we could for instance state that the terms ”team” and ”role” have the same meaning in the process dimension (someone who performs a task), but different meaning in the organization dimension (more than one person vs. one person or more). In some applications these concepts were distinct, in others not. By defining in which dimensions the concepts were different, and in which they were alike, we could easily determine in which contexts “the difference made a difference“.
Another example is taken from offshore oil&gas field development. There the original data model combined product data with process data (status, milestone, responsible). By separating these aspects is the common information model, we were able to generalize and simplify the management of process data, which to a large extent was shared among diffferent product object classes. Because other criteria were judged to be more important for organizing the product class hierarchy, the definition of the process aspects had to be replicated across many product classes.
Even a concrete example can be interpreted very differently by different communities. Depending on perspective, focus, and status, the concept of “allocating people to tasks” can be enocded as
- A property or attribute on the task, with the persons name or id as value,
- That the task is part of a workplan for the person or a role that he or she fills,
- A relationship between the task and the person.
- A role associated with the task
- An input parameter for the task
- A decision tree with alternative persons who could perform the task, e.g. controlled by competence and availability,
- A process that decribes the steps leading to selection of a resource,
- An organization unit, e.g. a headhunting firm,
- An application software system or service, e.g. internet matching,
- A discipline within HR.
This wide range of perspectives imply that we should be careful about concluding on what kind of element a concept should be represented as. Again it is important that the common model remains open for different perspectives, and allow interpretive flexibility in order to ensure freedom of expression. A methodology that demands that users define an element as either an object, a relationship, or a property, is unsuitable. A more reflective approach is needed, where an element can appear as different types in different local views.
These concerns also present an argument why conceptual knowledge models should be applied as the primary tool for integrated data models, rather than logical information models. If you are unable to reach a unified agreement of whether a concept is an object, property or relationship, it is difficult to apply a methodology that forces you to decide one way or the other, where incongruent, multi-facetted notions cannot exist side by side.
From Knowledge Architecture to Data Models
Sooner or later, however, IT level information and data models should be defined in order to implement the common conceptual knowledge architecture. This step should be performed by IT architects and data management specialists. Note however that an operational knowledge architecture could be applied to integrate applications systems, if a knowledge repository with the right capabilities were available. Alternatively, the common knowledge model, and the modeling principles for knowledge architectures, should guide the implementation of data transformation and mapping solutions. This could simplify the mapping rules and facilitate an agile business-driven integration.