Types and Extents in Microsoft’s M

April 20, 2009

My first impression of Microsoft Oslo and the M modeling language was quite positive. In particular, their approach seemed to take better care of instances than e.g. UML does, and I applauded the use of extents (instance collections) rather than types to represent repository database tables.

Upon closer examination, however, some doubts appeared. The syntax seems unneccesarily complex, and the repository and M representations goes out of synch with regard to such basic functions as subtyping and the identity of the instances. This post proposes some adjustments to simplify M and better align the visual, textual and repository representations of Oslo. An example is shown below, with one type (A), and two subtypes (B,C), one with a link to the other:

module M1 {
    type A {
        Id : Integer32 = AutoNumber();
        PropA : Text;
    } where identity Id;
    type B : A {
        PropB : Text;
    }
    type C : A {
        PropC : B;
    }
    As : A*;
    Bs: B*;
    Cs: C* where item.PropC in Bs;
}

This is the simplest expression of the intent in M. Because subtyping is resolved during compilation, the repository database structure will however not contain any trace of the fact that the instances of B and C are also instances of A.

Conversely, the solution that gives the simplest database structure, is shown below:

module M2 {
    import System;
    type A: System.RootItem {
        Id : Integer32 = AutoNumber();
        PropA : Text;
    } where identity Id;
    type B : System.DerivedItem {
        Id : Integer32 ;
        PropB : Text;
    } where identity Id;
    type C : System.DerivedItem {
        Id : Integer32 ;
        PropC : B;
    } where identity Id;
    As : A*;
    Bs: B* where item.Id in As.Id;
    Cs: C* where item.Id in As.Id && item.PropC in Bs;
}

The weakness of this approach, is that you end up with multiple instances in M for each repository entity, as shown here. In this case, it is M and Intellipad that does not know about the subtyping.

I would prefer this solution:

module M3 {
    import MSchema;
    type A : MSchema.Extent {
        Id : Integer32 = AutoNumber();
        PropA : Text;
    } where identity Id;
    type B : A {
        PropB : Text;
    }
    type C : A {
        PropC : B;
    }
}

Note that B and C are also extents, by inheritance from A. The repository structure for this specification should be identical to the one for M2 above. Here we define the interpretation of the model by MSchema using subtyping, just like for System.RootItem/DerivedItem in the M2 example. Rather than explicitly defining an extent as a collection of entities of a certain type, we define that the type is also an extent. A type stands for both the metadata of the entities (fields and constraints), and the set (or extent) of entities that possess these features. This duality of the type concept is already present in M, but not utilized to its logical conclusion. This approach would remove the misalignment between the repository and Intellipad, so that subtyping, instance identity, and instance typing are preserved across the three components of Oslo.

This solution simplifies the language by separating platform specific content (MSchema) from the conceptual language core. It simplifies the model by making default interpretations for the most common usage patterns. You could still define multiple extents from the same type, by declaring different named collections, so no expressiveness is lost.

Some might feel uneasy about the duality of the type construct. Note however that M already allows us to let a type identifier stand for a collection, e.g. in queries and constraints. The : symbol can already mean both subtyping and instantiation.

Modules for Separation of Concern

The above approach could also facilitate better separation of concerns between different roles working on the same model. In the example above, the end users should be supported in representing their intent in a simple and straightforward manner, not caring about how the elements should be stored in the repository. This could be left to the repository administrator, who would be annotating the users’ models in a separate module, e.g.:

module UserModule {
    type A : MSchema.Extent {
        PropA : Text;
    }
    type B : A {
        PropB : Text;
    }
    type C : A {
        PropC : B;
    }
}

module RepositoryAdministration {
    import MSchema, UserModule;
    UserModule.A : MSchema.Extent {
        Id : Integer32 = AutoNumber();
    } where identity Id;
}

The great benefit of this approach is that it keeps the user model simple, without any technical encoding details. It requires partial models that can be extended in other modules. In the example the repository administrator extends the specification of the users’ model with technical details. To further simplify repository administration, we could support mix-in inheritance to allow general default specifications. If included in the context, the model below would create a default interpretation that all types are repository extents, and add a system-level id to every entity:

module RepositoryAdministrationDefaults {
    import MSchema;
    Language.Entity : MSchema.Extent {
        Id : Integer32 = AutoNumber();
    } where identity Id;
}

The specification above could be one of many different default modules, that could be linked and unlinked from the context depending on the preferences. We would also need constructs for removing default specifications, for instance a predefined type MSchema.NoExtent, or perhaps we should use the minus sign to remove inherited features?

type A_transient : A – MSchema.Extent;

Our experience is that the amount of modeling needed in order to produce an executable representation, is greatly reduced by these kind of constructs.

Posted by Håvard Jørgensen
Filed in Tools and Languages
Tags: Microsoft Oslo

Active Knowledge Modeling