Django Tutorial - Abstract Base Classes vs Model Table Inheritance

With the recent QuerySet-Refactor branch merged into trunk the django developer community is finally gifted with one of the more popular, outstanding ticket requests they've been waiting for... model inheritance. Funny thing is people don't really seem to care, aren't ready to try it yet, or some don't fully grok it yet. Well, that's no good because it promotes code reuse, code reduction, and increases developer productivity. So let's take a look at the three existing options for extending your model classes, take a peek at what's going on in our database, and what type of syntactical sugar (or magic) are we gaining? Then you can make the decision as to which is best for you.

Introduction

This isn't going to be an introduction to Model Table Inheritance (MTI) or Abstract base Classes (ABC), that's what the documentation is for. This also isn't going to get into the discussion or debate about "issues" or "gotchas" that exist. Read the docs, play around with it yourself, and visit the forums if you have any questions or concerns.

My goal today is to review the 3 options that exist for reducing code (keeping things DRY) — composition (MTI), inheritance (ABC), and relationship (OneToOneField). We'll take a look at the model definition (which we use to define our database schema), the resulting sql and the syntax for referencing the properties of our model instances. Let's get right into it...

Actually, before we begin... this and a few other local project concepts will be used as the basis for future tutorials. We'll explore evolving the models, db performance tweaks, constraints, etc. My code here is meant for example only. So yea, grasp the concepts ... do as I say but not as I do kinda thing. Or maybe neither. Now let's get into it...

Model Table Inheritance(MTI)

We're going to reference two model classes I use to power this blog — ContentItemBase and Post. Let's take a peek at ContentItemBase first as I employ it as the "base" for all the content classes I use throughout my blog (and I use a lot - Post, Link, Video, Photo, Music, etc).

class ContentItemBase(models.Model):
  #TODO: document
  title = models.CharField(_('title'), max_length=100, unique_for_date="publish_on")
  slug = models.SlugField(_('slug'), prepopulate_from=('title',),)
  created_on = models.DateTimeField(_('created on'), auto_now_add=True, editable=False, )
  updated_on = models.DateTimeField(_('updated on'), editable=False)
  publish_on = models.DateTimeField(_('publish on'), )
  tags = TagField()
  status = models.IntegerField(_('status'), choices=enums.CONTENT_STATUS_CHOICES, default=enums.DEFAULT_CONTENT_STATUS, db_index=True)
  comment_status = models.IntegerField(_('comment status'), choices=enums.COMMENT_STATUS_CHOICES, default=enums.DEFAULT_COMMENT_STATUS, db_index=True)
  
  class Meta:
    pass

What I'm doing here is trying to define an object that shares in similar properties all my custom content classes will share as well. Each content item has a title, for example. Each has tags, a status, a comment_status, a slug, etc. So why would I want to duplicate my code when I can define one class that is reusable? Let's not repeat ourselves and reduce whenever possible, right? Ah, why did I even bother to ask - you already know the answer. So let's take a look at my truncated version of my Post model...

class Post(ContentItemBase):
  #TODO: document  
  DEFAULT_INPUT_FORMAT = 'X'
  
  INPUT_FORMAT_CHOICES = (
  ('X','XHTML'),
  ('M','Markdown'),
  ('R','Resructured Text'),
  )

  teaser = models.TextField(_('teaser'), blank=True, null=False)

Well, woopdy doo — who cares? I don't see anything special here. Wrong. Take a look at the class constructor. Instead of inheriting from models.Model this time we're inheriting from ContentItemBase. What does this mean? Simple, my Post class is now composed of all the properties, method, etc that I defined in my parent ContentItemBase. So, for example, I didn't have to reproduce the title property definition and so forth. This child Post class is a ContentItemBase and therefore shares its members. That's grand right? Let's take a look at the generated database schema definition for both of these...

CREATE TABLE "dizzy_posts" (
    "contentitembase_ptr_id" integer NOT NULL UNIQUE PRIMARY KEY REFERENCES "dizzy_contentitembase" ("id"),
    "teaser" text NULL,
    ...
)

CREATE TABLE "dizzy_contentitembase" (
    "id" integer NOT NULL PRIMARY KEY,
    "title" varchar(100) NOT NULL,
    "slug" varchar(50) NOT NULL,
    "created_on" datetime NOT NULL,
    "updated_on" datetime NOT NULL,
    "publish_on" datetime NOT NULL,
    "tags" varchar(255) NOT NULL,
    "status" integer NOT NULL,
    "comment_status" integer NOT NULL
)

See anything interesting? I do. Behind the scenes django created two tables - one for the ContentItemBase model and another for the Post model. Ohhhh, so there is a one-to-one relationship here. Yuppo. The dizzy_posts table has a column named contentitembase_ptr_id. This column holds the ID of the associated dizzy_contentitembase record. Why do we care? We care because that means we have a join between the tables. Is this a bad thing? No, not really. But depending on your DB architecture you may want to be kept informed as to how many how many joins are being executed, how often, etc. These all are general performance concerns and while most of the time it's nothing to break a sweat over, it's just good to be aware. Let's take a look at how we'd reference the title property via the python interpreter and the generated sql...

First we'll grab a record from the database with an ID = 1.

>>> p = Post.objects.get(id=1)
>>> p.title
u'moo'

Next let's take a look at what's going on under the hood. What sql was just executed to query to generate the resulting Post instance? Wait, no. Let's not. Did you just see that? I made a reference to the title property without having to reference the relationship the database created for us. There's the syntactical magic we're now gaining from QuerySet-Refactor. If you were using a ForeignKey or a OneToOneField to maintain the relationship you would have to do something like post.contentitembase.title which actually makes a lot of sense because it's a foreign key in the database and the syntax here represents that relationship. But in this example, as far as the model api knows, title is a member of Post and easily accessible. So let's take a look at the generated sql.

>>> from django.db import connection
>>> connection.queries[-1]
{'time': '0.001', 'sql': u'SELECT "dizzy_contentitembase"."id", "dizzy_contentitembase"."title", 
"dizzy_contentitembase"."slug", "dizzy_contentitembase"."created_on", "dizzy_contentitembase"."updated_on", 
"dizzy_contentitembase"."publish_on", "dizzy_contentitembase"."tags", "dizzy_contentitembase"."status", 
"dizzy_contentitembase"."comment_status", "dizzy_posts"."contentitembase_ptr_id", "dizzy_posts"."teaser"
 FROM "dizzy_posts" INNER JOIN "dizzy_contentitembase" ON ("dizzy_posts"."contentitembase_ptr_id" = 
"dizzy_contentitembase"."id") WHERE "dizzy_posts"."contentitembase_ptr_id" = 1  ORDER BY 
"dizzy_contentitembase"."publish_on" DESC'}

Alright now, I see what's going on. There's that join right there — INNER JOIN "dizzy_contentitembase" ON ("dizzy_posts"."contentitembase_ptr_id" = "dizzy_contentitembase"."id"). So what do we get with MTI? We get the added benefit of intuitive syntax on the model and we get a separate database table associattion between parent and child. Is this good? Is this bad? Nah, we can discuss that some other time. Let's move on to Abstract Base Classes.

Abstract Base Classes

Sticking with the same theme, we'll continue to use the ContentItemBase and Post models as our examples classes. This time we're going to see how ABC differs from MTI. It should be pretty straight forward. Let's take a look first at the ContentItemBase definition.

class ContentItemBase(models.Model):
  #TODO: document
  title = models.CharField(_('title'), max_length=100, unique_for_date="publish_on")
  slug = models.SlugField(_('slug'), prepopulate_from=('title',),)
  created_on = models.DateTimeField(_('created on'), auto_now_add=True, editable=False, )
  updated_on = models.DateTimeField(_('updated on'), editable=False)
  publish_on = models.DateTimeField(_('publish on'), )
  tags = TagField()
  status = models.IntegerField(_('status'), choices=enums.CONTENT_STATUS_CHOICES, default=enums.DEFAULT_CONTENT_STATUS, db_index=True)
  comment_status = models.IntegerField(_('comment status'), choices=enums.COMMENT_STATUS_CHOICES, default=enums.DEFAULT_COMMENT_STATUS, db_index=True)
  
  class Meta:
    abstract = True

Hmmm.... not much changed. Wrong. One very important thing changed. The Meta sub class definition now contains abstract = True. Previously we implemented a pass and ignored any custom Meta definitions. By setting this value here you are telling django something about this class - hence the reason it's defined in the Meta subclass. You're saying "Hey!" this is an abstract class and you know what, treat me like one damn it! So, it's a small change but we'll see soon that it's a big one. Let's take a look next at the Post model...

class Post(ContentItemBase):
  #TODO: document  
  DEFAULT_INPUT_FORMAT = 'X'
  
  INPUT_FORMAT_CHOICES = (
  ('X','XHTML'),
  ('M','Markdown'),
  ('R','Resructured Text'),
  )

  teaser = models.TextField(_('teaser'), blank=True, null=False)

Errr...? This is exactly the same as the MTI example. Yes! Nothing changed on the child Post class. It too inherits from ContentItemBase but we didn't have to do anything different here. So what'd the big whooohaa then? Let's move to the database...

CREATE TABLE "dizzy_posts" (
    "id" integer NOT NULL PRIMARY KEY,
    "title" varchar(100) NOT NULL,
    "slug" varchar(50) NOT NULL,
    "created_on" datetime NOT NULL,
    "updated_on" datetime NOT NULL,
    "publish_on" datetime NOT NULL,
    "tags" varchar(255) NOT NULL,
    "status" integer NOT NULL,
    "comment_status" integer NOT NULL,
    "teaser" text NULL,
    ...
);

Hello! There's no dizzy_contentitembase table? Nope. When using ABC django "flattens" the relationship between parent and child classes and essentially merges the two on the backend datastore. What does this mean Einstein? Correct, no joins when querying for your inherited members. Let's dive into the code...

First, let's grab a Post instance from the database.

>>> p = Post.objects.get(id=1)
>>> p.title
u'moo'

Good. So, we have the same nice, simple, intuitive property reference syntax here and why wouldn't we — the title field bring referenced is a column on the entity's table and doesn't require any joining via a database relationship. Let's take a look at the resulting sql.

>>> connection.queries[-1]
{'time': '0.000', 'sql': u'SELECT "dizzy_posts"."id", "dizzy_posts"."title", "dizzy_posts"."slug", 
"dizzy_posts"."created_on", "dizzy_posts"."updated_on", "dizzy_posts"."publish_on", "dizzy_posts"."tags", 
"dizzy_posts"."status", "dizzy_posts"."comment_status", "dizzy_posts"."teaser" FROM "dizzy_posts" WHERE 
"dizzy_posts"."id" = 1  ORDER BY "dizzy_posts"."publish_on" DESC'}

See that? No joins mom! Pretty slick. Not much more to say here other than, good job fellas! I like this one a lot.

Classic OneToOne Field

Before I let you go we need to take a look at the classic way we'd implement this solution. Well, there's two ways we could implement this in the "classic" way. One would be to copy and paste each property definition into each model (talk about not being DRY) and the other is to create a direct relationship between ContentItemBase and Post models. Sounds kinda like MTI doesn't it? Well, it's very close but not quite the cigar. I'll keep this one short, since most of you already know this design.

Again, we'll start by taking a peek at the ContentItemBase model definition. This is exactly the same as the MTI solution.

class ContentItemBase(models.Model):
  #TODO: document
  title = models.CharField(_('title'), max_length=100, unique_for_date="publish_on")
  slug = models.SlugField(_('slug'), prepopulate_from=('title',),)
  created_on = models.DateTimeField(_('created on'), auto_now_add=True, editable=False, )
  updated_on = models.DateTimeField(_('updated on'), editable=False)
  publish_on = models.DateTimeField(_('publish on'), )
  tags = TagField()
  status = models.IntegerField(_('status'), choices=enums.CONTENT_STATUS_CHOICES, default=enums.DEFAULT_CONTENT_STATUS, db_index=True)
  comment_status = models.IntegerField(_('comment status'), choices=enums.COMMENT_STATUS_CHOICES, default=enums.DEFAULT_COMMENT_STATUS, db_index=True)
  
  class Meta:
    pass

Now let's take a look at the Post model definition. Pay attention, because this one is a little bit different.

class Post(models.Model):
  #TODO: document
  DEFAULT_INPUT_FORMAT = 'X'
  
  INPUT_FORMAT_CHOICES = (
  ('X','XHTML'),
  ('M','Markdown'),
  ('R','Resructured Text'),
  )

  contentitembase = models.OneToOneField(ContentItemBase, primary_key=True)
  teaser = models.TextField(_('teaser'), blank=True, null=False)

See what we did here? We added a new property called contentitembase and we defined it as a OneToOneField with it's association to the ContentItemBase model. So what does that look like behind the scenes?

CREATE TABLE "dizzy_posts" (
    "contentitembase_id" integer NOT NULL UNIQUE PRIMARY KEY REFERENCES "dizzy_contentitembase" ("id"),
    "teaser" text NOT NULL,
	...
);


CREATE TABLE "dizzy_contentitembase" (
    "id" integer NOT NULL PRIMARY KEY,
    "title" varchar(100) NOT NULL,
    "slug" varchar(50) NOT NULL,
    "created_on" datetime NOT NULL,
    "updated_on" datetime NOT NULL,
    "publish_on" datetime NOT NULL,
    "tags" varchar(255) NOT NULL,
    "status" integer NOT NULL,
    "comment_status" integer NOT NULL
)

So, we're very close to MTI's table schema here. What we have now is an association between the dizzy_posts and dizzy_contentitembase tables via the foreign key dizzy_posts.contentitem_id. So yes there are differences here between MTA and OneToOne. So let's take a peek at the generated sql...

>>> post = Post.objects.get(pk=1)
>>> connection.queries[-1]
{'index': 16, 'time': '0.000', 'sql': u'SELECT "dizzy_posts"."contentitembase_id", "dizzy_posts"."teaser" FROM "dizzy_posts" WHERE "dizzy_posts"."contentitembase_id" = 1 '}
>>> post.contentitembase.title
u'foo'
>>> connection.queries[-1]
{'index': 17, 'time': '0.000', 'sql': u'SELECT "dizzy_contentitembase"."id", "dizzy_contentitembase"."title", 
"dizzy_contentitembase"."slug", "dizzy_contentitembase"."created_on", "dizzy_contentitembase"."updated_on", 
"dizzy_contentitembase"."publish_on", "dizzy_contentitembase"."tags", "dizzy_contentitembase"."status", 
"dizzy_contentitembase"."comment_status" FROM "dizzy_contentitembase" WHERE "dizzy_contentitembase"."id" 
= 1 '}

Now that's a little different. The first line where I retrieve the post, I chose to use the primarykey (pk) value, and in this case that's the ContentItemBase ID and not the Post id. We set the Post's contentitembase property on the model to a OneToOneField and also defined the field to be the unique key. This properly enforces database integrity by allowing only one ContenItemBase to be associated with only one Post.. For database retrieval we have a handful of options. There's a few ways we could generate additional indexes on the Post table beyond the exiting primary key, and if needed and we could traverse the relationship by querying the ContentItemBase to retrieve the requested Post via it's slug for example, but that's not for this tutorial (although you can read more and review great examples in the documentation). One important note to not overlook here — using the syntax above to retrieve the Post instance via it's pk required an additional query to retrieve the title of the post. Again, this is just one of the performance concerns you need to keep in the back of your head when designing your models and therefore your database tables.

Conclusion

What I didn't want to get into too much here today is the arguments for and against composition and inheritance. In conversations with other developers regarding this topic I've heard issues exist with inheritance and bugs/issues in the admin but I can't speak to any of those right now, and I don't bring them up to cry wolf. I just want you to be aware. Spend time testing out the water. Go start a discussion on IRC.

For example, I need to ask some people about polymorphism here. What are the obstacles with dealing with parent/child relationships across the object graph? I assume if I request a Parent but receive a Child via a Manger class, for example, that I can somehow make reference from one to the other. But honestly, I haven't had to get into that level of granularity yet... so I have more testing to do myself. But the more people talking about it, means the more people we'll start to see implementing it. I think the django team did a terrific job here. Now, whether or not one solution is better than the other isn't for me to tell you. It's for you to find out on your own. If you're interested in follow up reading - Eric Florenzano has written up a good post discussing his opinions on composition vs inheritance and even goes into the concept of mixins on another post.

Join the discusson — leave your comment below.

Howdy,

Nice post (I like your icons on the right bar, too ... very clever)!

I was going to ask you what you thought the advantages of taking the ABC route vs. the MTI route, but then I read your conclusion where you said you didn't want to get into it :-) (Let's assume their were not bugs w/ either).

I usually normalize the hell out of any db that crosses my path ... I'll admit that it might not be the best to do. I'm open to be convinced otherwise.

So, I'd feel a bit dirty taking the ABC route outlined here because it pushes those generic title, date, and whatever fields into each "child" table.

What do you think? A blog post for another time, maybe?

Posted by: Steve L

@Steve - actually I'm a ABC guy myself. It works for me so far on just about every level (admin works, haven't seen any bugs using the api, etc).

That being said I haven't pushed it beyond a one level parent/child hierarchy, but I believe issues start to arise once you extend that inheritance tree beyond the first node.

I see your case for normalization, but I've also had to go waaay back and denormalize the heck out of a database because I took it too far once. So yes, there's absolutely some pros and cons to both.

Thanks for the notes on the design. I have so much touching up to do and cleaning though, it'll get better and cleaner as it progress, promise.

And yes, I do think this deserves another blog post or a screencast to take the discussion into further detail. Thanks for the response!

Posted by: kevin

Hey Kevin,

Not to be "that guy", (i.e. Pedantic Man) but my understanding was that OneToOne was somewhat deprecated in favor of ForeignKey(unique=True).

This would make sense to me, since OneToOne was sort of this weird middle ground between ForeignKey and MTI. If you look at the generated schema, the foreign key column in the child tables sort of reminds me of the way that tables for models with a OneToOne relationship would have a foreign key as their primary key.

If you just have a foreign key field, then the child model can still have an independent primary key. Not that it's much good (its redundant) but theoretically it somewhat more fits the concept of the child model as a separate entity, rather than a specialization of the parent entity (which is what MTI delivers). I think OneToOne exists only because there wasn't MTI at the time. Just being speculative.

Posted by: Loren Davie

@loren - no, please be that guy.

I believe you are correct about OneToOne field being deprecated at some point in favor of ForeignKey. That being said I've used both solutions and chose OneToOne for this tutorial because of its similarity to MTI versus the the ForeignKey alternative for exactly the same point your brought up - the option for the child's independent primary key.

But that's a very good point to reiterate about the child model possibly having it's own independent primary key when using ForeignKey(unique=True). That's an excellent point actually, and sometimes I find myself needing that to emulate the child being it's own "entity".

That's why I write these posts - to kick off a discussion, top maybe help educate someone else, and to better educate myself.

Posted by: kevin

What I haven't figured out is how inheritance affects deletion etc. If e.g. a Post is deleted, the ContentItemBase is probably also deleted. What if the ContentItemBase is deleted?

Another lack of understanding is how you can utilise the ContentItemBase for more than the creation of the child models. Is it also possible to access it directly to e.g. retrieve all content published between two dates? Then the fields would be different.

Posted by: hans

@hans, thanks for the response.
Ya know what - I think you just fired up some ideas for further tutorials on the subject. I'll make a serieus out of MTI & ABC in the coming week(s).

Posted by: kevin

Model Inheritance is a nice feature, but I am among those that don't care.

Code reuse through inheritance is no silver bullet. Given the considerations needed for any non-trivial database schema, I don't accept it as a great improvement.

I would never reuse a database model through inheritance without going over everything to reconsider the consequences in terms of performance, query options, etc. Inheriting rather than copy paste would be a minor difference.

I have been bitten quite a few times and have seen others bitten more than I care to count by the mistake of inheriting just because something seems similar. Sure there are cases where a proper is-a relationship is in place, but much too often the schema evolves to a point where you have to change the base-subclass relationship.

Without the ability to handle multi-field primary keys the Django ORM remains a tool for fairly banal databases.

Posted by: Henrik Vendelbo

dwm Een plaatje zegt alles, toch ? ade Het volledige rapport is hier te vinden. Lees natuurlijk c de blogposting. u z Thanks for interesting post! rxq паркетная доска 9c

Posted by: ламинат

Leave your comment below...


 (optional)
 (I don't know how to spam. Your email is private.)

I do moderate comments, so be cool honey bunny... like three little fonzies.
meta info
Posted on: June 17, 08

abc, django, mti, querysetrefactor

subscribe