Domain events FAQ
A colleague asked me a bunch of great questions about best practices around emitting domain events, and I thought I’d share it as an FAQ. These are my opinions based on my experience, and would very much love your thoughts and input…
Q: Should you put more than one event type into the same topic?
A: No. This is for a few reasons. Consumers would need to have special logic to identify which type of message this is, and if they only care about one type, they have to waste a lot of processing power to filter out the other types of messages. Also, if one type of message has high volume, those messages can get in the way of the other types of messages and impact latency.
Q: Should you publish data from specific tables or publish a domain-level document that may span multiple tables?
A: Your tables are an internal implementation detail, so generally it’s not a good idea to publish these. The view you want to present to the outside world is of your aggregate domain JSON document that contains multiple entities, identified by the id of the aggregate root.
For example, you wouldn’t have one event for order line items and another event for orders. You would publish the order as a single document containing all the line items. If you publish table-level data, then it’s much harder to refactor your table schema. Also, if you publish pieces of an aggregate separately, then the consumer needs to know how to “piece them together” into a single unit, because the unit of transaction is the aggregate, not individual tables.
Q: When an aggregate changes, do you publish an event that contains the entire aggregate or just the pieces that have changed?
A: It is much preferred to publish the entire aggregate. This is for a couple of reasons. First of all, some changes are not idempotent if you just publish the change (such as “add line item”) — if you apply that change twice it’s not the same as applying it once, and if someone retries a message you’ll end up in the wrong state.
The other reason is that the consumer may not have the previous state, and won’t know how to apply the change.
Another good reason is specific to Kafka — you can create a “compacted” topic which is guaranteed to contain the latest event for each unique entity (e.g. for each order), but if that event is just the fields that have changed you have no way to grab the full data set for that entity.
Q: Should I publish events directed to a particular consumer?
A: When you’re first building an event, it’s generally intended for a specific use case, and you tend to write it for that use case. But you should expect over time that you could have more and more consumers. It’s similar to writing an API. You don’t want to be too generic up front, you want to make sure you’re driving your contract based on the consumer rather than making stuff up. But you need to keep in mind evolving the event over time. So be careful about tuning your event too tightly to a specific consumer. In particular, don’t let concepts from the client domain “pollute” your domain.
Q: When I need to change the data or semantics of an event, how do I do this without breaking consumers?
A: I’m glad you’re worried about breaking consumers. Not everyone thinks about that. You are building a contract, just as with an API, and you need to make sure all changes are fully backward- and forward-compatible. If you use Avro and the Schema Registry, this is something Avro can help guarantee. If you absolutely have to make an incompatible change, then just like with an API, you create a new version by publishing to a new topic, and then consumers can migrate to that new topic over time. In the meantime you’ll need to publish to both the old and new topics, just like supporting multiple versions of an API.
Q: You say an event should have the entire aggregate. But it seems like it’s possible for the event data to get really huge. What do we do then, and how big is too big?
A: A quick Google shows that Rabbit messages should be no larger than 128MB, and the default message max for Kafka is 1MB, but it can be increased. Further research shows one team that increased it to 5MB and it was absolutely fine. If you have a single domain aggregate that is getting bigger than 5MB, to me that’s a smell that maybe your domain design needs to be rethought.
Q: When your domain event refers to an aggregate in another domain, should it include that aggregate’s data or just a reference to it?
A: In general you should include only the data for your aggregate. But some of that data may be derived from data in another aggregate. For example, the Order aggregate may have product details that it obtained from the Product aggregate.
However, there are some situations where it may make more sense to include a reference, such as a containment relationship (a Product contains other Products). What this will mean is that the consumer needs to either (a) retain information about all Products, so that it can cross-reference when it needs to or (b) confidently and reliably call the owning service to get the information about the other product. The second is not recommended because it increases coupling and also forces the owning service to retain history about every entity for an unknown length of time because some client may call to get information about an entity from a looong time ago.
Q: How do we guarantee that updating the database and publishing the event are consistent? How are we guaranteed that either both happen or neither?
A: Yes, this is a well known challenge where you can run into inconsistencies between what you have stored in your database that owns the source of truth for a domain, and the domain event data. There are two solutions that I have seen.
One is you write changes to a local events table in your database that is updated within the same transaction as you update the domain tables. For example, when there is a change to a product definition, you write the update to an product_events table, and then in the same transaction you update the product.
You would then have a separate process read from the product_events table and publish to the products topic. After you publish the event, you mark it as published in the product_events table. If there is a crash and recovery you know how to start from where you left off.
The other approach is to just write to the topic, and then you have a consumer that reads from the topic and updates the domain tables. You can even have that consumer running in a different service. This is the foundation of event sourcing and the Command-Query Responsibility Separation pattern.