Writing a High Level Design
I’ve had many engineers ask me for guidance on this. I’ve had to write a lot of design documents over the years, and have come up with an approach that works for me.
First of all, I want to emphasize this is about high-level design. These kinds of documents provide a framework and guidance for driving the more detailed design and implementation of a feature/system. It’s not normally going to contain full database schema definitions or class definitions. I actually generally avoid writing up design documents at that level of detail because the details change so rapidly. I generally strive to have a document that can stand the test of time over six months to a year before it veers too far off course.
Start with Requirements
The first section I always have is an overall problem statement, and then a list of requirements. I strive to have these requirements defined as scenarios, use cases, or acceptance criteria. You can read me waxing poetic about requirements in this post.
What I usually do is get these reviewed and have solid agreement with stakeholders before I go much further.
Domain Model
In almost every case the next thing I work on is a domain model. This is inspired by Domain Driven Design. The two goals here are to get aligned with the domain experts on how they speak of things and how these things interact with each other.
I can’t emphasize enough how important this is. This establishes the foundation for the rest of the design, and it establishes the foundation for clear and accurate communication with the product owner and everyone else who will be using the system.
You try as much as possible to use the terms commonly used by domain experts, rather than invent your own terms. As a counter-example, when we built a system to define insurance plans, we came up with the term “insurance plan boundary” to indicate a limit such as your deductible or annual maximum out of pocket. That has consistently confused everyone everywhere, and also actually been the source of bugs and serious misunderstandings.
I normally pull this together in a meeting with the product owner. I bring up a drawing tool like LucidChart and build the diagram on the fly. We have the requirements at hand to test the domain model. Also it’s not uncommon for the process of building the domain model to impact the requirements.
Here’s an example of one I might have pulled together with a product owner to define a domain for a survey feature.
I use the E/R diagram standards to indicate cardinality (one-to-many, many-to-many), as I find they are easier for non-technical folks to understand than UML. But I pull in the triangle notation from UML to indicate the is-a-kind-of relationship such as the different types of questions above.
Once this has kind of settled, I take this diagram, put it into my design document (with a link to the original diagram so it can be updated), and then add a set of bullet points below providing a summary of each of the concepts introduced in the diagram. There is nothing so frustrating as a diagram and nothing else, and you are left to try and figure it out for yourself.
As an example:
- An Answer is a specific answer to a survey Question by a particular Person. Note that depending on the type of the Question, the valid values for an Answer may differ
This is a living document
The domain model is going to evolve as more use cases are uncovered, or as the dev team tries to implement it and realizes that it may look good on paper but doesn’t actually make sense when you try to get it to actually work. Have ongoing conversations with the product owner and keep updating the model as needed. You won’t regret it.
System Diagram
Most of the time it’s important to lay out the actual components that will be built to implement this design. Most of us are doing microservices these days, so this normally shows what services will be built or invoked as part of implementing the use cases. It’s not unusual for you to create a new service that represents the new domain. How big to make a service is a topic for another day, but in general you want to follow the guideline of having a new service for a given aggregate. In the example above it’s pretty clear the aggregate root is a Survey, and pretty much everything else falls under that.
You want to have a definition of a Person in your service that’s relevant to surveys, but there is likely a separate model of a Person that is the “source of truth” for people in your system, something like an account or user service, and your Person entity likely has a field storing the external id for that external model.
But I digress. The main thing is to show what the components are that will need to be built or interacted with to support this design. A UML component diagram works nicely for this.
Note that you don’t always need this. If it’s just one service for example, you could just say “we’re going to build a new survey service” and be done with it. Don’t give yourself more work than you need to.
Exercise the Domain Model and Components
So, you have requirements in the form of scenarios or use cases, and you have a domain model and a set of components. Your scenarios are your test cases for this model and components. These scenarios also allow the necessary APIs you need to build to emerge. I strongly recommend not defining a bunch of APIs ahead of time. Let them be derived from exercising your use cases.
The way I usually do this is by having a separate paragraph for each scenario, and show how the system will interact/behave for that scenario.
The judicious use of sequence diagrams is an excellent tool for this kind of exploration. But I strongly suggest to not use LucidChart for this; these diagrams change a lot and changing sequence diagrams in LucidChart is a miserable affair.
Instead, I highly recommend using WebSequenceDiagrams. They have a plugin for both Confluence and Google Docs that makes it very easy to embed live diagrams. Here is an example of one…
Here’s the text representation:
User->Survey Service: POST /survey?definition_id=[id]
Survey Service->Account Service: verify user
Account Service->Survey Service: OK
Survey Service->User: CREATED [id]
User->Survey Service: POST /survey/[id]/answer?question_id=[id] {data}
Survey Service->User: CREATED [id]
Here’s what WebSequenceDiagrams converts it into (you can choose different styles):
You can see how easy it is to tweak the text and get a new diagram. This easy flexibility is so important as you’re trying to figure out how your system works. +1 to the folks who built this, thanks!
Don’t Forget the Error Cases
I wrote a whole separate post on this, but think about what happens when things go wrong. How should things respond? How can you be resilient to failure? How can you avoid failure altogether?
What I’ve found is that often the analysis of failure scenarios has a deep impact on your design. You don’t want to discover systemic design flaws after the fact! I call this “happy path” designs and they never end well…
It’s particularly important to think about the semantics of partial failures and retries. What happens if someone posts the same answer twice (for example they’re not getting a response so the click submit over and over again)? What happens if the user decides not to finish the survey? How do you know they’ve finished the survey? How do you handle a failure where you can’t store the survey changes?
Iterate
You will likely notice that as you try to exercise the scenarios, you find problems with the domain or the components. That’s the beauty of it. Go back and tweak things, bringing questions back to the product owner or other experts as needed.
Get it reviewed. Walk through it with people you trust. Review one-on-one with folks whose support you need. Use it as a way to communicate to management what it is you’re building and why. They will all give you feedback, and you work to incorporate it.
My experience has been, by the time you’ve done all this, you are on very solid footing, and you’re ready to starting building this thing with good confidence and a lightness in your step.