A while back I was writing a story for a game idea. I was trying out some web-based SaaS writing tool and started writing at my desk. After a few minutes I realize this is going to be a long writing session, and switched over to my laptop to write on the couch. I was in the zone! All sorts of cool ideas just flowed onto the screen. It was awesome. After several hours, I eventually called it a night, closed my laptop, and went to bed. The next morning, I decided to continue writing, and got out my laptop again. All my work was gone. What happened?! Turns out, I had left the page open on my desktop, and it was periodically auto-saving the mostly empty text from that version, overwriting the content that was coming from my laptop’s version. All my work was lost, in part because of bad API design.
Frustrated by the experience, I decided to make my own writing tool. (Why switch to another tool when you can reinvent the wheel for fun!) From the very beginning, my plan was to design something that avoids the pitfalls I’ve seen so many APIs make, by implementing safeguards at the API level to prevent the client from hurting itself.
What went wrong?
In the above situation, the server, API, and client all committed grievous sins that together caused a bad situation. The server certainly could have saved the document history. And the client certainly could have not kept sending save requests every five minutes even though nothing had changed. But the API could have worked around both of those problems and prevented other bad scenarios from occurring.
The Naive Solution
At first glance, the solution I’d pick seems simple. Add document versioning, have the server send the version number when it sends the document, and have the client send back the version number it believes to be the latest version as part of the update request. If the numbers match, fantastic, you’re good to go. If not, the server should reject the operation, and then the client has to figure out some way to inform the user and let them figure out a way to resolve the conflict.
/article/{articleId}
patch:
parameters:
- in: path
name: articleId
schema:
type: integer
required: true
- lastVersion:
description: The last known version of the document
type: number
required: true
example: 42
- body:
description: The new text to replace the existing document
type: string
example: Hello dark and stormy world.
responses:
'200':
description: OK
'416':
description: lastVersion does not match latest version
But this is of course a totally custom solution, and for a problem as common as this, I bet there are some established conventions out there that can handle this problem. After all, there’s no need to reinvent the wheel. Let’s explore a few.
The No Versioning Solution
Let’s say adding the history of the document is too much to ask of the server or database design. Well, surely you’ve at least got a “last modified” timestamp in there somewhere, right? Turns out, there’s a solution we can used built in to the HTTP request spec: If-Unmodified-Since.
Essentially, all we have to do is send the time that that last update was created, and if there have been no changes since the specified time, the server then accepts the request to update.
/article/{articleId}
patch:
parameters:
- in: path
name: articleId
schema:
type: integer
required: true
- name: If-Modified-Since
in: header
required: true
description: Timestamp of the last response.
type: string
example: Sat, 29 Oct 1994 19:43:31 GMT
- body:
description: The new text to replace the existing document
type: string
example: Hello dark and stormy world.
responses:
'200':
description: OK
'416':
description: If-Modified-Since lower than latest version
There are a couple problems with this approach, however.
First, the client can simply set the time to the end of the universe, and this will override the safety check or requiring a timestamp. You should always assume that at some point, someone using your API will misuse it. People don’t use APIs because they want to follow elegant design. They use APIs because they want to get something done. And if setting this property to the end of time means they don’t have to bother with managing last known times, it means they can get to their desired goal of updating the document faster and with less cognitive load. Your API should protect the client from making poor decisions whenever possible.
The second issue is how timestamps are handled in HTTP land. HTTP dates are strings whose smallest unit of time is seconds. If you have multiple updates within the same second, data loss will still happen as the second request overwrites the first. And if you’re following the standard, you MUST use standard HTTP dates:
A recipient MUST ignore the If-Unmodified-Since header field if the received field-value is not a valid HTTP-date.
No Milliseconds since Unix Epoch for you! Is it likely that two clients will try to update at exactly the same second? No, probably not. But when designing systems, why take that chance? We can do better.
Etags!
There’s another HTTP standard header we can use instead: Etag (stands for ‘entity tag’). The server sends out an Etag whose value somehow represents the last version of the document. How that representation is formed is up to you. You can use a simple version number, like in my naive solution, you can use Milliseconds since Unix Epoch, or you can use something more complex if that better suits your needs.
The client doesn’t send back the Etag. Instead, they send back a different header: If-Match. Here, the intent is made clear by the name: Only accept this update if there’s a match with the specified identifier.
/article/{articleId}
patch:
parameters:
- in: path
name: articleId
schema:
type: integer
required: true
- If-Match:
in: header
description: The last known version of the document
type: number
required: true
example: 42
- body:
description: The new text to replace the existing document
type: string
example: Hello dark and stormy world.
responses:
'200':
description: OK
'416':
description: If-Match does not match latest version
This is essentially the same as my naive version, but using established conventions. There’s nothing really gained from my original way of doing things, so following the convention is preferable, since it’s a format people are already familiar with, and it makes the intent more clear.
Think about APIs
I’m a believer that a strong API helps foster a great user experience. Your API should be more than just a dumb communication protocol to let the client throw things into your database. It should consider both the intentions of the client and possible bad things the client might do. Don’t just accept anything because the client asks you – make sure they know what they’re asking you to do. It might be a bit more effort, but your end users will thank you for a smooth experience!