One error that I see over and over again with API's is improper usage of HTTP status codes. When it come to web sites, most people understand most common status codes 200, 404 and 500, however lot of people are not 100% sure how to translate HTTP status codes to API world, also which other status codes to use and when.
Part of confusion is natural and expected, since HTTP status codes were originally created with web sites in mind and not API's. Beside this there is fair number of them. Huge part of the end users, and engineers also, haven't seen most of them or encountered them in day to day. Due to all of this, I usually see three most common wrong patterns in usage of HTTP status codes.
If we are aware of existence of only 200, 404 and 500 status codes, then it is given that we will only use them. I would suggest everyone to go from time to time, to the list of all HTTP status codes and read it. Especially when we want to implement some behavior and check which HTTP status code make most sense for that type of response/behavior of our API. By doing this we will enrich our knowledge, which is always a good thing, and also build better products, also always a good thing.
If we have inconsistency in usage of HTTP status codes, it probably mean that we have other inconsistencies in behavior of our API. When ever there is inconsistency in behavior of any application, it means that not everyone is on the same page or of same understanding, how things should be done. Way to solve this is to have agreed way of working, good documentation, and in case of API, by proper usage of specification and ruthless enforcing of working according to specification. By having good agreement about specification for API and following it by everyone, lot of potential problems or issues will be caught early on.
Using wrong HTTP status code for behavior, shows the lack of understanding of HTTP status codes and cast doubt on quality of the product in the first place. Usually this happens when company standards are not defined or incorrect, or people lack understanding which status code should be used at which point. We will not go over all of the status codes in this post, just the ones that we should use/need in most cases.
This is the easiest of them all, we should just use status code 200. However keep in mind that there are a lot of 2xx status codes, 200 isn't he only one that means that request was processed correctly. We should use HTTP status code 200, only when request was successful and there is data send back in response.
This is one of those situation that is often done in improper way. If all is good, good request is received and all happened that needed to happen, however there is no data to be sent back, HTTP status code 204 should be used, and not 200. Remember that if 204 status code is sent, there can not be a response body.
Common usage for status 204 is with requests that insert or update data, and it isn't expected for API to return inserted or updated record, so called fire & forget types of requests. In this case we use 204 to let consumer know, that request was received and that it is processed, or will be at a latter stage.
For example subscription for newsletter
$ curl -X POST <API>/subscribe -b "email=john.doe@itshark.xyz"
In this case as long as consumer know that request was received and will be processed at some point in time, all is good. There is no need for getting any feedback info, so 204 is perfect match.
If you are designing an API and are pedantic in use cases and status codes, you will hit one interesting use case. That is the case, when request for retrieving data is valid, however there is no data to return to the client.
For example something like this
$ curl <API>/persons?fist_name=jflajflajfla
We can approach this use case from two sides
There are pros and cons for both options. Some people argue that 200 status code with empty object/list in response should be sent, since in this case front end team can check status code for 200 and then show data only in case there is some in response. Assumption is that there will be check in front end part if object contain data or list is not empty and such. Some well known websites took this approach.
On other side there are people like me who think that since there is no data there is also no point in creating and sending and empty object/list. Using status code 204 instead of 200 isn't that much bigger difference for front end part of website or application, since doing the check if status is 200 and then doing check if there is data or not to chose which part of code to execute, is similar to code to check if status is 200 or 204 and executing same piece of code. From coding effort and execution of code effort there isn't really a big difference. Upside in using 204 is that we will not be sending empty objects across the wire, so will save time and network bandwidth on it, also all caching strategies know how to work in good way with 204 status codes anyway.
No matter which course of action you and your team take 200 or 204, make sure that it is agreed by whole team and documented well and followed through. In the end both approaches will work and exists in real world.
It is common practice that some things are publicly available, while other things need to be protected. In case someone/something try to access some protected or confidential type of data and haven't provided credentials, or credentials provided were incorrect, response should be 401.
Data send back in response in this case, will for sure vary depending on the type of API you are building and for what it is used. Balance need to be found between providing enough info for end users to understand the error of their way, while also not helping bad actors with too much info. My suggestion would be either don't send any data in response body, or just send generic message "Invalid credentials" for all type of situations.
One more common thing that I see over and over again, is that not only we want to protect some data exposed, but that also some sets of data should be accessible only by certain roles. In this case when someone provided valid credentials, however requested data that shouldn't be accessed by that user 403 status code should be used.
One of common errors that I encounter over and over again is that status codes 401 and 403 are mixed, or only one of them used for both cases. There isn't really need for that, since they are intended for different situations.
Nowadays you don't see to often 404 page in web sites. It is either replaced by some nice page saying that page you were looking for don't exists or some other page is shown as the best guess from what you were looking for.
In the world of API it is still encountered often, however not always in the right places. Proper usage of HTTP status code 404 should be in case URL hit is incorrect, in other words end point doesn't exists.
There are few edge cases that should be taken into account
$ curl <API>/jflajflajlfjalf
Should result in 404 since URL
Where it goes wrong is when people start sending back 404 in cases there is no data, while URL is correct one.
For example request
$ $ curl <API>/person?fist_name=jflajflajflajfla
We covered this use case above and as stated we either return 204, or 200 with empty object/list in response. Status 404 can't be used since URL
Let us now look at little different example. Let us assume that we can request person by id by creating request like this
$ curl <API>/persons/123
In this case person id is part of URL for request. If there is data, we should send data back and status code 200. However, in case there is no data we should send 404, since there is no data and requested resource doesn't exists, we can't retrieve data/resource using that URL. HTTP status code 404 is used for Resource Not Found, so in case there is no Person with ID equal to 123 URL will not contain resource/data, and 404 is correct response.
Keep this in mind when designing API and ways to request data. Also, keep in mind that usually there are multiple layers between your API and consumer, and there can be cashing and optimization happening at any of them. Not all of them will be in your control. For example, if your API is called from front end code in browser, and some response return 404, there is good chance that call to that same URL will not happen from browser for some time, and instead it will be short-circuited to 404.
I am true believer, that no mater what is thrown at our applications, they should never crash or display error to the end user. Similar approach in my mind should be taken, also in the case of API's. No matter what is thrown at them, API's should do all in their power to response accordingly and provide appropriate status code. Status code 500 should be left as a last resort, for those edge cases that we haven't tough of and haven't covered yet.
HTTP status code 500 in the end means "Internal server error", so if it is produced it means there is use case or edge case that we haven't covered, and we should fix it ASAP.
It is good practice to have general understanding of which group of status codes are used for what, in that way it is easier to find the correct one and also validate if we are on the right track with existing ones.