March 02, 2015

Dropbox starts using POST, and why this is poor API design.

Today Dropbox announced in a blogpost titled “Limitations of the GET method in HTTP” that it will start allowing POST requests for APIs that would otherwise only be accessible using GET.

It’s an interesting post, and addresses a common limitation people run into when developing RESTful webservices. How do you deal with complex queries? Using URL parameters is cumbersome for a number of reasons:

There’s a limitation to the amount of data you can send. Somewhere between 2KB and 8KB apparently.
URL parameters don’t allow nearly enough flexibility in terms of how you can define your query. The percent-encoded string doesn’t really have a universal way to define in what character set its bytes are,
The URL is not nearly as versatile and expressive as JSON, and let alone XML.

Their solution to this problem is to now allow POST requests on endpoints that traditionally only alowed GET.

Is this the best solution? Well, it’s certainly a pragmatic one. We’re clearly running into artificial limitations here that are poorly solved by existing technology.

The problem with `POST`

Switching to POST discards a number of very useful features though. POST is defined as a non-safe, non-idempotent method. This means that if a POST request fails, an intermediate (such as a proxy) cannot just assume they can make the same request again.

It also ensures that HTTP caches no longer work out of the box for those requests.

Using `REPORT` instead

The HTTP specification has an extension that defines the PATCH request, this spec is picking up some steam, and a lot of people are starting to use it to solve common problems in API design.

In the same vein, there’s been another standard HTTP method for a while with the name REPORT, which specifically addresses some of the issues with POST.

The REPORT request:

Can have a request body
Is safe
Is idempotent

It appears in the IANA HTTP Method list and is actually quite great for this use-case. The main reason it’s off people’s radar, is because it originally appeared in a WebDAV-related spec a long time ago.

However, its semantics are well defined and it works everywhere. I would love to see more people start picking this up and adding it to their HTTP API toolbelt.

Using `GET` with a request body

Whenever this topic comes up on Hacker News, there’s almost guaranteed to be a comment about using GET with a request body.

I wondered about this myself (6 years ago now apparently!) and it’s my top question on stackoverflow. Clearly a lot of people have the same thinking process and wonder about this.

Using a request body with GET is bad. It might be allowed, but it’s specifically defined as meaningless. This means that any HTTP server, client or proxy is free to discard it without altering the semantic meaning of the request, and I guarantee that some of them will.

Furthermore, the benefits of using GET are then completely gone. Caching is not based on request bodies, and these requests are not addressable with a URI.

Literally the only reason why anyone would do this is because GET looks nicer, it’s an aesthetic decision, and nothing more.

Why real `GET` requests are great: addressability

Whether you use POST or the superiour REPORT request, you still miss the biggest advantage of using GET requests.

A GET query is always a URI. Anyone can link from it. Parts of your service can link to specific results. Even external services can integrate with you by referring to specific reports.

A POST query can not be linked and neither can a REPORT query. All we can do is explain that a certain URI accepts certain http methods with certain media-types, but this is not nearly as elegant as a simple URI. Linking rocks.

An alternative approach

One way to solve this issue entirely and fix all problems related to this, is disconnect the query you are doing from its result.

To do this, you could create a /queries endpoint where you allow clients to submit POST requests to, with request bodies containing all the details of your query.

This operation could create a new ‘query resource’ and responds by saying that the result of this query can be found at /queries/1 using a Content-Location header.

Then to fetch the result of the query, you can just issue a GET request on /queries/1. This means that you:

Don’t break the Web!
Resources in your API are still addressable and can be linked to.
The results are still cacheable, safe and idempotent.

We’re still using a POST request here though, but there’s a fundamental difference: we are using POST to create a new ‘query resource’ and we don’t use it to do the query itself.

The drawback? It’s definitely a bit more complicated to design this API and it requires storage on the server (for the query and/or the result of the query).

But then, REST services are not meant to be simple. They are meant to be robust and long-lasting, just like the web itself.

Web mentions

Comments

nicholaides • Mar 02, 2015

The other drawback to disconnecting the query from its result is that the server is no longer stateless (or as stateless). The server now has to record what all the query params were for a particular query.
- Evert • Mar 02, 2015
  
  I generally agree and tried to address that when talking about requiring additional 'storage'. I'm a bit hesitant to use the term stateless/stateful because that term tends to have a broader meaning in the context of HTTP.
  - gulyasm • Mar 03, 2015
    
    Also, you might need central storage, so when you have to scale, you can distribute the load across machines. Quite elegant solution though, I really like it.
Nathan Boolean Trujillo • Mar 02, 2015

why not just POST to a URL with GET params too and have the best of both worlds?
- Evert • Mar 02, 2015
  
  Actually, that would give you the drawback of both!
  - Nathan Boolean Trujillo • Mar 02, 2015
    
    not if you are posting JSON.
    - Evert • Mar 02, 2015
      
      I can't tell if you're serious!
- pierpaoloramon • Mar 02, 2015
  
  There’s no such thing as ‘GET params’. That’s just the (silly) PHP way of calling ‘query parameters’.
  - Nathan Boolean Trujillo • Mar 02, 2015
    
    yeah, "Query params" is the proper RFC nomenclature.
    You were close on the (silly) language:
    use CGI;
    $q = new CGI;
    my $value = $q->param('my_parameter_bame');
Jason • Mar 02, 2015

:s/In the same vain/In the same vein/
- Jason • Mar 02, 2015
  
  /aestetic/aesthetic/
  - Evert • Mar 02, 2015
    
    Thank you, fixed. That's what I get for not using spell check ;)
vittoriozaccaria • Mar 02, 2015

What about encoding the query as a base64 URL parameter of a GET? You'd probably get a bit more of flexibility..
- Boigus • Mar 02, 2015
  
  still doesn't get past the size limitations though
  - Sean • Mar 03, 2015
    
    A quick search indicates that just about every client and server supports 2 kb urls, which makes for a gigantic space of distinct queries. I use the b64 technique and it's great, save the cryptic url.
  - Sean • Mar 03, 2015
    
    A quick search indicates that just about every client and server supports 2 kb urls, which makes for a gigantic space of distinct queries. I use the b64 technique and it's great, save the cryptic url.
Mo Binni • Mar 02, 2015

This is relevant to the article but also just a general question. Let's say you have a GET api endpoint with which you retrieve a certain amount of non-changing data, such as a list of partners you work with. How do you leverage caching this request to not always leverage the query to execute and thus create load on the server?
- Evert • Mar 03, 2015
  
  The easiest? Put a HTTP proxy in front of it! There's plenty that do this out of the box. Squid and Varnish come mind, but I'm sure there's others.
  - Mo Binni • Mar 03, 2015
    
    Thanks, seems legit
Simon Wood • Mar 02, 2015

You solution detailed in "An alternative approach" is clearly the correct approach!
The only issue I have is GET to /queries/1 would return details of the saved 'query resource'. To get the results you would query /queries/1/results or some other path.
POST to /queries should create a new query.
GET, PUT and DELETE to /queries/1 would interact with or return the query resource
GET to /queries/1/results would return the results resource that corresponds to the saved query response.
Great response to the original Dropbox post.
yehosef • Mar 02, 2015

I think the ideal compromise would be to make a post request to a "/queries" endpoint. It would return the results and the "location" header with a url with a sha1 (or whatever..) token which represents that request. It you didn't want the keep the token=> request body around forever just give it an expiration. Similar to the EVAL/EVALSHA in Redis (if EVAL would also do a SCRIPT LOAD)
OMG_wtf • Mar 03, 2015

I have to agree with you that this is not correct use of the HTTP standards and I think that people at Dropbox are aware of this. I would say that your solution is nice and clean. But even the article mentions this:
"We could have somehow contorted /delta to mesh better with the HTTP
worldview, but there are other things to consider when designing an API,
like performance, simplicity, and developer ergonomics. In the end, we
decided the benefits of making /delta more HTTP-like weren’t worth the
costs and just switched it to HTTP POST."
Which makes perfect sense to me. Basically the drawback which is introduced by your solution is too much for them and they probably considered solution like this.
Anyway good blog post!
orliesaurus • Mar 03, 2015

When I attended a dropbox dev event, they did say out and clear that their API isn't fully RESTful, now I understand more why...great post =)
Pierre • Mar 03, 2015

s/vain/vein/
SelectaSound • Mar 03, 2015

Great article full of detail.
Alexander Weber • Mar 03, 2015

I don't think your suggestion does make sense. First, by splitting the request from the response you introduce a security risk, because now the result of the query can be accessed by everyone knowing the response ID, especially for private data I would not like that. Apart from that the server now has to keep the result available and is going to be bloated with useless data. How long should the results be kept? You can't drop the result after receiving the first GET request, because in that case there is no sense for the GET at all, because any second request made to the same response will fail anyway. So keep it for 90 seconds? What if the user has a very slow internet connection? Keep it for an hour? With 100,000 requests/s you're kind a dead within minutes. So that doesn't scale at all. The next problem is that either you need a database or any kind of storage in the background to keep the data or you need to ensure that the GET request comes to the same host that the original POST got to; so you either need different domains (sub-domains) or a layer 7 load balancing, what you might not need for any other purpose. Another problem is the dramatically increased latency, in fact you double the latency, for a satellite connection this means that you may add up to one second to your response time. And why all that effort? Just to support an idempotent request for something that is, due to its nature, not idempotent. The reason why it is not idempotent is volatility. In fact the resource you query is modifiable and not static, therefore there is no guarantee that the same request will always return the same response (concurrency). So I think what Dropbox did makes sense and I agree to their statement. My 2 cent.
- Rob Johnson • Mar 03, 2015
  
  I would have imagined that you still need to attach a security header for both the query POST and the result GET.
  Secondly, I would have also imagined that the query would be stored, not the actual results of that query. That would mean that the results can be cached appropriately when accessed and would not cause the scalability issues you've mentioned.
  I agree with your comments on latency, but it feels like a necessary evil for me.
  - Alexander Weber • Mar 03, 2015
    
    If you store the query, then the result is no longer idempotent, because the seconds GET might return a different result as the first one, because the data is volatile, right?
    Apart from that you anyway must persist the query somewhere, either local (in memory) or in an distributed storage. If you keep it around like that, you need some form of garbage collection so you produce garbage, what scales very bad.
    It is a security issue as well, because you need either to perform authentication and authorization twice or you need again to persists the authentication and authorization response, which opens up further security issues in your back-end and removes some advantages of stateless communication.
    Another point I see is that caching as a man-in-the-middle was always a bad idea (so by proxies), especially if it comes to resources that require authentication. Therefore, when using authentication, you should use TLS and any caching advantage of GET is directly gone.
    - Rob Johnson • Mar 03, 2015
      
      You're right, the data will not be indempotent, but as you said, it's not by its nature. The queries could be kept in something like a redis cache or memcache, with TTLs, to address the scaling issue, but I agree about the man-in-the middle and the use of proxies.
      - Alexander Weber • Mar 03, 2015
        
        But what do you gain by using GET vs POST, except for a huge amount of disadvantages? Is there any advantage, just one?
        
        Rob Johnson • Mar 03, 2015
        
        Perhaps it is just the one, but that one is following standards. See http://www.w3.org/Protocols... and see the definition of POST
        
        Alexander Weber • Mar 03, 2015
        
        And how does using POST not follow the standard? Where in the section POST does the standard mention something that prohibits the usage of POST for an API call like for example the /delta request? I don't get the point I think.
        
        Rob Johnson • Mar 03, 2015
        
        POST is not for GETting data.
        The below is an abstract from the W3 article:
        ----------------------------------------------------
        POST is designed to allow a uniform method to cover the following functions:
        - Annotation of existing resources;
        - Posting a message to a bulletin board, newsgroup, mailing list, or similar group of articles;
        - Providing a block of data, such as the result of submitting a form, to a data-handling process;
        - Extending a database through an append operation.
        
        Alexander Weber • Mar 03, 2015
        
        And you think that this list is exclusive, so nothing except for this, word by word, is allowed for POST? I don't understand the RFC like that, I would say this is a much more common specification that should fit much more cases apart of these few examples. Even while this is now a bit academic, I rather read this:
        "The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line
        ...
        The action performed by the POST method might not result in a resource that can be identified by a URI.
        ...
        The actual function performed by the POST method is determined by the server and is usually dependent on the Request-URI."
        And from that I would assume that the /delta interface may be accessed as well using POST, because: The entity posted is a new subordinate of the resource (/delta), it is a new query. The result is nothing that can be identified by URI and the actual function preformed by the POST method is determined by the server (executing the query).
        
        Rob Johnson • Mar 03, 2015
        
        I think that a RESTful API should try to adhere to standards and use the HTTP method verbs as they were intended (http://en.wikipedia.org/wik...
        A query to filter resources is not the same thing as creating a resource. From reading the definition of /delta surely to POST and create a subordinate would be to create a new instruction, not a query to filter the collection.
        
        Alexander Weber • Mar 04, 2015
        
        There is no such thing like RESTful or REST. These are just buzzwords. You will not find any RFC that defines REST. So to say, the HTTP protocol clearly states:
        "The POST method requests that the target resource process the representation enclosed in the request according to the resource's
        own specific semantics."
        See: http://tools.ietf.org/html/...
        It does not require anything to be created nor does it require anything specific as you say. It contains the examples you mentioned, but these are just examples. So Dropbox is compliant with the HTTP protocol and therefore with the corresponding internet standard, IMHO.
        
        Julio Bastida • Aug 15, 2017
        
        I don't think you see your opinion as humble IMHO
        
        Evert • Mar 03, 2015
        
        One of the core features of the web: you can link to something that can be retrieved using GET.
    - Evert • Mar 03, 2015
      
      Your definition of idempotence is wrong, because by your definition the result of any GET request can't change for it to be idempotent. A 'query' resource is no different from any other resource.
      It is definitely possible to use a trusted proxy to do https requests. It doesn't have to be a traditional proxy, but it can just be an agent that forwards requests on behalf of you and does the appropriate caching.
JonRimmer • Mar 03, 2015

Your alternative suggestion would require two HTTP requests in serial instead of one though, right? If I'm using the API from a device with a poor connection, like a mobile phone, then I've just doubled what already might be a second or more of latency. Personally, I'd choose a better user experience over theoretical purity in this case. The REPORT verb looks interesting though.
- Evert • Mar 03, 2015
  
  I agree that an extra request is one of the drawbacks, although good RESTful design goes well beyond 'theoretical purity'. By calling it that you immediately shut the door for considering it a valid design, and you make it hard to have an actual conversation about it.
  - JonRimmer • Mar 03, 2015
    
    I don't deny it's a valid design. But I feel there is a difference between good RESTful design and perfect RESTful design. With POST queries you lose cacheability, but is the cacheability of such complex queries important? Is it worth the necessity to persist queries as a separate resource, and to introduce extra latency? It might be, but if forced to choose, I would make the decision based on the resulting user experience and not the RESTful-ness of the API.
Eric • Mar 03, 2015

I don't understand how using POST for a complex query in not RESTful.
I can't find anything where Fielding says not to do this. I've read several blog posts and listened to him give speeches at conferences where he says POST is fine when there isn't a good alternative among the other common verbs.
What am I missing?
Greg Sohl • Mar 04, 2015

Doesn't the alternative approach require a stateful server? Would this violate the REST constraint of statelessness?
- Evert • Mar 04, 2015
  
  It's as much stateful as a regular POST/PUT is. If you issue a method like that, you change the server state, and this will cause a different response to be returned after a subsequent GET.
  So, no... this behaves exactly like any other resource.
Zdenek • Mar 07, 2015

Great post & discussion!
Eric Lubisse • Mar 11, 2015

Great post! I like your alternative approach to the problem. Interesting discussion as well :-)
orubel • Mar 18, 2015

POST is fine. PROXY should not be the intermediary. preHandler/postHandler or preFilter/postFilter (depending on your framework) act as mediarys. This keeps the distributed architectured singled threaded and CPU bound.
Henrik Kindvall • Oct 16, 2015

When developing a REST-api to be provided for customers to implement in their custom projects, in my case my customers are expecting the API to be fully functional for them to start querying objects. If I were to provide them with your alternative approach, telling them to create queries which can later be found and queried by looking in the response header, my guess is that they would be dissapointed I don't have an api that out-of-the-box provides the relevant queries they need. (They are expecting me to provide all the relevant queries needed to get the information needed for the service).
With this in mind, I could create the queries before handing them the api , and mention the url-locations (queries/1 , queries/2 , queries/3 aso..) in the api-doc. However this makes me write static documentation on queries that gets rendered on run-time. And this gives me a documentation hell.
Thanks for an interesting article!
- Matt Welke • Sep 20, 2017
  
  I think a potential solution to your problem would be to hard code the queries in the application (as you would if you were developing a hacky POST endpoint for querying for each query), use PUT for the query resources in the author's alternative approach, and have your application "upsert" these query resources every time they are to be used.
  The server will only create the query resources when they haven't been used yet. Either way, given a successful response code (indicating that the query was either already there or just created), your application can then perform a GET on "queries/:id/results".

Dropbox starts using POST, and why this is poor API design.

The problem with POST

Using REPORT instead

Using GET with a request body

Why real GET requests are great: addressability