Tuesday, 14 August 2007

What is Shrew?

Shrew: a small insectivorous, mammal; one of the descendents of the first mammals to evolve. Related to voles and hedgehogs. They are not rodents, and are not in related to mice and rats.

It's also an animal name containing the letters S, R and W: I use animal names for my project code names, and I needed this one to contain those three letters. The R and W are for REST and Web Server respectively, the S is for Scheme. I'll get to that later.

The idea of Shrew is to build a web server that makes the creation of RESTful web services easy. Popular web servers are still designed as file servers: a web server exposes a directory tree, and then allows certain files within that tree to be executed on the server, to generate the content to send to the client. This encourages (even forces) web applications and services to be built around files. RESTful services are supposed to built around resources. These resources are meant to be the logical 'things' that exist in your service: not files. You don't want to have ".aspx" or ".jsp" or ".php" appearing in your resource URI. It should look something like /person/visit/730. And one of the rewriting modules should not be how you have to get there.

Shrew will make resources the centre of an application or service. The resources exposed will be listed and mapped onto a URI form, and also a piece of code to handle that resource. Shrew will also make no attempt to pretend that an application is not running over HTTP. All request headers will be available to the application; the application will be able to override the generation of response headers on a response-by-response basis; the application will be able to specify particular response codes.

Most importantly, dispatch of incoming requests will be performed not just on the request URI, but also the request method: GET, POST, PUT or DELETE.

The point of providing this level of control is to allow applications to be properly RESTful. For example, look at the Etags header. When a request for a URI is first served to a client a web server can generate and attach an Etag header. If the client requests that resource again, it can include the Etag header in the request. The server can then use the received Etag to determine if the resource has changed and needs to be re-sent. Sounds fantastic, right? No need to re-generate a complex page on the server, or send down a large amount of HTML. The server reduces its load, and the client can redisplay faster. A popular web framework even automatically implements Etags for you. Everyones' happy.

Not so fast. That popular web framework generates its Etags by taking an MD5 hash of the page before sending it to the client. This requires generating the full page, everytime. Even when it hasn't changed. It saves the bandwidth, but not the server processing time. To do Etags properly the application needs control of headers, it needs to be able to match the response to a particular resource.

The web changes rapidly, there could be other valuable headers coming soon, there could be interesting and unique ways of using existing headers. Instead of trying to forsee all those cases, Shrew will simply provide default implementations for headers, but allow an application to override those.

Of course, there's more to web applications than requests and responses. Part of the reason those rewriting solutions are distasteful is that the generated pages needs to use URIs in the rewritten space, not the developers directory space. Shrew will allow resources to link to other resources, by name and id, generating the correct URI at runtime.

Shrew will also include a library for writing markup. This will not be template based. The markup will be written in a Shrew DSL and then executed to produce the HTML. Inspired by Markaby. This markup will be renderable as HTML for serving to a browser, and also as XML for serving to a web service client. This will probably require some hinting, as large parts of the HTML will not be required in a XML view. It's going to be interesting to get that to work...

Finally, in my introduction to Shrew I mentioned some unusual technology choices: the whole thing will be written in Scheme. This is the other half of Shrew as learning project. As well as really getting a handle on REST, I want to learn Scheme. I can read Scheme and write small projects in Scheme, but there's nothing like writing something large for really getting a feel for a language.

In particular, Scheme's killer feature to me is 'data is code.' Not that code is data, nor the macro system, nor that it's a functional language. I'm really interested in exploring the data is code concept, and a web application/service framework seems like a good place to try that.


Adam said...

I'm intrigued by this REST concept.

I'd like to know what other developers have chosen as URL for stuff like shopping cart pages or summary pages that change depending on your session? Or search result pages... do you assign each search an id and then allow the user to go back to it?

All solvable but you don't want insanely long URLs? Or do you?

Giles said...

I'm still exploring REST, but I can say that the point is to definitely have short, readable, fixed URIs. Amazon could do something like:


The idea is to use the full feature set of HTTP: headers and all the different verbs. In the above example, the user (or service client) would perform a PUT on the /cart URI to create a cart. The service would then send a link down to the client in an XML payload. Something like: /cart/54235. You then work with the cart using that URI. This allows past carts to be retained forever.

There are equally interesting tricks to do with transactions.

Basically, it's competition for WS-* web services. The RESTful approach is to model how the human readable web of documents and links already works.

By the way, WS-* have just died. Admitting how complicated they are, the standards committee has now split into six sub-committees intended to simplify the specs. Ha!

I'll let you know more as I get further into Shrew.