xmlpages: April 2007

Monday, April 30, 2007

wrapping up http caching

after pouring over the http 1.1 docs and a number of other articles and books, i'm pretty sure i have the basic caching pattern down:

PUBLIC CACHING

to reduce traffic, use expiration caching (expires for 1.0 and cache-control for 1.1) to tell public caches to serve requests without talking to the origin server.

to reduce bandwidth, use validation caching (last-modified for 1.0 and etags for 1.1) to tell public caches to use conditional gets to the origin server who can return a 304 instead of returning a full copy of the resource.

so this is all cool. i have this built into the ws-loop now. if you are doing a GET and the class generating the resource has max-age, etags, and/or last-modified set, caching headers will be sent with the response. and whenever a class is handling a GET, it will check for if-not-modified and if-none-match and return 304 if the resource has not changed.

LOCAL CACHING

so that's the public side of things. under the hood, the ws-loop is creating a local copy (on disk) of the requested resource and using that to serve up to clients as long as the file-date is not stale (based on max-age).  so, while the class still has to respond (no direct file delivery here), it will still deliver a disk copy of the resource for as long as it is directed. that's cool.

GZIP/DEFLATE

now the only other item that could be intersting would be gzip and deflate.  i need to look into how to implement that for dynamic resources (when supported) using the .net 2.0 compression classes.

CLEAN-UP

once that's done, i have some housekeeping to do with the ws-loop. cleaning up naming, some constants, implementing the ws-loop at the server root using isapi rewriter, etc.

once the clean up is done, some other niceties like moving the user/auth work into the db instead of xml and maybe making caching an external config insteado code work would be good. then it's back to building REST-ian apps in general.

Saturday, April 28, 2007

implemented basic caching for the rest server tonight

after adding a simple write-to-file routine for the GETs in my REST server, i also implemented support for ETag, Last-Modified, and Cache-Control max-age headers.

now, when a resource is generated, if the max-age is over zero, the results are written to disk and caching headers (etag, last-modified, and max-age) are written to the client.

in addition, when a GET is received at the server, the routine can check for If-None-Match (for etags) and Use-If-Not-Modified (for last-modified) request headers and, if possible simply return 304 instead of regenerating the request.

none of this is marked public yet. i still want to work on some details of the cache-control header including things like must revalidate, freshness, and no-cache directives for cookie-d requests, etc.

still, the server is getting more robust and generating cache-able content. and that's good!

beat a perf challenge today

last night i added support for xhtml strict doctypes in my html outputs from the rest server project. i immediately saw a serious perf cost to doing this - wow!  it bugged me all night long and it was only this am when i finally had the time to work through the problem.

turns out that the cost is not in the transforming of the xml into xthml via xsl where the cost was - it was in reconstituting the output of my transformation from a string pile back into a new xml document (long story on all that...).

anyway, since i need to output the final product (xml, string, strong object type) as a serialized string pile to the browser, i just added some smarts to the base class that handles the http response to all for more than just an xml document or C# type. now it handles a string pile, too.

*big* improvement! now i know i'm able to generate strict xhtml and i still have solid performance.

next for me is to implement a local caching pattern that allows for support of public caching servers. it's all about the GET this weekend!

Friday, April 27, 2007

tim ewald has a lightbulb experience

I finally get REST. Wow.

The essence of REST is to make the states of the protocol explicit and addressible by URIs. The current state of the protocol state machine is represented by the URI you just operated on and the state representation you retrieved. You change state by operating on the URI of the state you're moving to, making that your new state. A state's representation includes the links (arcs in the graph) to the other states that you can move to from the current state.

Wednesday, April 25, 2007

share nothing architecture

as i work toward a rest-ian model for my web apps, i am reminded of a simple rule of thumb when implementing web content: share nothing.

that means don't assume cookies or session that 'binds' a request or series of requests to a web server. as long as each request is completely stand-alone, it can be delivered from any server that has a copy of the requested resource.

again - thinking in terms of resources (not objects or pages) is the key. when a user makes a request the origin server will need to resolve the request into a (stand-alone) resource. once this is done, that resource can be stored - anywhere (including a third party caching server. then this can be replayed from the stored location.

the only tricky part there is aging the cached item. since i want to support third party caches, i can't rely on local 'dfity bits' to clear the cache when data changes. besides, that's a kind of 'not-shared-nothing' approach! instead i need to set a maxage value for caching and/or use an etag model. that way, when requests are made, the cache (local or remote) can properly sort out the details.

when we talk about third parties, things like forced validation and other issues will come into play, but i don't need that right now. what i need to focus on is a clean and simple private caching pattern using maxage and tags. then i can move out from there to public caches.

again, the key is to make sure i use the 'shared-nothing' approach when composing a resource. then it's easier to replay.

auth - now that's a diff story...

Monday, April 23, 2007

crockford made a good point about compressing js files

his major beef with the common js compress utils that remove whistspace *and* obfuscate the code by renaming vars, etc. is that the second step has the chance of introducing bugs.

i know that some of the compressed files i work with always *bork* firebug or ms script debugger, too.

he also pointed out that using gzip is a big way to reduce bandwidth - good point.

i guess i need to add a quality whitespace remover and some gzip 'judo' to my list of things to do.

Sunday, April 22, 2007

a new fix for the msie caching bug

ran into trouble with my ajax html client running in msie (7 in this case). it *refused* to refresh the pages it just edited! in other words, xmlhttprequest was not used to get the new data - msie served the data from a local cache instead!!!!!

i've sen this before and the most common fix is to add random data to the end of the URL to convince msie that this is a new url:

http.send("get",url+"?"+Math.random(),true);

it works, but is messy. i dug around for a while and turned up a posting on the msdn site that talked about this issue in some detail. the key point, msie honors a custom cache control header extension that allows web servers to control this odd behavior:

Cache-Control: post-check=[sec],pre-check=[sec]

both values are in seconds.

the deal is this:

when ie completes an http get request, it places the results in a local cache. then next time is is about to make the same request, these two values come into play. if the copy in the local cache is younger than the pre-check value, msie will deliver the data from the local cache. *also*, if the copy in the local cache is *older* than the post-check value, msie will (in the background) get a copy of the new resource from the server and save it do disk. that way the *next* time the page is requested, it will be newer, but it will still come from the local cache.

convoluted, but efficient.

so i added the following cache-control header to all my output:

CacheControl: post-check=1,precheck=2

the good news is that this worked like a champ. no messing with the url means i can better support third-party caching servers and still get msie to behave properly.

there are a handful of details to setting this up properly, so it's worth checking out the page at msdn.

built my first client for the rest server

i spent a couple hours building a simple CRUD app that allows me to manage my jogging log entries.

it went without much fanfare. i created a single page (/running/edit/index.html) that handles all the actions via ajax (no direct posting). this means there is nothing bookmarkable, but that's fine for the editor.  once i got the pattern down, it feels a lot like windows programming (wow!).  the ajax library i am using is from the ajax patterns book/site - highly recommended.

the basic implementation involves two uri:

/running/edit/

/runlogs/

the running/edit uri holds the single html document that does all the work. powered by lots of js and some key libraries (base2 & mozxpath along with my version of the ajax pattern lib). the runlogs uri holds the actual data. this is served via my rest-server. plain xml from a db table in this case.

i also made sure to secure the /edit/ uri to force authentication. that worked fine, too.

i learned a handful of things along the way. this is a dirt-simple editor. it's also rock-solid. fast, too. a good user experience.

of course, i have no meaningful css here and the gross-level ui experience is very basic.  anyone with decent design and ui skills could make improvements.

but the point here is that i was able to quickly build the full editor and it all works nicely. yay!

Saturday, April 21, 2007

cookies, session state, and rest (again)

now that i'm wrapping up my initial round of implementing a REST-ful web server coding pattern in C#, i am staring directly at the whole "cookies are nor REST" issue and working to resolve it as pragmatically as possible.

i even printed a copy of roy fielding's REST dissertation and amd working through that parts that contain his comments against cookies and session state.  on the surface, they make sense. basically cookies and session state can have the effect of controlling the way a URI is resolved (use the session state for filtering, the cookie for personalization of the page layout, etc). this means the resulting document can't be reliably cached or played back for other users (or possibly the same user). ok, i get that.

as a data bucket

so, my next round of thinking is *why* we got into the habit of using cookies and session state. first, they often are shortcuts - nothing more. i can stuff a value in a cookie and carry it around for a while (say, during a multi-page shopping experience) and then use it later. i can do this same thing using a server-side session bucket.  of course, the hitch on implementing server-side session state is that i need at least one cookie on the client to reliably link the client and the server session data.

as authentication

another common way to use cookies is to use them to handle authentication of a user. in other words, once a user presents credentials, a cookie is written to the client. then for every request, the server checks the cookie to make sure it's valid. you can even timeout the cookie to make sure users who leave their client unattended will eventually be 'de-authed.'

as an identifier

also, cookies are often used simply as an identifier. once a user logs in  (say using basic-auth or digest-auth), a cookie is created and passed to the client. this identifies the client for all susequent transactions. usually this is to help with tracking the user's actions on the site. sometimes the identifier is just a random value (commonly referred to as a session id) that is used purely for tracking purposes. it is then possible to playback a session using archives transactions kept on the server.

ok, data bucket, authentication, identifier...

auth not needed

i am working to use only basic-auth, and digest-auth for authentication. there is enough support in asp.net including httpmodules and the ability to access user and principal objects to make that all work consistently. i'm confident i don't need cookies for authentication. i just need to accept that the web pages will occassionally popup the 'rude' browser auth dialog<sigh>.

data bucket, i think i can deal with this

i understand the point, but need to noodle on this for a bit. some trival examples on the web invovle creating a we resource at a 'disposable' URL that a client can use as a data bucket during a session. i can see this working via ajax, but am not clear on how to implement it in a more traditional server-side html page environment. again with the 'composer' issue. i don't want to compose pages on the server that contain non-replayable, personalization, or private data that might end up in the cache. i need to work on it, but i can see the possibility.

identifier - i still think i need this

first, i've started implementing a simple session cookie that i use to track transactions in the session and to prevent simple replay issues ( i use some browser agent data as well as a random key). finally, i use a caching trick to timeout the sessino after x minutes of inactivity. by doing this, i can flip a flag in the code that will clear any personal data, start a new session, and force a new auth dialog (if needed). so i kinda really need that<g>.

second, while this random data helps me keep track of transactions by a client as well as timing out a client session, i still don't really know *who* this client is. for that, i think i need at least a 'friendly' name cookie or something.  not sure why i really need this, but i have a hard time letting this go.  the biggest thing that junps out is that, when using basic-auth, any non-authed pages are missing the auth data entirely. i suspect the same is true for digest-auth, but i'm not positive. so i think i need a 'name' cookie<sigh>.

as long as the session cookie and (if included) the name cookie are not used to control the content of the resource they are safe to use. it's only when state data is used to change the resource representation in 'hidden ways' that things are bad (non-cachable).

i think i need to check into ways to use caching controls to better track how a resource is dentified.

finally, i read a scary 'aside' while digging into this whole cookie-battle. it was regarding auth. something like 'tossing authenticate' headers around screws up third-party caching. hmmm... kinda makes sense. and is depressing.

ok, that's all for now. i'll soldier on.

can't forget ssl

while i continue to focus on getting the plumbing working for authentication and authorization in my REST server, i gotta remember that support for SSL should be on my list, too.

first, since i've only implemented basic-auth, ssl would be essential for any app that will live 'in the wild.' second, even w/ digest-auth added (sometime soon), sll will be very desirable from a privacy standpoint.

that also brings up the issue of supporting https patterns. i suspect i'll do this using an isap rewriter - not within the app loop itself. too much thrashing in the code, i think.

finally, i need to get ssl installed on my workstation. i think i can use msft selfssl package for starters. i should also get a (cheap) semi-secure ssl to install on the public machine.

ok - nuther thing to add to the list then.

major advance on the security side

i spent some time late last night and this am getting the next round of changes for the auth service. i now support a permissions model as well as a list of secured urls for the app.

HasPermissions(uri,httpMethod)

now, when the request is first received, the server will check the user's permissions collection and match the permission record with the requested uri. that will return the list of allowed httpmethods for this uri. sweet.  no more pinging the roles within the actual controller code. this is all done 'under the covers' before the controller even sees the request.

Secure Urls

i also added a way control which urls require authentication. for example, you might be able to see the home page just fine, but need to log in before you can edit your blog. again, this is all controlled by a list of path regexp patterns. the system will find the first match and then return "true" or "false" on whether authentcation is required to continue.

also, this is done completely under the covers - no controller code needed.

Guest

also, when auth is not required, the system will automatically load the credentials for the "guest" account. this makes sure that every request as a set of valid credentials to work with. it also allows me to control the access for guests. again, they might be able to GET, but not POST, etc.

Sessions

finally, i did some additional work on the session validation this am. now, i keep track of the session timeout (sliding value of 20 min for default). when the session has timed out, i refresh it (based on the existing 'old' cookie) and force a re-auth of the user. if this is a secured url, the user will be prompted again - nice. if it's a non-secure url, the guest account is refreshed quietly.

Wrinkles

first, i am doing this all w/ basic-auth. not a big deal, i think but i still need to implement digest-auth sometime soon. i'm hoping it will be relatively trivial to and branching code for that.

second, since each url will be auth'ed as either the prompted user login or the "guest", it's not clear how i'll be able to know what user is actually doing what action - esp. in un-auth'ed situations. while - security-wise - this is no big deal (i'll always have a valid security context for each request), it will make loging and some personalization a pita. i think i need to salt the session cookie with some other info that will make it easy to know who is here. once a user is prompted for a login, i should store some identifying data in a cookie for easy access (both server- and client-side). another job to do...

Caching

yeah, i did a lot of caching this am, too. i cache the list of user permissions, the list of secured urls, the session cookie (for timeout purposes), and i also cache each authurl that has been requested. this should keep the looping to a minimum at runtime.

well, that's a lot for this weekend - at least on the infrastructure side. i would still like to fill out some UI and general user cases to make things look like a real app/web page.

dealing with composing the html is the next big hurdle. i would like to use xml/xsl - even xinclude. but i need to look carefully at performance an other issues. as usual, a solid javascript-powered ajax page would always work well.

Friday, April 20, 2007

completed initial basic-auth implementation

i finally finished off the basic-auth implementation tonight. while i had the basic-auth request/response working, i had to user/password store and validation working. now i do!

this first pass uses a simple xml storage pattern with user/pass along with a list of associated roles for that user. the details are loaded on the first validation of the user and kept in cache throughout the session. i can now add a permission check at the top of each web method to check the role of the current user. if the role check fails, i return a 403 - sweet!

next step is to move away from site-wide role-based model and go straight for a uri/http-method model. the user store should have the uri (actually a regexp that can resolve to one or more uri) and a list of allowed actions (get, post, put, delete, head, option, *=all, !=none). this can all be done within the security loop *before* ever getting to the http handler code that implements the method (get, post, etc). that way, the entire security details (authentication and authorization) are outside the handler entirely.

need to do some work to grep the details of building a list of regexps for uri and a way to cleanly load and walk these uris at runtime.

of course, once the uri/action pattern is solid, i can implement a version of digest-auth, too!

Monday, April 16, 2007

initial solution for the composer dilemma

i've arrived at a tentative decision on the whole 'composer' dilemma regarding authorization and caching on the web. basically, the point is this:

you cache data, not apps.

or, put another way:

decide if your web page is data or presentation. if it's presentation, then it's not cache-able.

while this is probably artificially rigid, for now, it keeps my head straight.

i am also starting to think about making all 'web pages' apps (or app-like thingies<g>). for example, a blog home page is really a tiny app that presents the most recent post, a list of the last ten posts, and a blog roll. those are three data items (fully cache-able via REST-like URIs). the home page of the blog is really just some html and a handful of javascript ajax calls to retrieve the appropriate data. see? the web page is an app, right?

btw - if the web page is really just some html and scripts to get data, then that web page can be safely cached, too, right?

hmm.....

added support for timeouts on ajax calls

i've been using a very nice xmlhttp 'starter' library from the ajax patterns site. while it's a good library, it's not got all the bells and whistles. one item missing is handling time outs for long-running calls from client to server.

so, keeping it 'in the family' i decided to employ another pattern from the same site. now my library has a solid in-built timeout pattern as well as the ability to register a custom function to handle the time out event.

along the way i decided to adopt the rule of *not* using the .abort() method of the XMLHttpRequest object. while i see some strong debate on this matter, my testing has shown MSIE and FF behave differently when abort() is used. this, alone, along with tests that show invoking abort is not necessary, convinced me to drop it from the library.

Saturday, April 14, 2007

add support for basic auth today

i added the underlying support for basic-auth to the rest-based server system today.  i added session services. it's a simple cookie-based service, but it will make things easy to deal with. i plan on adding support for digest auth tomorrow. that would round things out nicely. 

i still need to implement a user store and a role-based security model. while this will not be super simple, getting the auth-right is the first step.  i also need to establish a uri table. this can hold the auth and role details for any/all uris as well as details on transformations and caching.  i have the basic ideas, but still need to come up with the details.

generally, it's going well, tho.  in fact, i am working on another side-project this weekend and find that my mind is now already thinking rest-like. this current project is typical RPC/API style and i'm having a bit of trouble keeping on track with the project's details.

i think that's a good sign, eh?

Friday, April 13, 2007

REST, HTTP, ACL

reviewing a couple documents, including the W3C docs on their ACL implementation points out that ACLs are applied to resource URIs - makes sense.

the W3C model has a database that lists URIs and their ACL grant. the work of matching the identity->role->grant details is done on the data side - nothing too magical, but would require a bit of 'busting' to get it right, i suspect (including granting for a single resource URI, for a resource URI collection [folder], inheritance downward, etc.).

i am still struggling with the issue of composers. not just the example of an xhtml home page, but also a simple list of resources. if a URI GET results in a list, is that list composed of only resources available to that identity/role? if that's true, does the same URI GET result in a different list for another identity/role?

[sigh] i think i'm missing something...

Thursday, April 12, 2007

groking REST

when designing a cache-able REST-ful implementation, two things are getting in my way:

- composers

- role-based security

i plan on tackling the composer issue first.

composers

composers are resources (usually xhtml pages) that act as mini-aggregators of the underlying resources. for example, the home page of a blog site probably has the most recent two or three posts in full; a list of the last ten posts (in title/link form), a list of the site's blogroll and maybe other data. this resource is a composed document made up of other resources. how do i make sure that the home page is replayable, reliably ache-able, and easily updated as the underlying data changes?

serer-side composers

there are two basic ways to handle the composing pattern. the first is to complete the composition at the server and then deliver the resulting cache-able document to the client. this works fine, but updating the content of this cached document now gets a bit tricky. to make the cache as up-to-date as possible, any change to the underlying resources (blog posts, lists of posts, blog roll, etc.) needs to invalidate the home page resource cache.

client-side composers

the second method is to complete the composition on the client - usually via ajax-type calls. the advantage here is that the composer itself is fully cache-able and will alway honor the cache status of the all the underlying resources. the downside is that it assumes much more 'smarts' on the client (will this model work on cell phones? other devices?).

it is certainly possible to create cache dependencies for the home page that will honor changes to the underlying resources. but now this creates a new set of management routines to handle not just the caching, but also the compose dependencies.  another approach is to simply make the composer resources cache for a fixed time (5 minutes, etc.). this simplifies the cache management, but risks clients getting 'stale' home page documents.

my first approach will be to use the client-side composer model. this allows me to focus on keeping the underlying resource implementaion clean and cacheable for now. once i get the hang of that, i can focus on creating a workable server-side composer model that can be cached efficiently, too.

xmlpages