aether is nice, but it's a bit slow, especially when many local files
must be parsed to produce a single page. cache.cgi is a simple program that,
in cooperation with any filesytem-based dynamic website, can serve from a
cached copy of the page when it is appropriate to do so.
This version of cache.cgi is very
experimental—I'm not even using it on my own site yet.
The principle of cache.cgi is simple: For GET requests that do not include
a query component, it checks for the existence of a cached copy of the page.
The "cached copy" consists of two files: _cache is a copy of the page
with headers, and _dep is a NUL-separated list of files or directories
that the page depends on. A cached copy is valid if every file named in
_dep exists and has a timestamp older than _dep.
When the cached copy is still valid, it is used. When it is not, the real CGI
is invoked. The real CGI must write the _cache and _dep files.
The real cgi should take care that these files are never incompletely
written, or concurrent requests can get the wrong results.
The "real CGI" is set at the top of cache.cgi. It names a file on the
local filesystem, not a URL. If the "real CGI" is /var/html/index.cgi
then the cache is /var/html/index.cgi-cache, joined with
PATH_INFO, joined with _cache or _dep.
A few wrinkles:
- URL components must not begin with an underscore
- When PATH_INFO is empty (eg the URL is
http://www.example.com/index.cgi) the name __index__ is
used to locate the _cache and _dep files.
- When the page depends on a file that does not exist, _dep must
list the innermost containing directory that does exist.
- HEAD requests are supported by chopping the contents of _cache
after the first '\n\n' sequence
The exact format of _dep files will probably be made more complicated in the
future. This is to cope with these anticipated problems:
- A "max age" limitation. If you have a sidebar that is generated from an
RSS feed fetched over HTTP, you probably fetch the page once a day and
cache it yourself. The _dep file will list the local copy of the
feed, which won't change until the page becomes outdated for some other
reason. "max age" fixes this
- A "time generated" marker. Right now, the timestamp of the _dep
file is used. As with make, this creates a race condition between
writing the _dep file and modifying a file listed in it. If the sequence
of events is
1. file read
2. file modified by another concurrent request
3. _dep and _cache written
then _dep is newer than file but _cache is based on
an old version of file.
In a testing setup, a moderately complicated aether page takes 600ms to serve,
calculated by ab -n 10 http://www.example.com/index.cgi/. The same page
takes 30ms to serve when it is cached, calculated by ab -n 10
http://www.example.com/cache.cgi/, a speedup of 20x.