I did up some basic ETags and stuff last night, and now working on finishing up the file serving features. There's going to be some basic caching of the headers/stat/file junk in the server, and I'm going with a policy that etags are fixed at: Etag: mtime-size So this is what a response looks like: --------------------- $ curl -i http://localhost:6767/tests/sample.html HTTP/1.1 200 OK Date: Wed, 14 Jul 10 17:40:41 +0000 Content-Type: text/plain Content-Length: 9 Last-Modified: Mon, 05 Jul 10 08:14:20 +0000 ETag: 4c31945c-9 Connection: keep-alive hi there --------------------- Questions I have are: 1. Should Etags be mtime-size-ctime? Only cost is the extra bytes of transfer. 2. I'm allowing keep-alive if the request is "small", but doing connection:close if it's a large request. This will cut down on hogs, but I'm curious what people think of that? 3. No directory listings OK? 4. Assuming index.html for now alright? This will become an option later, but for now just keeping it simple. 5. Any other "file serving" things you wished were available? Also, if anyone knows of some ninja tricks to get at CRC32 or MD5 hashes the filesystem knows about let me know. I may tinker with using that instead of these mtime-size etags. -- Zed A. Shaw http://zedshaw.com/
* Zed A. Shaw <zedshaw@zedshaw.com> [2010-07-14 19:50]: >I did up some basic ETags and stuff last night, and now working on >finishing up the file serving features. There's going to be some basic >caching of the headers/stat/file junk in the server, and I'm going with >a policy that etags are fixed at: > >Etag: mtime-size Hmm, that looks too weak for me. Apache's approach of also incorporating a file's inode is an approach good enough in practise to guarantee freedom from collisions, even though it suffers its own problems (like the etag validator producing false-negatives on redundant systems with mirrored files). But different files with the same mtime and size are just too easy to generate. But, to be fair, the HTTP/1.1 requirements on etags and strong validators are IMHO too strict to be fulfilled in practise. If I argued in RFC-lawyer-asshole mode, I'd say that even Apache violates the relevant sections in the HTTP/1.1 RFC. >1. Should Etags be mtime-size-ctime? Only cost is the extra bytes of >transfer. I think that adds no value. >3. No directory listings OK? Strictly speaking, directory listings are a relic, and since Mongrel2 is focusing on web applications, I don't think it's a necessity. >Also, if anyone knows of some ninja tricks to get at CRC32 or MD5 hashes >the filesystem knows about let me know. I may tinker with using that >instead of these mtime-size etags. That was my thought, too - if only the operating system already kept a current file hash available for us. But I'm not aware of anything even remotely portable. ZFS probably has something available, as it checksums virtually everything, but ZFS isn't available on any but a few operating systems. Regards, Andreas
On Wed, Jul 14, 2010 at 10:45:36PM +0200, Andreas Krennmair wrote: > * Zed A. Shaw <zedshaw@zedshaw.com> [2010-07-14 19:50]: > >Etag: mtime-size > > Hmm, that looks too weak for me. Apache's approach of also incorporating a > file's inode is an approach good enough in practise to guarantee freedom from Nope, inode breaks when you've got multiple boxes. First thing people do is disable that for mtime-size when they get a 2nd box (or they should). > collisions, even though it suffers its own problems (like the etag validator > producing false-negatives on redundant systems with mirrored files). But > different files with the same mtime and size are just too easy to generate. Well, I think you're confusing what the etag does. I don't find the file by the etag, I find the file, then compare the etags. So, if it's the same path and has the same mtime and the same length, then it's a duck. The only totally unbustable way is with a crc32 or md5, but that I think I'll leave for the module system later on so people can cook up whatever. For now this is the best practice so I'll go with it to get the feature done. > >Also, if anyone knows of some ninja tricks to get at CRC32 or MD5 hashes > >the filesystem knows about let me know. I may tinker with using that > >instead of these mtime-size etags. > > That was my thought, too - if only the operating system already kept a current > file hash available for us. But I'm not aware of anything even remotely > portable. ZFS probably has something available, as it checksums virtually > everything, but ZFS isn't available on any but a few operating systems. I know they have *some* hash internally, just not sure WTF is available. -- Zed A. Shaw http://zedshaw.com/
The important thing is what happens if the etag is wrong. If it forces a re-download of the data, it takes more time but the behavior is correct. If it is wrong and the data actually changed, then it's just plain wrong. Don't you think that a site that cares enough about this will be using a CDN or other server set for static content anyway? Let mongrel2 focus on the dynamic content and not worry about the extreme configuration cases. It should be correct and fast, but scope the solution appropriately. John Aughey On Wed, Jul 14, 2010 at 4:45 PM, Andreas Krennmair <ak@synflood.at> wrote: > * Zed A. Shaw <zedshaw@zedshaw.com> [2010-07-14 19:50]: > >I did up some basic ETags and stuff last night, and now working on > >finishing up the file serving features. There's going to be some basic > >caching of the headers/stat/file junk in the server, and I'm going with > >a policy that etags are fixed at: > > > >Etag: mtime-size > > Hmm, that looks too weak for me. Apache's approach of also incorporating a > file's inode is an approach good enough in practise to guarantee freedom > from > collisions, even though it suffers its own problems (like the etag > validator > producing false-negatives on redundant systems with mirrored files). But > different files with the same mtime and size are just too easy to generate. > But, to be fair, the HTTP/1.1 requirements on etags and strong validators > are > IMHO too strict to be fulfilled in practise. If I argued in > RFC-lawyer-asshole mode, I'd say that even Apache violates the relevant > sections in the HTTP/1.1 RFC. > > >1. Should Etags be mtime-size-ctime? Only cost is the extra bytes of > >transfer. > > I think that adds no value. > > >3. No directory listings OK? > > Strictly speaking, directory listings are a relic, and since Mongrel2 is > focusing on web applications, I don't think it's a necessity. > > >Also, if anyone knows of some ninja tricks to get at CRC32 or MD5 hashes > >the filesystem knows about let me know. I may tinker with using that > >instead of these mtime-size etags. > > That was my thought, too - if only the operating system already kept a > current > file hash available for us. But I'm not aware of anything even remotely > portable. ZFS probably has something available, as it checksums virtually > everything, but ZFS isn't available on any but a few operating systems. > > Regards, > Andreas >
On Wed, Jul 14, 2010 at 06:21:26PM -0400, John Aughey wrote: > The important thing is what happens if the etag is wrong. If it forces a > re-download of the data, it takes more time but the behavior is correct. If > it is wrong and the data actually changed, then it's just plain wrong. > > Don't you think that a site that cares enough about this will be using a CDN > or other server set for static content anyway? Let mongrel2 focus on the > dynamic content and not worry about the extreme configuration cases. It > should be correct and fast, but scope the solution appropriately. I agree! :-) -- Zed A. Shaw http://zedshaw.com/
"Zed A. Shaw" <zedshaw@zedshaw.com> wrote: > I did up some basic ETags and stuff last night, and now working on > finishing up the file serving features. There's going to be some basic > caching of the headers/stat/file junk in the server, and I'm going with > a policy that etags are fixed at: > > Etag: mtime-size > > So this is what a response looks like: > > --------------------- > $ curl -i http://localhost:6767/tests/sample.html > HTTP/1.1 200 OK > Date: Wed, 14 Jul 10 17:40:41 +0000 > Content-Type: text/plain > Content-Length: 9 > Last-Modified: Mon, 05 Jul 10 08:14:20 +0000 > ETag: 4c31945c-9 > Connection: keep-alive > > hi there > --------------------- > > Questions I have are: > > 1. Should Etags be mtime-size-ctime? Only cost is the extra bytes of > transfer. ctime is not possible to synchronize across multiple machines if you're load balancing, so stick with mtime-size. > 2. I'm allowing keep-alive if the request is "small", but doing > connection:close if it's a large request. This will cut down on hogs, > but I'm curious what people think of that? I've been debating that myself. I think it's alright to always enable keepalive for idempotent requests like GET/HEAD. Maybe clients can be smart enough to make a decision to fire off a GET request in another connection if it notices a large response being sent.... Mainstream browsers double the number of parallel connections if it detects keep-alive is off, so in some cases it can lead to better performance if there are large transfers while small ones are happening (and it can also hurt the server). > Also, if anyone knows of some ninja tricks to get at CRC32 or MD5 hashes > the filesystem knows about let me know. I may tinker with using that > instead of these mtime-size etags. Since you already use sqlite, I would just lazily compute them and store them in sqlite. You can compute+store them as extended attributes, too, but not everybody has nor enables them. -- Eric Wong
On Wed, Jul 14, 2010 at 12:45:47PM -0700, Eric Wong wrote: > > 1. Should Etags be mtime-size-ctime? Only cost is the extra bytes of > > transfer. > > ctime is not possible to synchronize across multiple machines if > you're load balancing, so stick with mtime-size. Ahhh, hadn't thought of that. > > 2. I'm allowing keep-alive if the request is "small", but doing > > connection:close if it's a large request. This will cut down on hogs, > > but I'm curious what people think of that? > > Mainstream browsers double the number of parallel connections if it > detects keep-alive is off, so in some cases it can lead to better > performance if there are large transfers while small ones are happening > (and it can also hurt the server). Yeah, I'll have to test how it works in practice, and maybe just make it an option. > > Also, if anyone knows of some ninja tricks to get at CRC32 or MD5 hashes > > the filesystem knows about let me know. I may tinker with using that > > instead of these mtime-size etags. > > Since you already use sqlite, I would just lazily compute them and store > them in sqlite. You can compute+store them as extended attributes, too, > but not everybody has nor enables them. Well, I keep trying to find how you access this information, and it's basically impossible. So, oh well. -- Zed A. Shaw http://zedshaw.com/
On Wed, Jul 14, 2010 at 4:29 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote: > On Wed, Jul 14, 2010 at 12:45:47PM -0700, Eric Wong wrote: > > > 1. Should Etags be mtime-size-ctime? Only cost is the extra bytes of > > > transfer. > > > > ctime is not possible to synchronize across multiple machines if > > you're load balancing, so stick with mtime-size. > > Ahhh, hadn't thought of that. > > > > 2. I'm allowing keep-alive if the request is "small", but doing > > > connection:close if it's a large request. This will cut down on hogs, > > > but I'm curious what people think of that? > > > > Mainstream browsers double the number of parallel connections if it > > detects keep-alive is off, so in some cases it can lead to better > > performance if there are large transfers while small ones are happening > > (and it can also hurt the server). > > Yeah, I'll have to test how it works in practice, and maybe just make it > an option. > > > > Also, if anyone knows of some ninja tricks to get at CRC32 or MD5 > hashes > > > the filesystem knows about let me know. I may tinker with using that > > > instead of these mtime-size etags. > > > > Since you already use sqlite, I would just lazily compute them and store > > them in sqlite. You can compute+store them as extended attributes, too, > > but not everybody has nor enables them. > > Well, I keep trying to find how you access this information, and it's > basically impossible. So, oh well. > I think he's asking why we don't just wait for a request for the file, check for it's md5 in a cache, and otherwise either calculate it immediately (a slowpath on the send back that does read then send with a hash in between) or omit the ETag and let some background service or low priority task populate the entry in the cache. > > -- > Zed A. Shaw > http://zedshaw.com/ >
On Wed, Jul 14, 2010 at 07:18:08PM -0400, Alex Gartrell wrote: > On Wed, Jul 14, 2010 at 4:29 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote: > > Well, I keep trying to find how you access this information, and it's > > basically impossible. So, oh well. > > > > I think he's asking why we don't just wait for a request for the file, check > for it's md5 in a cache, and otherwise either calculate it immediately (a > slowpath on the send back that does read then send with a hash in between) > or omit the ETag and let some background service or low priority task > populate the entry in the cache. I thought about that, but I think I'll hold that for later when there's a module system and people can cook up their own crazy schemes. -- Zed A. Shaw http://zedshaw.com/
On Wed, Jul 14, 2010 at 1:46 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote: > > 5. Any other "file serving" things you wished were available? > > Along the lines of the index.html lookup, one of the most dang useful things about nginx is being able to do something like this: # If the file exists as a static file serve it directly without # running all the other rewite tests on it if (-f $request_filename) { break; } # index.html if (-f $request_filename/index.html) { rewrite (.*) $1/index.html break; } # rails caching if (-f $request_filename.html) { rewrite (.*) $1.html break; }
mtime-size scares me. Apache uses inode-mtime-size for a reason. Size isn't much of a differentiator, and I can definitely see someone for some strange (not necessarily good) reason having files with the same mtime. Personally, I think if-modified-since is the way, rather than etags, to go if you're only going to use mtime + size. On Wed, Jul 14, 2010 at 12:32 PM, Billy Gray <wgray@zetetic.net> wrote: > On Wed, Jul 14, 2010 at 1:46 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote: > >> >> 5. Any other "file serving" things you wished were available? >> >> > Along the lines of the index.html lookup, one of the most dang useful > things about nginx is being able to do something like this: > > # If the file exists as a static file serve it directly without > # running all the other rewite tests on it > if (-f $request_filename) { > break; > } > > # index.html > if (-f $request_filename/index.html) { > rewrite (.*) $1/index.html break; > } > > # rails caching > if (-f $request_filename.html) { > rewrite (.*) $1.html break; > } > > > > -- Andrew Cholakian http://www.andrewvc.com
On Wed, Jul 14, 2010 at 01:01:14PM -0700, Andrew Cholakian wrote: > mtime-size scares me. Apache uses inode-mtime-size for a reason. Size isn't > much of a differentiator, and I can definitely see someone for some strange > (not necessarily good) reason having files with the same mtime. > > Personally, I think if-modified-since is the way, rather than etags, to go > if you're only going to use mtime + size. Nope, turns out once you have more than one server, inode is the death since they're different on every server. -- Zed A. Shaw http://zedshaw.com/
On Wed, Jul 14, 2010 at 03:32:48PM -0400, Billy Gray wrote: > On Wed, Jul 14, 2010 at 1:46 PM, Zed A. Shaw <zedshaw@zedshaw.com> wrote: > Along the lines of the index.html lookup, one of the most dang useful things > about nginx is being able to do something like this: > > # If the file exists as a static file serve it directly without > # running all the other rewite tests on it > if (-f $request_filename) { > break; > } > > # index.html > if (-f $request_filename/index.html) { > rewrite (.*) $1/index.html break; > } > > # rails caching > if (-f $request_filename.html) { > rewrite (.*) $1.html break; > } That'll come with filters, and, like, actually make sense. -- Zed A. Shaw http://zedshaw.com/