Hello, Just a few days ago I needed to serve a 3.5GB file out of mongrel2, and was surprised to receive an empty response. In the logs I see: [ERROR] (src/dir.c:94: errno: Value too large for defined data type) File stat failed: <here is the file path> So I googled about this and found out an option we can pass to gcc when compiling code that will handler 2GB+ files: -D_FILE_OFFSET_BITS=64 I created a branch[1] with this correction, but since I don't have access to *BSDs, OSX and other platforms, it would be great if anyone tried to compile and run the tests just to check thats all ok. Please review the patch and let me know if this -D_FILE_OFFSET_BITS=64 option is ok on all platforms. The patch is here: https://github.com/zedshaw/mongrel2/commit/e3b526e523edb8f3cb98a3f3e149ebf2e406101b Thanks. [1] https://github.com/zedshaw/mongrel2/tree/staticfiles-bigger-2gb -- Dalton Barreto http://daltonmatos.wordpress.com http://wsgid.com
On Thu, May 26, 2011 at 11:09:04PM -0300, Dalton Barreto wrote: > Hello, > > Just a few days ago I needed to serve a 3.5GB file out of mongrel2, > and was surprised to receive an empty response. > > In the logs I see: > > [ERROR] (src/dir.c:94: errno: Value too large for defined data type) > File stat failed: <here is the file path> Jeez, well at least our practice of checking every return value protected against it. The use of stat to find file sizes has hit us in another instance so I wonder if we can just ditch it and do something else. Anyway, I'll grab your stuff and apply it if someone hasn't yet. -- Zed A. Shaw http://zedshaw.com/
I think we will need to do more to make this work correctly. Specifically, all of the steam_file functions take int len's instead of off_t/size_t len's, so we're going to overflow with files greater than 4 gbs anyway. If you wanna go throught and change all this stuff you can, or I can do it; just let me know. Alex On Thu, May 26, 2011 at 10:09 PM, Dalton Barreto <daltonmatos@gmail.com>wrote: > Hello, > > Just a few days ago I needed to serve a 3.5GB file out of mongrel2, > and was surprised to receive an empty response. > > In the logs I see: > > [ERROR] (src/dir.c:94: errno: Value too large for defined data type) > File stat failed: <here is the file path> > > So I googled about this and found out an option we can pass to gcc > when compiling code that will handler 2GB+ files: > -D_FILE_OFFSET_BITS=64 > > I created a branch[1] with this correction, but since I don't have > access to *BSDs, OSX and other platforms, it would be great if anyone > tried to compile and run the tests just to check thats all ok. > > Please review the patch and let me know if this -D_FILE_OFFSET_BITS=64 > option is ok on all platforms. > > The patch is here: > > https://github.com/zedshaw/mongrel2/commit/e3b526e523edb8f3cb98a3f3e149ebf2e406101b > > Thanks. > > [1] https://github.com/zedshaw/mongrel2/tree/staticfiles-bigger-2gb > > -- > Dalton Barreto > http://daltonmatos.wordpress.com > http://wsgid.com >
2011/5/27 Alex Gartrell <agartrell@cmu.edu>: > I think we will need to do more to make this work correctly. Specifically, > all of the steam_file functions take int len's instead of off_t/size_t > len's, so we're going to overflow with files greater than 4 gbs anyway. If > you wanna go throught and change all this stuff you can, or I can do it; > just let me know. > Alex Great Alex, I just didn't see this. If you want, push this fixes directly to the branch I already created. Even not merging all this for the current release, at least we will already have a something good when it''s time for the merge. Thanks for you help. -- Dalton Barreto http://daltonmatos.wordpress.com http://wsgid.com
2011/5/27 Dalton Barreto <daltonmatos@gmail.com>: > 2011/5/27 Alex Gartrell <agartrell@cmu.edu>: >> I think we will need to do more to make this work correctly. Specifically, >> all of the steam_file functions take int len's instead of off_t/size_t >> len's, so we're going to overflow with files greater than 4 gbs anyway. If >> you wanna go throught and change all this stuff you can, or I can do it; >> just let me know. >> Alex > > Great Alex, I just didn't see this. If you want, push this fixes > directly to the branch I already created. Even not merging all this > for the current release, at least we will already have a something > good when it''s time for the merge. > Hey Alex, Take a look at the conversation at issue 5 [1]. I will try to fix this problem without changing the len parameter of IOBuf_stream_* functions, as Zed suggested. If you are willing to help, let me know. Thanks. -- Dalton Barreto http://daltonmatos.wordpress.com http://wsgid.com
2011/5/28 Dalton Barreto <daltonmatos@gmail.com>: > > Hey Alex, > > Take a look at the conversation at issue 5 [1]. I will try to fix this > problem without changing the len parameter of IOBuf_stream_* > functions, as Zed suggested. > Actually, now looking at the code I realized that we still need to change the len parameter. =) At least from IOBuf_stream_file() and IOBuf->stream_file callback. -- Dalton Barreto http://daltonmatos.wordpress.com http://wsgid.com
2011/5/28 Dalton Barreto <daltonmatos@gmail.com>: > 2011/5/28 Dalton Barreto <daltonmatos@gmail.com>: >> >> Hey Alex, >> >> Take a look at the conversation at issue 5 [1]. I will try to fix this >> problem without changing the len parameter of IOBuf_stream_* >> functions, as Zed suggested. >> > > Actually, now looking at the code I realized that we still need to > change the len parameter. =) > At least from IOBuf_stream_file() and IOBuf->stream_file callback. > I've pushed some code to the branch. What's missing is the main logic to send chunks of 2GB. I'll take a look at this later. Thanks. -- Dalton Barreto http://daltonmatos.wordpress.com http://wsgid.com
2011/5/28 Dalton Barreto <daltonmatos@gmail.com>: > 2011/5/28 Dalton Barreto <daltonmatos@gmail.com>: >> 2011/5/28 Dalton Barreto <daltonmatos@gmail.com>: >>> >>> Hey Alex, >>> >>> Take a look at the conversation at issue 5 [1]. I will try to fix this >>> problem without changing the len parameter of IOBuf_stream_* >>> functions, as Zed suggested. >>> >> >> Actually, now looking at the code I realized that we still need to >> change the len parameter. =) >> At least from IOBuf_stream_file() and IOBuf->stream_file callback. >> > > I've pushed some code to the branch. What's missing is the main logic > to send chunks of 2GB. > I'll take a look at this later. > Alright, so we have now a functioning version of chunked responses. I had to disable the timeout logic because Registration struct has a "uint32_t bytes_written", that was obviously overflowing. To skip the overflow problems I changed from int to long long int, but I don't know if this is the best to do. Any suggestions? Other than that, the implementation works pretty well. Even curl reporting at the end: curl: (18) transfer closed with outstanding read data remaining the MD5 sums of the two files match. Please review the patches, make any suggestions you have and I will make all modifications needed before merging into master branch. Thanks! -- Dalton Barreto http://daltonmatos.wordpress.com http://wsgid.com
This breaks on OS X (and I'd assume the BSD's, but I don't have them up and
running at the moment).
src/dir.c: In function ‘Dir_find_file’:
src/dir.c:114: error: ‘O_LARGEFILE’ undeclared (first use in this function)
src/dir.c:114: error: (Each undeclared identifier is reported only once
src/dir.c:114: error: for each function it appears in.)
I was also able to break it with a 3.7 gb file of all zeros on OS X
~/sandbox/bigfiles/files $ curl localhost:6767/bigfile > bigfile2
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
26 3814M 26 1024M 0 0 45.5M 0 0:01:23 0:00:22 0:01:01
41.1M
curl: (56) Received problem 3 in the chunky parser
(it sent the first chunk and then croaked)
[ERROR] (src/bsd_specific.c:64: errno: Socket is not connected) OS X
sendfile wrapper failed
[ERROR] (src/dir.c:185: errno: None) Failed sending chunk content:
IOBuf_stream_file returned -1
Sorry I don't have more time to look at it right now, I'll try to take a
look later on.
Alex
On Sun, May 29, 2011 at 7:35 PM, Dalton Barreto <daltonmatos@gmail.com>wrote:
> 2011/5/28 Dalton Barreto <daltonmatos@gmail.com>:
> > 2011/5/28 Dalton Barreto <daltonmatos@gmail.com>:
> >> 2011/5/28 Dalton Barreto <daltonmatos@gmail.com>:
> >>>
> >>> Hey Alex,
> >>>
> >>> Take a look at the conversation at issue 5 [1]. I will try to fix this
> >>> problem without changing the len parameter of IOBuf_stream_*
> >>> functions, as Zed suggested.
> >>>
> >>
> >> Actually, now looking at the code I realized that we still need to
> >> change the len parameter. =)
> >> At least from IOBuf_stream_file() and IOBuf->stream_file callback.
> >>
> >
> > I've pushed some code to the branch. What's missing is the main logic
> > to send chunks of 2GB.
> > I'll take a look at this later.
> >
>
> Alright, so we have now a functioning version of chunked responses. I
> had to disable the timeout logic because Registration struct has a
> "uint32_t bytes_written", that was obviously overflowing.
>
> To skip the overflow problems I changed from int to long long int, but
> I don't know if this is the best to do. Any suggestions?
>
> Other than that, the implementation works pretty well. Even curl
> reporting at the end:
>
> curl: (18) transfer closed with outstanding read data remaining
>
> the MD5 sums of the two files match.
>
> Please review the patches, make any suggestions you have and I will
> make all modifications needed before merging into master branch.
>
> Thanks!
>
>
> --
> Dalton Barreto
> http://daltonmatos.wordpress.com
> http://wsgid.com
>
2011/5/30 Alex Gartrell <agartrell@cmu.edu>: > This breaks on OS X (and I'd assume the BSD's, but I don't have them up and > running at the moment). > src/dir.c: In function ‘Dir_find_file’: > src/dir.c:114: error: ‘O_LARGEFILE’ undeclared (first use in this function) > src/dir.c:114: error: (Each undeclared identifier is reported only once > src/dir.c:114: error: for each function it appears in.) Indeed. Actually I just compiled without O_LARGEFILE and it worked. I just kept -D_FILE_OFFSET_BITS=64. > I was also able to break it with a 3.7 gb file of all zeros on OS X > ~/sandbox/bigfiles/files $ curl localhost:6767/bigfile > bigfile2 > % Total % Received % Xferd Average Speed Time Time Time > Current > Dload Upload Total Spent Left > Speed > 26 3814M 26 1024M 0 0 45.5M 0 0:01:23 0:00:22 0:01:01 > 41.1M > curl: (56) Received problem 3 in the chunky parser > (it sent the first chunk and then croaked) > [ERROR] (src/bsd_specific.c:64: errno: Socket is not connected) OS X > sendfile wrapper failed > [ERROR] (src/dir.c:185: errno: None) Failed sending chunk content: > IOBuf_stream_file returned -1 This is strange. I tested here with a 3GB file and it streamed just fine. Maybe it's something to do with BSD's sendfile()? > Sorry I don't have more time to look at it right now, I'll try to take a > look later on. > Alex > No problem Alex. Thanks for your time looking at this. I'll keep working. If you have any suggestions just let me know. Thanks a lot. -- Dalton Barreto http://daltonmatos.wordpress.com http://wsgid.com
> 2011/5/30 Alex Gartrell <agartrell@cmu.edu>: > I was also able to break it with a 3.7 gb file of all zeros on OS X > ~/sandbox/bigfiles/files $ curl localhost:6767/bigfile > bigfile2 > % Total % Received % Xferd Average Speed Time Time Time > Current > Dload Upload Total Spent Left > Speed > 26 3814M 26 1024M 0 0 45.5M 0 0:01:23 0:00:22 0:01:01 > 41.1M > curl: (56) Received problem 3 in the chunky parser > (it sent the first chunk and then croaked) Hey Zed and Alex, I've being thinking about this implementation and I may be (probably am) in the wrong path. The HTTP chunked response is better suited when we *do not* know the content length before hand, and this is *not* the case here as we are serving a static file with a known size. So, what do you think about removing this (currently broken) HTTP chunked response logic and just write the chunks directly to the connection socket? In fact, I think that this implementation is more correct than what I'm doing right now. Tell me what you think and I will change the code, it will be a much simpler implementation and it will be easier to merge back into the master branch. Thanks! -- Dalton Barreto http://daltonmatos.wordpress.com http://wsgid.com
On Thu, May 26, 2011 at 11:22:14PM -0400, Alex Gartrell wrote: > I think we will need to do more to make this work correctly. Specifically, > all of the steam_file functions take int len's instead of off_t/size_t > len's, so we're going to overflow with files greater than 4 gbs anyway. If > you wanna go throught and change all this stuff you can, or I can do it; > just let me know. Oh yeah there's going to be a bunch of those looking at this now. Let's put this in the list of ticket and then hit it on the next release instead of now. I want to spend more time testing it and doing it right. -- Zed A. Shaw http://zedshaw.com/
2011/5/27 Zed A. Shaw <zedshaw@zedshaw.com>: > On Thu, May 26, 2011 at 11:22:14PM -0400, Alex Gartrell wrote: >> I think we will need to do more to make this work correctly. Specifically, >> all of the steam_file functions take int len's instead of off_t/size_t >> len's, so we're going to overflow with files greater than 4 gbs anyway. If >> you wanna go throught and change all this stuff you can, or I can do it; >> just let me know. > > Oh yeah there's going to be a bunch of those looking at this now. Let's > put this in the list of ticket and then hit it on the next release > instead of now. I want to spend more time testing it and doing it > right. Alright, issue created: https://github.com/zedshaw/mongrel2/issues/5 Thanks. -- Dalton Barreto http://daltonmatos.wordpress.com http://wsgid.com
On Fri, May 27, 2011 at 08:15:29PM -0300, Dalton Barreto wrote: > Alright, issue created: https://github.com/zedshaw/mongrel2/issues/5 I absolutely hate github's issues system. How in the hell did they screw these up so bad. It's like they don't actually code for a living. For example, I just now replied to that ticket because the emails they send out look exactly like librelist emails, so I thought it was a message you sent to the mailing list. They couldn't be bothered to do [github/mongrel2] instead? Oh no, that's right, because they're all that matters. Then I'm looking for the close button, and I see this massive green OPEN at the top right, a ton of blank space, and nothing. Instead I gotta scroll through the whole list of comments and then ohhhh there it is, a gray button inside a gray panel with only a curvy button bevel to visually differentiate it from all the other gray shit on their page. WTF. -- Zed A. Shaw http://zedshaw.com/