atom feed29 messages in ru.sysoev.nginxRe: HTTP PUT for file uploads
FromSent OnAttachments
mikeJul 31, 2008 4:47 pm 
Phillip B OldhamAug 1, 2008 5:10 am 
MichaelAug 1, 2008 7:05 am 
Phillip B OldhamAug 1, 2008 7:16 am 
MichaelAug 1, 2008 7:51 am 
Mike MikeAug 2, 2008 4:15 am 
MichaelAug 2, 2008 6:49 am 
mikeAug 2, 2008 11:07 am 
mikeAug 2, 2008 11:12 am 
mikeAug 2, 2008 11:43 am 
MichaelAug 2, 2008 12:59 pm 
mikeAug 2, 2008 1:14 pm 
Phillip B OldhamAug 4, 2008 1:30 am 
mikeAug 4, 2008 1:38 am 
MichaelAug 4, 2008 4:54 am 
mikeAug 4, 2008 9:51 am 
Brian MoranSep 5, 2008 11:33 pm 
mikeSep 5, 2008 11:48 pm 
Valery KholodkovSep 6, 2008 4:44 am 
Kon WilmsSep 6, 2008 8:35 am 
mikeSep 6, 2008 2:14 pm 
mikeSep 6, 2008 2:16 pm 
mikeSep 8, 2008 2:12 pm 
Grzegorz NosekSep 9, 2008 12:43 am 
mikeSep 9, 2008 1:19 am 
Phillip B OldhamSep 9, 2008 1:37 am 
mikeSep 9, 2008 2:26 am 
Kon WilmsSep 9, 2008 8:31 am 
mikeSep 9, 2008 12:41 pm 
Subject:Re: HTTP PUT for file uploads
From:mike (mike@public.gmane.org)
Date:Sep 6, 2008 2:14:12 pm
List:ru.sysoev.nginx

On Sat, Sep 6, 2008 at 4:45 AM, Valery Kholodkov <valery+nginxen-v//pYa5@public.gmane.org> wrote:

1) multipart/form-data does not bloat the size of the file, it doesn't encode anything. rfc 1867 doesn't explicitly say that there any encoding should be applied;

that's weird. i thought someone said it bloats it. i will have to update that next time i post about this.

2) the ideal solution is to have byte ranges instead of Segment ID, since concatenation of parts is not a _scalable operation_. With byte ranges the server will be able to put the part into the proper place in the file, while leaving other parts empty. I.e. if I have two parts with byte ranges 0-19999/80000 and 40000-59999/80000, I can lseek to the second part and start writing two parts simultaneously:

|<-- part1 0-19999/80000 -->|<-- zeroes 000000 -->|<-- part2 40000-59999/80000 -->|

The reason I decided segments is due to being able to transfer multiple segments in parallel, and I don't know enough about server side code and shared filesystems to know if it would work properly over NFS or something else. I am thinking of a solution that has no "it may not work" type of restrictions.

If only one file can be sent at a time, then I was thinking PHP (since this was a first attempt into it, and I only know PHP) can seek to the specific byte range; however, being able to split up the file into segments and send them at will allows for multiple segments to be uploaded for the same file and does not have any NFS/locking risks.

If you want to code an extension that does this cleaner and uses byteranges that will be safe over a network filesystem like NFS that works for me :) I only know PHP, and have assumptions based on how other things do it.

A 2 gig file at 128k chunks (segments) would wind up being 2000 * 8 = 16000 chunks. Thats a lot of files. I started thinking of making "superchunks" which would be groupings of 100 chunks or something, so after 100 chunks (in a row) were successful it would glue those together, and reduce the number of files...

If this idea at least sounds viable, I think I could scrap together a decent amount of cash from my side business and my company to fund this. It would have to support operations safely over NFS, CIFS, or single servers (so a local /tmp file wouldn't work for NFS, since the requests could be sent to any of the webservers, so it would have to be on a shared directory, which should be user configurable)

I suppose client-side isn't too hard to seek throughout a file as it doesn't have to worry about odd locking issues and writing. It would be great if the server end could be created in a way that it could be come an "unofficial" standard on how to upload large files or with unreliable or slow connections.

It would also take care of progress bar stuff as it could give feedback when chunks get completed back to the client, and the client knows how fast it is sending data... so during a chunk it would be relying on it's own internal transfer stats, and it would be able to confirm up to byte 70000 is completed on the server for example...

Also, there's an issue of garbage collection. A job would have to clean out the [shared] temp directory after a while - I thought something like 4 days would be nice [user configurable is best], because a 2 gig file could take a long time and people might have to resume it over the course of a couple days, but any longer we'd have to assume it's an orphan file that won't be resumed again.

If you're interested in this, I would love to have someone as experienced as you - who has already dealt with handling file uploads and created nginx modules! Let me know, feel free to contact me off list. We might have to work out some more specifics, and you might want to know how much $$ - I'd have to ask at work what they would pay for it, but I'd pledge $500 out of my own pocket. I don't believe any other webserver or anything out there has anything like this (besides maybe some thick client apps with specific servers that only handle file uploads...)

Ideally, this would be something that other people could create modules for Apache, etc. as well and it could be adopted by browsers directly and alleviate the need for thick Java/Flash/etc apps. If it's done in a "standard" enough way to be re-creatable on the client and the server...

I'd love to hear any thoughts, opinions, get any code going, etc. I've actually got a PHP version of the server component that I think is actually functional (with minimal amounts of code, surprisingly) but don't have a client to test it with yet. Was going to create a PHP client to test it too.