I was in the office the other day, you know doing the usual drinking coffee and unleashing my usual carry on’esque humour to those unfortunate enough to be listening. When the conversation turned to file upload strategies and progress reporting solutions in Rails.
Now for various reasons that have been talked about by other people before, Rails as a solution to uploading large files was instantly rejected. The usual answer to this problem is to use a bank of merb’s and feed back progress to the client. It was during this conversation that Mr Tanner brought up the subject of the Nginx upload progress module.
Now we’re big fan’s of Nginx because it’s fast, lightweight and easy on resources. So I immediately looked into this. This module basically counts the number of bytes it’s received, and allows the progress of the upoad to be retrieved by the client via AJAX and JSON. After the file upload is complete Nginx then pushes the payload to the upstream server. This get’s around the problems people have had using Nginx with merb/upload progress. The upload progress feedback is pushed to Nginx instead which for me would allow more flexibility in my choice of what the upstream server/application will be. Merb’s great. It’s fast and lightweight, but if you’re using it just file uploads then it could be a bit of a waste. It would be simple to write a simple mongrel handler to deal with the file uploads or use the soon to be released Wisteria a Ruby micro framework by Kirk Haines.
Wisteria is a micro framework that doesn’t try and do anywhere near as much as Rails or Merb and because of this it’s managed to perform approx. 2000 req/s for a simple hello world application. These figures make this a framework to definitely consider in the future if you’re doing large file uploads or dealing with requests that need high performance but also need non of the polish that merb or Rails provide.
Of course there is the problem of validation if you want to verify both the upload payload and any other fields such as descriptions, tags etc. In my architectures I always push off as much as I can in a non synchronous way. So in the domain of a popular video sharing site that has to accept uploaded videos, with a description, some tags and then transcode the video into the appropriate format. I would probably validate the incoming data, push this onto a message queue (albeit without the actual file payload of course) to be dealt with by another process when resources are available.
Now the conundrum for me is to do with validation. In the lightweight process that’s dealing with the file upload I probably don’t want to be loading the entire model of the ORM I’m using such as ActiveRecord or DataMapper. However I do want to use the validations that are part of the model. Wouldn’t it be great if I could just mixin my validations into my ORM model and share them also with a lightweight DAO that I’m just using a a temporary container? This way I wouldn’t be duplicating business logic.
To my knowledge the AR validations are too tightly coupled with AR for me to do this in a lightweight manner. However I’m sure that I could do this with DataMapper because of the way it was developed using BDD. I’ll experiment more tonight to see if this is indeed feasible.