I posted this over at webhostingtalk.com and got nothing other than the usual advertisement spam from random hosting companies. So I decided to ask here Hello guys, I've come for some advice regarding the architecture of a file serving cluster. Let me give you some background information. I run a gaming community, and we've recently become quite popular. We allow our members to upload gaming related photos, videos, and files. Currently we're running one dedicated server providing the web app/database, and another cheap unmetered box (lighttpd) for serving the static files. We're literally growing exponentially and I foresee the need to add 2 to 3 file servers within the next month. What would an ideal architecture be to host/manage these fileservers? Here is what I've come up with, but somehow I think it's not as good as it should be. - central application server / db server - central upload server - cheap unmetered fileserver boxes The central upload server will be the file upload gateway. It'll mount each of the cheap unmetered servers through NFS, and then appropriately distribute the files to one of the fileservers. The problems I see with this are: 1. Load balancing (if a file is taking huge amounts of bandwidth, I'd like to distribute it across multiple servers, but how do I do this when I give a user an absolute link to his or her file like img1.myserver.com/545/file.zip). I suppose I could do this with rewrite rules, but that seems super excessive. 2. NFS - I've read many bad things. 3. Moving files ( same dilemma as load balancing) If anyone could shed some light or give me some advice, that would be amazing. Thanks, Alex
Combine http://www.howtoforge.com/high_availability_nfs_drbd_heartbeat with http://www.howtoforge.com/high_availability_loadbalanced_apache_cluster Mirror with http://www.howtoforge.com/mirroring_with_rsync Build your central app and DB server. Either get a good box, or use the apache cluster and mysql cluster as I mentioned. Next make your central upload server and set it up with rsync Next gather your fileserver boxes and set them all up with RSYNC. Have them use clustered NFS. Have Rsync run every 1-6 hours, however often you feel you need to update content. Stipulations: Identical file boxes- servers need to be basically same port speed and same HD size for easy management. Heartbeat servers: you need more than one hearbeat server so that it's high availability. Either two small root access hosting accounts could be used, or you could set one on the app server and the other on the upload server Anyways, thats how I would do it. The key is Rsync to keep all server archives absolutely identical and Apache Clustering to make them all load balance. I'm sure you could use Lighthttpd somewhere there instead, but I'm unfamiliar with it (Apache/Mongrel fan here) so that would be your own turf, sorry. Good luck with your project, sounds fun, wish I was there .
Sounds to me like ZFS would be a better fit for your distributed file storage. http://en.wikipedia.org/wiki/ZFS