Web filesystem layout change

Discussion in 'Developers' Forum' started by ispcomm, Aug 26, 2010.

  1. till

    till Super Moderator Staff Member ISPConfig Developer

    One thing that I forgot to mention, if you plan to make the web root changable (when the domain name changes) and not just the symlinks, then this will affect also the ssh and FTP users, the creation of the ssh user jails and also the cronjobs and cron jails. So you will have to change and recreate all these depending "assets" as well.
     
  2. ispcomm

    ispcomm Member

    Thank for the explanation about server.php and the way state is passed. The code makes much more sense now. Now I understand sys_datalog contains the serialization of the old and new data :)
    Well... you're getting me to reconsider all this a/abc.com way of storing site data. A rename of a site looks more like the creation of a new site and copying data over from the old directory (and then deleting the old site). If this approach is followed, onAfterUpdate could just duplicate the settings from sql and copy them to a new site. The plugin would then understand the new "rename" operation and act accordingly.

    Perhaps, it might be wise to just disable site renaming after a site is created ?

    I'm starting to feel that this is a little too much intrusive... otoh I'd like to use glusterfs for the underlying mail and web storage and that "requires" so sort of even distribution of data accross nodes.

    ispcomm
     
  3. till

    till Super Moderator Staff Member ISPConfig Developer

    In my opinion this woud be too much limitation. If a system is used e.g. by a internet agency, then you often run several copies of a website for development and testing and have to switch them easily. e.g. the current website runs as example.com while you develop the new version of the site as test.example.com. Then you get a call from the customer to switch the site live, then you can edit the just the domain example.com > old.example.com and test.example.com > example.com.

    Copying the site data is not an option in my opinion as it will break too much and usernames and permissions will change too.

    I've tried to find a setup that will match your goals of running many thousands of sites from one shared drive and here is my proposal.

    1) We leave the fixed path with the ID's as we have them now. So no domain name placeholder in the real path.

    2) To prevent a bottleneck with the number of clients (too many subdirectories in /var/www/clients/ directory) we add a (numeric) splitter there for the ClientID.

    Current setup:

    /var/www/clients/client324/web880/

    new optional setup with splitting, e.g:

    /var/www/clients/0/3/client324/web880
    /var/www/clients/1/5/client1534/web880

    So we have a directory that splits for thousand and one that splits for hundred. Maybe it is just enough to have a split by hundred, so it looks like this:

    /var/www/clients/3/client324/web880
    /var/www/clients/15/client1534/web880

    3) We leave the symlinks as an optional way to navigate to the fixed web directory of the website but put them in a aphabetic splitted directory tree, e.g.

    /var/www/e/ex/example.com links to /var/www/clients/3/client324/web880 directory where /var/www/e and /var/www/e/ex are directories and /var/www/e/ex/example.com is a symlink.
     
  4. ispcomm

    ispcomm Member

    Could be a reasonable solution

    Till,

    I agree with your proposal. I would only like to refine it a little.

    Ideally the distribution must be as "flat" as possible, hence hashing on client/web by the hundreds or thousands can be improved by just hashing by the "unit" (10 levels, each new site goes to a new dir), tens for first level:

    Example:

    /var/www/clients/1/00/web1
    /var/www/clients/2/00/web2
    /var/www/clients/3/00/web3
    /var/www/clients/1/10/web11
    /var/www/clients/2/10/web12
    /var/www/clients/3/10/web13

    OR

    /var/www/clients/1/0/web1
    /var/www/clients/2/0/web2
    /var/www/clients/3/0/web3
    /var/www/clients/1/1/web11
    /var/www/clients/2/1/web12
    /var/www/clients/3/1/web13


    If there's a requirement to have all sites under the same "client" directory, then a hash on the client number instead of the web site number.

    Ideally, it would be possible to hash on a n-modulus on each digit, effectively deciding the width of the tree (%20 / %30 / %100)....

    ispcomm.
     
    Last edited: Sep 1, 2010
  5. ispcomm

    ispcomm Member

    Let's assume we hash on 2 levels as in this example (easy for customer to calculate it's root directory as each directory corresponds to 1 digit in the web id)
    Each level of directory reduces the amount of subdirectories at the last level by a factor of 10. So, with 1 level, the last directory will contain max_web_id/10 sites.

    2 levels = reduction by 100
    3 levels = reduction by 1000

    With 10.000 sites and 2 levels, the last level will contain 100 subdirectories
    With 100.000 sites and 3 levels, the last level contains 100 subdirectories.

    So... 100.000 sites would be a good number for any single "cluster" of redundant servers and 3 levels seems appropriate even for a small cluster with 1000 sites (1 sites per last directory, but still acceptable).

    In theory, we should calculate the amount of space wasted by directory trees and the number of accesses necessary for an access to directory with "n" subdirectories... but let's leave this to academic circles (and perhaps google engineers).

    The only (minor) problem I see is that contrary to what happens with the hashing of the site/domain name, the website_id numbers are consecutive. This means that there will always be a slight imbalance of some of the directories (the directory containing the 10 will be filled sequentially and so goes for the one containing the 100ths etc). This i think is a minor problem, provided that there will be never more than 10 subdirectories in the "traverse" directory. A method to avoid this would be to hash using a modulus using prime number divisors but this would also mean a difficult way to calculate the final directory path for the customer (and more tickets in the support ticket queue). I don't want to sound too theoretical here also.

    If you agree with this proposal, I'll go forward and implement it in place of the current hash on the directory name. The change will be confined to the plugin and there will be no other issues with ftp/cron etc.

    I'll wait for your approval before moving forward.

    ispcomm.
     
  6. till

    till Super Moderator Staff Member ISPConfig Developer

    Your proposal with paths like "/var/www/clients/1/0/web1 " is fine for me, just that we should do the split based on the clientID and not the webID. I dont expect that a client will have more then a few websites, at least not more then hundred, so hashing on the clientID should work as well for the purpose of splitting the load. In ISPConfig the setup assumes that all sites of a client are in the same directory and I think we should not change this if it is not absolutely nescessary.
     
  7. ispcomm

    ispcomm Member

    Patch attached (finally).

    Till,

    I apologize as this simple change took me 10 days. I've had little time lately.

    Nevertheless, I followed your suggestion and implemented the patch with hashing on the client_id and the website_id. Both are optional and both are fixed when the domain name changes.

    There're 8 new tags, namely

    Code:
    client_idhash_1
    client_idhash_2
    client_idhash_3
    client_idhash_4
    
    website_idhash_1
    website_idhash_2
    website_idhash_3
    website_idhash_4
    They has the client ID or the website ID in a reverse-notated, 0-padded, modulus-10 hash, i.e.: 123 hash 3 becomes 3/2/1 while 123 hash 4 becomes 3/2/1/0.

    If you don't mind this patch, please include it in trunk.

    Thank you,
    ispcomm.

    Code:
    Index: interface/lib/plugins/sites_web_domain_plugin.inc.php
    ===================================================================
    --- interface/lib/plugins/sites_web_domain_plugin.inc.php	(revision 1972)
    +++ interface/lib/plugins/sites_web_domain_plugin.inc.php	(working copy)
    @@ -10,6 +10,24 @@
     	var $plugin_name        = 'sites_web_domain_plugin';
     	var $class_name         = 'sites_web_domain_plugin';
     
    +	// TODO: This function is a duplicate from the one in interface/web/sites/web_domain_edit.php
    +	//       There should be a single "token replacement" function to be called from modules and
    +	//	 from the main code.
    +	// Returna a "3/2/1" path hash from a numeric id '123'
    +	function id_hash($id,$levels) {
    +		$hash = "" . $id % 10 ;
    +		$id /= 10 ;
    +		$levels -- ;
    +		while ( $levels > 0 ) {
    +			$hash .= "/" . $id % 10 ;
    +			$id /= 10 ;
    +			$levels-- ;
    +		}
    +		return $hash;
    +	}
    +	
    +
    +
         /*
                 This function is called when the plugin is loaded
         */
    @@ -38,7 +56,17 @@
             // Get configuration for the web system
             $app->uses("getconf");        
             $web_config = $app->getconf->get_server_config(intval($page_form->dataRecord['server_id']),'web');            
    +	// TODO: This code is a duplicate from interface/web/sites/web_site_edit.php (there should be only 1).
             $document_root = str_replace("[website_id]",$page_form->id,$web_config["website_path"]);
    +print_r($web_config);
    +print_r($page_form);
    +
    +        $document_root = str_replace("[website_idhash_1]",$this->id_hash($page_form->id,1),$document_root);
    +		$document_root = str_replace("[website_idhash_2]",$this->id_hash($page_form->id,1),$document_root);
    +		$document_root = str_replace("[website_idhash_3]",$this->id_hash($page_form->id,1),$document_root);
    +		$document_root = str_replace("[website_idhash_4]",$this->id_hash($page_form->id,1),$document_root);
    +        
    +        
             // get the ID of the client
             if($_SESSION["s"]["user"]["typ"] != 'admin' && !$app->auth->has_clients($_SESSION['s']['user']['userid'])) {                    
                 $client_group_id = $_SESSION["s"]["user"]["default_group"];
    @@ -53,11 +81,20 @@
             // Set the values for document_root, system_user and system_group
             $system_user 				= $app->db->quote('web'.$page_form->id);
             $system_group 				= $app->db->quote('client'.$client_id);
    +	// TODO: Isn't this a duplication of the code above???
             $document_root 				= $app->db->quote(str_replace("[client_id]",$client_id,$document_root));
    -        $php_open_basedir 			= str_replace("[website_path]",$document_root,$web_config["php_open_basedir"]);
    +		$document_root = $app->db->quote(str_replace("[client_idhash_1]",$this->id_hash($client_id,1),$document_root));
    +		$document_root = $app->db->quote(str_replace("[client_idhash_2]",$this->id_hash($client_id,2),$document_root));
    +		$document_root = $app->db->quote(str_replace("[client_idhash_3]",$this->id_hash($client_id,3),$document_root));
    +		$document_root = $app->db->quote(str_replace("[client_idhash_4]",$this->id_hash($client_id,4),$document_root));
    +        $document_root = str_replace("[website_idhash_1]",$this->id_hash($page_form->id,1),$document_root);
    +		$document_root = str_replace("[website_idhash_2]",$this->id_hash($page_form->id,1),$document_root);
    +		$document_root = str_replace("[website_idhash_3]",$this->id_hash($page_form->id,1),$document_root);
    +		$document_root = str_replace("[website_idhash_4]",$this->id_hash($page_form->id,1),$document_root);
    +        $php_open_basedir = str_replace("[website_path]",$document_root,$web_config["php_open_basedir"]);
             $php_open_basedir 			= $app->db->quote(str_replace("[website_domain]",$page_form->dataRecord['domain'],$php_open_basedir));
             $htaccess_allow_override 	= $app->db->quote($web_config["htaccess_allow_override"]);
             $sql = "UPDATE web_domain SET system_user = '$system_user', system_group = '$system_group', document_root = '$document_root', allow_override = '$htaccess_allow_override', php_open_basedir = '$php_open_basedir'  WHERE domain_id = ".$page_form->id;
     		$app->db->query($sql);
    -	}
    -}              	
    \ No newline at end of file
    +	}	
    +}              	
    Index: interface/web/sites/web_domain_edit.php
    ===================================================================
    --- interface/web/sites/web_domain_edit.php	(revision 1972)
    +++ interface/web/sites/web_domain_edit.php	(working copy)
    @@ -251,6 +251,19 @@
     		parent::onShowEnd();
     	}
     
    +	// Returna a "3/2/1" path hash from a numeric id '123'
    +	function id_hash($id,$levels) {
    +		$hash = "" . $id % 10 ;
    +		$id /= 10 ;
    +		$levels -- ;
    +		while ( $levels > 0 ) {
    +			$hash .= "/" . $id % 10 ;
    +			$id /= 10 ;
    +			$levels-- ;
    +		}
    +		return $hash;
    +	}
    +	
     	function onSubmit() {
     		global $app, $conf;
     
    @@ -345,6 +358,10 @@
     		$web_rec = $app->tform->getDataRecord($this->id);
     		$web_config = $app->getconf->get_server_config(intval($web_rec["server_id"]),'web');
     		$document_root = str_replace("[website_id]",$this->id,$web_config["website_path"]);
    +		$document_root = str_replace("[website_idhash_1]",$this->id_hash($this->id,1),$document_root);
    +		$document_root = str_replace("[website_idhash_2]",$this->id_hash($this->id,1),$document_root);
    +		$document_root = str_replace("[website_idhash_3]",$this->id_hash($this->id,1),$document_root);
    +		$document_root = str_replace("[website_idhash_4]",$this->id_hash($this->id,1),$document_root);
     
     		// get the ID of the client
     		if($_SESSION["s"]["user"]["typ"] != 'admin' && !$app->auth->has_clients($_SESSION['s']['user']['userid'])) {
    @@ -361,6 +378,10 @@
     		$system_user = $app->db->quote('web'.$this->id);
     		$system_group = $app->db->quote('client'.$client_id);
     		$document_root = $app->db->quote(str_replace("[client_id]",$client_id,$document_root));
    +		$document_root = $app->db->quote(str_replace("[client_idhash_1]",$this->id_hash($client_id,1),$document_root));
    +		$document_root = $app->db->quote(str_replace("[client_idhash_2]",$this->id_hash($client_id,2),$document_root));
    +		$document_root = $app->db->quote(str_replace("[client_idhash_3]",$this->id_hash($client_id,3),$document_root));
    +		$document_root = $app->db->quote(str_replace("[client_idhash_4]",$this->id_hash($client_id,4),$document_root));
     		$php_open_basedir = str_replace("[website_path]",$document_root,$web_config["php_open_basedir"]);
     		$php_open_basedir = $app->db->quote(str_replace("[website_domain]",$web_rec['domain'],$php_open_basedir));
     		$htaccess_allow_override = $app->db->quote($web_config["htaccess_allow_override"]);
    @@ -426,7 +447,11 @@
     		$web_rec = $app->tform->getDataRecord($this->id);
     		$web_config = $app->getconf->get_server_config(intval($web_rec["server_id"]),'web');
     		$document_root = str_replace("[website_id]",$this->id,$web_config["website_path"]);
    -
    +		$document_root = str_replace("[website_idhash_1]",$this->id_hash($this->id,1),$document_root);
    +		$document_root = str_replace("[website_idhash_2]",$this->id_hash($this->id,1),$document_root);
    +		$document_root = str_replace("[website_idhash_3]",$this->id_hash($this->id,1),$document_root);
    +		$document_root = str_replace("[website_idhash_4]",$this->id_hash($this->id,1),$document_root);
    +		
     		// get the ID of the client
     		if($_SESSION["s"]["user"]["typ"] != 'admin' && !$app->auth->has_clients($_SESSION['s']['user']['userid'])) {
     			$client_group_id = $_SESSION["s"]["user"]["default_group"];
    @@ -516,4 +541,4 @@
     $page = new page_action;
     $page->onLoad(); 
    
     
  8. ispcomm

    ispcomm Member

    @till

    I wonder if you had a chance to review the above patch. I understand you've been away from the forums a few days and you'll have some catchup to do...
    ispcomm.
     
  9. till

    till Super Moderator Staff Member ISPConfig Developer

    The patch looks fine. I will add it to svn.
     
  10. ispcomm

    ispcomm Member

    thank you.

    I'm experimenting with the patch. If I find any problems I'll send you a fix.

    ispcomm
     

Share This Page