cluster migration: almost done!

Discussion in 'General' started by alexolivan, Nov 19, 2014.

  1. alexolivan

    alexolivan Member

    Hi forum!

    background:
    I heve being playing with a ISPC3 cluster in order to test the feasibility of an scenario: 4 node, Debian Virtual, cluster, replicated with MariaDB + Galera, and glusterFS... (nginx, postfix, dovecot, pure-ftpd)
    The cluster RAN OK, it is a blank cluster, with just a test website an a test email account, but it was very impressive seing it running stable for several weeks...
    So, we decided to deploy it, add some real sites and test it seriously... so we made snapshots of the VMs and got them resurrected on the production environment...
    Our idea is that it should be possible to deploy clones of the whloe blank cluster into production environments with just a few reconfiguring...
    now on the matter!

    The matter:
    The only thing we had to change was IP subnet (also private addresses), so we did, and reconfigured MariaDB/Galera, gluster ...and tada! it all seem working... but... we suspect not truly...
    We have detected that ALL System state / Monitoring / Logs have freezed at the same time on all cluster nodes (we do but have status/logs on the "master" server)... So we fear that comunication is lost between nodes.
    We have of course updated IPs on the GUI (we do have a single GUI Panel installed on the master) and have ensured to check that nodes have their IPs updated.

    Also, we have created GRANT entries on db whit new IPs, since we had seen old user / grat entries with original IPs... but we see no difference

    Maybe someone has some clue on what can be going on here...

    Thank you anyways in advance guys.
    Best regards!


    EDIT:
    COnfirmed: it does NOT work: creating a new site on hte GUI enabled server does no apply to the rest of nodes... DB mess for sure...
    Seems It is not so trivial for ISPConfig to change IP subnet in cluster environment...
     
    Last edited: Nov 19, 2014
  2. till

    till Super Moderator Staff Member ISPConfig Developer

    Check the db logins for the ispcsrv* users from the slaves (see config.inc.php files) and adjust the ip addresses in mysql permission tables to amke them work again. as soon as the slaves can connevt t the master again, they will start to pick up the changes and update the monitor. And check /etc/hosts files on all nodes.
     
  3. alexolivan

    alexolivan Member

    .... uhhhh its getting worst...

    Since I dont have Internals knowledge of ISPc3 and the cluster setup was the result of acurate forum study, and it worked, we have simply wipe out databases, clear /var/www contents (we are in fact at testing / research state) and reintalled the whole cluster.

    Again it worked... and we got our 4 mirror server up in the stats after fresh install.
    We set every server mirroring the "master" in GUI server config
    We reinstalled phpmyadmin from scratch.
    We reinstalled roundcube from scratch.
    We created a test user / reseller
    and an early test site.

    ...we then discovered that althoug data in /var/www and database is replicated, no configuration is generated in /etc/nginx/sites-available, etc...
    ...again we looked the log files and here it is: 20 minutes of entries were readable on the "slaves", then nothing more...
    So we are at starting point.

    It is evident that during install and some time the cluster was up, but something ruins it down somewhere.
    Overall, environment is the same, and database and storage is replicating like a charm...

    googling for config.inc.php (we do not know what or where it is) we found /usr/local/ispconfig/server/lib/config.inc.php... so we took a look at it:
    It does not seem to have information on any server / database other than localhost... so, either it is not the file or it has been somehow replaced.

    ispcsrvX user entries are present on our database, and are replicated on every node, but I dont find any logic on ispcsrvX -> host distribution... such internals are too deep.

    EDIT1:
    Another thing we changed (as appering in a perfect guide) we mark map web user to system id checkbow on each server config ... just to mention everything.

    EDIT2:
    phpmyadmin is configured in master, and being a socket client, we simply copy/replace /etc/phpmyadmin/* to slaves.... it seemed work since database an package contents are equaly present on every machine... and database is replicated from master package configuration.
    Roundcube is setup in a similar way, with every slave node pointing the master server with created remote user (again from perfect guides)
     
    Last edited: Nov 20, 2014
  4. till

    till Super Moderator Staff Member ISPConfig Developer

    The file conatins 2 database connection details on each slave node (if the slave has been installed correctly). The connection to the local database:

    //** Database
    $conf['db_type'] = 'mysql';
    $conf['db_host'] = 'localhost';
    $conf['db_database'] = 'dbispconfig';
    $conf['db_user'] = 'ispconfig';

    and the connectin to the master server:

    //** Database settings for the master DB. This setting is only used in multiserver setups
    $conf['dbmaster_type'] = 'mysql';
    $conf['dbmaster_host'] = 'master.example.com';
    $conf['dbmaster_database'] = 'dbispconfig';
    $conf['dbmaster_user'] = 'ispcsrv1';
    $conf['dbmaster_password'] = '261e714fabf8043c7aa50091614f8f11';


    The ispcnfig setup is very easy and straight forward, the slave connects to the master by using the mysql master details, pulls all new changes that are relevant for hom from sys_datalog table and inserts them into his local ispconfig database.

    If you are useing a mysql replication like galera, then ensure that the database name for the ispconfig database on each slave node is different and the username for the localhost must be different too as ispconfig takes care on replicating its configuration. If you would mirror the ispconfig databases and al slaves use the same database name, then gelera will conflict with ispconfig and ispconfig does not know which data has beenn processed anymore. Thats described in the ispconfig mirror tutorial.
     
  5. alexolivan

    alexolivan Member

    Hi thank you till for your feed back!

    OK... by studying the cluster guides I deduced the database structure logic (althoug not its working mechanism) and so I set all install in expert mode carefully creating a set of databases / users like that:

    ...................master...............slave1...............slave2...............slave3

    dbname......dbispconfig0.......dbispconfig1.......dbispconfig2.......dbispconfig3
    dbuser..........ispconfig0..........ispconfig1..........ispconfig2..........ispconfig3

    That table aided me on replying the install scripts, it has its logic regarding the replication since there is no room for conflict, and overall it works great after fresh install.

    That's what config.inc.php has on the 'master' server

    //** Database
    $conf['db_type'] = 'mysql';
    $conf['db_host'] = 'localhost';
    $conf['db_database'] = 'dbispconfig0';
    $conf['db_user'] = 'ispconfig0';
    $conf['db_password'] = 'b94f8d3aa277247763ceca5d3ce672c0';
    $conf['db_charset'] = 'utf8'; // same charset as html-charset - (HTML --> MYSQL: "utf-8" --> "utf8", "iso-8859-1" --> "latin1")
    $conf['db_new_link'] = false;
    $conf['db_client_flags'] = 0;

    define('DB_TYPE',$conf['db_type']);
    define('DB_HOST',$conf['db_host']);
    define('DB_DATABASE',$conf['db_database']);
    define('DB_USER',$conf['db_user']);
    define('DB_PASSWORD',$conf['db_password']);
    define('DB_CHARSET',$conf['db_charset']);


    //** Database settings for the master DB. This setting is only used in multiserver setups
    $conf['dbmaster_type'] = 'mysql';
    $conf['dbmaster_host'] = '';
    $conf['dbmaster_database'] = 'dbispconfig';
    $conf['dbmaster_user'] = '';
    $conf['dbmaster_password'] = '67583cbb1f9e60334db4aa7c812fe64c';
    $conf['dbmaster_new_link'] = false;
    $conf['dbmaster_client_flags'] = 0;

    I do not see any reference other than localhost anywhere... I will check the same file on slaves, compare, and look for the logic.
    I have also put query log on on one of the slaves to see what happens, but i do not see nothing relevant to me...


    EDIT:

    That's the same portion of file on one of the slaves...

    //** Database
    $conf['db_type'] = 'mysql';
    $conf['db_host'] = 'localhost';
    $conf['db_database'] = 'dbispconfig1';
    $conf['db_user'] = 'ispconfig1';
    $conf['db_password'] = 'd1bfbcd49b378a0b2400f3357b8e8c08';
    $conf['db_charset'] = 'utf8'; // same charset as html-charset - (HTML --> MYSQL: "utf-8" --> "utf8", "iso-8859-1" --> "latin1")
    $conf['db_new_link'] = false;
    $conf['db_client_flags'] = 0;

    define('DB_TYPE',$conf['db_type']);
    define('DB_HOST',$conf['db_host']);
    define('DB_DATABASE',$conf['db_database']);
    define('DB_USER',$conf['db_user']);
    define('DB_PASSWORD',$conf['db_password']);
    define('DB_CHARSET',$conf['db_charset']);


    //** Database settings for the master DB. This setting is only used in multiserver setups
    $conf['dbmaster_type'] = 'mysql';
    $conf['dbmaster_host'] = '192.168.1.11';
    $conf['dbmaster_database'] = 'dbispconfig0';
    $conf['dbmaster_user'] = 'ispcsrv5';
    $conf['dbmaster_password'] = 'f663c0cbc08d0ecc2f64b6bfd049eda0';
    $conf['dbmaster_new_link'] = false;
    $conf['dbmaster_client_flags'] = 0;

    I see something strange or maybe I'm wrong but.... being the master database "dbispconfig0" ... why master config file states master database as "dbispconfig"?
    Also username and host fields are empty while on the slave are set and pointing to the master.
    Is it possible that manipulating the server config on the GUI the file gets corrupted? (all this is the second time that occurs, and after every fresh install I whipe out any trace of old install atempts, including ensuring the whole /usr/local/ispconfig folder is ereased.... And the most curious thing it that it workied for 20 minutes)
     
    Last edited: Nov 21, 2014
  6. till

    till Super Moderator Staff Member ISPConfig Developer

    the master database details are not used on the master, so the defaults are entered there. It does not matter what is in these settings on a master as the software will ignore it anyway.

    No. This file can not be altered trough the gui. The config that you see as server config in the gui i stored in the server table of the database.
     
  7. alexolivan

    alexolivan Member

    aha... OK so it is correct.

    so I'm wondering if the slaves are having trouble connecting to the master database... so I shoul enable query logging on the master.

    Also interesting is to compare grants and config (although I ignore internals it may put light on this...) so I compare config.inc.php master database connection details for slave whit grants for its created database bind user...

    +---------------------------------------------------------------------------------------------------------------------------------+
    | Grants for [email protected] |
    +---------------------------------------------------------------------------------------------------------------------------------+
    | GRANT USAGE ON *.* TO 'ispcsrv13'@'192.168.30.21' IDENTIFIED BY PASSWORD '*68AEA822ED1FB0F2C491C4611199C75807061BAA' |
    | GRANT SELECT, INSERT, DELETE ON `dbispconfig0`.`web_backup` TO 'ispcsrv13'@'192.168.30.21' |
    | GRANT SELECT, UPDATE, DELETE ON `dbispconfig0`.`aps_instances` TO 'ispcsrv13'@'192.168.30.21' |
    | GRANT SELECT, UPDATE (status) ON `dbispconfig0`.`software_update_inst` TO 'ispcsrv13'@'192.168.30.21' |
    | GRANT SELECT ON `dbispconfig0`.`sys_group` TO 'ispcsrv13'@'192.168.30.21' |
    | GRANT SELECT, INSERT, DELETE ON `dbispconfig0`.`monitor_data` TO 'ispcsrv13'@'192.168.30.21' |
    | GRANT SELECT, UPDATE (ssl_key, ssl_action, ssl_request, ssl_cert) ON `dbispconfig0`.`web_domain` TO 'ispcsrv13'@'192.168.30.21' |
    | GRANT SELECT, INSERT ON `dbispconfig0`.`sys_log` TO 'ispcsrv13'@'192.168.30.21' |
    | GRANT SELECT, INSERT, UPDATE ON `dbispconfig0`.`web_traffic` TO 'ispcsrv13'@'192.168.30.21' |
    | GRANT SELECT, DELETE ON `dbispconfig0`.`aps_instances_settings` TO 'ispcsrv13'@'192.168.30.21' |
    | GRANT SELECT, INSERT, UPDATE ON `dbispconfig0`.`mail_traffic` TO 'ispcsrv13'@'192.168.30.21' |
    | GRANT SELECT, UPDATE (updated) ON `dbispconfig0`.`server` TO 'ispcsrv13'@'192.168.30.21' |
    | GRANT SELECT, UPDATE (status, error) ON `dbispconfig0`.`sys_datalog` TO 'ispcsrv13'@'192.168.30.21' |
    | GRANT SELECT, UPDATE (response, action_state) ON `dbispconfig0`.`sys_remoteaction` TO 'ispcsrv13'@'192.168.30.21' |
    +---------------------------------------------------------------------------------------------------------------------------------+

    //** Database settings for the master DB. This setting is only used in multiserver setups
    $conf['dbmaster_type'] = 'mysql';
    $conf['dbmaster_host'] = '192.168.1.11';
    $conf['dbmaster_database'] = 'dbispconfig0';
    $conf['dbmaster_user'] = 'ispcsrv13';
    $conf['dbmaster_password'] = 'd7362bccf029a83ea4b930e108a68580';
    $conf['dbmaster_new_link'] = false;
    $conf['dbmaster_client_flags'] = 0;

    Is this a password mismatch?
     
  8. till

    till Super Moderator Staff Member ISPConfig Developer

    This "d7362bccf029a83ea4b930e108a68580" is the cleartext password (so you should change that now on your server after you posted it) while (*68AEA822ED1FB0F2C491C4611199C75807061BAA) is the encrypted password.

    your config file snipped above must be wrong (you entered wrong things in the ispconfig installer):

    $conf['dbmaster_host'] = '192.168.1.11';

    the installer aksed you to enter the hostname of the mysql server and not the IP. This is not optional to enter an IP there. If you enter the IP instaed of the requested hostname, your setup must break. Take a look at the multiserver tutorials, you can see there that always the hostname is used and that you have to configure these hostnames in /etc/hosts on all servers of the cluster first.
     
  9. alexolivan

    alexolivan Member

    I'm very sorry I changed this on the post just because i didn wanted to post domain names... (it is all research so i do not care on passwords) but i do on actual real domain name.

    I do actually set hostname during install.
    the ip entry you saw is edited by me.... again... very sorry

    //** Database settings for the master DB. This setting is only used in multiserver setups
    $conf['dbmaster_type'] = 'mysql';
    $conf['dbmaster_host'] = 'master.mycrazytestingdomain.es';
    $conf['dbmaster_database'] = 'dbispconfig0';
    $conf['dbmaster_user'] = 'ispcsrv13';
    $conf['dbmaster_password'] = 'd7362bccf029a83ea4b930e108a68580';
    $conf['dbmaster_new_link'] = false;
    $conf['dbmaster_client_flags'] = 0;

    here I have changed the domain part, just for forum privacy... but the setup was done with hostnames, and I have taken very care of /etc/hosts contents on every server, so it is correctly set up.
    Again....it all worked for 20 minutes before crashing...

    I'm thinking on reinstall again it all and check if the status / logs are updating and comunication is up.
    Put just one server into mirror and compare behaviour...
    But I'm reluctant to solve matters whiping everything up and reintalling... in real production this would not be an easy option :eek:
     
  10. till

    till Super Moderator Staff Member ISPConfig Developer

    you can test the connection like this, login to the slave, then run:

    mysql -h master.mycrazytestingdomain.es -u ispcsrv13 -p'd7362bccf029a83ea4b930e108a68580' dbispconfig0

    if the login works, then the mysql connection is fine.
     
  11. alexolivan

    alexolivan Member

    I think you got it!

    nameserver mismatch! :eek:
    I has been modified...

    Post results.... but I feel we got it....

    We got a problem in /etc/hosts file (it was my full fault I personally did a "clean up" of the /etc/hosts file, because it was a mess of comments of trials...so I rearranged it) and I commited a typo!!! .... as I distributed it the error was extended to all slaves...)
    I now can connect from slaves to master manually as you proposed.

    So I'm waiting wether the cluster resurrects...
    will post feed back on it
     
    Last edited: Nov 21, 2014
  12. alexolivan

    alexolivan Member

    mmmmm :confused:

    I think there's something going on here.... my /etc/hosts are being overwriten, I heve to investigate why...

    Anyhow one of the slaves is back to the cluster... I got logs of it osn states and it has updated its nginx vhosts setup with the single demo site

    The other two slave do can conect with mysql command providing their setup users and passwords on the file, so they can connect to the databse.
    They have but not updated nginx setup and do not have log reporting back to server... have to investigate why and monitor status of /etc/hosts files

    EDIT:
    May the problem with /etc/hosts being caused because doing things on a screen session, inside a ssh, I use screen to prevent ssh droputs in the midle of ispconfig install script bu editing hosts file directly in ssh prompt seems to make it permanent.

    I can confirm the two remaining rogue slave server can connect with their credentials and are for a while able to do it sinc /etc/hosts is not changed... but they refuse to join the cluster...
     
    Last edited: Nov 21, 2014
  13. till

    till Super Moderator Staff Member ISPConfig Developer

    As a workaround, use this:

    chattr +i /etc/hosts

    after you cahnged it to the correct values.you can remove the protection with chattr -i /etc/hosts at any time.

    The most liely reason for changes in that file is that the server is a vm and the virtualisation software changed it or a network configuration software or dhcp is changing it.

    if slaves are not writing data: http://www.faqforge.com/linux/debugging-ispconfig-3-server-actions-in-case-of-a-failure/
     
  14. alexolivan

    alexolivan Member

    I'm not sure if is our hypervisor what changes /etc/hosts.... maybe I have never saw this unless using VZcontainers and changing hostname on the hypervisor
    Any how the inmmutable file trick is cool!:D

    Nw for the debug...
    Followed instructions

    Here at first rebel server:
    21.11.2014-11:52 - DEBUG - There is already an instance of server.php running. Exiting.

    freezed ?

    EDIT:
    21.11.2014-11:57 - DEBUG - There is already an instance of server.php running. Exiting.
    On the second rebel


    EDIT:

    ps aux | grep server.php
    root 16530 0.0 0.3 304392 24832 ? S 10:57 0:00 /usr/bin/php -q /usr/local/ispconfig/server/server.php

    ps aux | grep server.php
    root 22174 0.0 0.3 302548 25100 ? S 10:55 0:00 /usr/bin/php -q /usr/local/ispconfig/server/server.php

    Those processes have almost one our running... are they daemons? launched from cron?.... dont think so... will compare with running slave...
     
    Last edited: Nov 21, 2014
  15. alexolivan

    alexolivan Member

    :confused: I tried to gracefully close those processes, since the workin slave does not have any permanent server.php running.

    So I tried SIGTERM on them, and ended with a zombie crawling on every system:

    ps aux | grep server.php
    root 16530 0.0 0.3 304392 24832 ? D 10:57 0:00 /usr/bin/php -q /usr/local/ispconfig/server/server.php

    ps aux | grep server.php
    root 22174 0.0 0.3 302548 25100 ? D 10:55 0:00 /usr/bin/php -q /usr/local/ispconfig/server/server.php

    I feel this is what stops the system to resume operation.
    I have to get ride of these zombies, rebooting may be last resort...
     
    Last edited: Nov 21, 2014
  16. alexolivan

    alexolivan Member

    OK... get ride of the Defunct processes

    here I go...

    /usr/local/ispconfig/server/server.sh
    21.11.2014-12:28 - DEBUG - There is already a lockfile set. Waiting another 10 seconds...
    21.11.2014-12:29 - DEBUG - There is already a lockfile set. Waiting another 10 seconds...
    21.11.2014-12:29 - DEBUG - There is already a lockfile set. Waiting another 10 seconds...
    21.11.2014-12:29 - DEBUG - There is already a lockfile set. Waiting another 10 seconds...
    21.11.2014-12:29 - DEBUG - There is already a lockfile set. Waiting another 10 seconds...

    lockfile?
     
  17. alexolivan

    alexolivan Member

    OK.... removing lockfile aided

    The script lasted long first time.... but Have 2 out of 3 server online! :D

    The last one is giving me trouble wit glusterfs / imap ports conflict... this is something serious... but will manage to workaround....
     
  18. alexolivan

    alexolivan Member

    SOLVED

    glusterfs / dovecot / fstab entries conflict is solvable by editing lsb tags in order to control boot order...

    All systems synced and cleanly rebooting.

    Shoud mark this thread as solved!

    Thank you very much Till!!! :)
     

Share This Page