Hi all, I've got 10 servers which all using the exact same way of taking backups. They all mount the backup storage over ssh. Only on two of them, I'm getting the following warning every day after the backup runs: Backup directory /var/backup could not be unmounted The command /usr/local/ispconfig/server/scripts/backup_dir_umount.sh failed. Under server config I got selected the option that is a remote mount. My fstab entry works fine as I can mount and umount manually without a problem. This is the specific entry in fstab just for reference: [email protected]:/mnt/raid5/email1 /var/backup fuse.sshfs defaults 0 0 This is what I got in my backup_dir_umount.sh Code: [email protected]:~# cat /usr/local/ispconfig/server/scripts/backup_dir_umount.sh #!/bin/bash umount /var/backup Running backup_dir_umount.sh manually mounts fine the remote storage and also the backup_dir_umount umount it. It just throws that error in my email every day but backups are stored fine on it. How on earth I can see why it throws that error? ISPconfig is 3.2 Thanks PS: By the way, when the 3.2.1 will be available for download?
When it is ready. You can follow developer info to find out more exactly. My guess is before end of month. Back to your problem. What kind of setup is this? What is in the script file backup_dir_mount.sh ? umout usually fails because some process is still using that mount as working directory. Try man lsof and something like Code: lsof /var/backup before the umount to see which processes have which files open.
The backup_dir_mount.sh just mounts the directory and all it has is: Code: #!/bin/bash if ! mount | grep -q backup; then mount /var/backup || echo "Couldn't mount" && exit fi How I know when the back is about to finish to run the lsof /var/backup? That is running 3am and as mentioned in my post when I run the scripts manually including the backup it doesn't produce any error and mounts/umounts fine.
Put the command in the unmount script. You could try adding a sleep command to wait a little before unmounting. Once you can identify what processes are using the mount, loop and wait on them to finish before unmounting. Also if you would, post here what you find, I'd be curious what is using the mount, if it's something from how the backup system now works, or just something local.
@Jesse Norell I had sleep 5m and was still the same. I will raise the value and also include the lsof to see what will happen.
@Jesse Norell I've added a sleep of 45m and still the same but the storage is not mounted; it just throws that email warning. The functionality works as expected.
Just clarifying, with a 45min sleep time, the storage is unmounted, but you also get an email with the warning "Backup directory /var/backup could not be unmounted The command /usr/local/ispconfig/server/scripts/backup_dir_umount.sh failed." ? Were you able to tell what processes were using it? (Via lsof, or even just browsing through ps ouput?)
Yes @Jesse Norell exactly that is happening. I changed the backup time yesterday to run earlier so I can monitor the process but it didn't run. I guess I have to run ispconfig cron or something else before it gets the value? Once I switch to a later time it worked.
I got some progress....in one of the server that I was getting the email I didn't receive today. I logged in and saw that /var/backup was still mounted. Code: [email protected]:~# lsof /var/backup COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME php 31042 root 5r DIR 0,46 4096 84 /var/backup/mail6 [email protected]:~# ps ax | grep 31042 14208 pts/0 S+ 0:00 grep 31042 31042 ? S 0:01 /bin/php -q -d disable_classes= -d disable_functions= -d open_basedir= /usr/local/ispconfig/server/cron.php
@Jesse Norell shall I report this to gitlab? For some reason the ispconfig cron still using the mount while there is nothing to backup.
Would you have multiple backups setup to run at different times, where a later one fires off while an earlier hasn't finished yet?
Yes I do. I can change the times if you think that is causing the problem. Today one more problem appeared. I got the warning email from email3 server which when I logged in the remote path wasn't mounted and the backup finished ok. On the other hand on email1 server, I got no emails but the remote storage is still mounted by the same process (cron) and looks that still is running (is not getting any backup as that started 9 hours ago and took just an hour or so to finish). This is the output: Code: [email protected]:~# lsof /var/backup COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME php 318 root 5r DIR 0,46 4096 84 /var/backup/mail6 [email protected]:~# ps ax | grep 318 318 ? S 0:00 /bin/php -q -d disable_classes= -d disable_functions= -d open_basedir= /usr/local/ispconfig/server/cron.php 23133 pts/0 S+ 0:00 grep 318 [email protected]:~#
It would probably be worth filing an rfe in the issue tracker for the server to track ongoing backups and not run the unmount script if any other backups have not completed. For now, either change your script to only run if nothing is using the mount point (eg. based on lsof output), or just kill the error (add " 2>/dev/null") and the last running unmount should get the job done. Ie. if I read that correctly, a backup finished and 8 hours later the php process hasn't exited? See what shows up in both ispconfig.log and cron.log if you turn up the debug level in both server config and $conf['log_priority'] in /usr/local/ispconfig/server/lib/config.inc.php. Also see what other processes would be related, eg. "ps --ppid 318", or look up the process group id for the running php daemon ("ps -p 318 -o pgrp") and see what else is in that group ("ps -wg ####").
Hey there, especially Stelios, did you find out anything else? I do have the same problem and don't find a reason why this happens. Everything works as expected, every morning I have a look - backups are done and backup is unmounted ... but the error email is there. lsof /var/backup in /usr/local/ispconfig/server/scripts/backup_dir_umount.sh doesn't show anything (before umounting, yes). I'n not all updated with ISPConfig right now (3.2.5, will do it soon) but I figure there was no change to that since the bug-report didn't close. Thank you for your hints.
Just checked right now after Mail came again. The normal backup-processes are still running, I do see especially gzip/mysqldump of all websites one after another (webxxx). Did anybody follow on this or shall I ignore it by using the hint from Jesse Norell (add " 2>/dev/null"). If so, add it to the line "umount /var/backup"? Thank you for help again.
@jeensg sorry for the late reply. This is what I got in my backup_dir_umount.sh and seems to work ok. Code: #!/bin/bash sleep 2m umount /var/backup
Thanks for your answer! Unfortunately this doesn'T work for me. I tried exactly the same content, but it won't work. I also tried sleep 60m , but then the same email arrives just one hour later as normal, although nothing is using the mount-point anymore. I don't get it. Also the "lsof /var/backup" doesn't appear in the mail, altough the mountpoint is still used / shown when logged in. Next thing I'll try is the "2>/dev/null" - but actually I'm not satisfied at all with that, because I will not get any error messages then :-(
So, another Try the following I put in my umount-script-file (yes, MISspelling the last mount-point): Code: #!/bin/bash sleep 2m lsof /var/backup echo test umount /var/backu This resulted in 2 (!) error-mails, the last one 8 minutes after the first one (and the mount-point is NOT unmounted, as expected). If I see it right, somehow the script is triggered twice, right? But why? Does anybody know how I could check this? btw: the lsof-command has no output in the email ... how can I produce this output?
Next try, I'll stay up to it so I corrected the lsof-part in the umount-script to really get an email: Code: #!/bin/bash lsof /var/backup | mail -s lsof-check root umount /var/backup -> approximately 4 min after Backup-start in night, the email with lsof-check arrives and tells me, that a php-process is still running (backup of a mail-account/-domain) -- the same time an email is arriving, that the mount-point could not be unmounted -> 10min after Backup-start another lsof-check-mail arrives saying, that nothing is going on ... which means, that nothing is using the mount-point anymore and the umounting was working as expected The question remains why the unmount-script is triggered too soon ... and therefor twice. I hope somebody can help me. (would it be better to open another thread? Please also tell me. Or another bug report somewhere, if it is one?) btw: does anybody on how to find out which domains/accounts are associated with folder "mailxy"? For the websites there is an ID in ISPConfig, for mails unfortunately not.