How to rsync all users web folders?

Discussion in 'Technical' started by lonerunner, Sep 9, 2020.

  1. lonerunner

    lonerunner Member

    I want to use rsync to connect to a server and backup all user websites, i am just not completely sure what would be proper command since there are multiple users and multiple web folders for each user.

    Before i actually do something stupid on server i am kindly asking you guys what would be proper command to sync all.

    From my local pc i would connect to ssh and copy all web folders. Would command below do the work ?

    Code:
    rsync -avP 'ssh -p <port-number>' --stats --delete user@remote-server:/var/www/clients/client*/web*/web /path/to/local/folder
    I've read that using asterix * will copy all folders named with that name in that directory, so it should copy client2 client3 client4... and all web folders inside following the folder structure. Also removing trailing slash at the end will copy the folder it self rather just files.

    Would this command work as i imagined?
     
  2. Taleman

    Taleman Well-Known Member HowtoForge Supporter

    Youi can try what the command would do with --dry-run and --verbose.
    The command as given looks incorrect, there is only one parameter for startes. To copy you need from and to.
    When I use rsync, I read the man page chapter "USAGE" and copy commands from there.
    Do you have really large files or what is the reason for using --partial? It makes files on target that are not completely transferred stay there, possibly causing lots of confusion.
    My usual command looks like this:
    Code:
    rsync -avz /var/www/clients/ user@remote-server:/var/www/clients
     
  3. lonerunner

    lonerunner Member

    I did read the man page of rsync as well as few other website sources but i was not sure i would do command correct. This is what i understood and thought i need for transfering files

    Code:
    -a, --archive               archive mode; equals -rlptgoD
    -v, --verbose               increase verbosity
    -P                          same as --partial --progress
    
    I have read that using -z will increase resources usage on the server, i would rather keep transfering larger files and time then using more resources. I don't know how correct it is but few blogs have mentioned it.

    As the server is using different port for ssh i need this command

    Code:
    'ssh -p <port-number>'
    
    The last two commands

    Code:
    --stats                 give some file-transfer stats
    --delete                delete extraneous files from dest dirs < this means it will delete files not anymore on server
    
    Maybe instead using -p and --stats commands i could just replace it with --progress but i understood that --progress and --stats are different commands, one show stats at the end of transfer, other show progress during transfer.

    Most confusing part is how to copy only web folders from ALL users and ALL web folders.

    The command you are mentioning @Taleman will copy all from clients which in my case it will copy some unnecessary files like from one account it will copy folders like dev and bin and it will do infinite copy of those files. I left my pc to copy over night over ftp and in the morning one file was 15gb and keep copying.

    I thought that:
    Code:
    /var/www/clients/client*/web*/web
    
    will copy only stuff within web folder from all clients and clients web folders?
     
  4. Taleman

    Taleman Well-Known Member HowtoForge Supporter

    That is true. You can copy that way if that is your use case.
     
    lonerunner likes this.
  5. nhybgtvfr

    nhybgtvfr Well-Known Member HowtoForge Supporter

    one thing you can do is on the server holding the backups, is use hardlinks, creating a new backup folder with the date appended.

    so you're first backup will download every file for the site into eg site-8-10-2020
    the next rsync will compare the files against last nights backup, and only download new and changed files to site-9-10-2020, and create hardlinks to unchanged files in site-8-10-2020
    the next rsync will compare files against site-9-10-2020 and only download new and changed files to site-10-10-2020, creating hardlinks back to the original files.

    if you then delete the folder site-8-10-2020. it doesn't delete files/folders that have hardlinks to them.
    saves a hell of a lot of time on the backups, (and on diskspace), although you'll need to recreate the backup file system first, massively upping the inode limit.
    i know i'm not explaining it too well here, but basically you can limit the number of backups to eg 10, only ever doing incremental backups after the first full backup, and constantly removing the oldest site backup folder after each backup, but never losing the original full files unless a file happens to have been deleted on the live site long enough ago that it no longer appears in any still remaining backup archives.
    in fact, even though the site-10-10-2020 would only actually contain the latest new and changed files, if you did a resync of this folder back to your webserver, it would be a full site restore, it would restore in full, every file and folder.

    i did write a script to do this on an old linux nas, which is turned off and sat away in a corner somewhere. if i can find it, and it still loads, i can post the script here if anyone thinks it'll be useful.
    i just backup to aws s3 now....
     
    Last edited: Sep 11, 2020
  6. lonerunner

    lonerunner Member

    Thanks for the help @Taleman i got it tested and working, this is the command

    Code:
    rsync -avz -e 'ssh -p <port-number>' --stats --delete user@remote-server:/var/www/clients/client*/web*/web /path/to/local/folder
    
     

Share This Page