CentOS 7 Load Average and SMB

Discussion in 'Server Operation' started by GMSS, Nov 10, 2017.

  1. GMSS

    GMSS New Member

    Specifications:
    Linux Distro - CentOS 7
    CPU - 2 CPU's, 6 cores per CPU = 12 cores total
    RAM - 32Gb
    SAMBA - verison 4.6.2
    10 users connected to the SMB share at any one time.


    Hi Guru's,

    I have a very strange situation and I am currently at a lost on where to go or how to troubleshoot.
    Basically, my CentOS server seems to run really slow only on a Wednesday. The only applications I'm running on it are:

    SAMBA - file sharing
    CrashPlan - for backups online
    VNC Server

    On Wednesday, users connected to the SAMBA share sometimes get disconnected from the share but more often,
    access to the files are really slow.

    I noticed that on Wednesday and when I get a call from the end user saying its slow, the load average can fluctuate and go up to
    2.40. But seeing that I have 12 cores and it is only using 20% of its total number of cores when this happens, I can't see why this
    would be a factor. Am I wrong?

    Every other day (apart from Wednesday) the load average is between 0.5 - 1.1 and I get no reports of it being slow.
    The online back (Crashplan) doesn't run during the day so this cannot be causing the issue.There are no cron jobs
    created as well and no automated scripts running on the server from what I can tell.

    The only thing I can think of is that the SAMBA service is slowing the server down on Wednesday for some reason.

    How can I find out what process is using up the CPU and why does it spike only after mid-day on Wednesday?

    Many thanks in advance.

    Kind Regards
    GMSS
     
  2. GMSS

    GMSS New Member

    Hi Experts,
    Any advice on this conundrum?
    It doesn't make sense that the server would slow down even when the load average is just 2.5 when there are 12 cores (2 CPU x 6 Cores)
    Kind Regards
    GMSS
     
  3. till

    till Super Moderator Staff Member ISPConfig Developer

    Install a monitoring software like munin on your server to analyze which resources get used over the time.
     
  4. GMSS

    GMSS New Member

    Hi Till,
    Many thanks for your reply. I will attempt to get this done.
    I was hoping if you could guide me to analysing my server live time and to see why even when having 12 cores, when the 1 min load average reaches 2.4, should the SMB process run slow for users connecting to it?

    Kind Regards
    GMSS
     
  5. till

    till Super Moderator Staff Member ISPConfig Developer

  6. GMSS

    GMSS New Member

    Hi Till
    Thanks for this and that's good to know.:)
    Sorry for the confusion but what I meant by "guide me and live time" is to perhaps let me know what commands/steps (apart from installing Munin) I could use to try and troubleshoot this conundrum myself.

    Kind Regard
    GMSS
     
  7. till

    till Super Moderator Staff Member ISPConfig Developer

    Munin writes graphs of many relevant system parameters like load, io usage etc. let it run for a few days and then check the graphs to see if there are spikes in any of the graphs e.g. in the io graph, when you experience the slowing down.
     
  8. GMSS

    GMSS New Member

    Hi Till,
    Ok and will attempt to do so later in the week. Today is the day where I get users connecting to the SMB share say that it is running slow. So i will monitor "Top" and see what is eating the CPU in the interim period.

    Kind Regards
    GMSS
     
  9. till

    till Super Moderator Staff Member ISPConfig Developer

    You might want to use iotop as well, that's the top equivalent to monitor disk io.
     
  10. GMSS

    GMSS New Member

    Hi Till,
    We've now managed to find out what's causing our server to slow down on Wednesday.
    Here is what we did:
    Typed in: systemctl list-timers at the command prompt and this is what was returned:
    upload_2017-11-15_16-3-29.png
    It looks like there is process setup to clean up the tmp files which starts 8.50pm on Tuesday and doesn't finish till about 24hrs later on Wednesday at 8.50pm. So this is what is slowing down to server and causing the end user to slow down.

    I've now stopped this process by running the command:
    systemctl stop systemd-tmpfiles-clean.timer
    And the load is looking much better and users are much happier with the performance.

    Obviously this schedule is important to run on the server but can you tell me how I can reschedule this so that it runs on a Friday night instead?

    Kind Regards
    GMSS
     
  11. GMSS

    GMSS New Member

    Hi Till,
    Just to inform you and everyone that the previous thread was just a red haring, this is NOT what is causing our server to slow on Wednesday.

    I've now found out what the real cause is. It relates to "Patrol Read Mode" on a Dell server. By default the "Patrol Read Mode" on our Dell R510 is set to "Automatic" and it runs between the hours of 12pm-10pm on Wednesdays. This process reoccurs again 7 days later and is definitely the cause of why the server crawls only on Wednesday. Here is the link to see what this process does:
    http://www.dell.com/support/article/us/en/19/sln292303/dell-perc-controller-disk-patrol-read?lang=en

    I found this out by searching the logs (/var/log/messages) and running the command:
    more messages | grep "Server_Administrator" | more

    It then showed me that "Patrol Read" starts running at 12pm and ends at 10pm. At that point I knew that I'd nailed this conundrum.
    I've now changed this to run manually instead of automatic. To do this you will need to install Dell OMSA. Here is a snippet of the option in OMSA:
    upload_2017-12-19_15-13-59.png

    This has really taken me a long time to figure out and do hope it helps someone out there who may be having the same issue I had.

    Kind Regards
    GMSS
     
    till likes this.

Share This Page