Server big problem... please help :(

Discussion in 'Server Operation' started by draven76, May 9, 2007.

  1. draven76

    draven76 New Member

    Systems: 1 linux RHEL 4 ES box (call it penguin) with postfix relaying mail to 1 windows box with exchange (joker).

    Rules for relaying are: recipient's address domain must match one of our domains OR sender is an auth. user. We have some additional antispam rules.

    Outgoing email doesn't go thru penguin if coming from inside the lan but will be handled by exchange on joker. Exchange accept email from anyone, i know this is not very good but joker it's not exposed to the internet so only lan users can use it to relay mails. Penguin handle only outgoing emails from our users that are not working on site (we got some users with notebooks moving around the country).

    Yesterday penguin was nearly freezed. I could move the mouse pointer and try to launch programs from the gui but the server would not agree to execute my commands ;( so i try to reboot. It seemed to start closing session... it partially closed the gui and then nothing more... i had the background image, the mouse pointer and a lot of ??? in my mind. Then i went a bit angry (was also hammered by 40 users asking why they weren't receiving their precious emails! ;) ) and shutted down penguin with a lot of brute force applied to the power button ;).
    At reboot during services loading penguin slowed down a lot and after something as 20 min booted up. Postfix wasn't working, in the log all emails got deferred because connections to 127.0.0.1 were refused. I started investigating and did one change to hosts: i found two different ip in the list with one machine name (127.0.0.1 penguin and xx.yy.zz.ww penguin) and fixed the situation then rebooted. Still very slow but now postfix told me it was working the queue. Penguin started to execute my commands, very slowly, but i could do a service httpd restart and see the effects. I made the firewall route incoming smtp to joker (god protect us from spammers... if they find our free relaying server we will be writtend forever in spam server lists) and let penguin take a breath. Executed up2grade and now i succeded upgrading a lot of services, even kernel. Penguin seems faster now but i want to check if all it's ok. Still have postfix working a 300 emails queue. How can i know if the system is at "full speed"? And after the general test of the system... how to test postfix?

    Thanks

    Dario palermo
     
  2. falko

    falko Super Moderator Howtoforge Staff

    I think you should check all your logs for errors, then take a look at
    Code:
    top
    Maybe a monitoring solution such as munin would be good so that you can see a graph of your system load. And maybe you should check your system for rootkits.
     
  3. draven76

    draven76 New Member

    Thanks for the tips

    I used the standard system monitor in the "slow processing" phase and it was saying 3% of cpu load with standard operations (postfix running, very slow but running). Anyway... i eventually made the updates and the queue is now empty. I'll surely install munin as you suggested. I'd like some benchmarking tools too, just some plain cpu & disk test to get some data to compare to other similar system... to be sure my system is running at a decent speed. Any more suggestion? :)
     
  4. falko

    falko Super Moderator Howtoforge Staff

Share This Page