Hi, i have 2 identical servers both with ubuntu and Xen to host virtual servers on.. and on the both dom0 i have drbd, nfs and heartbeat to get files replicated between them both.. And now to the problem Both the servers stops working sometimes and refuse to do anything, but i can still ping them both. but can't access any port like http, ssh, ftp. and i have checked syslog and kernel log but can not se anything that tells me that something went wrong. The both servers is have now the same problem and i can't write anything from the logs. Next weekend im gona go to the place where they are hosted (450 km from here). and it would be helpfull if i could get any hints about what the problem could be. i have a gut feeling that this has something about drbd/nfs to do becaus one time when i was moving 30-40 gb to it both died in the same way. Any tips are welcome.
I'd install munin on both servers ( http://www.howtoforge.com/server-monitoring-with-munin-and-monit-on-debian-lenny ). That should make it easier to track down where the problem comes from (e.g. full hard drive, not enough memory or swap, etc.).
Are you using a bridge with static IPs to the Virtual hosts or a bridge with DHCP assigned IPs? I had this problem on a CentOS 5.4 server with Xen when I was using DHCP assigned IPs. Switched to static (outside of my DHCP range, but still within the subnet) and everything started working fine.
Falko, ill try that Mosquito, all the virtual servers and xen host's have static ip address Saw something strange today on the servers.. tried to access ssh but it dosen't answer anyting.. not "No route to host" or "Refused". so the machine seems to be alive still.
This is the only strange thing i find in the server log. Can this be the reason that the server is always dieing?
I think i have found the problem. I installed the same version of Xen (3.3) on Ubuntu 9.10, same as the server and i get exacly the same problem as on the servers. the problem is that when the kernel is booting it just get black screen when it is starting/been online for a few hours. i can ping the server but nothing else. im using kernel: linux-image-2.6.24-16-xen_2.6.24-16.30zng1_i386 So im gona try to install Xen 4.0.0 with a newer kernel and se if that works
When they are down are you trying to access the server(s) by domain or by IP? If by domain it could be a DNS isue.
Now this night i got the same problems. both died almost at the same time. between 6-7 am. the second server died first and everyting was pointed to the first server. Yes i try to ssh directly to the ip and not to the dns
falko. i think you have right, it haves to be some crontab that is messed up when i gave it a hard think i realized that i have tryed to move everything to the servers 2 time Both times night between Thursday and friday And both servers died 06:00-07:00 And this has happend twice, so it needs to be cron. ill let you know when i have tryed that!
i have now stopped cron.daily and have created a script that creates 40 gb files of different size, and then i tar everything and untar it, then it starts over, so now i just need to wait and se what happends
I stopped everything in cron.daily, hourly, weekly and monthly and everything seemd to work ok. but this morning server 2 died. and then i looked on server 1 and found out that motd-update and vnstat vas in cron.d folder and stopped it on the server 1 and it has not died yet. So i started digging in the logs on server1 for information from the resent crash and found out these things. and after these things the logs dosent say anything untill manually reboot. it would be nice if someone had the time to look at the logs and se if you can se why it dies Kernel log: Messages log: Syslog:
And now it seems that server 1 is dead, when i try to ssh the xen virt server on server 1 it asks for the password but when type the password nothing happends and trying to ssh server 1 it dosent do anything, not timeout or something like that.
I have been looking around on Xen lists and i have noticed other pepole with similar problems. and it seems to be a bug in the xen kernel on DELL R200-R300 machines. so i think i need to fix that or upgrade to 4.0 or something