Hi, I hope this is the right place to post this since I saw others posting drbd/heartbeat questions here. If there is some mailing list or forum that deals specifically with such things please director me to it. I have set up 2 identical PCs running Gentoo. They both have DRBD v0.7.21 and Heartbeat v1.2.7 (using ldirectord) installed. I am going for a Hot-Standby (active/passive) system. I have setup the network with 1 ethernet cable for connecting both nodes to a LAN (eth1). There is a crossover ethernet cable hooking up the 2 pcs directly (eth0), this is dedicated for DRBD replication. Heartbeat is set up to use eth1 to connect to the LAN and send heartbeats. Both nodes are started and everything is find and they share the virtual ip address perfectly. The failover works fine if I test it by turning off hearbeat on the primary node. It also works fine if I unplug the power supply from the primary node. But if I unplug the eth1 network cable the ip address fails-over but it doesn't switch the DRBD disk. The drbd disk remains mounted on the primary node but not on the secondary node even though heartbeat failed over to the secondary node and the secondary node took over the virtual ip address. The only way I got this to work is by unplugging both the cross over cable (eth0) and the network cable (eth1) at the same time. So drbd gets cut off and so does the network, only then does the secondary node take over both the drbddisk and the ip all together. seeing this, I decided to use just 1 interface for both drbd and heartbeat (eth1). Simulating a network failure (unplug the eth1 cable) both failover. Then when I reconnect the cable it fails back automatically even though I set autofailback off ! Not only that, data in drbddisk does not get replicated to the other node once connected. Doing a cat /proc/drbd shows both disks in a consistant state some how. If I set the drbd conf to go Stand onle instead of reconnect, and I handle the drbd disks manually the data does get replicated! This is through me using drbdadm commands, then running heartbeat on the failed node to fail back to. Basically I don't know why this is happening. What I wan Code: t is a Active/Passive hot-standby setup. I want drbd on a crossover cable and heartbeat on a network. Once one node fails the other should take over, when the failed node comes back online it should NOT failback, I want the admin to tend the node and he should decide wheather to failback or not; that way drbd can do a sync. Below is the drbd.conf file I am using (at least the relevant parts) Note: 172.22.0.x is the crossover cable Note: 192.168.1.x is the LAN network Code: resource mirror { protocol C; incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; startup { degr-wfc-timeout 20; # 20 seconds } disk { on-io-error detach; } net { ko-count 4; on-disconnect stand_alone; #on-disconnect reconnect; } syncer { rate 10M; group 1; al-extents 257; } on gentoo1 { device /dev/drbd0; disk /dev/sda4; address 172.22.0.1:7788; meta-disk internal; } on gentoo2 { device /dev/drbd0; disk /dev/sda4; address 172.22.0.2:7788; meta-disk internal; } } This is the ha.cf config file Code: logfile /var/log/ha-log logfacility local0 keepalive 1 deadtime 15 warntime 5 bcast eth1 auto_failback off node gentoo1 node gentoo2 This is the haresources file Code: gentoo1 drbddisk::mirror Filesystem::/dev/drbd0::/ha::reiserfs 192.168.1.3/8/eth1 ldirectord apache2 Thanks in advance!
meh, so much for this post. The nodes were split-brained. Fixing it was to add a STONITH device (managable remote power switch) to each node.