Hi all, My environtment : 2 servers , Centos 6.4 64 bit , MariaDB-server-10.2.0-1.el6.x86_64 , 2 node are running normally , after restart mysql on node1 10.10.201.3 , it failed to start , mysql on node2 10.10.201.4 continue running without problem. /etc/my.cnf.d/server.cnf Configuration on node1 and node2 node1 bind-address=10.10.201.3 datadir=/opt/mysql socket=/opt/mysql/mysql.sock handlersocket_address="10.10.201.3" handlersocket_port="9998" handlersocket_port_wr="9999" open_files_limit = 25600 log-error=/opt/mysql/log/mysqld.log [galera] # Mandatory settings wsrep_on=ON wsrep_provider=/usr/lib64/galera/libgalera_smm.so wsrep_cluster_address="gcomm://10.10.201.4,10.10.201.3" wsrep_cluster_name='galera_cluster_www' wsrep_node_address='10.10.201.3' wsrep_node_name='www-node1' wsrep_sst_method=rsync wsrep_sst_auth=sst_userassword wsrep-slave-threads=16 binlog_format=ROW default_storage_engine=InnoDB innodb_autoinc_lock_mode=2 wsrep_log_conflicts=ON wsrep_provider_options="cert.log_conflicts=ON" wsrep_debug=ON wsrep_max_ws_size = 2G binlog_row_image = minimal node2 bind-address=10.10.201.4 datadir=/opt/mysql socket=/opt/mysql/mysql.sock handlersocket_address="10.10.201.4" handlersocket_port="9998" handlersocket_port_wr="9999" open_files_limit = 25600 [galera] # Mandatory settings wsrep_on=ON wsrep_provider=/usr/lib64/galera/libgalera_smm.so wsrep_cluster_address="gcomm://10.10.201.3,10.10.201.4" wsrep_cluster_name='galera_cluster_www' wsrep_node_address='10.10.201.4' wsrep_node_name='www-node2' wsrep_sst_method=rsync wsrep_sst_auth=sst_userassword wsrep-slave-threads=16 binlog_format=ROW default_storage_engine=InnoDB innodb_autoinc_lock_mode=2 wsrep_log_conflicts=ON wsrep_provider_options="cert.log_conflicts=ON" wsrep_debug=ON wsrep_max_ws_size = 2G binlog_row_image = minimal On node2 cluster information : MariaDB [(none)]> show status like 'wsrep%'; +------------------------------+--------------------------------------+ | Variable_name | Value | +------------------------------+--------------------------------------+ | wsrep_apply_oooe | 0.017353 | | wsrep_apply_oool | 0.000050 | | wsrep_apply_window | 1.021550 | | wsrep_causal_reads | 0 | | wsrep_cert_deps_distance | 24.564685 | | wsrep_cert_index_size | 48 | | wsrep_cert_interval | 0.021750 | | wsrep_cluster_conf_id | 69 | | wsrep_cluster_size | 1 | | wsrep_cluster_state_uuid | c07f825f-132f-11e6-b219-d7e841605104 | | wsrep_cluster_status | Primary | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000034 | | wsrep_commit_window | 1.005403 | | wsrep_connected | ON | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_sent | 0 | | wsrep_gcomm_uuid | 401f6755-71da-11e6-8244-9e88079ed6c4 | | wsrep_incoming_addresses | 10.10.201.4:3306 | | wsrep_last_committed | 2364263 | | wsrep_local_bf_aborts | 116 | | wsrep_local_cached_downto | 2221069 | | wsrep_local_cert_failures | 23 | | wsrep_local_commits | 729390 | | wsrep_local_index | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_avg | 0.004725 | | wsrep_local_recv_queue_max | 6 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_replays | 112 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_avg | 0.000335 | | wsrep_local_send_queue_max | 2 | | wsrep_local_send_queue_min | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_local_state_uuid | c07f825f-132f-11e6-b219-d7e841605104 | | wsrep_protocol_version | 7 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <> | | wsrep_provider_version | 25.3.15(r3578) | | wsrep_ready | ON | | wsrep_received | 1376816 | | wsrep_received_bytes | 630752657 | | wsrep_repl_data_bytes | 303429595 | | wsrep_repl_keys | 3039261 | | wsrep_repl_keys_bytes | 41097380 | | wsrep_repl_other_bytes | 0 | | wsrep_replicated | 729452 | | wsrep_replicated_bytes | 391211903 | | wsrep_thread_count | 17 | +------------------------------+--------------------------------------+ gvwstate.dat and grastate.dat on node2 cat gvwstate.dat my_uuid: 401f6755-71da-11e6-8244-9e88079ed6c4 #vwbeg view_id: 3 401f6755-71da-11e6-8244-9e88079ed6c4 71 bootstrap: 0 member: 401f6755-71da-11e6-8244-9e88079ed6c4 0 #vwend cat grastate.dat # GALERA saved state version: 2.1 uuid: c07f825f-132f-11e6-b219-d7e841605104 seqno: -1 cert_index: gvwstate.dat and grastate.dat on node1 cat gvwstate.dat # GALERA saved state version: 2.1 uuid: 00000000-0000-0000-0000-000000000000 seqno: -1 cert_index: cat grastate.dat # GALERA saved state version: 2.1 uuid: 00000000-0000-0000-0000-000000000000 seqno: -1 cert_index: mysqld.log on node1 when I try to start mysql 161125 11:51:32 mysqld_safe Starting mysqld daemon with databases from /opt/mysql 161125 11:51:32 mysqld_safe WSREP: Running position recovery with --log_error='/opt/mysql/wsrep_recovery.n5Z5Dc' --pid-file='/opt/mysql/www-node1-recover.pid' 2016-11-25 11:51:32 140135822641184 [Warning] option 'key_cache_block_size': unsigned value 65536 adjusted to 16384 2016-11-25 11:51:32 140135822641184 [Warning] option 'key_cache_block_size': unsigned value 65536 adjusted to 16384 2016-11-25 11:51:32 140135822641184 [Note] /usr/sbin/mysqld (mysqld 10.2.0-MariaDB) starting as process 27378 ... 161125 11:51:37 mysqld_safe WSREP: Recovered position c07f825f-132f-11e6-b219-d7e841605104:2354710 2016-11-25 11:51:37 139649876842528 [Warning] option 'key_cache_block_size': unsigned value 65536 adjusted to 16384 2016-11-25 11:51:37 139649876842528 [Warning] option 'key_cache_block_size': unsigned value 65536 adjusted to 16384 2016-11-25 11:51:37 139649876842528 [Note] /usr/sbin/mysqld (mysqld 10.2.0-MariaDB) starting as process 27445 ... 2016-11-25 11:51:37 139649876842528 [Note] WSREP: Setting wsrep_ready to 0 2016-11-25 11:51:37 139649876842528 [Note] WSREP: Read nil XID from storage engines, skipping position init 2016-11-25 11:51:37 139649876842528 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so' 2016-11-25 11:51:37 139649876842528 [Note] WSREP: wsrep_load(): Galera 25.3.15(r3578) by Codership Oy <> loaded successfully. 2016-11-25 11:51:37 139649876842528 [Note] WSREP: CRC-32C: using hardware acceleration. 2016-11-25 11:51:37 139649876842528 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1 2016-11-25 11:51:37 139649876842528 [Note] WSREP: Passing config to GCS: base_dir = /opt/mysql/; base_host = 10.10.201.3; base_port = 4567; cert.log_conflicts = ON; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /opt/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /opt/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = fa 2016-11-25 11:51:37 139647505303296 [Note] WSREP: Service thread queue flushed. 2016-11-25 11:51:37 139649876842528 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1 2016-11-25 11:51:37 139649876842528 [Note] WSREP: wsrep_sst_grab() 2016-11-25 11:51:37 139649876842528 [Note] WSREP: Start replication 2016-11-25 11:51:37 139649876842528 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1 2016-11-25 11:51:37 139649876842528 [Note] WSREP: protonet asio version 0 2016-11-25 11:51:37 139649876842528 [Note] WSREP: Using CRC-32C for message checksums. 2016-11-25 11:51:37 139649876842528 [Note] WSREP: backend: asio 2016-11-25 11:51:37 139649876842528 [Note] WSREP: restore pc from disk successfully 2016-11-25 11:51:37 139649876842528 [Note] WSREP: GMCast version 0 2016-11-25 11:51:37 139649876842528 [Note] WSREP: (00000000, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567 2016-11-25 11:51:37 139649876842528 [Note] WSREP: (00000000, 'tcp://0.0.0.0:4567') multicast: , ttl: 1 2016-11-25 11:51:37 139649876842528 [ERROR] WSREP: failed to open gcomm backend connection: 131: invalid UUID: 00000000 (FATAL) at gcomm/src/pc.cppC():271 2016-11-25 11:51:37 139649876842528 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():208: Failed to open backend connection: -131 (State not recoverable) 2016-11-25 11:51:37 139649876842528 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1379: Failed to open channel 'galera_cluster_www' at 'gcomm://10.10.201.4,10.10.201.3': -131 (State not recoverable) 2016-11-25 11:51:37 139649876842528 [ERROR] WSREP: gcs connect failed: State not recoverable 2016-11-25 11:51:37 139649876842528 [ERROR] WSREP: wsrep::connect(gcomm://10.10.201.4,10.10.201.3) failed: 7 2016-11-25 11:51:37 139649876842528 [ERROR] Aborting 161125 11:51:38 mysqld_safe mysqld from pid file /opt/mysql/www-node1.pid ended
After running mysqld_safe --wsrep-recover on node1 I have : 161125 14:48:28 mysqld_safe Starting mysqld daemon with databases from /opt/mysql 161125 14:48:28 mysqld_safe WSREP: Running position recovery with --log_error='/opt/mysql/wsrep_recovery.ByCKgy' --pid-file='/opt/mysql/myface-web1-recover.pid' 2016-11-25 14:48:28 140678609938464 [Warning] option 'key_cache_block_size': unsigned value 65536 adjusted to 16384 2016-11-25 14:48:28 140678609938464 [Warning] option 'key_cache_block_size': unsigned value 65536 adjusted to 16384 2016-11-25 14:48:28 140678609938464 [Note] /usr/sbin/mysqld (mysqld 10.2.0-MariaDB) starting as process 5457 ... 161125 14:48:33 mysqld_safe WSREP: Recovered position c07f825f-132f-11e6-b219-d7e841605104:2354710 2016-11-25 14:48:34 140580072642592 [Warning] option 'key_cache_block_size': unsigned value 65536 adjusted to 16384 2016-11-25 14:48:34 140580072642592 [Warning] option 'key_cache_block_size': unsigned value 65536 adjusted to 16384 2016-11-25 14:48:34 140580072642592 [Note] /usr/sbin/mysqld (mysqld 10.2.0-MariaDB) starting as process 5524 ... 2016-11-25 14:48:34 140580072642592 [Note] WSREP: Setting wsrep_ready to 0 2016-11-25 14:48:34 140580072642592 [Note] WSREP: Read nil XID from storage engines, skipping position init 2016-11-25 14:48:34 140580072642592 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so' 2016-11-25 14:48:34 140580072642592 [Note] WSREP: wsrep_load(): Galera 25.3.15(r3578) by Codership Oy <> loaded successfully. 2016-11-25 14:48:34 140580072642592 [Note] WSREP: CRC-32C: using hardware acceleration. 2016-11-25 14:48:34 140580072642592 [Note] WSREP: Found saved state: c07f825f-132f-11e6-b219-d7e841605104:2354710 2016-11-25 14:48:34 140580072642592 [Note] WSREP: Passing config to GCS: base_dir = /opt/mysql/; base_host = 10.10.201.3; base_port = 4567; cert.log_conflicts = ON; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /opt/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /opt/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = fa 2016-11-25 14:48:34 140577701267200 [Note] WSREP: Service thread queue flushed. 2016-11-25 14:48:34 140580072642592 [Note] WSREP: Assign initial position for certification: 2354710, protocol version: -1 2016-11-25 14:48:34 140580072642592 [Note] WSREP: wsrep_sst_grab() 2016-11-25 14:48:34 140580072642592 [Note] WSREP: Start replication 2016-11-25 14:48:34 140580072642592 [Note] WSREP: Setting initial position to c07f825f-132f-11e6-b219-d7e841605104:2354710 2016-11-25 14:48:34 140580072642592 [Note] WSREP: protonet asio version 0 2016-11-25 14:48:34 140580072642592 [Note] WSREP: Using CRC-32C for message checksums. 2016-11-25 14:48:34 140580072642592 [Note] WSREP: backend: asio 2016-11-25 14:48:34 140580072642592 [Warning] WSREP: access file(/opt/mysql//gvwstate.dat) failed(No such file or directory) 2016-11-25 14:48:34 140580072642592 [Note] WSREP: restore pc from disk failed 2016-11-25 14:48:34 140580072642592 [Note] WSREP: GMCast version 0 2016-11-25 14:48:34 140580072642592 [Note] WSREP: (90fd69ba, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567 2016-11-25 14:48:34 140580072642592 [Note] WSREP: (90fd69ba, 'tcp://0.0.0.0:4567') multicast: , ttl: 1 2016-11-25 14:48:34 140580072642592 [Note] WSREP: EVS version 0 2016-11-25 14:48:34 140580072642592 [Note] WSREP: gcomm: connecting to group 'galera_cluster_myface', peer '10.10.201.4:,10.10.201.3:' 2016-11-25 14:48:34 140580072642592 [Warning] WSREP: (90fd69ba, 'tcp://0.0.0.0:4567') address 'tcp://10.10.201.3:4567' points to own listening address, blacklisting 2016-11-25 14:48:34 140580072642592 [Note] WSREP: (90fd69ba, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 2016-11-25 14:48:34 140580072642592 [Note] WSREP: declaring 401f6755 at tcp://10.10.201.4:4567 stable 2016-11-25 14:48:34 140580072642592 [Note] WSREP: Node 401f6755 state prim 2016-11-25 14:48:34 140580072642592 [Note] WSREP: view(view_id(PRIM,401f6755,72) memb { 401f6755,0 90fd69ba,0 } joined { } left { } partitioned { }) 2016-11-25 14:48:34 140580072642592 [Note] WSREP: save pc into disk 2016-11-25 14:48:35 140580072642592 [Note] WSREP: gcomm: connected 2016-11-25 14:48:35 140580072642592 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636 2016-11-25 14:48:35 140580072642592 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0) 2016-11-25 14:48:35 140580072642592 [Note] WSREP: Opened channel 'galera_cluster_myface' 2016-11-25 14:48:35 140577632937728 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2 2016-11-25 14:48:35 140577632937728 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID. 2016-11-25 14:48:35 140577632937728 [Note] WSREP: STATE EXCHANGE: sent state msg: 91467fa4-b2e3-11e6-8fa3-7eafa2c96c2a 2016-11-25 14:48:35 140577632937728 [Note] WSREP: STATE EXCHANGE: got state msg: 91467fa4-b2e3-11e6-8fa3-7eafa2c96c2a from 0 (myface-web2) 2016-11-25 14:48:35 140577632937728 [Note] WSREP: STATE EXCHANGE: got state msg: 91467fa4-b2e3-11e6-8fa3-7eafa2c96c2a from 1 (myface-web1) 2016-11-25 14:48:35 140577632937728 [Note] WSREP: Quorum results: version = 3, component = PRIMARY, conf_id = 69, members = 1/2 (joined/total), act_id = 2365567, last_appl. = -1, protocols = 0/7/3 (gcs/repl/appl), group UUID = c07f825f-132f-11e6-b219-d7e841605104 2016-11-25 14:48:35 140580072642592 [Note] WSREP: Waiting for SST to complete. 2016-11-25 14:48:35 140577632937728 [Note] WSREP: Flow-control interval: [23, 23] 2016-11-25 14:48:35 140577632937728 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 2365567) 2016-11-25 14:48:35 140577605678848 [Note] WSREP: State transfer required: Group state: c07f825f-132f-11e6-b219-d7e841605104:2365567 Local state: c07f825f-132f-11e6-b219-d7e841605104:2354710 2016-11-25 14:48:35 140577605678848 [Note] WSREP: New cluster view: global state: c07f825f-132f-11e6-b219-d7e841605104:2365567, view# 70: Primary, number of nodes: 2, my index: 1, protocol version 3 2016-11-25 14:48:35 140577605678848 [Warning] WSREP: Gap in state sequence. Need state transfer. 2016-11-25 14:48:35 140577605678848 [Note] WSREP: Setting wsrep_ready to 0 2016-11-25 14:48:35 140577597290240 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.10.201.3' --datadir '/opt/mysql/' --parent '5524' --binlog '/opt/mysql/binlog/mariadb-binlog' ' 2016-11-25 14:48:35 140577605678848 [Note] WSREP: Prepared SST request: rsync|10.10.201.3:4444/rsync_sst 2016-11-25 14:48:35 140577605678848 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2016-11-25 14:48:35 140577605678848 [Note] WSREP: REPL Protocols: 7 (3, 2) 2016-11-25 14:48:35 140577701267200 [Note] WSREP: Service thread queue flushed. 2016-11-25 14:48:35 140577605678848 [Note] WSREP: Assign initial position for certification: 2365567, protocol version: 3 2016-11-25 14:48:35 140577701267200 [Note] WSREP: Service thread queue flushed. 2016-11-25 14:48:35 140577605678848 [Note] WSREP: Prepared IST receiver, listening at: tcp://10.10.201.3:4568 2016-11-25 14:48:35 140577632937728 [Note] WSREP: Member 1.0 (myface-web1) requested state transfer from '*any*'. Selected 0.0 (myface-web2)(SYNCED) as donor. 2016-11-25 14:48:35 140577632937728 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 2365567) 2016-11-25 14:48:35 140577605678848 [Note] WSREP: Requesting state transfer: success, donor: 0 2016-11-25 14:48:37 140577643427584 [Note] WSREP: (90fd69ba, 'tcp://0.0.0.0:4567') turning message relay requesting off 2016-11-25 14:49:38 140577632937728 [Warning] WSREP: 0.0 (myface-web2): State transfer to 1.0 (myface-web1) failed: -110 (Connection timed out) 2016-11-25 14:49:38 140577632937728 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():733: Will never receive state. Need to abort. 2016-11-25 14:49:38 140577632937728 [Note] WSREP: gcomm: terminating thread 2016-11-25 14:49:38 140577632937728 [Note] WSREP: gcomm: joining thread 2016-11-25 14:49:38 140577632937728 [Note] WSREP: gcomm: closing backend 2016-11-25 14:49:39 140577632937728 [Note] WSREP: view(view_id(NON_PRIM,401f6755,72) memb { 90fd69ba,0 } joined { } left { } partitioned { 401f6755,0 }) 2016-11-25 14:49:39 140577632937728 [Note] WSREP: view((empty)) 2016-11-25 14:49:39 140577632937728 [Note] WSREP: gcomm: closed 2016-11-25 14:49:39 140577632937728 [Note] WSREP: /usr/sbin/mysqld: Terminated. 161125 14:49:39 mysqld_safe mysqld from pid file /opt/mysql/myface-web1.pid ended WSREP_SST: [ERROR] Parent mysqld process (PID:5524) terminated unexpectedly. (20161125 14:49:40.204) WSREP_SST: [INFO] Joiner cleanup. rsync PID: 5567 (20161125 14:49:40.206) WSREP_SST: [INFO] Joiner cleanup done. (20161125 14:49:40.713) Please give me some advice. Thank you very much.