今天讲下工作中使用较多的MHA
首先需要确认一点的是,MHA实现的是Master的高可用。
贴一张MHA作者在 MySQL Conference and Expo 2011 上分享的图片
Manager Node 192.168.237.11(管理节点)
Data Node_1 192.168.237.12(数据节点,MySQL主库)
Data Node_2 192.168.237.13(数据节点,MySQL备库,候选主库)
Data Node_3 192.168.237.14(数据节点,MySQL备库)
(1)配置数据节点的iptables
配置数据节点的iptables,确保管理节点能访问到数据节点的MySQL端口
(2)打通SSH互信
正确配置管理节点和数据节点的域名解析,很重要!
管理节点及数据节点间打通SSH互信:
# ssh-keygen -t rsa # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.237.11 # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.237.12 # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.237.13 # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.237.14
(3)MySQL基础配置
所有数据节点配置相同的复制过滤规则(MHA在启动的时候会检查过滤规则,不同则启动失败)
开启候选主库的log-bin。管理节点会检测候选主库是否有配置log-bin,若没有该配置项,MHA将不会进行failover
RHEL7:
https://github.com/yoshinorim/mha4mysql-manager/releases
https://github.com/yoshinorim/mha4mysql-node/releases
RHEL6:
https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads
(1)配置CPAN源(阿里源为例)
cpan[1]> o conf urllist push http://mirrors.aliyun.com/CPAN/ Please use 'o conf commit' to make the config permanent! cpan[2]> o conf commit commit: wrote '/usr/share/perl5/CPAN/Config.pm' cpan[3]> o conf urllist urllist 0 [http://mirrors.aliyun.com/CPAN/] Type 'o conf' to view all configuration items
(2)配置EPEL源
RHEL6:
# yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm
RHEL7:
# yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
# yum clean all
# yum makecache
(1)更新libmysqlclient.so
# rpm -qa |grep mysql-libs mysql-libs-5.1.73-8.el6_8.x86_64 # rpm -e --nodeps mysql-libs-5.1.73-8.el6_8.x86_64
找一台MySQL拷贝libmysqlclient.so运行库文件 # cp /usr/local/mysql3306/lib/libmysqlclient.so.20 /usr/lib64/ # cp /usr/local/mysql3306/lib/libmysqlclient.so /usr/lib64/
(2)更新DBD-MySQL
# rpm -qa |grep perl-DBD-MySQL perl-DBD-MySQL-4.013-3.el6.x86_64 # rpm -e --nodeps perl-DBD-MySQL-4.013-3.el6.x86_64
# yum install perl-CPAN # ln -s /usr/local/mysql3306/bin/mysql_config /usr/local/bin/ cpan[1]> install DBD-MySQL
cpan[2]> install DBI
cpan[2]> install ExtUtils MakeMaker
(3)安装mha4mysql-node
# tar zxvf mha4mysql-node-0.56.tar.gz # cd mha4mysql-node-0.56 # perl Makefile.PL # make && make install
Tips:常见安装报错(一般是由于yum方式安装perl模块较旧导致)
Tip1、编译时出现Can't locate ExtUtils/MakeMaker.pm in @INC
https://github.com/Perl-Toolchain-Gang/ExtUtils-MakeMaker/releases # tar zxvf ExtUtils-MakeMaker-7.18.tar.gz # cd ExtUtils-MakeMaker-7.18 # perl Makefile.PL
# make && make install
Tip2、编译时出现Can't locate CPAN.pm in @INC
https://cpan.metacpan.org/authors/id/A/AN/ANDK/CPAN-2.26.tar.gz # tar zxvf CPAN-2.26.tar.gz # cd CPAN-2.26 # perl Makefile.PL
# make && make install
(2)设置relay log清除方式 mysql > set global relay_log_purge=0; my.cnf中设置 relay_log_purge=0 关闭该参数的原因是因为在默认情况下,从库上的relay log在SQL线程执行完后会自动被删掉。但是在failover过程中从库需要利用候选主库上的中继日志来补数据,所以MySQL默认打开的自动清除中继日志的功能需要关闭。
但是为了不撑爆磁盘就需要定期清除旧的relay log。清除relay log需要考虑复制延时的问题。在ext3文件系统下,删除大文件建议采用硬链接的方法。 关于relay log的清除,可参考以下文章:
http://daisywei.blog.51cto.com/7837970/1881154
(1)同数据节点一样,也需要安装 mha4mysql-node
(2)安装依赖包 # yum install gcc
cpan[1]> install Config::Tiny cpan[2]> install Log::Dispatch cpan[3]> install Parallel::ForkManager cpan[4]> install Time::HiRes
(3)安装mha4mysql-manager # tar zxvf mha4mysql-manager-0.56.tar.gz # cd mha4mysql-manager-0.56 # perl Makefile.PL # make && make install(4)修改配置文件
建立mha配置文件及脚本目录
# mkdir -p /etc/masterha/
复制源码包内的配置文件、脚本(app1.cnf、masterha_default.cnf 可以合并为一个文件)
# cp ./mha4mysql-manager-0.56/samples/conf/masterha_default.cnf /etc/
# cp ./mha4mysql-manager-0.56/samples/conf/app1.cnf /etc/materha/
# cp ./mha4mysql-manager-0.56/samples/scripts/* /etc/masterha/ # vim /etc/masterha_default.cnf
[server default] #数据库账户(需要select、insert、super、reload权限,以及对mysql.user的更改权限) user=mha password=123456 #SSH账户 ssh_user=root #复制账户 repl_user=replication_user repl_password=123456 ping_interval=1 # master_ip_failover_script= /etc/masterha/master_ip_failover # master_ip_online_change_script= /etc/masterha/master_ip_online_change # shutdown_script= /etc/masterha/power_manager # report_script= /etc/masterha/send_report# vim /etc/masterha/app1.cnf
[server default] manager_workdir=/var/log/masterha/app1 manager_log=/var/log/masterha/app1/manager.log remote_workdir=/var/log/ [server1] hostname=237_12 master_binlog_dir=/data/mysql3306 candidate_master=1 check_repl_delay=0 #防止master故障时,切换时slave有延迟,卡在那里切不过去 [server2] hostname=237_13 master_binlog_dir=/data/mysql3306 candidate_master=1 check_repl_delay=0 [server3] hostname=237_14 master_binlog_dir=/data/mysql3306 no_master=1(5)环境检测 step1
# masterha_check_ssh --conf=/etc/masterha/app1.cnf
提示:All SSH connection tests passed successfully 表示节点间SSH连同
step2
# masterha_check_repl --conf=/etc/masterha/app1.cnf
提示:MySQL Replication Health is OK.
(6)启动MHA Manager
# nohup masterha_manager --conf=/etc/masterha/app1.cnf &
观察日志情况
# tail -f /tmp/mha_manager.log Thu Aug 10 21:49:09 2017 - [info] Reading default configuration from /etc/masterha_default.cnf.. Thu Aug 10 21:49:09 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Thu Aug 10 21:49:09 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
查看MHA状态:
# masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:1716) is running(0:PING_OK), master:237_12
例:模拟主库(192.168.237.12)MySQL实例挂掉
观察备库237.14的主从复制关系:主库已经切换为237.13
观察此时新的主库的binlog位置点:
查看管理节点的日志记录(位置由配置文件中参数:manager_log决定)
# cat /var/log/masterha/app1/manager.log
From: 237_12(192.168.237.12:3306) (current master) +--237_13(192.168.237.13:3306) +--237_14(192.168.237.14:3306) To: 237_13(192.168.237.13:3306) (new master) +--237_14(192.168.237.14:3306)[root@237_11 script]# masterha_check_status --conf=/etc/masterha/app1.cnf app1 is stopped(2:NOT_RUNNING).
需要注意的是,无论宕机导致的master切换、还是没有特别设置地手动切换master(不含orig_master_is_new_slave选项),原来的master都不在MHA架构内了,即使重新启动也不会加入,必须手动加入。配置文件中仍然是以挂掉的master为主库,并没有提升candidated master为主库,此时MHA将不再具备高可用的特性,需尽快更新配置文件,并在管理节点运行masterha_check_ssh确认无误后重新启用MHA manager。
此时,若假设原主库(237.12)上的MySQL已成功修复故障,正常启动后,可以以备库的身份加入MHA架构内:
(1)复制采用GTID方式
直接change master ... MASTER_AUTO_POSITION=1;
(2)binlog+position方式
首先,在MHA管理节点的日志中找到,复制需要的主库binlog位置点:
建立到新主库237.13的复制关系
Tips:该步骤很有可能会报1236的错误,并且Slave_IO_Running状态为 No
解决办法如下(从库切换到新的binlog,并重新指向之前卡在的主库binlog位置点):
mysql> stop slave; Query OK, 0 rows affected (0.01 sec) mysql> mysql> flush logs; Query OK, 0 rows affected (0.01 sec) mysql> mysql> show master status \G *************************** 1. row *************************** File: mysql-bin.000035 Position: 107 Binlog_Do_DB: Binlog_Ignore_DB: 1 row in set (0.00 sec) mysql> change master to master_log_file='mysql-bin.000018',master_log_pos=107; Query OK, 0 rows affected (0.01 sec) mysql> start slave; Query OK, 0 rows affected (0.00 sec) mysql> show slave status \G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.237.13 Master_User: replication_user Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000018 Read_Master_Log_Pos: 107 Relay_Log_File: 237_12-relay-bin.000002 Relay_Log_Pos: 253 Relay_Master_Log_File: mysql-bin.000018 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 107 Relay_Log_Space: 410 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 23713 1 row in set (0.00 sec)当建立新主从关系后,可以开始修改MHA配置文件(app1.cnf)中的server1、server2的hostname,重新启用新的高可用。启动MHA前需做如下检查:
step1、删除前一次failover标记文件:
# rm -rf app1.failover.complete
step2、检查配置文件正确性:
# masterha_check_ssh --conf=/etc/masterha/app1.cnf
step3、再次启动MHA: # nohup masterha_manager --conf=/etc/masterha/app1.cnf &
确认mha状态: # masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:1977) is running(0:PING_OK), master:node2
(online master switch理论写操作的阻塞时间在0.5秒-2秒)
功能:用于调整无法在线变更的参数(例如innodb_buffer_pool_size等)。通过在从库配置文件中调整参数并重启生效后,执行online master switch将从库提升为主库,原主库同样修改参数后重启,并以从库的身份重新加入MHA架构,从而曲线实现了在线变更参数的需求而无需中断业务。
切换前需要先停止MHA manger,否则会有如下报错。
以下为成功切换过程:
[root@237_11 app1]# masterha_stop --conf=/etc/masterha/app1.cnf Stopped app1 successfully. [1]+ Exit 1 nohup masterha_manager --conf=/etc/masterha/app1.cnf < /dev/null > /tmp/mha_manager.log 2>&1 [root@237_11 app1]# [root@237_11 app1]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=237_12 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0 Sun Aug 13 21:46:10 2017 - [info] MHA::MasterRotate version 0.56. Sun Aug 13 21:46:10 2017 - [info] Starting online master switch.. Sun Aug 13 21:46:10 2017 - [info] Sun Aug 13 21:46:10 2017 - [info] * Phase 1: Configuration Check Phase.. Sun Aug 13 21:46:10 2017 - [info] Sun Aug 13 21:46:10 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun Aug 13 21:46:10 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Sun Aug 13 21:46:10 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Sun Aug 13 21:46:10 2017 - [info] GTID failover mode = 0 Sun Aug 13 21:46:10 2017 - [info] Current Alive Master: 237_13(192.168.237.13:3306) Sun Aug 13 21:46:10 2017 - [info] Alive Slaves: Sun Aug 13 21:46:10 2017 - [info] 237_12(192.168.237.12:3306) Version=5.5.54-log (oldest major version between slaves) log-bin:enabled Sun Aug 13 21:46:10 2017 - [info] Replicating from 192.168.237.13(192.168.237.13:3306) Sun Aug 13 21:46:10 2017 - [info] Primary candidate for the new Master (candidate_master is set) Sun Aug 13 21:46:10 2017 - [info] 237_14(192.168.237.14:3306) Version=5.5.54-log (oldest major version between slaves) log-bin:enabled Sun Aug 13 21:46:10 2017 - [info] Replicating from 192.168.237.13(192.168.237.13:3306) Sun Aug 13 21:46:10 2017 - [info] Not candidate for the new Master (no_master is set) Sun Aug 13 21:46:10 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Sun Aug 13 21:46:10 2017 - [info] ok. Sun Aug 13 21:46:10 2017 - [info] Checking MHA is not monitoring or doing failover.. Sun Aug 13 21:46:10 2017 - [info] Checking replication health on 237_12.. Sun Aug 13 21:46:10 2017 - [info] ok. Sun Aug 13 21:46:10 2017 - [info] Checking replication health on 237_14.. Sun Aug 13 21:46:10 2017 - [info] ok. Sun Aug 13 21:46:10 2017 - [info] 237_12 can be new master. Sun Aug 13 21:46:10 2017 - [info] From: 237_13(192.168.237.13:3306) (current master) +--237_12(192.168.237.12:3306) +--237_14(192.168.237.14:3306) To: 237_12(192.168.237.12:3306) (new master) +--237_14(192.168.237.14:3306) +--237_13(192.168.237.13:3306) Sun Aug 13 21:46:10 2017 - [info] Checking whether 237_12(192.168.237.12:3306) is ok for the new master.. Sun Aug 13 21:46:10 2017 - [info] ok. Sun Aug 13 21:46:10 2017 - [info] 237_13(192.168.237.13:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host. Sun Aug 13 21:46:10 2017 - [info] 237_13(192.168.237.13:3306): Resetting slave pointing to the dummy host. Sun Aug 13 21:46:10 2017 - [info] ** Phase 1: Configuration Check Phase completed. Sun Aug 13 21:46:10 2017 - [info] Sun Aug 13 21:46:10 2017 - [info] * Phase 2: Rejecting updates Phase.. Sun Aug 13 21:46:10 2017 - [info] Sun Aug 13 21:46:10 2017 - [warning] master_ip_online_change_script is not defined. Skipping disabling writes on the current master. Sun Aug 13 21:46:10 2017 - [info] Locking all tables on the orig master to reject updates from everybody (including root): Sun Aug 13 21:46:10 2017 - [info] Executing FLUSH TABLES WITH READ LOCK.. Sun Aug 13 21:46:10 2017 - [info] ok. Sun Aug 13 21:46:10 2017 - [info] Orig master binlog:pos is mysql-bin.000018:107. Sun Aug 13 21:46:10 2017 - [info] Waiting to execute all relay logs on 237_12(192.168.237.12:3306).. Sun Aug 13 21:46:10 2017 - [info] master_pos_wait(mysql-bin.000018:107) completed on 237_12(192.168.237.12:3306). Executed 0 events. Sun Aug 13 21:46:10 2017 - [info] done. Sun Aug 13 21:46:10 2017 - [info] Getting new master's binlog name and position.. Sun Aug 13 21:46:10 2017 - [info] mysql-bin.000035:107 Sun Aug 13 21:46:10 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='237_12 or 192.168.237.12', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000035', MASTER_LOG_POS=107, MASTER_USER='replication_user', MASTER_PASSWORD='xxx'; Sun Aug 13 21:46:10 2017 - [info] Sun Aug 13 21:46:10 2017 - [info] * Switching slaves in parallel.. Sun Aug 13 21:46:10 2017 - [info] Sun Aug 13 21:46:10 2017 - [info] -- Slave switch on host 237_14(192.168.237.14:3306) started, pid: 8397 Sun Aug 13 21:46:10 2017 - [info] Sun Aug 13 21:46:11 2017 - [info] Log messages from 237_14 ... Sun Aug 13 21:46:11 2017 - [info] Sun Aug 13 21:46:10 2017 - [info] Waiting to execute all relay logs on 237_14(192.168.237.14:3306).. Sun Aug 13 21:46:10 2017 - [info] master_pos_wait(mysql-bin.000018:107) completed on 237_14(192.168.237.14:3306). Executed 0 events. Sun Aug 13 21:46:10 2017 - [info] done. Sun Aug 13 21:46:10 2017 - [info] Resetting slave 237_14(192.168.237.14:3306) and starting replication from the new master 237_12(192.168.237.12:3306).. Sun Aug 13 21:46:11 2017 - [info] Executed CHANGE MASTER. Sun Aug 13 21:46:11 2017 - [info] Slave started. Sun Aug 13 21:46:11 2017 - [info] End of log messages from 237_14 ... Sun Aug 13 21:46:11 2017 - [info] Sun Aug 13 21:46:11 2017 - [info] -- Slave switch on host 237_14(192.168.237.14:3306) succeeded. Sun Aug 13 21:46:11 2017 - [info] Unlocking all tables on the orig master: Sun Aug 13 21:46:11 2017 - [info] Executing UNLOCK TABLES.. Sun Aug 13 21:46:11 2017 - [info] ok. Sun Aug 13 21:46:11 2017 - [info] Starting orig master as a new slave.. Sun Aug 13 21:46:11 2017 - [info] Resetting slave 237_13(192.168.237.13:3306) and starting replication from the new master 237_12(192.168.237.12:3306).. Sun Aug 13 21:46:11 2017 - [info] Executed CHANGE MASTER. Sun Aug 13 21:46:11 2017 - [info] Slave started. Sun Aug 13 21:46:11 2017 - [info] All new slave servers switched successfully. Sun Aug 13 21:46:11 2017 - [info] Sun Aug 13 21:46:11 2017 - [info] * Phase 5: New master cleanup phase.. Sun Aug 13 21:46:11 2017 - [info] Sun Aug 13 21:46:11 2017 - [info] 237_12: Resetting slave info succeeded. Sun Aug 13 21:46:11 2017 - [info] Switching master to 237_12(192.168.237.12:3306) completed successfully.
通过修改脚本:master_ip_failover实现,VIP:192.168.237.50
前提要求:
(1)相应设备不做mac地址绑定;
(2)首次bond vip,需要手工在主库上执行(后续漂移利用master_ip_failover实现)
init_vip.sh脚本如下
#!/bin/bash key="1" vip="192.168.47.50/24" /sbin/ifconfig eth0:$key $vip(3)candidate_master获取到VIP后, 需要如下执行:
# arping -I eth0 -c 3 -s $VIP $GATEWAY >/dev/null
作用是刷新ARP映射表,避免了VIP实际已漂移到从库的情况下,由于缓存的问题导致应用仍然无法访问的情况(已合并到下述脚本中执行)
缺陷:适用于单机单实例
master_ip_failover脚本如下:
#!/usr/bin/env perl # Copyright (C) 2011 DeNA Co.,Ltd. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA ## Note: This is a sample script and is not complete. Modify the script based on your environment. use strict; use warnings FATAL => 'all'; use Getopt::Long; use MHA::DBHelper; #自定义该组机器的vip my $vip = "192.168.237.50"; my $gateway = "192.168.237.2"; my $key = "1"; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, $new_master_password ); GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_user=s' => \$new_master_user, 'new_master_password=s' => \$new_master_password, ); my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down"; exit &main(); sub main { if ( $command eq "stop" || $command eq "stopssh" ) { # $orig_master_host, $orig_master_ip, $orig_master_port are passed. # If you manage master ip address at global catalog database, # invalidate orig_master_ip here. # 此处也可以添加dns、zookeeper、consul等剔除dead master信息 my $exit_code = 1; eval { # updating global catalog, etc &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { # all arguments are passed. # If you manage master ip address at global catalog database, # activate new_master_ip here. # You can also grant write access (create user, set read_only=0, etc) here. my $exit_code = 10; eval { my $new_master_handler = new MHA::DBHelper(); # args: hostname, port, user, password, raise_error_or_not $new_master_handler->connect( $new_master_ip, $new_master_port, $new_master_user, $new_master_password, 1 ); ## Set read_only=0 on the new master $new_master_handler->disable_log_bin_local(); print "Set read_only=0 on the new master.\n"; $new_master_handler->disable_read_only(); ## Creating an app user on the new master print "Creating app user on the new master..\n"; #FIXME_xxx_create_user( $new_master_handler->{dbh} ); $new_master_handler->enable_log_bin_local(); $new_master_handler->disconnect(); ## Update master ip on the catalog database, etc &start_vip(); $exit_code = 0; }; if ($@) { warn $@; # If you want to continue failover, exit 10. exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { # do nothing exit 0; } else { &usage(); exit 1; } } #ssh -o ConnectTimeout=15 -o ConnectionAttempts=3,设置SSH等待超时时间15S,重试3次。防止SSH连不上,MHA超长等待。 #arping -I eth0 -c 3 -s $VIP $GATEWAY > /dev/null,刷新arp映射表 sub start_vip() { `ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $ssh_user\@$new_master_host \" $ssh_start_vip \"`; `ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $ssh_user\@$new_master_host \" arping -I eth0 -c 3 -s $vip $gateway > /dev/null \"`; } sub stop_vip() { `ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; }
利用服务发现替代,适用于单机多实例
利用ProxySQL进行路由规则的切换,详见我的另外一篇博文的第九节:
ProxySQL初体验
manager本身是一个单点,存在风险。
1、利用masterha_master_monitor,不再赘述
send_report #因故障切换后发送报警的脚本,不是必须,可自行编写简单的shell完成
2、masterha_check_status --conf=/etc/masterha/app1.cnf
https://severalnines.com/blog/mysql-replication-failover-maxscale-vs-mha-part-1
MHA参数详解:http://wubx.net/mha-parameters/
MHA在线切换过程:https://www.bbsmax.com/A/amd0ml0Xzg/
MHA 清理relay log(purge_relay_logs):https://blog.csdn.net/leshami/article/details/45688503
安装报错排查:https://blog.51cto.com/arthur376/1812640
虚拟IP(VIP)在高可用上的原理:https://gitos.org/2019/08/19/vip-ha.html
