MySQL Master高可用---MHA的安装与配置说明

xiaoxiao2021-02-28  83

一、背景

今天讲下工作中使用较多的MHA

首先需要确认一点的是,MHA实现的是Master的高可用。

贴一张MHA作者在 MySQL Conference and Expo 2011 上分享的图片

 

二、复制结构与基础配置

1、MySQL复制关系

Manager Node 192.168.237.11(管理节点)

Data Node_1 192.168.237.12(数据节点,MySQL主库)

Data Node_2 192.168.237.13(数据节点,MySQL备库,候选主库)

Data Node_3 192.168.237.14(数据节点,MySQL备库)

2、节点基础配置

(1)配置数据节点的iptables

配置数据节点的iptables,确保管理节点能访问到数据节点的MySQL端口

(2)打通SSH互信

正确配置管理节点和数据节点的域名解析,很重要!

管理节点及数据节点间打通SSH互信:

# ssh-keygen -t rsa # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.237.11 # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.237.12 # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.237.13 # ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.237.14

(3)MySQL基础配置

所有数据节点配置相同的复制过滤规则(MHA在启动的时候会检查过滤规则,不同则启动失败)

开启候选主库的log-bin。管理节点会检测候选主库是否有配置log-bin,若没有该配置项,MHA将不会进行failover

 

三、MHA安装

1、安装包下载

RHEL7:

https://github.com/yoshinorim/mha4mysql-manager/releases

https://github.com/yoshinorim/mha4mysql-node/releases

RHEL6:

https://github.com/yoshinorim/mha4mysql-manager/wiki/Downloads

2、配置CPAN源及EPEL源(可选)

(1)配置CPAN源(阿里源为例)

cpan[1]> o conf urllist push http://mirrors.aliyun.com/CPAN/ Please use 'o conf commit' to make the config permanent! cpan[2]> o conf commit commit: wrote '/usr/share/perl5/CPAN/Config.pm' cpan[3]> o conf urllist     urllist                0 [http://mirrors.aliyun.com/CPAN/] Type 'o conf' to view all configuration items

(2)配置EPEL源

RHEL6:

# yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm

RHEL7:

# yum -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

# yum clean all

# yum makecache

3、数据节点部署

(1)更新libmysqlclient.so

# rpm -qa |grep mysql-libs mysql-libs-5.1.73-8.el6_8.x86_64 # rpm -e --nodeps mysql-libs-5.1.73-8.el6_8.x86_64

找一台MySQL拷贝libmysqlclient.so运行库文件 # cp /usr/local/mysql3306/lib/libmysqlclient.so.20 /usr/lib64/ # cp /usr/local/mysql3306/lib/libmysqlclient.so /usr/lib64/

(2)更新DBD-MySQL

# rpm -qa |grep perl-DBD-MySQL perl-DBD-MySQL-4.013-3.el6.x86_64 # rpm -e --nodeps perl-DBD-MySQL-4.013-3.el6.x86_64

# yum install perl-CPAN # ln -s /usr/local/mysql3306/bin/mysql_config /usr/local/bin/ cpan[1]> install DBD-MySQL

cpan[2]>  install DBI

cpan[2]>  install ExtUtils MakeMaker

(3)安装mha4mysql-node

# tar zxvf mha4mysql-node-0.56.tar.gz # cd mha4mysql-node-0.56 # perl Makefile.PL # make && make install

Tips:常见安装报错(一般是由于yum方式安装perl模块较旧导致)

Tip1、编译时出现Can't locate ExtUtils/MakeMaker.pm in @INC

https://github.com/Perl-Toolchain-Gang/ExtUtils-MakeMaker/releases # tar zxvf ExtUtils-MakeMaker-7.18.tar.gz # cd ExtUtils-MakeMaker-7.18 # perl Makefile.PL

# make && make install

Tip2、编译时出现Can't locate CPAN.pm in @INC

https://cpan.metacpan.org/authors/id/A/AN/ANDK/CPAN-2.26.tar.gz # tar zxvf CPAN-2.26.tar.gz # cd CPAN-2.26 # perl Makefile.PL

# make && make install

 

(2)设置relay log清除方式 mysql > set global relay_log_purge=0; my.cnf中设置 relay_log_purge=0 关闭该参数的原因是因为在默认情况下,从库上的relay log在SQL线程执行完后会自动被删掉。但是在failover过程中从库需要利用候选主库上的中继日志来补数据,所以MySQL默认打开的自动清除中继日志的功能需要关闭。

但是为了不撑爆磁盘就需要定期清除旧的relay log。清除relay log需要考虑复制延时的问题。在ext3文件系统下,删除大文件建议采用硬链接的方法。 关于relay log的清除,可参考以下文章:

http://daisywei.blog.51cto.com/7837970/1881154

 

4、管理节点部署

(1)同数据节点一样,也需要安装 mha4mysql-node

(2)安装依赖包 # yum install gcc

cpan[1]> install Config::Tiny cpan[2]> install Log::Dispatch cpan[3]> install Parallel::ForkManager cpan[4]> install Time::HiRes

(3)安装mha4mysql-manager # tar zxvf mha4mysql-manager-0.56.tar.gz # cd mha4mysql-manager-0.56 # perl Makefile.PL # make && make install(4)修改配置文件

建立mha配置文件及脚本目录

# mkdir -p /etc/masterha/

复制源码包内的配置文件、脚本(app1.cnf、masterha_default.cnf 可以合并为一个文件)

# cp ./mha4mysql-manager-0.56/samples/conf/masterha_default.cnf /etc/

# cp ./mha4mysql-manager-0.56/samples/conf/app1.cnf /etc/materha/

# cp ./mha4mysql-manager-0.56/samples/scripts/* /etc/masterha/ # vim /etc/masterha_default.cnf

[server default] #数据库账户(需要select、insert、super、reload权限,以及对mysql.user的更改权限) user=mha password=123456 #SSH账户 ssh_user=root #复制账户 repl_user=replication_user repl_password=123456 ping_interval=1 # master_ip_failover_script= /etc/masterha/master_ip_failover # master_ip_online_change_script= /etc/masterha/master_ip_online_change # shutdown_script= /etc/masterha/power_manager # report_script= /etc/masterha/send_report

# vim /etc/masterha/app1.cnf

[server default] manager_workdir=/var/log/masterha/app1 manager_log=/var/log/masterha/app1/manager.log remote_workdir=/var/log/ [server1] hostname=237_12 master_binlog_dir=/data/mysql3306 candidate_master=1 check_repl_delay=0 #防止master故障时,切换时slave有延迟,卡在那里切不过去 [server2] hostname=237_13 master_binlog_dir=/data/mysql3306 candidate_master=1 check_repl_delay=0 [server3] hostname=237_14 master_binlog_dir=/data/mysql3306 no_master=1

(5)环境检测 step1

# masterha_check_ssh --conf=/etc/masterha/app1.cnf

提示:All SSH connection tests passed successfully 表示节点间SSH连同

step2

# masterha_check_repl --conf=/etc/masterha/app1.cnf

提示:MySQL Replication Health is OK.

(6)启动MHA Manager

# nohup masterha_manager --conf=/etc/masterha/app1.cnf &

观察日志情况

# tail -f /tmp/mha_manager.log Thu Aug 10 21:49:09 2017 - [info] Reading default configuration from /etc/masterha_default.cnf.. Thu Aug 10 21:49:09 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Thu Aug 10 21:49:09 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..

查看MHA状态:

# masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:1716) is running(0:PING_OK), master:237_12

 

四、故障演练

1、failover故障切换

例:模拟主库(192.168.237.12)MySQL实例挂掉

观察备库237.14的主从复制关系:主库已经切换为237.13

观察此时新的主库的binlog位置点:

查看管理节点的日志记录(位置由配置文件中参数:manager_log决定)

# cat /var/log/masterha/app1/manager.log

From: 237_12(192.168.237.12:3306) (current master) +--237_13(192.168.237.13:3306) +--237_14(192.168.237.14:3306) To: 237_13(192.168.237.13:3306) (new master) +--237_14(192.168.237.14:3306)

[root@237_11 script]# masterha_check_status --conf=/etc/masterha/app1.cnf  app1 is stopped(2:NOT_RUNNING).

需要注意的是,无论宕机导致的master切换、还是没有特别设置地手动切换master(不含orig_master_is_new_slave选项),原来的master都不在MHA架构内了,即使重新启动也不会加入,必须手动加入。配置文件中仍然是以挂掉的master为主库,并没有提升candidated master为主库,此时MHA将不再具备高可用的特性,需尽快更新配置文件,并在管理节点运行masterha_check_ssh确认无误后重新启用MHA manager。

 

此时,若假设原主库(237.12)上的MySQL已成功修复故障,正常启动后,可以以备库的身份加入MHA架构内:

(1)复制采用GTID方式

直接change master ... MASTER_AUTO_POSITION=1;

(2)binlog+position方式

首先,在MHA管理节点的日志中找到,复制需要的主库binlog位置点:

建立到新主库237.13的复制关系

Tips:该步骤很有可能会报1236的错误,并且Slave_IO_Running状态为 No

解决办法如下(从库切换到新的binlog,并重新指向之前卡在的主库binlog位置点):

mysql> stop slave; Query OK, 0 rows affected (0.01 sec) mysql> mysql> flush logs; Query OK, 0 rows affected (0.01 sec) mysql> mysql> show master status \G *************************** 1. row *************************** File: mysql-bin.000035 Position: 107 Binlog_Do_DB: Binlog_Ignore_DB: 1 row in set (0.00 sec) mysql> change master to master_log_file='mysql-bin.000018',master_log_pos=107; Query OK, 0 rows affected (0.01 sec) mysql> start slave; Query OK, 0 rows affected (0.00 sec) mysql> show slave status \G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.237.13 Master_User: replication_user Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000018 Read_Master_Log_Pos: 107 Relay_Log_File: 237_12-relay-bin.000002 Relay_Log_Pos: 253 Relay_Master_Log_File: mysql-bin.000018 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 107 Relay_Log_Space: 410 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 23713 1 row in set (0.00 sec)

当建立新主从关系后,可以开始修改MHA配置文件(app1.cnf)中的server1、server2的hostname,重新启用新的高可用。启动MHA前需做如下检查:

step1、删除前一次failover标记文件:

# rm -rf app1.failover.complete

step2、检查配置文件正确性:

# masterha_check_ssh --conf=/etc/masterha/app1.cnf

step3、再次启动MHA: # nohup masterha_manager --conf=/etc/masterha/app1.cnf &

确认mha状态: # masterha_check_status --conf=/etc/masterha/app1.cnf app1 (pid:1977) is running(0:PING_OK), master:node2

 

2、在线手动切换

(online master switch理论写操作的阻塞时间在0.5秒-2秒)

功能:用于调整无法在线变更的参数(例如innodb_buffer_pool_size等)。通过在从库配置文件中调整参数并重启生效后,执行online master switch将从库提升为主库,原主库同样修改参数后重启,并以从库的身份重新加入MHA架构,从而曲线实现了在线变更参数的需求而无需中断业务。

切换前需要先停止MHA manger,否则会有如下报错。

 

以下为成功切换过程:

[root@237_11 app1]# masterha_stop --conf=/etc/masterha/app1.cnf Stopped app1 successfully. [1]+ Exit 1 nohup masterha_manager --conf=/etc/masterha/app1.cnf < /dev/null > /tmp/mha_manager.log 2>&1 [root@237_11 app1]# [root@237_11 app1]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=237_12 --orig_master_is_new_slave --running_updates_limit=10000 --interactive=0 Sun Aug 13 21:46:10 2017 - [info] MHA::MasterRotate version 0.56. Sun Aug 13 21:46:10 2017 - [info] Starting online master switch.. Sun Aug 13 21:46:10 2017 - [info] Sun Aug 13 21:46:10 2017 - [info] * Phase 1: Configuration Check Phase.. Sun Aug 13 21:46:10 2017 - [info] Sun Aug 13 21:46:10 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun Aug 13 21:46:10 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Sun Aug 13 21:46:10 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Sun Aug 13 21:46:10 2017 - [info] GTID failover mode = 0 Sun Aug 13 21:46:10 2017 - [info] Current Alive Master: 237_13(192.168.237.13:3306) Sun Aug 13 21:46:10 2017 - [info] Alive Slaves: Sun Aug 13 21:46:10 2017 - [info] 237_12(192.168.237.12:3306) Version=5.5.54-log (oldest major version between slaves) log-bin:enabled Sun Aug 13 21:46:10 2017 - [info] Replicating from 192.168.237.13(192.168.237.13:3306) Sun Aug 13 21:46:10 2017 - [info] Primary candidate for the new Master (candidate_master is set) Sun Aug 13 21:46:10 2017 - [info] 237_14(192.168.237.14:3306) Version=5.5.54-log (oldest major version between slaves) log-bin:enabled Sun Aug 13 21:46:10 2017 - [info] Replicating from 192.168.237.13(192.168.237.13:3306) Sun Aug 13 21:46:10 2017 - [info] Not candidate for the new Master (no_master is set) Sun Aug 13 21:46:10 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time.. Sun Aug 13 21:46:10 2017 - [info] ok. Sun Aug 13 21:46:10 2017 - [info] Checking MHA is not monitoring or doing failover.. Sun Aug 13 21:46:10 2017 - [info] Checking replication health on 237_12.. Sun Aug 13 21:46:10 2017 - [info] ok. Sun Aug 13 21:46:10 2017 - [info] Checking replication health on 237_14.. Sun Aug 13 21:46:10 2017 - [info] ok. Sun Aug 13 21:46:10 2017 - [info] 237_12 can be new master. Sun Aug 13 21:46:10 2017 - [info] From: 237_13(192.168.237.13:3306) (current master) +--237_12(192.168.237.12:3306) +--237_14(192.168.237.14:3306) To: 237_12(192.168.237.12:3306) (new master) +--237_14(192.168.237.14:3306) +--237_13(192.168.237.13:3306) Sun Aug 13 21:46:10 2017 - [info] Checking whether 237_12(192.168.237.12:3306) is ok for the new master.. Sun Aug 13 21:46:10 2017 - [info] ok. Sun Aug 13 21:46:10 2017 - [info] 237_13(192.168.237.13:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host. Sun Aug 13 21:46:10 2017 - [info] 237_13(192.168.237.13:3306): Resetting slave pointing to the dummy host. Sun Aug 13 21:46:10 2017 - [info] ** Phase 1: Configuration Check Phase completed. Sun Aug 13 21:46:10 2017 - [info] Sun Aug 13 21:46:10 2017 - [info] * Phase 2: Rejecting updates Phase.. Sun Aug 13 21:46:10 2017 - [info] Sun Aug 13 21:46:10 2017 - [warning] master_ip_online_change_script is not defined. Skipping disabling writes on the current master. Sun Aug 13 21:46:10 2017 - [info] Locking all tables on the orig master to reject updates from everybody (including root): Sun Aug 13 21:46:10 2017 - [info] Executing FLUSH TABLES WITH READ LOCK.. Sun Aug 13 21:46:10 2017 - [info] ok. Sun Aug 13 21:46:10 2017 - [info] Orig master binlog:pos is mysql-bin.000018:107. Sun Aug 13 21:46:10 2017 - [info] Waiting to execute all relay logs on 237_12(192.168.237.12:3306).. Sun Aug 13 21:46:10 2017 - [info] master_pos_wait(mysql-bin.000018:107) completed on 237_12(192.168.237.12:3306). Executed 0 events. Sun Aug 13 21:46:10 2017 - [info] done. Sun Aug 13 21:46:10 2017 - [info] Getting new master's binlog name and position.. Sun Aug 13 21:46:10 2017 - [info] mysql-bin.000035:107 Sun Aug 13 21:46:10 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='237_12 or 192.168.237.12', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000035', MASTER_LOG_POS=107, MASTER_USER='replication_user', MASTER_PASSWORD='xxx'; Sun Aug 13 21:46:10 2017 - [info] Sun Aug 13 21:46:10 2017 - [info] * Switching slaves in parallel.. Sun Aug 13 21:46:10 2017 - [info] Sun Aug 13 21:46:10 2017 - [info] -- Slave switch on host 237_14(192.168.237.14:3306) started, pid: 8397 Sun Aug 13 21:46:10 2017 - [info] Sun Aug 13 21:46:11 2017 - [info] Log messages from 237_14 ... Sun Aug 13 21:46:11 2017 - [info] Sun Aug 13 21:46:10 2017 - [info] Waiting to execute all relay logs on 237_14(192.168.237.14:3306).. Sun Aug 13 21:46:10 2017 - [info] master_pos_wait(mysql-bin.000018:107) completed on 237_14(192.168.237.14:3306). Executed 0 events. Sun Aug 13 21:46:10 2017 - [info] done. Sun Aug 13 21:46:10 2017 - [info] Resetting slave 237_14(192.168.237.14:3306) and starting replication from the new master 237_12(192.168.237.12:3306).. Sun Aug 13 21:46:11 2017 - [info] Executed CHANGE MASTER. Sun Aug 13 21:46:11 2017 - [info] Slave started. Sun Aug 13 21:46:11 2017 - [info] End of log messages from 237_14 ... Sun Aug 13 21:46:11 2017 - [info] Sun Aug 13 21:46:11 2017 - [info] -- Slave switch on host 237_14(192.168.237.14:3306) succeeded. Sun Aug 13 21:46:11 2017 - [info] Unlocking all tables on the orig master: Sun Aug 13 21:46:11 2017 - [info] Executing UNLOCK TABLES.. Sun Aug 13 21:46:11 2017 - [info] ok. Sun Aug 13 21:46:11 2017 - [info] Starting orig master as a new slave.. Sun Aug 13 21:46:11 2017 - [info] Resetting slave 237_13(192.168.237.13:3306) and starting replication from the new master 237_12(192.168.237.12:3306).. Sun Aug 13 21:46:11 2017 - [info] Executed CHANGE MASTER. Sun Aug 13 21:46:11 2017 - [info] Slave started. Sun Aug 13 21:46:11 2017 - [info] All new slave servers switched successfully. Sun Aug 13 21:46:11 2017 - [info] Sun Aug 13 21:46:11 2017 - [info] * Phase 5: New master cleanup phase.. Sun Aug 13 21:46:11 2017 - [info] Sun Aug 13 21:46:11 2017 - [info] 237_12: Resetting slave info succeeded. Sun Aug 13 21:46:11 2017 - [info] Switching master to 237_12(192.168.237.12:3306) completed successfully.

 

五、数据库无感访问

1、VIP方式

通过修改脚本:master_ip_failover实现,VIP:192.168.237.50

前提要求:

(1)相应设备不做mac地址绑定;

(2)首次bond vip,需要手工在主库上执行(后续漂移利用master_ip_failover实现)

init_vip.sh脚本如下

#!/bin/bash key="1" vip="192.168.47.50/24" /sbin/ifconfig eth0:$key $vip

(3)candidate_master获取到VIP后, 需要如下执行:

# arping -I eth0 -c 3 -s $VIP $GATEWAY >/dev/null

作用是刷新ARP映射表,避免了VIP实际已漂移到从库的情况下,由于缓存的问题导致应用仍然无法访问的情况(已合并到下述脚本中执行)

 

缺陷:适用于单机单实例

 

master_ip_failover脚本如下:

#!/usr/bin/env perl # Copyright (C) 2011 DeNA Co.,Ltd. # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., # 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA ## Note: This is a sample script and is not complete. Modify the script based on your environment. use strict; use warnings FATAL => 'all'; use Getopt::Long; use MHA::DBHelper; #自定义该组机器的vip my $vip = "192.168.237.50"; my $gateway = "192.168.237.2"; my $key = "1"; my ( $command, $ssh_user, $orig_master_host, $orig_master_ip, $orig_master_port, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, $new_master_password ); GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_user=s' => \$new_master_user, 'new_master_password=s' => \$new_master_password, ); my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down"; exit &main(); sub main { if ( $command eq "stop" || $command eq "stopssh" ) { # $orig_master_host, $orig_master_ip, $orig_master_port are passed. # If you manage master ip address at global catalog database, # invalidate orig_master_ip here. # 此处也可以添加dns、zookeeper、consul等剔除dead master信息 my $exit_code = 1; eval { # updating global catalog, etc &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { # all arguments are passed. # If you manage master ip address at global catalog database, # activate new_master_ip here. # You can also grant write access (create user, set read_only=0, etc) here. my $exit_code = 10; eval { my $new_master_handler = new MHA::DBHelper(); # args: hostname, port, user, password, raise_error_or_not $new_master_handler->connect( $new_master_ip, $new_master_port, $new_master_user, $new_master_password, 1 ); ## Set read_only=0 on the new master $new_master_handler->disable_log_bin_local(); print "Set read_only=0 on the new master.\n"; $new_master_handler->disable_read_only(); ## Creating an app user on the new master print "Creating app user on the new master..\n"; #FIXME_xxx_create_user( $new_master_handler->{dbh} ); $new_master_handler->enable_log_bin_local(); $new_master_handler->disconnect(); ## Update master ip on the catalog database, etc &start_vip(); $exit_code = 0; }; if ($@) { warn $@; # If you want to continue failover, exit 10. exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { # do nothing exit 0; } else { &usage(); exit 1; } } #ssh -o ConnectTimeout=15 -o ConnectionAttempts=3,设置SSH等待超时时间15S,重试3次。防止SSH连不上,MHA超长等待。 #arping -I eth0 -c 3 -s $VIP $GATEWAY > /dev/null,刷新arp映射表 sub start_vip() { `ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $ssh_user\@$new_master_host \" $ssh_start_vip \"`; `ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $ssh_user\@$new_master_host \" arping -I eth0 -c 3 -s $vip $gateway > /dev/null \"`; } sub stop_vip() { `ssh -o ConnectTimeout=15 -o ConnectionAttempts=3 $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`; } sub usage { print "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n"; }

 

2、Consul

利用服务发现替代,适用于单机多实例

 

3、ProxySQL

利用ProxySQL进行路由规则的切换,详见我的另外一篇博文的第九节:

ProxySQL初体验

 

六、MHA存活监控

manager本身是一个单点,存在风险。

1、利用masterha_master_monitor,不再赘述

send_report   #因故障切换后发送报警的脚本,不是必须,可自行编写简单的shell完成

2、masterha_check_status --conf=/etc/masterha/app1.cnf

 

参考文档

https://severalnines.com/blog/mysql-replication-failover-maxscale-vs-mha-part-1

MHA参数详解:http://wubx.net/mha-parameters/

MHA在线切换过程:https://www.bbsmax.com/A/amd0ml0Xzg/

MHA 清理relay log(purge_relay_logs):https://blog.csdn.net/leshami/article/details/45688503

安装报错排查:https://blog.51cto.com/arthur376/1812640

虚拟IP(VIP)在高可用上的原理:https://gitos.org/2019/08/19/vip-ha.html

 

转载请注明原文地址: https://www.6miu.com/read-73369.html

最新回复(0)