Contents
  1. 1. 环境介绍
  2. 2. 安装MHA
    1. 2.1. 下载MHA Node和MHA Manager
      1. 2.1.1. 下载MHA Manager
      2. 2.1.2. 下载MHA Node
    2. 2.2. 安装
  3. 3. SSH密钥登录方式设置
  4. 4. 配置MHA管理节点
  5. 5. 测试节点间的SSH登录
  6. 6. 检查候选主库的log-bin是否都启用
  7. 7. 检查复制过滤规则
  8. 8. 检查复制配置
  9. 9. 启动MHA Manager
  10. 10. 停止MHA Manager
  11. 11. 测试failover
  12. 12. 参考文献

MHA for MySQL是MySQL的一个高可用方案,本文介绍MHA的安装配置。MHA分管理节点和数据库节点,支持一主多从、多主多从等配置,本文为一主多从方式,原理都相同,详见官方文档。可实现主库挂掉之后自动将一个从库提升为主库,并自动保持新主库和从库的一致性。

环境介绍

操作系统都为CentOS 7 x86_64,数据库版本为MariaDB 10,EPEL已启用(有些依赖的软件包需要从EPEL软件仓库中下载)。 主从复制需要已经配置好。

服务器详情如下:
主库:192.168.71.151
从库1:192.168.71.152
从库2:192.168.71.153
管理节点:192.168.71.154

安装MHA

MHA由MHA Manager和MHA Node两个软件包组成。MHA Manager运行在管理节点上,MHSA Node运行的每个数据库服务器上。MHA Node程序在需要的时候(配置检查、failover)时由MHA Manager调用。MHA Manger和MHA Node程序都是Perl语言写的。

下载MHA Node和MHA Manager

v0.56程序可从地址 https://code.google.com/p/mysql-master-ha/wiki/Downloads?tm=2 页面里提供的下载地址下载,旧版程序可从地址 http://code.google.com/p/mysql-master-ha/downloads/list 下载。本文以v0.56程序为例。官网只有使用于CentOS 5和CentOS 6的rpm包,没有提供适用于CentOS 7的rpm包,经测试适用于CentOS 6的rpm包在CentOS 7也可以安装成功。

下载MHA Manager

1
curl http://www.mysql.gr.jp/frame/modules/bwiki/index.php\?plugin\=attach\&pcmd\=open\&file\=mha4mysql-manager-0.56-0.el6.noarch.rpm\&refer\=matsunobu -o mha4mysql-manager-0.56-0.el6.noarch.rpm

下载MHA Node

1
curl http://www.mysql.gr.jp/frame/modules/bwiki/index.php\?plugin\=attach\&pcmd\=open\&file\=mha4mysql-node-0.56-0.el6.noarch.rpm\&refer\=matsunobu -o mha4mysql-node-0.56-0.el6.noarch.rpm

安装

MHA依赖一系列的软件包,MHA Node依赖软件包perl-DBD-MySQL,MHA Manager依赖perl-DBD-MySQL,perl-Config-Tiny,perl-Log-Dispatch,perl-Parallel-ForkManager。MHA Manager依赖MHA Node的一些模块,所以管理节点上也需要安装MHA Node。为自动安装MHA的依赖包,使用yum localinstall方式安装。
在所有的服务器上安装MHA Node:

1
yum localinstall -y mha4mysql-node-0.56-0.el6.noarch.rpm

在管理节点(192.168.71.154)上安装MHA Manager:

1
yum localinstall -y mha4mysql-manager-0.56-0.el6.noarch.rpm

SSH密钥登录方式设置

MHA Manager内部使用SSH连接到各个MySQL服务器,最新从库节点上的MHA Node也需要使用SSH (scp)把relay log文件发给各个从库节点,故需要各台服务器见需要配置SSH密钥登录方式。

在服务器 192.168.71.151 上执行

1
2
3
4
ssh-copy-id -i /root/id_rsa.pub root@192.168.71.151
ssh-copy-id -i /root/id_rsa.pub root@192.168.71.152
ssh-copy-id -i /root/id_rsa.pub root@192.168.71.153
ssh-copy-id -i /root/id_rsa.pub root@192.168.71.154

在服务器 192.168.71.152 上执行

1
2
3
4
ssh-copy-id -i /root/id_rsa.pub root@192.168.71.151
ssh-copy-id -i /root/id_rsa.pub root@192.168.71.152
ssh-copy-id -i /root/id_rsa.pub root@192.168.71.153
ssh-copy-id -i /root/id_rsa.pub root@192.168.71.154

在服务器 192.168.71.153 上执行

1
2
3
4
ssh-copy-id -i /root/id_rsa.pub root@192.168.71.151
ssh-copy-id -i /root/id_rsa.pub root@192.168.71.152
ssh-copy-id -i /root/id_rsa.pub root@192.168.71.153
ssh-copy-id -i /root/id_rsa.pub root@192.168.71.154

在服务器 192.168.71.154 上执行

1
2
3
4
ssh-copy-id -i /root/id_rsa.pub root@192.168.71.151
ssh-copy-id -i /root/id_rsa.pub root@192.168.71.152
ssh-copy-id -i /root/id_rsa.pub root@192.168.71.153
ssh-copy-id -i /root/id_rsa.pub root@192.168.71.154

注意:MHA也是通过SSH方式登录本机操作的,故在本机上也需要执行ssh-copy-id,例如在 192.168.71.151 上也需要执行 ssh-copy-id -i /root/id_rsa.pub root@192.168.71.151

配置MHA管理节点

以下操作在管理节点 192.168.71.154。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
mkdir -p /usr/local/mha
mkdir -p /etc/mha
cat > /etc/mha/mha.conf << EOF
[server default]
# mysql user and password
user=joelhy
password=root123
ssh_user=root
# working directory on the manager
manager_workdir=/usr/local/mha
# working directory on MySQL servers
remote_workdir=/usr/local/mha
repl_user=slave_user
repl_password=bigs3cret
[server1]
hostname=192.168.71.151
[server2]
hostname=192.168.71.152
[server3]
hostname=192.168.71.153
EOF

repl_user帐号slave_user用于主从复制时,从库连接到主库复制数据,故有可能由从库上升为主库的都需要添加该帐号,
本文两个从库都作为failover时的主库候选,故除了当前主库需要 添加该帐号外,在这两个数据库节点上也需要创建slave_user帐号,即在所有的数据库节点的MariaDB shell上都需要执行如下语句

1
2
MariaDB [(none)]> GRANT REPLICATION SLAVE ON *.* TO 'slave_user'@'192.168.71.%' IDENTIFIED BY 'bigs3cret';
Query OK, 0 rows affected (0.00 sec)

其中joelhy是具有管理权限的帐号,当然可以不叫joelhy取做其他名称,例如foo,同理也需要在各个数据库上创建该帐号。在数据库节点192.168.71.151, 192.168.71.152, 192.168.71.153上执行

1
2
MariaDB [(none)]> GRANT ALL ON *.* TO joelhy@'192.168.71.%' IDENTIFIED BY 'root123';
Query OK, 0 rows affected (0.00 sec)

测试节点间的SSH登录

在管理节点 192.168.71.154 上执行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[root@localhost ~]# masterha_check_ssh --conf=/etc/mha/mha.conf
Fri Feb 6 19:51:30 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Feb 6 19:51:30 2015 - [info] Reading application default configuration from /etc/mha/mha.conf..
Fri Feb 6 19:51:30 2015 - [info] Reading server configuration from /etc/mha/mha.conf..
Fri Feb 6 19:51:30 2015 - [info] Starting SSH connection tests..
Fri Feb 6 19:51:31 2015 - [debug]
Fri Feb 6 19:51:30 2015 - [debug] Connecting via SSH from root@192.168.71.151(192.168.71.151:22) to root@192.168.71.152(192.168.71.152:22)..
Fri Feb 6 19:51:30 2015 - [debug] ok.
Fri Feb 6 19:51:30 2015 - [debug] Connecting via SSH from root@192.168.71.151(192.168.71.151:22) to root@192.168.71.153(192.168.71.153:22)..
Fri Feb 6 19:51:31 2015 - [debug] ok.
Fri Feb 6 19:51:31 2015 - [debug]
Fri Feb 6 19:51:30 2015 - [debug] Connecting via SSH from root@192.168.71.152(192.168.71.152:22) to root@192.168.71.151(192.168.71.151:22)..
Fri Feb 6 19:51:31 2015 - [debug] ok.
Fri Feb 6 19:51:31 2015 - [debug] Connecting via SSH from root@192.168.71.152(192.168.71.152:22) to root@192.168.71.153(192.168.71.153:22)..
Fri Feb 6 19:51:31 2015 - [debug] ok.
Fri Feb 6 19:51:32 2015 - [debug]
Fri Feb 6 19:51:31 2015 - [debug] Connecting via SSH from root@192.168.71.153(192.168.71.153:22) to root@192.168.71.151(192.168.71.151:22)..
Fri Feb 6 19:51:31 2015 - [debug] ok.
Fri Feb 6 19:51:31 2015 - [debug] Connecting via SSH from root@192.168.71.153(192.168.71.153:22) to root@192.168.71.152(192.168.71.152:22)..
Fri Feb 6 19:51:32 2015 - [debug] ok.
Fri Feb 6 19:51:32 2015 - [info] All SSH connection tests passed successfully.

反馈信息显示All SSH connection tests passed successfully.才是SSH登录配置正确,否则需要根据错误信息需要配置。

检查候选主库的log-bin是否都启用

如果当前的从库没有设置log-bin,很明显按照主从复制的要求,是不能作为新的主库的,所以作为候选主库的从库节点必须启用log-bin,本例中只有2个从库节点,且都打算作为候选主库,故都需要启用log-bin。

检查复制过滤规则

MHA要求各个数据库节点的复制过滤规则(binlog-do-db, replicate-ignore-db)都一样。

检查复制配置

使用masterha_check_repl检查当前的MySQL集群拓扑结构是否符合MHA的要求。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
[root@localhost ~]# masterha_check_repl --conf=/etc/mha/mha.conf
Fri Feb 6 20:16:18 2015 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Feb 6 20:16:18 2015 - [info] Reading application default configuration from /etc/mha/mha.conf..
Fri Feb 6 20:16:18 2015 - [info] Reading server configuration from /etc/mha/mha.conf..
Fri Feb 6 20:16:18 2015 - [info] MHA::MasterMonitor version 0.56.
Fri Feb 6 20:16:18 2015 - [info] GTID failover mode = 0
Fri Feb 6 20:16:18 2015 - [info] Dead Servers:
Fri Feb 6 20:16:18 2015 - [info] Alive Servers:
Fri Feb 6 20:16:18 2015 - [info] 192.168.71.151(192.168.71.151:3306)
Fri Feb 6 20:16:18 2015 - [info] 192.168.71.152(192.168.71.152:3306)
Fri Feb 6 20:16:18 2015 - [info] 192.168.71.153(192.168.71.153:3306)
Fri Feb 6 20:16:18 2015 - [info] Alive Slaves:
Fri Feb 6 20:16:18 2015 - [info] 192.168.71.152(192.168.71.152:3306) Version=10.0.16-MariaDB-log (oldest major version between slaves) log-bin:enabled
Fri Feb 6 20:16:18 2015 - [info] Replicating from 192.168.71.151(192.168.71.151:3306)
Fri Feb 6 20:16:18 2015 - [info] 192.168.71.153(192.168.71.153:3306) Version=10.0.16-MariaDB-log (oldest major version between slaves) log-bin:enabled
Fri Feb 6 20:16:18 2015 - [info] Replicating from 192.168.71.151(192.168.71.151:3306)
Fri Feb 6 20:16:18 2015 - [info] Current Alive Master: 192.168.71.151(192.168.71.151:3306)
Fri Feb 6 20:16:18 2015 - [info] Checking slave configurations..
Fri Feb 6 20:16:18 2015 - [info] read_only=1 is not set on slave 192.168.71.152(192.168.71.152:3306).
Fri Feb 6 20:16:18 2015 - [warning] relay_log_purge=0 is not set on slave 192.168.71.152(192.168.71.152:3306).
Fri Feb 6 20:16:18 2015 - [info] read_only=1 is not set on slave 192.168.71.153(192.168.71.153:3306).
Fri Feb 6 20:16:18 2015 - [warning] relay_log_purge=0 is not set on slave 192.168.71.153(192.168.71.153:3306).
Fri Feb 6 20:16:18 2015 - [info] Checking replication filtering settings..
Fri Feb 6 20:16:18 2015 - [info] binlog_do_db= , binlog_ignore_db=
Fri Feb 6 20:16:18 2015 - [info] Replication filtering check ok.
Fri Feb 6 20:16:18 2015 - [info] GTID (with auto-pos) is not supported
Fri Feb 6 20:16:18 2015 - [info] Starting SSH connection tests..
Fri Feb 6 20:16:20 2015 - [info] All SSH connection tests passed successfully.
Fri Feb 6 20:16:20 2015 - [info] Checking MHA Node version..
Fri Feb 6 20:16:20 2015 - [info] Version check ok.
Fri Feb 6 20:16:20 2015 - [info] Checking SSH publickey authentication settings on the current master..
Fri Feb 6 20:16:20 2015 - [info] HealthCheck: SSH to 192.168.71.151 is reachable.
Fri Feb 6 20:16:21 2015 - [info] Master MHA Node version is 0.56.
Fri Feb 6 20:16:21 2015 - [info] Checking recovery script configurations on 192.168.71.151(192.168.71.151:3306)..
Fri Feb 6 20:16:21 2015 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/usr/local/mha/save_binary_logs_test --manager_version=0.56 --start_file=mysql-bin.000011
Fri Feb 6 20:16:21 2015 - [info] Connecting to root@192.168.71.151(192.168.71.151:22)..
Creating /usr/local/mha if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to mysql-bin.000011
Fri Feb 6 20:16:21 2015 - [info] Binlog setting check done.
Fri Feb 6 20:16:21 2015 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Fri Feb 6 20:16:21 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='joelhy' --slave_host=192.168.71.152 --slave_ip=192.168.71.152 --slave_port=3306 --workdir=/usr/local/mha --target_version=10.0.16-MariaDB-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Fri Feb 6 20:16:21 2015 - [info] Connecting to root@192.168.71.152(192.168.71.152:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to relay-bin.000002
Temporary relay log file is /var/lib/mysql/relay-bin.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Feb 6 20:16:22 2015 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='joelhy' --slave_host=192.168.71.153 --slave_ip=192.168.71.153 --slave_port=3306 --workdir=/usr/local/mha --target_version=10.0.16-MariaDB-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Fri Feb 6 20:16:22 2015 - [info] Connecting to root@192.168.71.153(192.168.71.153:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to relay-bin.000005
Temporary relay log file is /var/lib/mysql/relay-bin.000005
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Fri Feb 6 20:16:23 2015 - [info] Slaves settings check done.
Fri Feb 6 20:16:23 2015 - [info]
192.168.71.151(192.168.71.151:3306) (current master)
+--192.168.71.152(192.168.71.152:3306)
+--192.168.71.153(192.168.71.153:3306)
Fri Feb 6 20:16:23 2015 - [info] Checking replication health on 192.168.71.152..
Fri Feb 6 20:16:23 2015 - [info] ok.
Fri Feb 6 20:16:23 2015 - [info] Checking replication health on 192.168.71.153..
Fri Feb 6 20:16:23 2015 - [info] ok.
Fri Feb 6 20:16:23 2015 - [warning] master_ip_failover_script is not defined.
Fri Feb 6 20:16:23 2015 - [warning] shutdown_script is not defined.
Fri Feb 6 20:16:23 2015 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.

注意最后一句为MySQL Replication Health is OK.,如果是MySQL Replication Health is NOT OK!,则需要根据反馈的错误信息修改配置。

启动MHA Manager

在管理节点 192.168.71.154 上执行

1
[root@localhost ~]# nohup masterha_manager --conf=/etc/mha/mha.conf &

停止MHA Manager

如果需要停止MHA Mnager,可运行命令

1
2
[root@localhost ~]# masterha_stop --conf=/etc/mha/mha.conf
Stopped mha successfully.

测试failover

可在MHA Manager管理节点运行MHA Manager时,测试failover。例如直接关闭主库的操作系统或者关闭主库的MySQL服务
在主库节点 192.168.71.151上关闭MySQL:

1
2
[root@localhost ~]# service mysql stop
Shutting down MySQL.. SUCCESS!

注意此时管理节点的MHA Manager会有很长的输出,最后一句为Master failover to 192.168.71.152(192.168.71.152:3306) completed successfully.,说明主库已经failover到192.168.71.152从库节点了。
注意:MHA Manager在failover成功后会自动退出。出故障的原主库在配置更改完后,重新启用后,需要在管理节点上再次运行masterha_check_ssh --conf=/etc/mha/mha.conf,没问题的话,再启动MHA Manager:nohup masterha_manager --conf=/etc/mha/mha.conf &

参考文献

[1] https://code.google.com/p/mysql-master-ha/wiki/TableOfContents?tm=6
[2] 再谈Mysql MHA

Contents
  1. 1. 环境介绍
  2. 2. 安装MHA
    1. 2.1. 下载MHA Node和MHA Manager
      1. 2.1.1. 下载MHA Manager
      2. 2.1.2. 下载MHA Node
    2. 2.2. 安装
  3. 3. SSH密钥登录方式设置
  4. 4. 配置MHA管理节点
  5. 5. 测试节点间的SSH登录
  6. 6. 检查候选主库的log-bin是否都启用
  7. 7. 检查复制过滤规则
  8. 8. 检查复制配置
  9. 9. 启动MHA Manager
  10. 10. 停止MHA Manager
  11. 11. 测试failover
  12. 12. 参考文献