MySQL主从复制搭建实战:数据同步零丢失配置指南
一、概述
1. 背景介绍
在生产环境中运维MySQL数据库十年,主从复制是遇到最频繁的架构需求之一。无论是为了读写分离提升性能,还是为了数据容灾保障业务连续性,主从复制都是MySQL高可用架构的基石。
2019年的一次惨痛经历至今记忆犹新:某电商平台大促期间,单机MySQL扛不住流量直接宕机,由于没有从库,数据恢复花了整整6个小时,直接损失数百万。这次事故之后,主从复制成为所有核心业务数据库的标配。
A. 核心原理
MySQL主从复制的核心原理是:主库将数据变更写入二进制日志(Binary Log),从库通过I/O线程读取主库的binlog并写入本地的中继日志(Relay Log),然后SQL线程读取中继日志并在从库上重放这些变更,从而实现数据同步。
2. 技术特点
MySQL 8.0/8.4 LTS版本在复制功能上有了重大改进:
A. 基于GTID的复制
- 全局事务标识符(GTID)让每个事务都有唯一标识
- 故障切换时自动定位复制位点,无需手动指定binlog文件和位置
- 支持自动故障转移,大幅简化运维复杂度
B. 增强的半同步复制
- 主库提交事务前必须收到至少一个从库的确认
- 配合rpl_semi_sync_source_wait_for_replica_count参数可要求多个从库确认
- 有效防止主库故障时的数据丢失
C. 并行复制优化
- 基于WriteSet的并行复制,大幅提升从库回放效率
- replica_parallel_type=LOGICAL_CLOCK配合binlog_transaction_dependency_tracking=WRITESET
- 实测可将复制延迟从分钟级降到秒级
D. 克隆插件支持
- MySQL 8.0.17+引入Clone Plugin
- 从库初始化无需mysqldump或xtrabackup
- 自动完成数据同步和复制配置
3. 适用场景
| 场景 | 说明 | 推荐配置 |
|---|---|---|
| 读写分离 | 主库写、从库读,分担主库压力 | 异步复制 + 多从库 |
| 数据容灾 | 主库故障时快速切换到从库 | 半同步复制 + GTID |
| 数据备份 | 在从库执行备份,不影响主库性能 | 异步复制 + 延迟从库 |
| 数据分析 | 报表查询在从库执行 | 异步复制 + 专用从库 |
| 跨地域部署 | 多数据中心数据同步 | 异步复制 + 压缩传输 |
4. 环境要求
| 组件 | 版本要求 | 说明 |
|---|---|---|
| MySQL Server | 8.0.35+ 或 8.4 LTS | 推荐使用8.4 LTS长期支持版本 |
| 操作系统 | Rocky 9 / Ubuntu 24.04 | 内核版本5.14+ |
| 内存 | 最低4GB,推荐16GB+ | InnoDB Buffer Pool需要足够内存 |
| 磁盘 | SSD,IOPS 3000+ | binlog和数据文件分盘存储 |
| 网络 | 主从之间延迟<1ms | 同机房部署最佳 |
本文实验环境:
- 主库:192.168.1.100(master.example.com)
- 从库:192.168.1.101(slave.example.com)
- 系统:Rocky Linux 9.3
- MySQL:8.4.0 LTS
二、准备工作
1. 系统环境配置
在主库和从库服务器上执行相同的系统配置:
# 关闭SELinux(或配置相应策略)
sudo setenforce 0
sudo sed -i 's/SELINUX=enforcing/SELINUX=permissive/' /etc/selinux/config
# 配置防火墙
sudo firewall-cmd --permanent --add-port=3306/tcp
sudo firewall-cmd --reload
# 配置时间同步(主从时间必须一致)
sudo dnf install -y chrony
sudo systemctl enable --now chronyd
# 验证时间同步状态
chronyc tracking
# 配置主机名解析
echo "192.168.1.100 master.example.com master" | sudo tee -a /etc/hosts
echo "192.168.1.101 slave.example.com slave" | sudo tee -a /etc/hosts
# 优化系统参数
cat << 'EOF' | sudo tee /etc/sysctl.d/mysql.conf
# 网络优化
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_keepalive_time = 600
# 内存优化
vm.swappiness = 1
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5
# 文件描述符
fs.file-max = 6815744
fs.aio-max-nr = 1048576
EOF
sudo sysctl -p /etc/sysctl.d/mysql.conf
# 配置用户资源限制
cat << 'EOF' | sudo tee /etc/security/limits.d/mysql.conf
mysql soft nofile 65535
mysql hard nofile 65535
mysql soft nproc 65535
mysql hard nproc 65535
EOF2. 安装MySQL 8.4 LTS
# 添加MySQL官方仓库
sudo dnf install -y https://dev.mysql.com/get/mysql84-community-release-el9-1.noarch.rpm
# 安装MySQL Server
sudo dnf install -y mysql-community-server mysql-community-client
# 查看安装版本
mysql --version
# mysql Ver 8.4.0 for Linux on x86_64 (MySQL Community Server - GPL)
# 创建数据目录(建议数据和日志分盘)
sudo mkdir -p /data/mysql/{data,logs,binlog,relaylog,tmp}
sudo chown -R mysql:mysql /data/mysql
sudo chmod 750 /data/mysql3. 生成server-id
MySQL主从复制要求每个节点有唯一的server-id。推荐使用IP地址最后两段作为server-id:
# 主库(192.168.1.100)
# server-id = 1100
# 从库(192.168.1.101)
# server-id = 1101三、核心配置
1. 主库配置
创建主库配置文件 /etc/my.cnf:
[mysqld]
# 基础配置
user = mysql
port = 3306
basedir = /usr
datadir = /data/mysql/data
tmpdir = /data/mysql/tmp
socket = /var/lib/mysql/mysql.sock
pid-file = /var/run/mysqld/mysqld.pid
# 字符集配置
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init_connect = 'SET NAMES utf8mb4'
# 主从复制核心配置
server-id = 1100
log-bin = /data/mysql/binlog/mysql-bin
binlog_format = ROW
binlog_row_image = FULL
max_binlog_size = 1G
binlog_expire_logs_seconds = 604800
sync_binlog = 1
# GTID配置(强烈推荐)
gtid_mode = ON
enforce_gtid_consistency = ON
binlog_gtid_simple_recovery = ON
# 半同步复制配置
# MySQL 8.0.26+使用新参数名
rpl_semi_sync_source_enabled = ON
rpl_semi_sync_source_timeout = 10000
rpl_semi_sync_source_wait_for_replica_count = 1
rpl_semi_sync_source_wait_point = AFTER_SYNC
# InnoDB配置
innodb_buffer_pool_size = 8G
innodb_buffer_pool_instances = 8
innodb_log_file_size = 2G
innodb_log_buffer_size = 64M
innodb_flush_log_at_trx_commit = 1
innodb_flush_method = O_DIRECT
innodb_file_per_table = ON
innodb_io_capacity = 4000
innodb_io_capacity_max = 8000
innodb_read_io_threads = 8
innodb_write_io_threads = 8
# 连接配置
max_connections = 2000
max_connect_errors = 100000
wait_timeout = 600
interactive_timeout = 600
# 日志配置
log_error = /data/mysql/logs/error.log
slow_query_log = ON
slow_query_log_file = /data/mysql/logs/slow.log
long_query_time = 1
log_queries_not_using_indexes = ON
# 性能优化
table_open_cache = 4000
table_definition_cache = 2000
thread_cache_size = 128
sort_buffer_size = 4M
join_buffer_size = 4M
read_buffer_size = 2M
read_rnd_buffer_size = 8M
# 安全配置
local_infile = OFF
skip_symbolic_links = ON
sql_mode = STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
[client]
socket = /var/lib/mysql/mysql.sock
default-character-set = utf8mb4
[mysql]
socket = /var/lib/mysql/mysql.sock
default-character-set = utf8mb4
prompt = "\\u@\\h [\\d]> "2. 从库配置
创建从库配置文件 /etc/my.cnf:
[mysqld]
# 基础配置
user = mysql
port = 3306
basedir = /usr
datadir = /data/mysql/data
tmpdir = /data/mysql/tmp
socket = /var/lib/mysql/mysql.sock
pid-file = /var/run/mysqld/mysqld.pid
# 字符集配置
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init_connect = 'SET NAMES utf8mb4'
# 主从复制核心配置
server-id = 1101
log-bin = /data/mysql/binlog/mysql-bin
binlog_format = ROW
binlog_row_image = FULL
max_binlog_size = 1G
binlog_expire_logs_seconds = 604800
sync_binlog = 1
# 从库开启binlog(用于级联复制或切换为主库)
log_replica_updates = ON
# 中继日志配置
relay_log = /data/mysql/relaylog/relay-bin
relay_log_recovery = ON
relay_log_info_repository = TABLE
relay_log_purge = ON
# GTID配置
gtid_mode = ON
enforce_gtid_consistency = ON
binlog_gtid_simple_recovery = ON
# 半同步复制配置(从库端)
rpl_semi_sync_replica_enabled = ON
# 并行复制配置(大幅降低复制延迟)
replica_parallel_type = LOGICAL_CLOCK
replica_parallel_workers = 16
replica_preserve_commit_order = ON
binlog_transaction_dependency_tracking = WRITESET
transaction_write_set_extraction = XXHASH64
# 复制过滤(按需配置)
# replicate_do_db = app_db
# replicate_ignore_db = test
# replicate_wild_do_table = app_%.%
# replicate_wild_ignore_table = %.tmp_%
# 从库只读配置
read_only = ON
super_read_only = ON
# InnoDB配置
innodb_buffer_pool_size = 8G
innodb_buffer_pool_instances = 8
innodb_log_file_size = 2G
innodb_log_buffer_size = 64M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_file_per_table = ON
innodb_io_capacity = 4000
innodb_io_capacity_max = 8000
innodb_read_io_threads = 8
innodb_write_io_threads = 8
# 连接配置
max_connections = 2000
max_connect_errors = 100000
wait_timeout = 600
interactive_timeout = 600
# 日志配置
log_error = /data/mysql/logs/error.log
slow_query_log = ON
slow_query_log_file = /data/mysql/logs/slow.log
long_query_time = 1
log_queries_not_using_indexes = ON
# 性能优化
table_open_cache = 4000
table_definition_cache = 2000
thread_cache_size = 128
sort_buffer_size = 4M
join_buffer_size = 4M
read_buffer_size = 2M
read_rnd_buffer_size = 8M
# 安全配置
local_infile = OFF
skip_symbolic_links = ON
sql_mode = STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION
[client]
socket = /var/lib/mysql/mysql.sock
default-character-set = utf8mb4
[mysql]
socket = /var/lib/mysql/mysql.sock
default-character-set = utf8mb4
prompt = "\\u@\\h [\\d]> "四、启动和验证
1. 初始化并启动主库
# 初始化数据目录
sudo mysqld --initialize --user=mysql --datadir=/data/mysql/data
# 获取临时密码
sudo grep 'temporary password' /data/mysql/logs/error.log
# 2024-01-15T10:30:45.123456Z 6 [Note] [MY-010454] [Server] A temporary password is generated for root@localhost: Abc123!@#xyz
# 启动MySQL服务
sudo systemctl start mysqld
sudo systemctl enable mysqld
# 修改root密码并进行安全配置
mysql -uroot -p'Abc123!@#xyz' --connect-expired-password << 'EOF'
ALTER USER 'root'@'localhost' IDENTIFIED BY 'YourStrongPassword@2024';
DELETE FROM mysql.user WHERE User='';
DROP DATABASE IF EXISTS test;
DELETE FROM mysql.db WHERE Db='test' OR Db='test\_%';
FLUSH PRIVILEGES;
EOF
# 安装半同步复制插件
mysql -uroot -p'YourStrongPassword@2024' << 'EOF'
INSTALL PLUGIN rpl_semi_sync_source SONAME 'semisync_source.so';
SET GLOBAL rpl_semi_sync_source_enabled = ON;
SHOW PLUGINS WHERE Name LIKE '%semi%';
EOF2. 创建复制用户
在主库上创建专用的复制用户:
-- 登录主库
mysql -uroot -p'YourStrongPassword@2024'
-- 创建复制用户
CREATE USER 'repl_user'@'192.168.1.%' IDENTIFIED WITH caching_sha2_password BY 'ReplPassword@2024';
-- 授予复制权限
GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'repl_user'@'192.168.1.%';
-- 如果使用Clone插件初始化从库,还需要额外权限
GRANT BACKUP_ADMIN ON *.* TO 'repl_user'@'192.168.1.%';
FLUSH PRIVILEGES;
-- 查看主库状态
SHOW MASTER STATUS\G
/*
*************************** 1. row ***************************
File: mysql-bin.000003
Position: 856
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: 3e11fa47-c5ab-11ee-9c88-000c29123456:1-15
*/
-- 查看GTID执行状态
SELECT @@GLOBAL.GTID_EXECUTED;3. 初始化并启动从库
A. 方法一:使用Clone插件(推荐,MySQL 8.0.17+)
# 从库执行初始化(空数据目录)
sudo mysqld --initialize --user=mysql --datadir=/data/mysql/data
# 获取临时密码并启动
sudo grep 'temporary password' /data/mysql/logs/error.log
sudo systemctl start mysqld
# 修改root密码
mysql -uroot -p'临时密码' --connect-expired-password << 'EOF'
ALTER USER 'root'@'localhost' IDENTIFIED BY 'YourStrongPassword@2024';
EOF-- 登录从库
mysql -uroot -p'YourStrongPassword@2024'
-- 安装Clone插件
INSTALL PLUGIN clone SONAME 'mysql_clone.so';
-- 配置克隆源(主库地址)
SET GLOBAL clone_valid_donor_list = '192.168.1.100:3306';
-- 执行克隆操作(会覆盖从库所有数据并重启)
CLONE INSTANCE FROM 'repl_user'@'192.168.1.100':3306 IDENTIFIED BY 'ReplPassword@2024';
-- 克隆完成后MySQL会自动重启,重新登录
mysql -uroot -p'YourStrongPassword@2024'
-- 安装半同步复制插件
INSTALL PLUGIN rpl_semi_sync_replica SONAME 'semisync_replica.so';
SET GLOBAL rpl_semi_sync_replica_enabled = ON;B. 方法二:使用mysqldump(适合小型数据库)
# 在主库执行导出
mysqldump -uroot -p'YourStrongPassword@2024' \
--all-databases \
--single-transaction \
--routines \
--triggers \
--events \
--set-gtid-purged=ON \
--source-data=2 \
> /tmp/full_backup.sql
# 传输到从库
scp /tmp/full_backup.sql slave:/tmp/
# 在从库导入
mysql -uroot -p'YourStrongPassword@2024' < /tmp/full_backup.sql4. 配置从库复制
-- 登录从库
mysql -uroot -p'YourStrongPassword@2024'
-- 配置复制(使用GTID模式)
CHANGE REPLICATION SOURCE TO
SOURCE_HOST = '192.168.1.100',
SOURCE_PORT = 3306,
SOURCE_USER = 'repl_user',
SOURCE_PASSWORD = 'ReplPassword@2024',
SOURCE_AUTO_POSITION = 1,
SOURCE_CONNECT_RETRY = 10,
SOURCE_RETRY_COUNT = 3,
GET_SOURCE_PUBLIC_KEY = 1;
-- 启动复制
START REPLICA;
-- 查看复制状态
SHOW REPLICA STATUS\G5. 验证复制状态
-- 在从库执行,检查关键指标
SHOW REPLICA STATUS\G
/*
关键字段说明:
Replica_IO_Running: Yes -- I/O线程运行状态
Replica_SQL_Running: Yes -- SQL线程运行状态
Seconds_Behind_Source: 0 -- 复制延迟(秒)
Retrieved_Gtid_Set: ... -- 已接收的GTID集合
Executed_Gtid_Set: ... -- 已执行的GTID集合
Auto_Position: 1 -- GTID自动定位
Source_SSL_Allowed: Yes -- SSL状态
*/
-- 检查半同步状态(主库)
SHOW STATUS LIKE 'Rpl_semi_sync%';
/*
Rpl_semi_sync_source_status | ON
Rpl_semi_sync_source_clients | 1
Rpl_semi_sync_source_yes_tx | 156
Rpl_semi_sync_source_no_tx | 0
*/
-- 检查半同步状态(从库)
SHOW STATUS LIKE 'Rpl_semi_sync%';
/*
Rpl_semi_sync_replica_status | ON
*/6. 测试数据同步
-- 主库创建测试数据
CREATE DATABASE test_repl;
USE test_repl;
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
username VARCHAR(50) NOT NULL,
email VARCHAR(100),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB;
INSERT INTO users (username, email) VALUES
('user1', 'user1@example.com'),
('user2', 'user2@example.com'),
('user3', 'user3@example.com');
-- 从库验证数据
USE test_repl;
SELECT * FROM users;
/*
+----+----------+-------------------+---------------------+
| id | username | email | created_at |
+----+----------+-------------------+---------------------+
| 1 | user1 | user1@example.com | 2024-01-15 10:45:30 |
| 2 | user2 | user2@example.com | 2024-01-15 10:45:30 |
| 3 | user3 | user2@example.com | 2024-01-15 10:45:30 |
+----+----------+-------------------+---------------------+
*/五、架构模式
1. 一主多从架构
生产环境中常见的一主多从架构,通常配置2-3个从库分担读压力:
graph TD
M[Master<br/>192.168.1.100] --> S1[Slave1<br/>192.168.1.101<br/>读请求]
M --> S2[Slave2<br/>192.168.1.102<br/>读请求]
M --> S3[Slave3<br/>192.168.1.103<br/>备份专用]从库1和从库2的配置基本相同,从库3作为备份专用,可配置为延迟从库:
# 从库3配置(延迟复制,用于误操作恢复)
[mysqld]
# ...其他配置同上...
# 延迟复制配置:延迟1小时
# 在CHANGE REPLICATION SOURCE时指定
# SOURCE_DELAY = 3600配置延迟从库:
CHANGE REPLICATION SOURCE TO
SOURCE_HOST = '192.168.1.100',
SOURCE_PORT = 3306,
SOURCE_USER = 'repl_user',
SOURCE_PASSWORD = 'ReplPassword@2024',
SOURCE_AUTO_POSITION = 1,
SOURCE_DELAY = 3600; -- 延迟1小时
START REPLICA;2. 级联复制架构
当从库数量较多时,可采用级联复制减轻主库压力:
graph TD
M[Master<br/>192.168.1.100] --> RS[Relay Slave<br/>192.168.1.101]
RS --> S2[Slave2<br/>192.168.1.102]
RS --> S3[Slave3<br/>192.168.1.103]中继从库配置要点:
# 中继从库关键配置
[mysqld]
server-id = 1101
log-bin = /data/mysql/binlog/mysql-bin
# 必须开启此选项,将接收到的事务写入自己的binlog
log_replica_updates = ON
# 其他配置同普通从库六、实际应用案例
1. 电商平台读写分离
某电商平台日活百万,订单系统数据库面临高并发压力。
A. 需求分析
- 写入QPS:约500
- 读取QPS:约5000
- 数据量:200GB
- 可用性要求:99.99%
B. 架构设计
sequenceDiagram
participant App as 应用服务器
participant M as 主库<br/>写操作
participant S1 as 从库1<br/>读操作
participant S2 as 从库2<br/>读操作
App->>M: 写入订单
M-->>App: 写入成功
App->>S1: 查询订单
S1-->>App: 返回数据
App->>S2: 统计报表
S2-->>App: 返回结果C. 应用层配置示例(Python + SQLAlchemy)
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
import random
class MySQLRouter:
"""MySQL主从路由器"""
def __init__(self):
# 主库连接(写操作)
self.master = create_engine(
'mysql+pymysql://app_user:password@192.168.1.100:3306/order_db',
pool_size=20,
max_overflow=10,
pool_pre_ping=True,
pool_recycle=3600
)
# 从库连接池(读操作)
self.slaves = [
create_engine(
'mysql+pymysql://app_user:password@192.168.1.101:3306/order_db',
pool_size=30,
max_overflow=20,
pool_pre_ping=True,
pool_recycle=3600
),
create_engine(
'mysql+pymysql://app_user:password@192.168.1.102:3306/order_db',
pool_size=30,
max_overflow=20,
pool_pre_ping=True,
pool_recycle=3600
)
]
def get_read_connection(self):
"""获取读连接(负载均衡)"""
return random.choice(self.slaves)
def get_write_connection(self):
"""获取写连接"""
return self.master
# 使用示例
router = MySQLRouter()
# 写操作
with router.get_write_connection().connect() as conn:
conn.execute(text("INSERT INTO orders (user_id, amount) VALUES (:uid, :amt)"),
{"uid": 1001, "amt": 99.99})
conn.commit()
# 读操作
with router.get_read_connection().connect() as conn:
result = conn.execute(text("SELECT * FROM orders WHERE user_id = :uid"),
{"uid": 1001})
orders = result.fetchall()2. 跨机房容灾部署
某金融系统要求同城双活,异地灾备:
graph LR
subgraph 北京机房
M[Master<br/>192.168.1.100] --> SS[Sync Slave<br/>半同步复制]
end
subgraph 上海机房
M -->|异步复制<br/>30ms| DR[DR Slave<br/>192.168.2.100]
end异地从库特殊配置:
# 异地从库配置
[mysqld]
server-id = 2100
# 压缩传输减少带宽消耗
replica_compressed_protocol = ON
# 适当增大超时时间(跨机房网络延迟)
replica_net_timeout = 120
# 增大中继日志空间
relay_log_space_limit = 20G
# 允许较大的复制延迟
# 通过监控而非配置限制七、最佳实践
1. 性能优化
A. 并行复制调优
MySQL 8.0的并行复制能显著降低复制延迟,关键参数调优:
-- 查看当前并行复制配置
SHOW VARIABLES LIKE '%replica_parallel%';
SHOW VARIABLES LIKE '%binlog_transaction_dependency%';
-- 推荐配置(从库执行)
SET GLOBAL replica_parallel_type = 'LOGICAL_CLOCK';
SET GLOBAL replica_parallel_workers = 16; -- 通常设为CPU核数
SET GLOBAL replica_preserve_commit_order = ON;
SET GLOBAL binlog_transaction_dependency_tracking = 'WRITESET';
-- 监控并行复制效果
SELECT * FROM performance_schema.replication_applier_status_by_worker;B. 网络传输优化
# 主库配置
[mysqld]
# 增大binlog缓存
binlog_cache_size = 4M
binlog_stmt_cache_size = 4M
# 批量发送优化
replica_net_timeout = 60# 从库配置(跨机房时启用)
[mysqld]
# 启用压缩传输
replica_compressed_protocol = ON2. 安全加固
A. 网络安全
-- 复制用户仅允许特定IP段访问
CREATE USER 'repl_user'@'192.168.1.0/255.255.255.0'
IDENTIFIED WITH caching_sha2_password BY 'StrongPassword@2024'
REQUIRE SSL; -- 强制SSL连接
GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'192.168.1.0/255.255.255.0';B. SSL加密复制
# 生成SSL证书
mkdir -p /etc/mysql/ssl
cd /etc/mysql/ssl
# CA证书
openssl genrsa 2048 > ca-key.pem
openssl req -new -x509 -nodes -days 3650 -key ca-key.pem -out ca-cert.pem \
-subj "/C=CN/ST=Beijing/L=Beijing/O=Example/CN=MySQL-CA"
# 主库证书
openssl req -newkey rsa:2048 -nodes -keyout master-key.pem -out master-req.pem \
-subj "/C=CN/ST=Beijing/L=Beijing/O=Example/CN=master.example.com"
openssl x509 -req -in master-req.pem -days 3650 -CA ca-cert.pem -CAkey ca-key.pem \
-set_serial 01 -out master-cert.pem
# 从库证书
openssl req -newkey rsa:2048 -nodes -keyout slave-key.pem -out slave-req.pem \
-subj "/C=CN/ST=Beijing/L=Beijing/O=Example/CN=slave.example.com"
openssl x509 -req -in slave-req.pem -days 3650 -CA ca-cert.pem -CAkey ca-key.pem \
-set_serial 02 -out slave-cert.pem
# 设置权限
chown mysql:mysql *.pem
chmod 600 *-key.pem
chmod 644 *-cert.pem ca-cert.pem# 主库SSL配置
[mysqld]
ssl-ca = /etc/mysql/ssl/ca-cert.pem
ssl-cert = /etc/mysql/ssl/master-cert.pem
ssl-key = /etc/mysql/ssl/master-key.pem
require_secure_transport = ON-- 从库配置SSL复制
CHANGE REPLICATION SOURCE TO
SOURCE_HOST = '192.168.1.100',
SOURCE_USER = 'repl_user',
SOURCE_PASSWORD = 'StrongPassword@2024',
SOURCE_AUTO_POSITION = 1,
SOURCE_SSL = 1,
SOURCE_SSL_CA = '/etc/mysql/ssl/ca-cert.pem',
SOURCE_SSL_CERT = '/etc/mysql/ssl/slave-cert.pem',
SOURCE_SSL_KEY = '/etc/mysql/ssl/slave-key.pem',
SOURCE_SSL_VERIFY_SERVER_CERT = 1;3. 配置注意事项
- server-id必须唯一:整个复制拓扑中,每个MySQL实例的server-id必须不同
GTID模式限制:
- 不能使用CREATE TABLE ... SELECT语句
- 不能在事务中混合使用事务表和非事务表
- 不能使用CREATE TEMPORARY TABLE(事务内)
binlog_format选择:
- ROW:记录每行数据变化,数据一致性最好,推荐使用
- STATEMENT:记录SQL语句,binlog体积小,但某些函数可能导致不一致
- MIXED:自动选择,不推荐
4. 常见错误处理
| 错误码 | 错误描述 | 原因分析 | 解决方案 |
|---|---|---|---|
| 1062 | Duplicate entry for key | 从库已存在相同主键数据 | 使用pt-table-sync同步数据 |
| 1032 | Can't find record | 从库缺少要更新/删除的数据 | 检查数据一致性,使用pt-table-checksum校验 |
| 1236 | Binary log position error | binlog位置无效或已被清理 | 重新初始化从库或使用GTID重建复制 |
| 1045 | Access denied | 复制用户权限不足 | 检查用户权限和密码 |
| 1593 | Server_id not set | server-id未配置 | 配置唯一的server-id |
| 1794 | Slave SQL thread retried | 复制冲突重试 | 检查并发写入或网络问题 |
| 13117 | GTID_PURGED can't be set | GTID集合冲突 | 使用RESET MASTER清理后重新配置 |
八、故障排查和监控
1. 关键监控指标
| 指标名称 | 说明 | 告警阈值 | 采集方法 |
|---|---|---|---|
| Seconds_Behind_Source | 复制延迟秒数 | >10s告警,>60s严重 | SHOW REPLICA STATUS |
| Replica_IO_Running | I/O线程状态 | !=Yes告警 | SHOW REPLICA STATUS |
| Replica_SQL_Running | SQL线程状态 | !=Yes告警 | SHOW REPLICA STATUS |
| Rpl_semi_sync_source_clients | 半同步从库数 | <1告警 | SHOW STATUS |
| Rpl_semi_sync_source_no_tx | 降级异步次数 | >0告警 | SHOW STATUS |
| Relay_Log_Space | 中继日志大小 | >10GB告警 | SHOW REPLICA STATUS |
2. 常见问题诊断
A. 复制延迟过大
-- 查看复制延迟
SHOW REPLICA STATUS\G
-- 分析延迟原因
-- 1. 检查从库负载
SHOW PROCESSLIST;
SELECT * FROM performance_schema.threads WHERE TYPE = 'BACKGROUND' AND NAME LIKE '%replica%';
-- 2. 检查是否有大事务
SELECT * FROM performance_schema.events_transactions_current WHERE STATE = 'ACTIVE'\G
-- 3. 检查并行复制效率
SELECT WORKER_ID, LAST_SEEN_TRANSACTION, APPLYING_TRANSACTION, LAST_APPLIED_TRANSACTION
FROM performance_schema.replication_applier_status_by_worker;
-- 4. 检查是否有锁等待
SELECT * FROM performance_schema.data_lock_waits;B. I/O线程停止
-- 查看I/O线程状态
SHOW REPLICA STATUS\G
-- 关注:Replica_IO_Running, Last_IO_Error, Last_IO_Errno
-- 常见原因:
-- 1. 网络连接问题
-- 2. 复制用户权限问题
-- 3. 主库binlog被清理
-- 4. 主库server-id变更
-- 重新连接测试
STOP REPLICA IO_THREAD;
START REPLICA IO_THREAD;C. SQL线程停止
-- 查看SQL线程状态
SHOW REPLICA STATUS\G
-- 关注:Replica_SQL_Running, Last_SQL_Error, Last_SQL_Errno
-- 查看最后执行的事务
SELECT * FROM performance_schema.replication_applier_status_by_coordinator\G
SELECT * FROM performance_schema.replication_applier_status_by_worker
WHERE LAST_ERROR_NUMBER != 0\G
-- 数据不一致时的处理
-- 1. 使用pt-table-checksum校验
pt-table-checksum --user=root --password=xxx --host=192.168.1.100 \
--databases=mydb --tables=mytable
-- 2. 使用pt-table-sync修复
pt-table-sync --print --user=root --password=xxx \
--sync-to-master h=192.168.1.101,D=mydb,t=mytable3. 备份与恢复
A. 从库备份策略
利用从库进行备份,不影响主库性能:
#!/bin/bash
# 文件:/opt/scripts/backup_from_slave.sh
# 功能:从库热备份脚本
BACKUP_DIR="/backup/mysql"
DATE=$(date +%Y%m%d_%H%M%S)
MYSQL_USER="backup"
MYSQL_PASS="BackupPass@2024"
RETENTION_DAYS=7
# 创建备份目录
mkdir -p ${BACKUP_DIR}/${DATE}
# 停止复制SQL线程,确保数据一致性
mysql -u${MYSQL_USER} -p${MYSQL_PASS} -e "STOP REPLICA SQL_THREAD;"
# 记录复制位置
mysql -u${MYSQL_USER} -p${MYSQL_PASS} -e "SHOW REPLICA STATUS\G" > ${BACKUP_DIR}/${DATE}/replica_status.txt
# 执行备份
xtrabackup --backup \
--user=${MYSQL_USER} \
--password=${MYSQL_PASS} \
--target-dir=${BACKUP_DIR}/${DATE}/data \
--slave-info \
--safe-slave-backup
# 恢复复制
mysql -u${MYSQL_USER} -p${MYSQL_PASS} -e "START REPLICA SQL_THREAD;"
# 压缩备份
cd ${BACKUP_DIR}
tar -czf ${DATE}.tar.gz ${DATE}
rm -rf ${DATE}
# 清理过期备份
find ${BACKUP_DIR} -name "*.tar.gz" -mtime +${RETENTION_DAYS} -delete
echo "Backup completed: ${BACKUP_DIR}/${DATE}.tar.gz"B. 使用备份重建从库
# 解压备份
cd /backup/mysql
tar -xzf 20240115_100000.tar.gz
# 准备备份
xtrabackup --prepare --target-dir=/backup/mysql/20240115_100000/data
# 停止MySQL
systemctl stop mysqld
# 清空数据目录
rm -rf /data/mysql/data/*
# 恢复数据
xtrabackup --copy-back --target-dir=/backup/mysql/20240115_100000/data
# 修改权限
chown -R mysql:mysql /data/mysql/data
# 启动MySQL
systemctl start mysqld
# 配置复制(从xtrabackup_slave_info获取位置信息)
cat /backup/mysql/20240115_100000/data/xtrabackup_slave_info
# CHANGE REPLICATION SOURCE TO SOURCE_AUTO_POSITION = 1
mysql -uroot -p << 'EOF'
CHANGE REPLICATION SOURCE TO
SOURCE_HOST = '192.168.1.100',
SOURCE_USER = 'repl_user',
SOURCE_PASSWORD = 'ReplPassword@2024',
SOURCE_AUTO_POSITION = 1;
START REPLICA;
EOF九、总结
1. 技术要点回顾
- GTID是现代MySQL复制的标配:简化故障切换,避免手动指定binlog位置
- 半同步复制保障数据安全:rpl_semi_sync_source_wait_point=AFTER_SYNC配合超时设置,在性能和安全间取得平衡
- 并行复制大幅降低延迟:WRITESET依赖追踪配合多个worker线程,将复制延迟从分钟级降到秒级
- Clone插件简化从库初始化:无需依赖外部工具,自动完成数据同步和复制配置
- 监控告警是高可用基础:复制延迟、线程状态、半同步状态必须纳入监控体系
2. 进阶学习方向
- MySQL Group Replication:多主架构,自动故障转移
- MySQL InnoDB Cluster:官方高可用解决方案
- MySQL Router:自动读写分离和故障切换
- Orchestrator:复制拓扑管理和自动化故障恢复
- ProxySQL:高性能MySQL代理
3. 命令速查表
| 命令 | 说明 |
|---|---|
| SHOW MASTER STATUS | 查看主库binlog位置 |
| SHOW REPLICA STATUS\G | 查看从库复制状态 |
| START REPLICA | 启动复制 |
| STOP REPLICA | 停止复制 |
| RESET REPLICA ALL | 清除所有复制配置 |
| SHOW BINARY LOGS | 查看binlog文件列表 |
| PURGE BINARY LOGS TO 'file' | 清理指定文件之前的binlog |
| SELECT @@GLOBAL.GTID_EXECUTED | 查看已执行的GTID集合 |
| SHOW PROCESSLIST | 查看当前连接和复制线程 |
| SHOW VARIABLES LIKE '%replica%' | 查看复制相关配置 |
4. 术语表
| 术语 | 英文 | 说明 |
|---|---|---|
| 主库 | Master/Source | 提供写入的数据库节点 |
| 从库 | Slave/Replica | 复制主库数据的数据库节点 |
| 二进制日志 | Binary Log | 记录数据变更的日志文件 |
| 中继日志 | Relay Log | 从库缓存主库binlog的日志文件 |
| GTID | Global Transaction ID | 全局事务标识符 |
| 半同步复制 | Semi-sync Replication | 主库等待从库确认的复制模式 |
| 异步复制 | Async Replication | 主库不等待从库确认的复制模式 |
| 复制延迟 | Replication Lag | 从库数据落后主库的时间 |
| I/O线程 | I/O Thread | 从库接收binlog的线程 |
| SQL线程 | SQL Thread | 从库回放事务的线程 |