MySQL主从复制搭建实战:数据同步零丢失配置指南

一、概述

1. 背景介绍

在生产环境中运维MySQL数据库十年,主从复制是遇到最频繁的架构需求之一。无论是为了读写分离提升性能,还是为了数据容灾保障业务连续性,主从复制都是MySQL高可用架构的基石。

2019年的一次惨痛经历至今记忆犹新:某电商平台大促期间,单机MySQL扛不住流量直接宕机,由于没有从库,数据恢复花了整整6个小时,直接损失数百万。这次事故之后,主从复制成为所有核心业务数据库的标配。

A. 核心原理

MySQL主从复制的核心原理是:主库将数据变更写入二进制日志(Binary Log),从库通过I/O线程读取主库的binlog并写入本地的中继日志(Relay Log),然后SQL线程读取中继日志并在从库上重放这些变更,从而实现数据同步。

2. 技术特点

MySQL 8.0/8.4 LTS版本在复制功能上有了重大改进:

A. 基于GTID的复制

  • 全局事务标识符(GTID)让每个事务都有唯一标识
  • 故障切换时自动定位复制位点,无需手动指定binlog文件和位置
  • 支持自动故障转移,大幅简化运维复杂度

B. 增强的半同步复制

  • 主库提交事务前必须收到至少一个从库的确认
  • 配合rpl_semi_sync_source_wait_for_replica_count参数可要求多个从库确认
  • 有效防止主库故障时的数据丢失

C. 并行复制优化

  • 基于WriteSet的并行复制,大幅提升从库回放效率
  • replica_parallel_type=LOGICAL_CLOCK配合binlog_transaction_dependency_tracking=WRITESET
  • 实测可将复制延迟从分钟级降到秒级

D. 克隆插件支持

  • MySQL 8.0.17+引入Clone Plugin
  • 从库初始化无需mysqldump或xtrabackup
  • 自动完成数据同步和复制配置

3. 适用场景

场景说明推荐配置
读写分离主库写、从库读,分担主库压力异步复制 + 多从库
数据容灾主库故障时快速切换到从库半同步复制 + GTID
数据备份在从库执行备份,不影响主库性能异步复制 + 延迟从库
数据分析报表查询在从库执行异步复制 + 专用从库
跨地域部署多数据中心数据同步异步复制 + 压缩传输

4. 环境要求

组件版本要求说明
MySQL Server8.0.35+ 或 8.4 LTS推荐使用8.4 LTS长期支持版本
操作系统Rocky 9 / Ubuntu 24.04内核版本5.14+
内存最低4GB,推荐16GB+InnoDB Buffer Pool需要足够内存
磁盘SSD,IOPS 3000+binlog和数据文件分盘存储
网络主从之间延迟<1ms同机房部署最佳

本文实验环境:

  • 主库:192.168.1.100(master.example.com)
  • 从库:192.168.1.101(slave.example.com)
  • 系统:Rocky Linux 9.3
  • MySQL:8.4.0 LTS

二、准备工作

1. 系统环境配置

在主库和从库服务器上执行相同的系统配置:

# 关闭SELinux(或配置相应策略)
sudo setenforce 0
sudo sed -i 's/SELINUX=enforcing/SELINUX=permissive/' /etc/selinux/config

# 配置防火墙
sudo firewall-cmd --permanent --add-port=3306/tcp
sudo firewall-cmd --reload

# 配置时间同步(主从时间必须一致)
sudo dnf install -y chrony
sudo systemctl enable --now chronyd

# 验证时间同步状态
chronyc tracking

# 配置主机名解析
echo "192.168.1.100 master.example.com master" | sudo tee -a /etc/hosts
echo "192.168.1.101 slave.example.com slave" | sudo tee -a /etc/hosts

# 优化系统参数
cat << 'EOF' | sudo tee /etc/sysctl.d/mysql.conf
# 网络优化
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_keepalive_time = 600

# 内存优化
vm.swappiness = 1
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5

# 文件描述符
fs.file-max = 6815744
fs.aio-max-nr = 1048576
EOF

sudo sysctl -p /etc/sysctl.d/mysql.conf

# 配置用户资源限制
cat << 'EOF' | sudo tee /etc/security/limits.d/mysql.conf
mysql soft nofile 65535
mysql hard nofile 65535
mysql soft nproc 65535
mysql hard nproc 65535
EOF

2. 安装MySQL 8.4 LTS

# 添加MySQL官方仓库
sudo dnf install -y https://dev.mysql.com/get/mysql84-community-release-el9-1.noarch.rpm

# 安装MySQL Server
sudo dnf install -y mysql-community-server mysql-community-client

# 查看安装版本
mysql --version
# mysql  Ver 8.4.0 for Linux on x86_64 (MySQL Community Server - GPL)

# 创建数据目录(建议数据和日志分盘)
sudo mkdir -p /data/mysql/{data,logs,binlog,relaylog,tmp}
sudo chown -R mysql:mysql /data/mysql
sudo chmod 750 /data/mysql

3. 生成server-id

MySQL主从复制要求每个节点有唯一的server-id。推荐使用IP地址最后两段作为server-id:

# 主库(192.168.1.100)
# server-id = 1100

# 从库(192.168.1.101)
# server-id = 1101

三、核心配置

1. 主库配置

创建主库配置文件 /etc/my.cnf:

[mysqld]
# 基础配置
user = mysql
port = 3306
basedir = /usr
datadir = /data/mysql/data
tmpdir = /data/mysql/tmp
socket = /var/lib/mysql/mysql.sock
pid-file = /var/run/mysqld/mysqld.pid

# 字符集配置
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init_connect = 'SET NAMES utf8mb4'

# 主从复制核心配置
server-id = 1100
log-bin = /data/mysql/binlog/mysql-bin
binlog_format = ROW
binlog_row_image = FULL
max_binlog_size = 1G
binlog_expire_logs_seconds = 604800
sync_binlog = 1

# GTID配置(强烈推荐)
gtid_mode = ON
enforce_gtid_consistency = ON
binlog_gtid_simple_recovery = ON

# 半同步复制配置
# MySQL 8.0.26+使用新参数名
rpl_semi_sync_source_enabled = ON
rpl_semi_sync_source_timeout = 10000
rpl_semi_sync_source_wait_for_replica_count = 1
rpl_semi_sync_source_wait_point = AFTER_SYNC

# InnoDB配置
innodb_buffer_pool_size = 8G
innodb_buffer_pool_instances = 8
innodb_log_file_size = 2G
innodb_log_buffer_size = 64M
innodb_flush_log_at_trx_commit = 1
innodb_flush_method = O_DIRECT
innodb_file_per_table = ON
innodb_io_capacity = 4000
innodb_io_capacity_max = 8000
innodb_read_io_threads = 8
innodb_write_io_threads = 8

# 连接配置
max_connections = 2000
max_connect_errors = 100000
wait_timeout = 600
interactive_timeout = 600

# 日志配置
log_error = /data/mysql/logs/error.log
slow_query_log = ON
slow_query_log_file = /data/mysql/logs/slow.log
long_query_time = 1
log_queries_not_using_indexes = ON

# 性能优化
table_open_cache = 4000
table_definition_cache = 2000
thread_cache_size = 128
sort_buffer_size = 4M
join_buffer_size = 4M
read_buffer_size = 2M
read_rnd_buffer_size = 8M

# 安全配置
local_infile = OFF
skip_symbolic_links = ON
sql_mode = STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION

[client]
socket = /var/lib/mysql/mysql.sock
default-character-set = utf8mb4

[mysql]
socket = /var/lib/mysql/mysql.sock
default-character-set = utf8mb4
prompt = "\\u@\\h [\\d]> "

2. 从库配置

创建从库配置文件 /etc/my.cnf:

[mysqld]
# 基础配置
user = mysql
port = 3306
basedir = /usr
datadir = /data/mysql/data
tmpdir = /data/mysql/tmp
socket = /var/lib/mysql/mysql.sock
pid-file = /var/run/mysqld/mysqld.pid

# 字符集配置
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
init_connect = 'SET NAMES utf8mb4'

# 主从复制核心配置
server-id = 1101
log-bin = /data/mysql/binlog/mysql-bin
binlog_format = ROW
binlog_row_image = FULL
max_binlog_size = 1G
binlog_expire_logs_seconds = 604800
sync_binlog = 1

# 从库开启binlog(用于级联复制或切换为主库)
log_replica_updates = ON

# 中继日志配置
relay_log = /data/mysql/relaylog/relay-bin
relay_log_recovery = ON
relay_log_info_repository = TABLE
relay_log_purge = ON

# GTID配置
gtid_mode = ON
enforce_gtid_consistency = ON
binlog_gtid_simple_recovery = ON

# 半同步复制配置(从库端)
rpl_semi_sync_replica_enabled = ON

# 并行复制配置(大幅降低复制延迟)
replica_parallel_type = LOGICAL_CLOCK
replica_parallel_workers = 16
replica_preserve_commit_order = ON
binlog_transaction_dependency_tracking = WRITESET
transaction_write_set_extraction = XXHASH64

# 复制过滤(按需配置)
# replicate_do_db = app_db
# replicate_ignore_db = test
# replicate_wild_do_table = app_%.%
# replicate_wild_ignore_table = %.tmp_%

# 从库只读配置
read_only = ON
super_read_only = ON

# InnoDB配置
innodb_buffer_pool_size = 8G
innodb_buffer_pool_instances = 8
innodb_log_file_size = 2G
innodb_log_buffer_size = 64M
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_file_per_table = ON
innodb_io_capacity = 4000
innodb_io_capacity_max = 8000
innodb_read_io_threads = 8
innodb_write_io_threads = 8

# 连接配置
max_connections = 2000
max_connect_errors = 100000
wait_timeout = 600
interactive_timeout = 600

# 日志配置
log_error = /data/mysql/logs/error.log
slow_query_log = ON
slow_query_log_file = /data/mysql/logs/slow.log
long_query_time = 1
log_queries_not_using_indexes = ON

# 性能优化
table_open_cache = 4000
table_definition_cache = 2000
thread_cache_size = 128
sort_buffer_size = 4M
join_buffer_size = 4M
read_buffer_size = 2M
read_rnd_buffer_size = 8M

# 安全配置
local_infile = OFF
skip_symbolic_links = ON
sql_mode = STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_ENGINE_SUBSTITUTION

[client]
socket = /var/lib/mysql/mysql.sock
default-character-set = utf8mb4

[mysql]
socket = /var/lib/mysql/mysql.sock
default-character-set = utf8mb4
prompt = "\\u@\\h [\\d]> "

四、启动和验证

1. 初始化并启动主库

# 初始化数据目录
sudo mysqld --initialize --user=mysql --datadir=/data/mysql/data

# 获取临时密码
sudo grep 'temporary password' /data/mysql/logs/error.log
# 2024-01-15T10:30:45.123456Z 6 [Note] [MY-010454] [Server] A temporary password is generated for root@localhost: Abc123!@#xyz

# 启动MySQL服务
sudo systemctl start mysqld
sudo systemctl enable mysqld

# 修改root密码并进行安全配置
mysql -uroot -p'Abc123!@#xyz' --connect-expired-password << 'EOF'
ALTER USER 'root'@'localhost' IDENTIFIED BY 'YourStrongPassword@2024';
DELETE FROM mysql.user WHERE User='';
DROP DATABASE IF EXISTS test;
DELETE FROM mysql.db WHERE Db='test' OR Db='test\_%';
FLUSH PRIVILEGES;
EOF

# 安装半同步复制插件
mysql -uroot -p'YourStrongPassword@2024' << 'EOF'
INSTALL PLUGIN rpl_semi_sync_source SONAME 'semisync_source.so';
SET GLOBAL rpl_semi_sync_source_enabled = ON;
SHOW PLUGINS WHERE Name LIKE '%semi%';
EOF

2. 创建复制用户

在主库上创建专用的复制用户:

-- 登录主库
mysql -uroot -p'YourStrongPassword@2024'

-- 创建复制用户
CREATE USER 'repl_user'@'192.168.1.%' IDENTIFIED WITH caching_sha2_password BY 'ReplPassword@2024';

-- 授予复制权限
GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'repl_user'@'192.168.1.%';

-- 如果使用Clone插件初始化从库,还需要额外权限
GRANT BACKUP_ADMIN ON *.* TO 'repl_user'@'192.168.1.%';

FLUSH PRIVILEGES;

-- 查看主库状态
SHOW MASTER STATUS\G
/*
*************************** 1. row ***************************
             File: mysql-bin.000003
         Position: 856
     Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: 3e11fa47-c5ab-11ee-9c88-000c29123456:1-15
*/

-- 查看GTID执行状态
SELECT @@GLOBAL.GTID_EXECUTED;

3. 初始化并启动从库

A. 方法一:使用Clone插件(推荐,MySQL 8.0.17+)

# 从库执行初始化(空数据目录)
sudo mysqld --initialize --user=mysql --datadir=/data/mysql/data

# 获取临时密码并启动
sudo grep 'temporary password' /data/mysql/logs/error.log
sudo systemctl start mysqld

# 修改root密码
mysql -uroot -p'临时密码' --connect-expired-password << 'EOF'
ALTER USER 'root'@'localhost' IDENTIFIED BY 'YourStrongPassword@2024';
EOF
-- 登录从库
mysql -uroot -p'YourStrongPassword@2024'

-- 安装Clone插件
INSTALL PLUGIN clone SONAME 'mysql_clone.so';

-- 配置克隆源(主库地址)
SET GLOBAL clone_valid_donor_list = '192.168.1.100:3306';

-- 执行克隆操作(会覆盖从库所有数据并重启)
CLONE INSTANCE FROM 'repl_user'@'192.168.1.100':3306 IDENTIFIED BY 'ReplPassword@2024';

-- 克隆完成后MySQL会自动重启,重新登录
mysql -uroot -p'YourStrongPassword@2024'

-- 安装半同步复制插件
INSTALL PLUGIN rpl_semi_sync_replica SONAME 'semisync_replica.so';
SET GLOBAL rpl_semi_sync_replica_enabled = ON;

B. 方法二:使用mysqldump(适合小型数据库)

# 在主库执行导出
mysqldump -uroot -p'YourStrongPassword@2024' \
  --all-databases \
  --single-transaction \
  --routines \
  --triggers \
  --events \
  --set-gtid-purged=ON \
  --source-data=2 \
  > /tmp/full_backup.sql

# 传输到从库
scp /tmp/full_backup.sql slave:/tmp/

# 在从库导入
mysql -uroot -p'YourStrongPassword@2024' < /tmp/full_backup.sql

4. 配置从库复制

-- 登录从库
mysql -uroot -p'YourStrongPassword@2024'

-- 配置复制(使用GTID模式)
CHANGE REPLICATION SOURCE TO
  SOURCE_HOST = '192.168.1.100',
  SOURCE_PORT = 3306,
  SOURCE_USER = 'repl_user',
  SOURCE_PASSWORD = 'ReplPassword@2024',
  SOURCE_AUTO_POSITION = 1,
  SOURCE_CONNECT_RETRY = 10,
  SOURCE_RETRY_COUNT = 3,
  GET_SOURCE_PUBLIC_KEY = 1;

-- 启动复制
START REPLICA;

-- 查看复制状态
SHOW REPLICA STATUS\G

5. 验证复制状态

-- 在从库执行,检查关键指标
SHOW REPLICA STATUS\G
/*
关键字段说明:
Replica_IO_Running: Yes          -- I/O线程运行状态
Replica_SQL_Running: Yes         -- SQL线程运行状态
Seconds_Behind_Source: 0         -- 复制延迟(秒)
Retrieved_Gtid_Set: ...          -- 已接收的GTID集合
Executed_Gtid_Set: ...           -- 已执行的GTID集合
Auto_Position: 1                 -- GTID自动定位
Source_SSL_Allowed: Yes          -- SSL状态
*/

-- 检查半同步状态(主库)
SHOW STATUS LIKE 'Rpl_semi_sync%';
/*
Rpl_semi_sync_source_status     | ON
Rpl_semi_sync_source_clients    | 1
Rpl_semi_sync_source_yes_tx     | 156
Rpl_semi_sync_source_no_tx      | 0
*/

-- 检查半同步状态(从库)
SHOW STATUS LIKE 'Rpl_semi_sync%';
/*
Rpl_semi_sync_replica_status    | ON
*/

6. 测试数据同步

-- 主库创建测试数据
CREATE DATABASE test_repl;
USE test_repl;

CREATE TABLE users (
    id INT AUTO_INCREMENT PRIMARY KEY,
    username VARCHAR(50) NOT NULL,
    email VARCHAR(100),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB;

INSERT INTO users (username, email) VALUES
    ('user1', 'user1@example.com'),
    ('user2', 'user2@example.com'),
    ('user3', 'user3@example.com');

-- 从库验证数据
USE test_repl;
SELECT * FROM users;
/*
+----+----------+-------------------+---------------------+
| id | username | email             | created_at          |
+----+----------+-------------------+---------------------+
|  1 | user1    | user1@example.com | 2024-01-15 10:45:30 |
|  2 | user2    | user2@example.com | 2024-01-15 10:45:30 |
|  3 | user3    | user2@example.com | 2024-01-15 10:45:30 |
+----+----------+-------------------+---------------------+
*/

五、架构模式

1. 一主多从架构

生产环境中常见的一主多从架构,通常配置2-3个从库分担读压力:

graph TD
    M[Master<br/>192.168.1.100] --> S1[Slave1<br/>192.168.1.101<br/>读请求]
    M --> S2[Slave2<br/>192.168.1.102<br/>读请求]
    M --> S3[Slave3<br/>192.168.1.103<br/>备份专用]

mermaid

一主多从架构

从库1和从库2的配置基本相同,从库3作为备份专用,可配置为延迟从库:

# 从库3配置(延迟复制,用于误操作恢复)
[mysqld]
# ...其他配置同上...

# 延迟复制配置:延迟1小时
# 在CHANGE REPLICATION SOURCE时指定
# SOURCE_DELAY = 3600

配置延迟从库:

CHANGE REPLICATION SOURCE TO
  SOURCE_HOST = '192.168.1.100',
  SOURCE_PORT = 3306,
  SOURCE_USER = 'repl_user',
  SOURCE_PASSWORD = 'ReplPassword@2024',
  SOURCE_AUTO_POSITION = 1,
  SOURCE_DELAY = 3600;  -- 延迟1小时

START REPLICA;

2. 级联复制架构

当从库数量较多时,可采用级联复制减轻主库压力:

graph TD
    M[Master<br/>192.168.1.100] --> RS[Relay Slave<br/>192.168.1.101]
    RS --> S2[Slave2<br/>192.168.1.102]
    RS --> S3[Slave3<br/>192.168.1.103]

mermaid

级联复制架构

中继从库配置要点:

# 中继从库关键配置
[mysqld]
server-id = 1101
log-bin = /data/mysql/binlog/mysql-bin

# 必须开启此选项,将接收到的事务写入自己的binlog
log_replica_updates = ON

# 其他配置同普通从库

六、实际应用案例

1. 电商平台读写分离

某电商平台日活百万,订单系统数据库面临高并发压力。

A. 需求分析

  • 写入QPS:约500
  • 读取QPS:约5000
  • 数据量:200GB
  • 可用性要求:99.99%

B. 架构设计

sequenceDiagram
    participant App as 应用服务器
    participant M as 主库<br/>写操作
    participant S1 as 从库1<br/>读操作
    participant S2 as 从库2<br/>读操作

    App->>M: 写入订单
    M-->>App: 写入成功

    App->>S1: 查询订单
    S1-->>App: 返回数据

    App->>S2: 统计报表
    S2-->>App: 返回结果

mermaid

读写分离流程

C. 应用层配置示例(Python + SQLAlchemy)

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
import random

class MySQLRouter:
    """MySQL主从路由器"""

    def __init__(self):
        # 主库连接(写操作)
        self.master = create_engine(
            'mysql+pymysql://app_user:password@192.168.1.100:3306/order_db',
            pool_size=20,
            max_overflow=10,
            pool_pre_ping=True,
            pool_recycle=3600
        )

        # 从库连接池(读操作)
        self.slaves = [
            create_engine(
                'mysql+pymysql://app_user:password@192.168.1.101:3306/order_db',
                pool_size=30,
                max_overflow=20,
                pool_pre_ping=True,
                pool_recycle=3600
            ),
            create_engine(
                'mysql+pymysql://app_user:password@192.168.1.102:3306/order_db',
                pool_size=30,
                max_overflow=20,
                pool_pre_ping=True,
                pool_recycle=3600
            )
        ]

    def get_read_connection(self):
        """获取读连接(负载均衡)"""
        return random.choice(self.slaves)

    def get_write_connection(self):
        """获取写连接"""
        return self.master

# 使用示例
router = MySQLRouter()

# 写操作
with router.get_write_connection().connect() as conn:
    conn.execute(text("INSERT INTO orders (user_id, amount) VALUES (:uid, :amt)"),
                 {"uid": 1001, "amt": 99.99})
    conn.commit()

# 读操作
with router.get_read_connection().connect() as conn:
    result = conn.execute(text("SELECT * FROM orders WHERE user_id = :uid"),
                          {"uid": 1001})
    orders = result.fetchall()

2. 跨机房容灾部署

某金融系统要求同城双活,异地灾备:

graph LR
    subgraph 北京机房
        M[Master<br/>192.168.1.100] --> SS[Sync Slave<br/>半同步复制]
    end

    subgraph 上海机房
        M -->|异步复制<br/>30ms| DR[DR Slave<br/>192.168.2.100]
    end

mermaid

跨机房容灾架构

异地从库特殊配置:

# 异地从库配置
[mysqld]
server-id = 2100

# 压缩传输减少带宽消耗
replica_compressed_protocol = ON

# 适当增大超时时间(跨机房网络延迟)
replica_net_timeout = 120

# 增大中继日志空间
relay_log_space_limit = 20G

# 允许较大的复制延迟
# 通过监控而非配置限制

七、最佳实践

1. 性能优化

A. 并行复制调优

MySQL 8.0的并行复制能显著降低复制延迟,关键参数调优:

-- 查看当前并行复制配置
SHOW VARIABLES LIKE '%replica_parallel%';
SHOW VARIABLES LIKE '%binlog_transaction_dependency%';

-- 推荐配置(从库执行)
SET GLOBAL replica_parallel_type = 'LOGICAL_CLOCK';
SET GLOBAL replica_parallel_workers = 16;  -- 通常设为CPU核数
SET GLOBAL replica_preserve_commit_order = ON;
SET GLOBAL binlog_transaction_dependency_tracking = 'WRITESET';

-- 监控并行复制效果
SELECT * FROM performance_schema.replication_applier_status_by_worker;

B. 网络传输优化

# 主库配置
[mysqld]
# 增大binlog缓存
binlog_cache_size = 4M
binlog_stmt_cache_size = 4M

# 批量发送优化
replica_net_timeout = 60
# 从库配置(跨机房时启用)
[mysqld]
# 启用压缩传输
replica_compressed_protocol = ON

2. 安全加固

A. 网络安全

-- 复制用户仅允许特定IP段访问
CREATE USER 'repl_user'@'192.168.1.0/255.255.255.0'
  IDENTIFIED WITH caching_sha2_password BY 'StrongPassword@2024'
  REQUIRE SSL;  -- 强制SSL连接

GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'192.168.1.0/255.255.255.0';

B. SSL加密复制

# 生成SSL证书
mkdir -p /etc/mysql/ssl
cd /etc/mysql/ssl

# CA证书
openssl genrsa 2048 > ca-key.pem
openssl req -new -x509 -nodes -days 3650 -key ca-key.pem -out ca-cert.pem \
  -subj "/C=CN/ST=Beijing/L=Beijing/O=Example/CN=MySQL-CA"

# 主库证书
openssl req -newkey rsa:2048 -nodes -keyout master-key.pem -out master-req.pem \
  -subj "/C=CN/ST=Beijing/L=Beijing/O=Example/CN=master.example.com"
openssl x509 -req -in master-req.pem -days 3650 -CA ca-cert.pem -CAkey ca-key.pem \
  -set_serial 01 -out master-cert.pem

# 从库证书
openssl req -newkey rsa:2048 -nodes -keyout slave-key.pem -out slave-req.pem \
  -subj "/C=CN/ST=Beijing/L=Beijing/O=Example/CN=slave.example.com"
openssl x509 -req -in slave-req.pem -days 3650 -CA ca-cert.pem -CAkey ca-key.pem \
  -set_serial 02 -out slave-cert.pem

# 设置权限
chown mysql:mysql *.pem
chmod 600 *-key.pem
chmod 644 *-cert.pem ca-cert.pem
# 主库SSL配置
[mysqld]
ssl-ca = /etc/mysql/ssl/ca-cert.pem
ssl-cert = /etc/mysql/ssl/master-cert.pem
ssl-key = /etc/mysql/ssl/master-key.pem
require_secure_transport = ON
-- 从库配置SSL复制
CHANGE REPLICATION SOURCE TO
  SOURCE_HOST = '192.168.1.100',
  SOURCE_USER = 'repl_user',
  SOURCE_PASSWORD = 'StrongPassword@2024',
  SOURCE_AUTO_POSITION = 1,
  SOURCE_SSL = 1,
  SOURCE_SSL_CA = '/etc/mysql/ssl/ca-cert.pem',
  SOURCE_SSL_CERT = '/etc/mysql/ssl/slave-cert.pem',
  SOURCE_SSL_KEY = '/etc/mysql/ssl/slave-key.pem',
  SOURCE_SSL_VERIFY_SERVER_CERT = 1;

3. 配置注意事项

  1. server-id必须唯一:整个复制拓扑中,每个MySQL实例的server-id必须不同
  2. GTID模式限制:

    • 不能使用CREATE TABLE ... SELECT语句
    • 不能在事务中混合使用事务表和非事务表
    • 不能使用CREATE TEMPORARY TABLE(事务内)
  3. binlog_format选择:

    • ROW:记录每行数据变化,数据一致性最好,推荐使用
    • STATEMENT:记录SQL语句,binlog体积小,但某些函数可能导致不一致
    • MIXED:自动选择,不推荐

4. 常见错误处理

错误码错误描述原因分析解决方案
1062Duplicate entry for key从库已存在相同主键数据使用pt-table-sync同步数据
1032Can't find record从库缺少要更新/删除的数据检查数据一致性,使用pt-table-checksum校验
1236Binary log position errorbinlog位置无效或已被清理重新初始化从库或使用GTID重建复制
1045Access denied复制用户权限不足检查用户权限和密码
1593Server_id not setserver-id未配置配置唯一的server-id
1794Slave SQL thread retried复制冲突重试检查并发写入或网络问题
13117GTID_PURGED can't be setGTID集合冲突使用RESET MASTER清理后重新配置

八、故障排查和监控

1. 关键监控指标

指标名称说明告警阈值采集方法
Seconds_Behind_Source复制延迟秒数>10s告警,>60s严重SHOW REPLICA STATUS
Replica_IO_RunningI/O线程状态!=Yes告警SHOW REPLICA STATUS
Replica_SQL_RunningSQL线程状态!=Yes告警SHOW REPLICA STATUS
Rpl_semi_sync_source_clients半同步从库数<1告警SHOW STATUS
Rpl_semi_sync_source_no_tx降级异步次数>0告警SHOW STATUS
Relay_Log_Space中继日志大小>10GB告警SHOW REPLICA STATUS

2. 常见问题诊断

A. 复制延迟过大

-- 查看复制延迟
SHOW REPLICA STATUS\G

-- 分析延迟原因
-- 1. 检查从库负载
SHOW PROCESSLIST;
SELECT * FROM performance_schema.threads WHERE TYPE = 'BACKGROUND' AND NAME LIKE '%replica%';

-- 2. 检查是否有大事务
SELECT * FROM performance_schema.events_transactions_current WHERE STATE = 'ACTIVE'\G

-- 3. 检查并行复制效率
SELECT WORKER_ID, LAST_SEEN_TRANSACTION, APPLYING_TRANSACTION, LAST_APPLIED_TRANSACTION
FROM performance_schema.replication_applier_status_by_worker;

-- 4. 检查是否有锁等待
SELECT * FROM performance_schema.data_lock_waits;

B. I/O线程停止

-- 查看I/O线程状态
SHOW REPLICA STATUS\G
-- 关注:Replica_IO_Running, Last_IO_Error, Last_IO_Errno

-- 常见原因:
-- 1. 网络连接问题
-- 2. 复制用户权限问题
-- 3. 主库binlog被清理
-- 4. 主库server-id变更

-- 重新连接测试
STOP REPLICA IO_THREAD;
START REPLICA IO_THREAD;

C. SQL线程停止

-- 查看SQL线程状态
SHOW REPLICA STATUS\G
-- 关注:Replica_SQL_Running, Last_SQL_Error, Last_SQL_Errno

-- 查看最后执行的事务
SELECT * FROM performance_schema.replication_applier_status_by_coordinator\G
SELECT * FROM performance_schema.replication_applier_status_by_worker
WHERE LAST_ERROR_NUMBER != 0\G

-- 数据不一致时的处理
-- 1. 使用pt-table-checksum校验
pt-table-checksum --user=root --password=xxx --host=192.168.1.100 \
  --databases=mydb --tables=mytable

-- 2. 使用pt-table-sync修复
pt-table-sync --print --user=root --password=xxx \
  --sync-to-master h=192.168.1.101,D=mydb,t=mytable

3. 备份与恢复

A. 从库备份策略

利用从库进行备份,不影响主库性能:

#!/bin/bash
# 文件:/opt/scripts/backup_from_slave.sh
# 功能:从库热备份脚本

BACKUP_DIR="/backup/mysql"
DATE=$(date +%Y%m%d_%H%M%S)
MYSQL_USER="backup"
MYSQL_PASS="BackupPass@2024"
RETENTION_DAYS=7

# 创建备份目录
mkdir -p ${BACKUP_DIR}/${DATE}

# 停止复制SQL线程,确保数据一致性
mysql -u${MYSQL_USER} -p${MYSQL_PASS} -e "STOP REPLICA SQL_THREAD;"

# 记录复制位置
mysql -u${MYSQL_USER} -p${MYSQL_PASS} -e "SHOW REPLICA STATUS\G" > ${BACKUP_DIR}/${DATE}/replica_status.txt

# 执行备份
xtrabackup --backup \
  --user=${MYSQL_USER} \
  --password=${MYSQL_PASS} \
  --target-dir=${BACKUP_DIR}/${DATE}/data \
  --slave-info \
  --safe-slave-backup

# 恢复复制
mysql -u${MYSQL_USER} -p${MYSQL_PASS} -e "START REPLICA SQL_THREAD;"

# 压缩备份
cd ${BACKUP_DIR}
tar -czf ${DATE}.tar.gz ${DATE}
rm -rf ${DATE}

# 清理过期备份
find ${BACKUP_DIR} -name "*.tar.gz" -mtime +${RETENTION_DAYS} -delete

echo "Backup completed: ${BACKUP_DIR}/${DATE}.tar.gz"

B. 使用备份重建从库

# 解压备份
cd /backup/mysql
tar -xzf 20240115_100000.tar.gz

# 准备备份
xtrabackup --prepare --target-dir=/backup/mysql/20240115_100000/data

# 停止MySQL
systemctl stop mysqld

# 清空数据目录
rm -rf /data/mysql/data/*

# 恢复数据
xtrabackup --copy-back --target-dir=/backup/mysql/20240115_100000/data

# 修改权限
chown -R mysql:mysql /data/mysql/data

# 启动MySQL
systemctl start mysqld

# 配置复制(从xtrabackup_slave_info获取位置信息)
cat /backup/mysql/20240115_100000/data/xtrabackup_slave_info
# CHANGE REPLICATION SOURCE TO SOURCE_AUTO_POSITION = 1

mysql -uroot -p << 'EOF'
CHANGE REPLICATION SOURCE TO
  SOURCE_HOST = '192.168.1.100',
  SOURCE_USER = 'repl_user',
  SOURCE_PASSWORD = 'ReplPassword@2024',
  SOURCE_AUTO_POSITION = 1;

START REPLICA;
EOF

九、总结

1. 技术要点回顾

  1. GTID是现代MySQL复制的标配:简化故障切换,避免手动指定binlog位置
  2. 半同步复制保障数据安全:rpl_semi_sync_source_wait_point=AFTER_SYNC配合超时设置,在性能和安全间取得平衡
  3. 并行复制大幅降低延迟:WRITESET依赖追踪配合多个worker线程,将复制延迟从分钟级降到秒级
  4. Clone插件简化从库初始化:无需依赖外部工具,自动完成数据同步和复制配置
  5. 监控告警是高可用基础:复制延迟、线程状态、半同步状态必须纳入监控体系

2. 进阶学习方向

  • MySQL Group Replication:多主架构,自动故障转移
  • MySQL InnoDB Cluster:官方高可用解决方案
  • MySQL Router:自动读写分离和故障切换
  • Orchestrator:复制拓扑管理和自动化故障恢复
  • ProxySQL:高性能MySQL代理

3. 命令速查表

命令说明
SHOW MASTER STATUS查看主库binlog位置
SHOW REPLICA STATUS\G查看从库复制状态
START REPLICA启动复制
STOP REPLICA停止复制
RESET REPLICA ALL清除所有复制配置
SHOW BINARY LOGS查看binlog文件列表
PURGE BINARY LOGS TO 'file'清理指定文件之前的binlog
SELECT @@GLOBAL.GTID_EXECUTED查看已执行的GTID集合
SHOW PROCESSLIST查看当前连接和复制线程
SHOW VARIABLES LIKE '%replica%'查看复制相关配置

4. 术语表

术语英文说明
主库Master/Source提供写入的数据库节点
从库Slave/Replica复制主库数据的数据库节点
二进制日志Binary Log记录数据变更的日志文件
中继日志Relay Log从库缓存主库binlog的日志文件
GTIDGlobal Transaction ID全局事务标识符
半同步复制Semi-sync Replication主库等待从库确认的复制模式
异步复制Async Replication主库不等待从库确认的复制模式
复制延迟Replication Lag从库数据落后主库的时间
I/O线程I/O Thread从库接收binlog的线程
SQL线程SQL Thread从库回放事务的线程

参考资料

  1. MySQL主从复制搭建实战:数据同步零丢失配置指南
最后修改:2026 年 01 月 15 日
如果觉得我的文章对你有用,请随意赞赏