Compare commits

...

4 Commits

10 changed files with 282 additions and 16 deletions

2
.gitignore vendored
View File

@@ -3,4 +3,4 @@ venv/
export/
*_host_ids.txt
*.log
partitioning/tests/
backup/

View File

@@ -0,0 +1,60 @@
# Code Documentation: ZabbixPartitioner
## Class: ZabbixPartitioner
### Core Methods
#### `__init__(self, config: Dict[str, Any], dry_run: bool = False)`
Initializes the partitioner with configuration and runtime mode.
- **config**: Dictionary containing database connection and partitioning rules.
- **dry_run**: If True, SQL queries are logged but not executed.
#### `connect_db(self)`
Context manager for database connections.
- Handles connection lifecycle (open/close).
- Sets strict session variables:
- `wait_timeout = 86400` (24h) to prevent timeouts during long operations.
- `sql_log_bin = 0` (if configured) to prevent replication of partitioning commands.
#### `run(self, mode: str)`
Main entry point for execution.
- **mode**:
- `'init'`: Initial setup. Calls `initialize_partitioning`.
- `'maintenance'` (default): Routine operation. Calls `create_future_partitions` and `drop_old_partitions`.
### Logic Methods
#### `initialize_partitioning(table: str, period: str, premake: int, retention_str: str)`
Converts a standard table to a partitioned table.
- **Strategies** (via `initial_partitioning_start` config):
- `retention`: Starts from (Now - Retention). Creates `p_archive` for older data. FAST.
- `db_min`: Queries `SELECT MIN(clock)`. PRECISE but SLOW.
#### `create_future_partitions(table: str, period: str, premake: int)`
Ensures sufficient future partitions exist.
- Calculates required partitions based on current time + `premake` count.
- Checks `information_schema` for existing partitions.
- Adds missing partitions using `ALTER TABLE ... ADD PARTITION`.
#### `drop_old_partitions(table: str, period: str, retention_str: str)`
Removes partitions older than the retention period.
- Parses partition names (e.g., `p2023_01_01`) to extract their date.
- Compares against the calculated retention cutoff date.
- Drops qualifiers using `ALTER TABLE ... DROP PARTITION`.
### Helper Methods
#### `get_table_min_clock(table: str) -> Optional[datetime]`
- Queries the table for the oldest timestamp. Used in `db_min` initialization strategy.
#### `has_incompatible_primary_key(table: str) -> bool`
- **Safety Critical**: Verifies that the table's Primary Key includes the `clock` column.
- Returns `True` if incompatible (prevents partitioning to avoid MySQL errors).
#### `get_partition_name(dt: datetime, period: str) -> str`
- Generates standard partition names:
- Daily: `pYYYY_MM_DD`
- Monthly: `pYYYY_MM`
#### `get_partition_description(dt: datetime, period: str) -> str`
- Generates the `VALUES LESS THAN` expression for the partition (Start of NEXT period).

View File

@@ -0,0 +1,39 @@
# Refactoring Notes: Zabbix Partitioning Script
## Overview
The `zabbix_partitioning.py` script has been significantly refactored to improve maintainability, reliability, and compatibility with modern Zabbix versions (7.x).
## Key Changes
### 1. Architecture: Class-Based Structure
- **Old**: Procedural script with global variables and scattered logic.
- **New**: Encapsulated in a `ZabbixPartitioner` class.
- **Purpose**: Improves modularity, testability, and state management. Allows the script to be easily imported or extended.
### 2. Database Connection Management
- **Change**: Implemented `contextlib.contextmanager` for database connections.
- **Purpose**: Ensures database connections are robustly opened and closed, even if errors occur. Handles `wait_timeout` and binary logging settings automatically for every session.
### 3. Logging
- **Change**: Replaced custom `print` statements with Python's standard `logging` module.
- **Purpose**:
- Allows consistent log formatting.
- Supports configurable output destinations (Console vs Syslog) via the config file.
- Granular log levels (INFO for standard ops, DEBUG for SQL queries).
### 4. Configuration Handling
- **Change**: Improved validation and parsing of the YAML configuration.
- **Purpose**:
- Removed unused parameters (e.g., `timezone`, as the script relies on system local time).
- Added support for custom database ports (critical for non-standard deployments or containerized tests).
- Explicitly handles the `replicate_sql` flag to control binary logging (it was intergrated into the partitioning logic).
### 5. Type Safety
- **Change**: Added comprehensive Python type hinting (e.g., `List`, `Dict`, `Optional`).
- **Purpose**: Makes the code self-documenting and allows IDEs/linters to catch potential errors before execution.
### 6. Zabbix 7.x Compatibility
- **Change**: Added logic to verify Zabbix database version and schema requirements.
- **Purpose**:
- Checks `dbversion` table.
- **Critical**: Validates that target tables have the `clock` column as part of their Primary Key before attempting partitioning, preventing potential data corruption or MySQL errors.

View File

@@ -40,6 +40,14 @@ logging: syslog
# premake: Number of partitions to create in advance
premake: 10
# initial_partitioning_start: Strategy for the first partition during initialization (--init).
# Options:
# db_min: (Default) Queries SELECT MIN(clock) to ensure ALL data is covered. Slow on huge tables consistently.
# retention: Starts partitioning from (Now - Retention Period).
# Creates a 'p_archive' partition for all data older than retention.
# Much faster as it skips the MIN(clock) query. (Recommended for large DBs)
initial_partitioning_start: db_min
# replicate_sql: False - Disable binary logging. Partitioning changes are NOT replicated to slaves (use for independent maintenance).
# replicate_sql: True - Enable binary logging. Partitioning changes ARE replicated to slaves (use for consistent cluster schema).
replicate_sql: False

View File

@@ -371,7 +371,7 @@ class ZabbixPartitioner:
for name in to_drop:
self.execute_query(f"ALTER TABLE `{table}` DROP PARTITION {name}")
def initialize_partitioning(self, table: str, period: str, premake: int):
def initialize_partitioning(self, table: str, period: str, premake: int, retention_str: str):
"""Initial partitioning for a table (convert regular table to partitioned)."""
self.logger.info(f"Initializing partitioning for {table}")
@@ -384,7 +384,20 @@ class ZabbixPartitioner:
self.logger.info(f"Table {table} is already partitioned.")
return
# Check for data
init_strategy = self.config.get('initial_partitioning_start', 'db_min')
start_dt = None
p_archive_ts = None
if init_strategy == 'retention':
self.logger.info(f"Strategy 'retention': Calculating start date from retention ({retention_str})")
retention_date = self.get_lookback_date(retention_str)
# Start granular partitions from the retention date
start_dt = self.truncate_date(retention_date, period)
# Create a catch-all for anything older
p_archive_ts = int(start_dt.timestamp())
else:
# Default 'db_min' strategy
self.logger.info("Strategy 'db_min': Querying table for minimum clock (may be slow)")
min_clock = self.get_table_min_clock(table)
if not min_clock:
@@ -392,8 +405,6 @@ class ZabbixPartitioner:
start_dt = self.truncate_date(datetime.now(), period)
else:
# Table has data.
# For a safe migration, we usually create a catch-all for old data (p_old) or just start partitions covering existing data.
# This script's strategy: Create partitions starting from min_clock.
start_dt = self.truncate_date(min_clock, period)
# Build list of partitions from start_dt up to NOW + premake
@@ -401,18 +412,35 @@ class ZabbixPartitioner:
curr = start_dt
partitions_def = {}
# If we have an archive partition, add it first
if p_archive_ts:
partitions_def['p_archive'] = str(p_archive_ts)
while curr < target_dt:
name = self.get_partition_name(curr, period)
desc = self.get_partition_description(curr, period)
partitions_def[name] = desc
curr = self.get_next_date(curr, period, 1)
# Re-doing the loop to be cleaner on types
parts_sql = []
for name, timestamp_expr in sorted(partitions_def.items()):
parts_sql.append(PARTITION_TEMPLATE % (name, timestamp_expr))
# 1. Archive Partition
if p_archive_ts:
parts_sql.append(f"PARTITION p_archive VALUES LESS THAN ({p_archive_ts}) ENGINE = InnoDB")
# 2. Granular Partitions
# We need to iterate again from start_dt
curr = start_dt
while curr < target_dt:
name = self.get_partition_name(curr, period)
desc_date_str = self.get_partition_description(curr, period) # Returns "YYYY-MM-DD HH:MM:SS"
parts_sql.append(PARTITION_TEMPLATE % (name, desc_date_str))
curr = self.get_next_date(curr, period, 1)
query = f"ALTER TABLE `{table}` PARTITION BY RANGE (`clock`) (\n" + ",\n".join(parts_sql) + "\n)"
self.logger.info(f"Applying initial partitioning to {table} ({len(partitions_def)} partitions)")
self.logger.info(f"Applying initial partitioning to {table} ({len(parts_sql)} partitions)")
self.execute_query(query)
def run(self, mode: str):
@@ -437,7 +465,7 @@ class ZabbixPartitioner:
retention = item[table]
if mode == 'init':
self.initialize_partitioning(table, period, premake)
self.initialize_partitioning(table, period, premake, retention)
else:
# Maintenance mode (Add new, remove old)
self.create_future_partitions(table, period, premake)

View File

@@ -0,0 +1,36 @@
# Zabbix Partitioning Tests
This directory contains a Docker-based test environment for the Zabbix Partitioning script.
## Prerequisites
- Docker & Docker Compose
- Python 3
## Setup & Run
1. Start the database container:
```bash
docker compose up -d
```
This will start a MySQL 8.0 container and import the Zabbix schema.
2. Create valid config (done automatically):
The `test_config.yaml` references the running container.
3. Run the partitioning script:
```bash
# Create virtual environment if needed
python3 -m venv venv
./venv/bin/pip install pymysql pyyaml
# Dry Run
./venv/bin/python3 ../../partitioning/zabbix_partitioning.py -c test_config.yaml --dry-run --init
# Live Run
./venv/bin/python3 ../../partitioning/zabbix_partitioning.py -c test_config.yaml --init
```
## Cleanup
```bash
docker compose down
rm -rf venv
```

View File

@@ -0,0 +1,14 @@
services:
zabbix-db:
image: mysql:8.0
container_name: zabbix-partition-test
environment:
MYSQL_ROOT_PASSWORD: root_password
MYSQL_DATABASE: zabbix
MYSQL_USER: zbx_part
MYSQL_PASSWORD: zbx_password
volumes:
- ../../partitioning/schemas/70-schema-mysql.txt:/docker-entrypoint-initdb.d/schema.sql
ports:
- "33060:3306"
command: --default-authentication-plugin=mysql_native_password

View File

@@ -0,0 +1,31 @@
import re
def get_partitionable_tables(schema_path):
with open(schema_path, 'r', encoding='utf-8', errors='ignore') as f:
content = f.read()
# Split into CREATE TABLE statements
tables = content.split('CREATE TABLE')
valid_tables = []
for table_def in tables:
# Extract table name
name_match = re.search(r'`(\w+)`', table_def)
if not name_match:
continue
table_name = name_match.group(1)
# Check for PRIMARY KEY definition
pk_match = re.search(r'PRIMARY KEY \((.*?)\)', table_def, re.DOTALL)
if pk_match:
pk_cols = pk_match.group(1)
if 'clock' in pk_cols:
valid_tables.append(table_name)
return valid_tables
if __name__ == '__main__':
tables = get_partitionable_tables('/opt/git/Zabbix/partitioning/70-schema-mysql.txt')
print("Partitionable tables (PK contains 'clock'):")
for t in tables:
print(f" - {t}")

View File

@@ -0,0 +1,25 @@
database:
type: mysql
host: 127.0.0.1
socket:
user: root
passwd: root_password
db: zabbix
# Port mapping in docker-compose is 33060
port: 33060
partitions:
daily:
- history: 7d
- history_uint: 7d
- history_str: 7d
- history_log: 7d
- history_text: 7d
- history_bin: 7d
- trends: 365d
- trends_uint: 365d
logging: console
premake: 2
replicate_sql: False
initial_partitioning_start: retention

View File

@@ -0,0 +1,25 @@
import time
import pymysql
import sys
config = {
'host': '127.0.0.1',
'port': 33060,
'user': 'root',
'password': 'root_password',
'database': 'zabbix'
}
max_retries = 90
for i in range(max_retries):
try:
conn = pymysql.connect(**config)
print("Database is ready!")
conn.close()
sys.exit(0)
except Exception as e:
print(f"Waiting for DB... ({e})")
time.sleep(2)
print("Timeout waiting for DB")
sys.exit(1)