CHANGE: Added auto-tests with Docker.

CHANGE: Added code documentation with explanation of script functions.
CHANGE: Added refactoring notes
2025-12-16 14:11:46 +01:00 · 2025-12-16 14:11:39 +01:00 · 2025-12-16 14:10:34 +01:00 · 2025-12-16 14:10:20 +01:00
10 changed files with 282 additions and 16 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -3,4 +3,4 @@ venv/
 export/
 *_host_ids.txt
 *.log
-partitioning/tests/
+backup/
--- a/partitioning/CODE_DOCUMENTATION.md
+++ b/partitioning/CODE_DOCUMENTATION.md
@@ -0,0 +1,60 @@
+# Code Documentation: ZabbixPartitioner
+
+## Class: ZabbixPartitioner
+
+### Core Methods
+
+#### `__init__(self, config: Dict[str, Any], dry_run: bool = False)`
+Initializes the partitioner with configuration and runtime mode.
+- **config**: Dictionary containing database connection and partitioning rules.
+- **dry_run**: If True, SQL queries are logged but not executed.
+
+#### `connect_db(self)`
+Context manager for database connections.
+- Handles connection lifecycle (open/close).
+- Sets strict session variables:
+  - `wait_timeout = 86400` (24h) to prevent timeouts during long operations.
+  - `sql_log_bin = 0` (if configured) to prevent replication of partitioning commands.
+
+#### `run(self, mode: str)`
+Main entry point for execution.
+- **mode**: 
+  - `'init'`: Initial setup. Calls `initialize_partitioning`.
+  - `'maintenance'` (default): Routine operation. Calls `create_future_partitions` and `drop_old_partitions`.
+
+### Logic Methods
+
+#### `initialize_partitioning(table: str, period: str, premake: int, retention_str: str)`
+Converts a standard table to a partitioned table.
+- **Strategies** (via `initial_partitioning_start` config):
+  - `retention`: Starts from (Now - Retention). Creates `p_archive` for older data. FAST.
+  - `db_min`: Queries `SELECT MIN(clock)`. PRECISE but SLOW.
+
+#### `create_future_partitions(table: str, period: str, premake: int)`
+Ensures sufficient future partitions exist.
+- Calculates required partitions based on current time + `premake` count.
+- Checks `information_schema` for existing partitions.
+- Adds missing partitions using `ALTER TABLE ... ADD PARTITION`.
+
+#### `drop_old_partitions(table: str, period: str, retention_str: str)`
+Removes partitions older than the retention period.
+- Parses partition names (e.g., `p2023_01_01`) to extract their date.
+- Compares against the calculated retention cutoff date.
+- Drops qualifiers using `ALTER TABLE ... DROP PARTITION`.
+
+### Helper Methods
+
+#### `get_table_min_clock(table: str) -> Optional[datetime]`
+- Queries the table for the oldest timestamp. Used in `db_min` initialization strategy.
+
+#### `has_incompatible_primary_key(table: str) -> bool`
+- **Safety Critical**: Verifies that the table's Primary Key includes the `clock` column.
+- Returns `True` if incompatible (prevents partitioning to avoid MySQL errors).
+
+#### `get_partition_name(dt: datetime, period: str) -> str`
+- Generates standard partition names:
+  - Daily: `pYYYY_MM_DD`
+  - Monthly: `pYYYY_MM`
+
+#### `get_partition_description(dt: datetime, period: str) -> str`
+- Generates the `VALUES LESS THAN` expression for the partition (Start of NEXT period).
--- a/partitioning/REFACTORING_NOTES.md
+++ b/partitioning/REFACTORING_NOTES.md
@@ -0,0 +1,39 @@
+# Refactoring Notes: Zabbix Partitioning Script
+
+## Overview
+The `zabbix_partitioning.py` script has been significantly refactored to improve maintainability, reliability, and compatibility with modern Zabbix versions (7.x).
+
+## Key Changes
+
+### 1. Architecture: Class-Based Structure
+- **Old**: Procedural script with global variables and scattered logic.
+- **New**: Encapsulated in a `ZabbixPartitioner` class.
+- **Purpose**: Improves modularity, testability, and state management. Allows the script to be easily imported or extended.
+
+### 2. Database Connection Management
+- **Change**: Implemented `contextlib.contextmanager` for database connections.
+- **Purpose**: Ensures database connections are robustly opened and closed, even if errors occur. Handles `wait_timeout` and binary logging settings automatically for every session.
+
+### 3. Logging
+- **Change**: Replaced custom `print` statements with Python's standard `logging` module.
+- **Purpose**: 
+  - Allows consistent log formatting.
+  - Supports configurable output destinations (Console vs Syslog) via the config file.
+  - Granular log levels (INFO for standard ops, DEBUG for SQL queries).
+
+### 4. Configuration Handling
+- **Change**: Improved validation and parsing of the YAML configuration.
+- **Purpose**: 
+  - Removed unused parameters (e.g., `timezone`, as the script relies on system local time).
+  - Added support for custom database ports (critical for non-standard deployments or containerized tests).
+  - Explicitly handles the `replicate_sql` flag to control binary logging (it was intergrated into the partitioning logic).
+
+### 5. Type Safety
+- **Change**: Added comprehensive Python type hinting (e.g., `List`, `Dict`, `Optional`).
+- **Purpose**: Makes the code self-documenting and allows IDEs/linters to catch potential errors before execution.
+
+### 6. Zabbix 7.x Compatibility
+- **Change**: Added logic to verify Zabbix database version and schema requirements.
+- **Purpose**: 
+  - Checks `dbversion` table.
+  - **Critical**: Validates that target tables have the `clock` column as part of their Primary Key before attempting partitioning, preventing potential data corruption or MySQL errors.
--- a/partitioning/zabbix_partitioning.conf
+++ b/partitioning/zabbix_partitioning.conf
@@ -40,6 +40,14 @@ logging: syslog
 # premake: Number of partitions to create in advance
 premake: 10

+# initial_partitioning_start: Strategy for the first partition during initialization (--init).
+# Options:
+#   db_min:    (Default) Queries SELECT MIN(clock) to ensure ALL data is covered. Slow on huge tables consistently.
+#   retention: Starts partitioning from (Now - Retention Period). 
+#              Creates a 'p_archive' partition for all data older than retention.
+#              Much faster as it skips the MIN(clock) query. (Recommended for large DBs)
+initial_partitioning_start: db_min
+
 # replicate_sql: False - Disable binary logging. Partitioning changes are NOT replicated to slaves (use for independent maintenance).
 # replicate_sql: True - Enable binary logging. Partitioning changes ARE replicated to slaves (use for consistent cluster schema).
 replicate_sql: False
--- a/partitioning/zabbix_partitioning.py
+++ b/partitioning/zabbix_partitioning.py
@@ -371,7 +371,7 @@ class ZabbixPartitioner:
        for name in to_drop:
            self.execute_query(f"ALTER TABLE `{table}` DROP PARTITION {name}")

-    def initialize_partitioning(self, table: str, period: str, premake: int):
+    def initialize_partitioning(self, table: str, period: str, premake: int, retention_str: str):
        """Initial partitioning for a table (convert regular table to partitioned)."""
        self.logger.info(f"Initializing partitioning for {table}")
        
@@ -384,35 +384,63 @@ class ZabbixPartitioner:
             self.logger.info(f"Table {table} is already partitioned.")
             return

-        # Check for data
-        min_clock = self.get_table_min_clock(table)
+        init_strategy = self.config.get('initial_partitioning_start', 'db_min')
+        start_dt = None
+        p_archive_ts = None

-        if not min_clock:
-            # Empty table. Start from NOW
-            start_dt = self.truncate_date(datetime.now(), period)
+        if init_strategy == 'retention':
+            self.logger.info(f"Strategy 'retention': Calculating start date from retention ({retention_str})")
+            retention_date = self.get_lookback_date(retention_str)
+            # Start granular partitions from the retention date
+            start_dt = self.truncate_date(retention_date, period)
+            # Create a catch-all for anything older
+            p_archive_ts = int(start_dt.timestamp())
        else:
-             # Table has data. 
-             # For a safe migration, we usually create a catch-all for old data (p_old) or just start partitions covering existing data.
-             # This script's strategy: Create partitions starting from min_clock.
-             start_dt = self.truncate_date(min_clock, period)
+            # Default 'db_min' strategy
+            self.logger.info("Strategy 'db_min': Querying table for minimum clock (may be slow)")
+            min_clock = self.get_table_min_clock(table)
+            
+            if not min_clock:
+                # Empty table. Start from NOW
+                start_dt = self.truncate_date(datetime.now(), period)
+            else:
+                 # Table has data. 
+                 start_dt = self.truncate_date(min_clock, period)
        
        # Build list of partitions from start_dt up to NOW + premake
        target_dt = self.get_next_date(self.truncate_date(datetime.now(), period), period, premake)
        
        curr = start_dt
        partitions_def = {}
+        
+        # If we have an archive partition, add it first
+        if p_archive_ts:
+             partitions_def['p_archive'] = str(p_archive_ts)
+
        while curr < target_dt:
            name = self.get_partition_name(curr, period)
            desc = self.get_partition_description(curr, period)
            partitions_def[name] = desc
            curr = self.get_next_date(curr, period, 1)
            
+        # Re-doing the loop to be cleaner on types
        parts_sql = []
-        for name, timestamp_expr in sorted(partitions_def.items()):
-            parts_sql.append(PARTITION_TEMPLATE % (name, timestamp_expr))
+        
+        # 1. Archive Partition
+        if p_archive_ts:
+             parts_sql.append(f"PARTITION p_archive VALUES LESS THAN ({p_archive_ts}) ENGINE = InnoDB")
+        
+        # 2. Granular Partitions
+        # We need to iterate again from start_dt
+        curr = start_dt
+        while curr < target_dt:
+            name = self.get_partition_name(curr, period)
+            desc_date_str = self.get_partition_description(curr, period) # Returns "YYYY-MM-DD HH:MM:SS"
+            parts_sql.append(PARTITION_TEMPLATE % (name, desc_date_str))
+            curr = self.get_next_date(curr, period, 1)
            
        query = f"ALTER TABLE `{table}` PARTITION BY RANGE (`clock`) (\n" + ",\n".join(parts_sql) + "\n)"
-        self.logger.info(f"Applying initial partitioning to {table} ({len(partitions_def)} partitions)")
+        self.logger.info(f"Applying initial partitioning to {table} ({len(parts_sql)} partitions)")
        self.execute_query(query)

    def run(self, mode: str):
@@ -437,7 +465,7 @@ class ZabbixPartitioner:
                     retention = item[table]
                     
                     if mode == 'init':
-                         self.initialize_partitioning(table, period, premake)
+                         self.initialize_partitioning(table, period, premake, retention)
                     else:
                         # Maintenance mode (Add new, remove old)
                         self.create_future_partitions(table, period, premake)
--- a/zabbix-tests/partitioning/README.md
+++ b/zabbix-tests/partitioning/README.md
@@ -0,0 +1,36 @@
+# Zabbix Partitioning Tests
+
+This directory contains a Docker-based test environment for the Zabbix Partitioning script.
+
+## Prerequisites
+- Docker & Docker Compose
+- Python 3
+
+## Setup & Run
+1. Start the database container:
+   ```bash
+   docker compose up -d
+   ```
+   This will start a MySQL 8.0 container and import the Zabbix schema.
+
+2. Create valid config (done automatically):
+   The `test_config.yaml` references the running container.
+
+3. Run the partitioning script:
+   ```bash
+   # Create virtual environment if needed
+   python3 -m venv venv
+   ./venv/bin/pip install pymysql pyyaml
+
+   # Dry Run
+   ./venv/bin/python3 ../../partitioning/zabbix_partitioning.py -c test_config.yaml --dry-run --init
+
+   # Live Run
+   ./venv/bin/python3 ../../partitioning/zabbix_partitioning.py -c test_config.yaml --init
+   ```
+
+## Cleanup
+```bash
+docker compose down
+rm -rf venv
+```
--- a/zabbix-tests/partitioning/docker-compose.yml
+++ b/zabbix-tests/partitioning/docker-compose.yml
@@ -0,0 +1,14 @@
+services:
+  zabbix-db:
+    image: mysql:8.0
+    container_name: zabbix-partition-test
+    environment:
+      MYSQL_ROOT_PASSWORD: root_password
+      MYSQL_DATABASE: zabbix
+      MYSQL_USER: zbx_part
+      MYSQL_PASSWORD: zbx_password
+    volumes:
+      - ../../partitioning/schemas/70-schema-mysql.txt:/docker-entrypoint-initdb.d/schema.sql
+    ports:
+      - "33060:3306"
+    command: --default-authentication-plugin=mysql_native_password
--- a/zabbix-tests/partitioning/find_tables.py
+++ b/zabbix-tests/partitioning/find_tables.py
@@ -0,0 +1,31 @@
+import re
+
+def get_partitionable_tables(schema_path):
+    with open(schema_path, 'r', encoding='utf-8', errors='ignore') as f:
+        content = f.read()
+
+    # Split into CREATE TABLE statements
+    tables = content.split('CREATE TABLE')
+    valid_tables = []
+
+    for table_def in tables:
+        # Extract table name
+        name_match = re.search(r'`(\w+)`', table_def)
+        if not name_match:
+            continue
+        table_name = name_match.group(1)
+
+        # Check for PRIMARY KEY definition
+        pk_match = re.search(r'PRIMARY KEY \((.*?)\)', table_def, re.DOTALL)
+        if pk_match:
+            pk_cols = pk_match.group(1)
+            if 'clock' in pk_cols:
+                valid_tables.append(table_name)
+    
+    return valid_tables
+
+if __name__ == '__main__':
+    tables = get_partitionable_tables('/opt/git/Zabbix/partitioning/70-schema-mysql.txt')
+    print("Partitionable tables (PK contains 'clock'):")
+    for t in tables:
+        print(f" - {t}")
--- a/zabbix-tests/partitioning/test_config.yaml
+++ b/zabbix-tests/partitioning/test_config.yaml
@@ -0,0 +1,25 @@
+database:
+    type: mysql
+    host: 127.0.0.1
+    socket: 
+    user: root
+    passwd: root_password
+    db: zabbix
+    # Port mapping in docker-compose is 33060
+    port: 33060
+
+partitions:
+    daily:
+        - history: 7d
+        - history_uint: 7d
+        - history_str: 7d
+        - history_log: 7d
+        - history_text: 7d
+        - history_bin: 7d
+        - trends: 365d
+        - trends_uint: 365d
+
+logging: console
+premake: 2
+replicate_sql: False
+initial_partitioning_start: retention
--- a/zabbix-tests/partitioning/wait_for_db.py
+++ b/zabbix-tests/partitioning/wait_for_db.py
@@ -0,0 +1,25 @@
+import time
+import pymysql
+import sys
+
+config = {
+    'host': '127.0.0.1',
+    'port': 33060,
+    'user': 'root',
+    'password': 'root_password',
+    'database': 'zabbix'
+}
+
+max_retries = 90
+for i in range(max_retries):
+    try:
+        conn = pymysql.connect(**config)
+        print("Database is ready!")
+        conn.close()
+        sys.exit(0)
+    except Exception as e:
+        print(f"Waiting for DB... ({e})")
+        time.sleep(2)
+
+print("Timeout waiting for DB")
+sys.exit(1)
Author	SHA1	Message	Date
Maksym Buz	b1595ee9af	CHANGE: Added auto-tests with Docker.	2025-12-16 14:11:46 +01:00
Maksym Buz	cecd55cd3d	CHANGE: Added code documentation with explanation of script functions.	2025-12-16 14:11:39 +01:00
Maksym Buz	259340df46	CHANGE: Added refactoring notes	2025-12-16 14:10:34 +01:00
Maksym Buz	59cd724959	CHANGE: Added min(clock) scan bypass function (initial_partitioning_start), which significantly affects working with huge tables.	2025-12-16 14:10:20 +01:00