CHANGE: Added auto-tests with Docker.

CHANGE: Added code documentation with explanation of script functions.
CHANGE: Added refactoring notes
2025-12-16 14:11:46 +01:00 · 2025-12-16 14:11:39 +01:00 · 2025-12-16 14:10:34 +01:00 · 2025-12-16 14:10:20 +01:00
10 changed files with 282 additions and 16 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -3,4 +3,4 @@ venv/
 export/
 *_host_ids.txt
 *.log
-partitioning/tests/
+backup/
--- a/partitioning/CODE_DOCUMENTATION.md
+++ b/partitioning/CODE_DOCUMENTATION.md
@@ -0,0 +1,60 @@
 # Code Documentation: ZabbixPartitioner
 ## Class: ZabbixPartitioner
 ### Core Methods
 #### `__init__(self, config: Dict[str, Any], dry_run: bool = False)`
 Initializes the partitioner with configuration and runtime mode.
 - **config**: Dictionary containing database connection and partitioning rules.
 - **dry_run**: If True, SQL queries are logged but not executed.
 #### `connect_db(self)`
 Context manager for database connections.
 - Handles connection lifecycle (open/close).
 - Sets strict session variables:
  - `wait_timeout = 86400` (24h) to prevent timeouts during long operations.
  - `sql_log_bin = 0` (if configured) to prevent replication of partitioning commands.
 #### `run(self, mode: str)`
 Main entry point for execution.
 - **mode**: 
  - `'init'`: Initial setup. Calls `initialize_partitioning`.
  - `'maintenance'` (default): Routine operation. Calls `create_future_partitions` and `drop_old_partitions`.
 ### Logic Methods
 #### `initialize_partitioning(table: str, period: str, premake: int, retention_str: str)`
 Converts a standard table to a partitioned table.
 - **Strategies** (via `initial_partitioning_start` config):
  - `retention`: Starts from (Now - Retention). Creates `p_archive` for older data. FAST.
  - `db_min`: Queries `SELECT MIN(clock)`. PRECISE but SLOW.
 #### `create_future_partitions(table: str, period: str, premake: int)`
 Ensures sufficient future partitions exist.
 - Calculates required partitions based on current time + `premake` count.
 - Checks `information_schema` for existing partitions.
 - Adds missing partitions using `ALTER TABLE ... ADD PARTITION`.
 #### `drop_old_partitions(table: str, period: str, retention_str: str)`
 Removes partitions older than the retention period.
 - Parses partition names (e.g., `p2023_01_01`) to extract their date.
 - Compares against the calculated retention cutoff date.
 - Drops qualifiers using `ALTER TABLE ... DROP PARTITION`.
 ### Helper Methods
 #### `get_table_min_clock(table: str) -> Optional[datetime]`
 - Queries the table for the oldest timestamp. Used in `db_min` initialization strategy.
 #### `has_incompatible_primary_key(table: str) -> bool`
 - **Safety Critical**: Verifies that the table's Primary Key includes the `clock` column.
 - Returns `True` if incompatible (prevents partitioning to avoid MySQL errors).
 #### `get_partition_name(dt: datetime, period: str) -> str`
 - Generates standard partition names:
  - Daily: `pYYYY_MM_DD`
  - Monthly: `pYYYY_MM`
 #### `get_partition_description(dt: datetime, period: str) -> str`
 - Generates the `VALUES LESS THAN` expression for the partition (Start of NEXT period).
--- a/partitioning/REFACTORING_NOTES.md
+++ b/partitioning/REFACTORING_NOTES.md
@@ -0,0 +1,39 @@
 # Refactoring Notes: Zabbix Partitioning Script
 ## Overview
 The `zabbix_partitioning.py` script has been significantly refactored to improve maintainability, reliability, and compatibility with modern Zabbix versions (7.x).
 ## Key Changes
 ### 1. Architecture: Class-Based Structure
 - **Old**: Procedural script with global variables and scattered logic.
 - **New**: Encapsulated in a `ZabbixPartitioner` class.
 - **Purpose**: Improves modularity, testability, and state management. Allows the script to be easily imported or extended.
 ### 2. Database Connection Management
 - **Change**: Implemented `contextlib.contextmanager` for database connections.
 - **Purpose**: Ensures database connections are robustly opened and closed, even if errors occur. Handles `wait_timeout` and binary logging settings automatically for every session.
 ### 3. Logging
 - **Change**: Replaced custom `print` statements with Python's standard `logging` module.
 - **Purpose**: 
  - Allows consistent log formatting.
  - Supports configurable output destinations (Console vs Syslog) via the config file.
  - Granular log levels (INFO for standard ops, DEBUG for SQL queries).
 ### 4. Configuration Handling
 - **Change**: Improved validation and parsing of the YAML configuration.
 - **Purpose**: 
  - Removed unused parameters (e.g., `timezone`, as the script relies on system local time).
  - Added support for custom database ports (critical for non-standard deployments or containerized tests).
  - Explicitly handles the `replicate_sql` flag to control binary logging (it was intergrated into the partitioning logic).
 ### 5. Type Safety
 - **Change**: Added comprehensive Python type hinting (e.g., `List`, `Dict`, `Optional`).
 - **Purpose**: Makes the code self-documenting and allows IDEs/linters to catch potential errors before execution.
 ### 6. Zabbix 7.x Compatibility
 - **Change**: Added logic to verify Zabbix database version and schema requirements.
 - **Purpose**: 
  - Checks `dbversion` table.
  - **Critical**: Validates that target tables have the `clock` column as part of their Primary Key before attempting partitioning, preventing potential data corruption or MySQL errors.
--- a/partitioning/zabbix_partitioning.conf
+++ b/partitioning/zabbix_partitioning.conf
@@ -40,6 +40,14 @@ logging: syslog
 # premake: Number of partitions to create in advance
 premake: 10
 # initial_partitioning_start: Strategy for the first partition during initialization (--init).
 # Options:
 #   db_min:    (Default) Queries SELECT MIN(clock) to ensure ALL data is covered. Slow on huge tables consistently.
 #   retention: Starts partitioning from (Now - Retention Period). 
 #              Creates a 'p_archive' partition for all data older than retention.
 #              Much faster as it skips the MIN(clock) query. (Recommended for large DBs)
 initial_partitioning_start: db_min
 # replicate_sql: False - Disable binary logging. Partitioning changes are NOT replicated to slaves (use for independent maintenance).
 # replicate_sql: True - Enable binary logging. Partitioning changes ARE replicated to slaves (use for consistent cluster schema).
 replicate_sql: False
--- a/partitioning/zabbix_partitioning.py
+++ b/partitioning/zabbix_partitioning.py
@@ -371,7 +371,7 @@ class ZabbixPartitioner:
        for name in to_drop:
            self.execute_query(f"ALTER TABLE `{table}` DROP PARTITION {name}")
-    def initialize_partitioning(self, table: str, period: str, premake: int):
+    def initialize_partitioning(self, table: str, period: str, premake: int, retention_str: str):
        """Initial partitioning for a table (convert regular table to partitioned)."""
        self.logger.info(f"Initializing partitioning for {table}")
@@ -384,35 +384,63 @@ class ZabbixPartitioner:
             self.logger.info(f"Table {table} is already partitioned.")
             return
-        # Check for data
+        init_strategy = self.config.get('initial_partitioning_start', 'db_min')
-        min_clock = self.get_table_min_clock(table)
+        start_dt = None
-        
+        p_archive_ts = None
-        if not min_clock:
+
-            # Empty table. Start from NOW
+        if init_strategy == 'retention':
-            start_dt = self.truncate_date(datetime.now(), period)
+            self.logger.info(f"Strategy 'retention': Calculating start date from retention ({retention_str})")
            retention_date = self.get_lookback_date(retention_str)
            # Start granular partitions from the retention date
            start_dt = self.truncate_date(retention_date, period)
            # Create a catch-all for anything older
            p_archive_ts = int(start_dt.timestamp())
        else:
-             # Table has data. 
+            # Default 'db_min' strategy
-             # For a safe migration, we usually create a catch-all for old data (p_old) or just start partitions covering existing data.
+            self.logger.info("Strategy 'db_min': Querying table for minimum clock (may be slow)")
-             # This script's strategy: Create partitions starting from min_clock.
+            min_clock = self.get_table_min_clock(table)
-             start_dt = self.truncate_date(min_clock, period)
+            
            if not min_clock:
                # Empty table. Start from NOW
                start_dt = self.truncate_date(datetime.now(), period)
            else:
                 # Table has data. 
                 start_dt = self.truncate_date(min_clock, period)
        # Build list of partitions from start_dt up to NOW + premake
        target_dt = self.get_next_date(self.truncate_date(datetime.now(), period), period, premake)
        curr = start_dt
        partitions_def = {}
        # If we have an archive partition, add it first
        if p_archive_ts:
             partitions_def['p_archive'] = str(p_archive_ts)
        while curr < target_dt:
            name = self.get_partition_name(curr, period)
            desc = self.get_partition_description(curr, period)
            partitions_def[name] = desc
            curr = self.get_next_date(curr, period, 1)
        # Re-doing the loop to be cleaner on types
        parts_sql = []
-        for name, timestamp_expr in sorted(partitions_def.items()):
+        
-            parts_sql.append(PARTITION_TEMPLATE % (name, timestamp_expr))
+        # 1. Archive Partition
        if p_archive_ts:
             parts_sql.append(f"PARTITION p_archive VALUES LESS THAN ({p_archive_ts}) ENGINE = InnoDB")
        # 2. Granular Partitions
        # We need to iterate again from start_dt
        curr = start_dt
        while curr < target_dt:
            name = self.get_partition_name(curr, period)
            desc_date_str = self.get_partition_description(curr, period) # Returns "YYYY-MM-DD HH:MM:SS"
            parts_sql.append(PARTITION_TEMPLATE % (name, desc_date_str))
            curr = self.get_next_date(curr, period, 1)
        query = f"ALTER TABLE `{table}` PARTITION BY RANGE (`clock`) (\n" + ",\n".join(parts_sql) + "\n)"
-        self.logger.info(f"Applying initial partitioning to {table} ({len(partitions_def)} partitions)")
+        self.logger.info(f"Applying initial partitioning to {table} ({len(parts_sql)} partitions)")
        self.execute_query(query)
    def run(self, mode: str):
@@ -437,7 +465,7 @@ class ZabbixPartitioner:
                     retention = item[table]
                     if mode == 'init':
-                         self.initialize_partitioning(table, period, premake)
+                         self.initialize_partitioning(table, period, premake, retention)
                     else:
                         # Maintenance mode (Add new, remove old)
                         self.create_future_partitions(table, period, premake)
--- a/zabbix-tests/partitioning/README.md
+++ b/zabbix-tests/partitioning/README.md
@@ -0,0 +1,36 @@
 # Zabbix Partitioning Tests
 This directory contains a Docker-based test environment for the Zabbix Partitioning script.
 ## Prerequisites
 - Docker & Docker Compose
 - Python 3
 ## Setup & Run
 1. Start the database container:
   ```bash
   docker compose up -d
   ```
   This will start a MySQL 8.0 container and import the Zabbix schema.
 2. Create valid config (done automatically):
   The `test_config.yaml` references the running container.
 3. Run the partitioning script:
   ```bash
   # Create virtual environment if needed
   python3 -m venv venv
   ./venv/bin/pip install pymysql pyyaml
   # Dry Run
   ./venv/bin/python3 ../../partitioning/zabbix_partitioning.py -c test_config.yaml --dry-run --init
   # Live Run
   ./venv/bin/python3 ../../partitioning/zabbix_partitioning.py -c test_config.yaml --init
   ```
 ## Cleanup
 ```bash
 docker compose down
 rm -rf venv
 ```
--- a/zabbix-tests/partitioning/docker-compose.yml
+++ b/zabbix-tests/partitioning/docker-compose.yml
@@ -0,0 +1,14 @@
 services:
  zabbix-db:
    image: mysql:8.0
    container_name: zabbix-partition-test
    environment:
      MYSQL_ROOT_PASSWORD: root_password
      MYSQL_DATABASE: zabbix
      MYSQL_USER: zbx_part
      MYSQL_PASSWORD: zbx_password
    volumes:
      - ../../partitioning/schemas/70-schema-mysql.txt:/docker-entrypoint-initdb.d/schema.sql
    ports:
      - "33060:3306"
    command: --default-authentication-plugin=mysql_native_password
--- a/zabbix-tests/partitioning/find_tables.py
+++ b/zabbix-tests/partitioning/find_tables.py
@@ -0,0 +1,31 @@
 import re
 def get_partitionable_tables(schema_path):
    with open(schema_path, 'r', encoding='utf-8', errors='ignore') as f:
        content = f.read()
    # Split into CREATE TABLE statements
    tables = content.split('CREATE TABLE')
    valid_tables = []
    for table_def in tables:
        # Extract table name
        name_match = re.search(r'`(\w+)`', table_def)
        if not name_match:
            continue
        table_name = name_match.group(1)
        # Check for PRIMARY KEY definition
        pk_match = re.search(r'PRIMARY KEY \((.*?)\)', table_def, re.DOTALL)
        if pk_match:
            pk_cols = pk_match.group(1)
            if 'clock' in pk_cols:
                valid_tables.append(table_name)
    return valid_tables
 if __name__ == '__main__':
    tables = get_partitionable_tables('/opt/git/Zabbix/partitioning/70-schema-mysql.txt')
    print("Partitionable tables (PK contains 'clock'):")
    for t in tables:
        print(f" - {t}")
--- a/zabbix-tests/partitioning/test_config.yaml
+++ b/zabbix-tests/partitioning/test_config.yaml
@@ -0,0 +1,25 @@
 database:
    type: mysql
    host: 127.0.0.1
    socket: 
    user: root
    passwd: root_password
    db: zabbix
    # Port mapping in docker-compose is 33060
    port: 33060
 partitions:
    daily:
        - history: 7d
        - history_uint: 7d
        - history_str: 7d
        - history_log: 7d
        - history_text: 7d
        - history_bin: 7d
        - trends: 365d
        - trends_uint: 365d
 logging: console
 premake: 2
 replicate_sql: False
 initial_partitioning_start: retention
--- a/zabbix-tests/partitioning/wait_for_db.py
+++ b/zabbix-tests/partitioning/wait_for_db.py
@@ -0,0 +1,25 @@
 import time
 import pymysql
 import sys
 config = {
    'host': '127.0.0.1',
    'port': 33060,
    'user': 'root',
    'password': 'root_password',
    'database': 'zabbix'
 }
 max_retries = 90
 for i in range(max_retries):
    try:
        conn = pymysql.connect(**config)
        print("Database is ready!")
        conn.close()
        sys.exit(0)
    except Exception as e:
        print(f"Waiting for DB... ({e})")
        time.sleep(2)
 print("Timeout waiting for DB")
 sys.exit(1)
Author	SHA1	Message	Date
Maksym Buz	b1595ee9af	CHANGE: Added auto-tests with Docker.	2025-12-16 14:11:46 +01:00
Maksym Buz	cecd55cd3d	CHANGE: Added code documentation with explanation of script functions.	2025-12-16 14:11:39 +01:00
Maksym Buz	259340df46	CHANGE: Added refactoring notes	2025-12-16 14:10:34 +01:00
Maksym Buz	59cd724959	CHANGE: Added min(clock) scan bypass function (initial_partitioning_start), which significantly affects working with huge tables.	2025-12-16 14:10:20 +01:00