f16725e 3 months ago History
1 contributor
991 lines | 37.293kb

autoSMART Development Guide

๐Ÿ“š Developer Documentation Index

This document serves as the complete guide for developers working on autoSMART. It includes development environment setup, architecture documentation, testing procedures, and developer-specific changelog.

Quick Navigation

๐Ÿ“ Codebase Structure

autoSMART follows a modular architecture with clear separation of concerns. Below is the complete directory structure and file descriptions:

Project Root

autoSMART/
โ”œโ”€โ”€ README.md                    # Symlink to docs/README.md (end-user documentation)
โ”œโ”€โ”€ .deployignore               # Files excluded from production deployment
โ”œโ”€โ”€ config/                     # Configuration files and templates
โ”œโ”€โ”€ docs/                       # Documentation (mixed deployment)
โ”œโ”€โ”€ lib/                        # Perl modules and core libraries
โ”œโ”€โ”€ scripts/                    # Executable scripts and utilities
โ””โ”€โ”€ sql/                        # Database schema and SQL files

๐Ÿ“ /config/ - Configuration Management

Configuration files are organized by scope and environment:

config/
โ”œโ”€โ”€ cluster.conf                # Cluster-wide settings (shared across nodes)
โ”œโ”€โ”€ cluster-ebony.conf         # Node-specific configuration for ebony
โ”œโ”€โ”€ database.conf              # PostgreSQL connection settings
โ”œโ”€โ”€ openai.conf               # OpenAI API configuration and prompts
โ”œโ”€โ”€ smart.conf                # SMART parameter thresholds and monitoring rules
โ”œโ”€โ”€ default                   # Default/template configuration
โ””โ”€โ”€ debug-ebony.sh           # Development debugging script for ebony node

Configuration File Details

  • cluster.conf (88 lines):

    • Cluster topology and node definitions
    • Node hostnames, IP addresses, and roles
    • Shared monitoring parameters across cluster
    • Global system settings and defaults
    • Inter-node communication configuration
  • database.conf (30 lines):

    • PostgreSQL connection parameters (host, port, database, credentials)
    • Connection pooling settings and timeouts
    • Database-specific optimizations and tuning parameters
    • SSL configuration and security settings
  • openai.conf (50 lines):

    • OpenAI API key and model configuration
    • Prompt templates for failure prediction analysis
    • Response parsing rules and confidence thresholds
    • Rate limiting and cost management settings
    • Fallback configurations for API failures
  • smart.conf (57 lines):

    • SMART parameter monitoring thresholds for different drive types
    • Critical parameter definitions and escalation rules
    • Alert generation rules and notification preferences
    • Parameter collection intervals and scheduling
    • Drive type specific monitoring configurations
  • default (107 lines):

    • Default/template configuration for new node deployments
    • Standard parameter values and system defaults
    • Configuration validation rules and constraints
    • Example configurations with detailed comments
  • cluster-ebony.conf (13 lines):

    • Node-specific configuration overrides for ebony node
    • Local network settings and hardware-specific parameters
    • Custom thresholds for specific hardware configurations
  • debug-ebony.sh (29 lines):

    • Development debugging utilities for ebony node
    • Test data generation and validation scripts
    • Development environment setup and configuration
    • Debugging tools and diagnostic utilities

๐Ÿ“ /lib/ - Core Perl Modules

Core business logic implemented as reusable Perl modules:

lib/
โ”œโ”€โ”€ SmartCollector.pm          # SMART data collection and hardware tracking
โ””โ”€โ”€ PredictionEngine.pm        # AI-powered failure prediction engine

Module Architecture

  • SmartCollector.pm (802 lines):

    • Hardware Identification: Device detection using serial numbers and model names
    • SMART Data Collection: Integration with smartmontools for comprehensive parameter collection
    • Migration Detection: Algorithms to detect when drives move between nodes or device paths
    • Differential Storage: Intelligent storage system that only saves changed parameters
    • Database Layer: PostgreSQL integration with connection pooling and error handling
    • Storage Efficiency: Real-time monitoring of storage optimization effectiveness
    • Configuration Management: Dynamic configuration loading and validation
    • Error Handling: Comprehensive error handling with detailed logging
  • PredictionEngine.pm (607 lines):

    • OpenAI Integration: Direct API communication with GPT models
    • Prompt Engineering: Sophisticated prompt templates for failure prediction
    • Response Processing: Parsing and validation of AI-generated predictions
    • Confidence Scoring: Statistical analysis of prediction reliability
    • Timeline Estimation: Failure time prediction with confidence intervals
    • Cost Optimization: API usage optimization and request batching
    • Error Recovery: Robust error handling for API failures and rate limits

๐Ÿ“ /scripts/ - Executable Components

Production scripts and development utilities:

scripts/
โ”œโ”€โ”€ autosmart-collector.pl      # Main data collection daemon
โ”œโ”€โ”€ autosmart-predictor.pl      # AI prediction processing
โ”œโ”€โ”€ autosmart-report.pl         # Report generation engine
โ”œโ”€โ”€ autosmart-migration-report.pl # Hardware migration analysis
โ”œโ”€โ”€ smart-collector-daemon.pl   # Background collection service
โ”œโ”€โ”€ deploy.sh                   # Unified deployment script
โ”œโ”€โ”€ deploy-production.sh        # Production cluster deployment
โ”œโ”€โ”€ install.sh                  # Symlink to deploy.sh for compatibility
โ”œโ”€โ”€ uninstall.sh               # Complete system removal
โ”œโ”€โ”€ monitor-cluster.sh          # Cluster health monitoring
โ”œโ”€โ”€ test-smart-collection.pl    # SMART collection testing
โ”œโ”€โ”€ test-differential-storage.pl # Storage optimization testing
โ”œโ”€โ”€ test-db-connection.pl       # Database connectivity testing
โ””โ”€โ”€ simple-smart-test.pl        # Basic SMART functionality test

Script Categories

Production Scripts
  • autosmart-collector.pl (348 lines):

    • Main collection daemon that runs on each node
    • Scheduled SMART data collection and processing
    • Hardware detection and migration tracking
    • Integration with SmartCollector.pm module
    • Command-line options for daemon mode, single-run, and debugging
  • autosmart-predictor.pl (483 lines):

    • Processes collected data for AI predictions
    • Batch processing of pending SMART readings
    • Integration with PredictionEngine.pm for OpenAI communication
    • Prediction result storage and confidence tracking
  • autosmart-report.pl (662 lines):

    • Generates comprehensive health reports and alerts
    • Configurable report formats (summary, detailed, trend analysis)
    • Email notification system for critical alerts
    • Historical data analysis and trend detection
  • smart-collector-daemon.pl (252 lines):

    • Background service wrapper for collector
    • Process management and restart capabilities
    • Log rotation and system integration
    • Service status monitoring and health checks
Deployment Scripts
  • deploy.sh (697 lines):

    • Unified deployment for single node or cluster
    • Supports install, uninstall, and cluster deployment modes
    • Automatic dependency checking and installation
    • Configuration template deployment and customization
    • System service registration and startup
  • deploy-production.sh (116 lines):

    • Production-specific deployment procedures
    • Multi-node cluster deployment automation
    • Production safety checks and validation
    • Rollback capabilities for failed deployments
  • uninstall.sh (187 lines):

    • Complete system cleanup and removal
    • Service stopping and deregistration
    • File and directory cleanup
    • Database cleanup options (configurable)
  • monitor-cluster.sh (515 lines):

    • Ongoing cluster health monitoring
    • Node status verification and reporting
    • Service health checks across all cluster nodes
    • Automated restart capabilities for failed services
Development & Testing Scripts
  • test-smart-collection.pl (132 lines):

    • Validates SMART data collection functionality
    • Tests hardware detection and identification
    • Verifies database connectivity and data storage
    • Performance benchmarking for collection operations
  • test-differential-storage.pl (270 lines):

    • Comprehensive testing of storage optimization
    • Validates differential storage algorithms
    • Tests change detection and storage efficiency
    • Performance analysis and optimization verification
  • test-db-connection.pl (55 lines):

    • Database connectivity verification
    • Connection pooling and timeout testing
    • SQL execution validation
    • Database performance testing
  • simple-smart-test.pl (144 lines):

    • Basic functionality testing
    • Quick validation of core components
    • Integration testing for development
    • Smoke testing for deployment validation
Analysis Scripts
  • autosmart-migration-report.pl (615 lines):
    • Hardware migration tracking and analysis
    • Migration pattern detection and reporting
    • Historical migration data analysis
    • Migration-related issue identification and troubleshooting

๐Ÿ“ /sql/ - Database Schema

PostgreSQL database definitions and utilities:

sql/
โ”œโ”€โ”€ schema.sql                  # Complete production database schema
โ””โ”€โ”€ schema-fixed.sql           # Schema with specific fixes/patches

Database Schema Components

  • Core Tables:
    • hdd_inventory: Hardware identification and location tracking
    • smart_readings: SMART parameter data with differential storage
    • hdd_migrations: Drive movement logging between nodes/paths
  • AI Integration:
    • predictions: AI-generated failure predictions with confidence scores
    • alert_history: Alert notification tracking and escalation
  • Configuration:
    • smart_thresholds: Configurable parameter thresholds and alert rules
    • system_config: System-wide configuration parameters
  • Optimization:
    • Differential storage functions (should_store_smart_reading())
    • Reconstructed views (smart_readings_reconstructed)
    • Change detection algorithms with SHA256 checksums
  • Indexing:
    • Performance-optimized indexes for temporal queries
    • Hardware identification indexes for fast lookups
    • Composite indexes for complex query patterns
Schema Files Details
  • schema.sql (726 lines):

    • Complete production database schema
    • Full table definitions with constraints and indexes
    • PostgreSQL functions for differential storage
    • Views for data reconstruction and reporting
    • Trigger definitions for automated processes
  • schema-fixed.sql (423 lines):

    • Schema patches and specific fixes
    • Migration scripts for schema updates
    • Performance optimization adjustments
    • Compatibility fixes for different PostgreSQL versions

๐Ÿ“ /docs/ - Documentation

Documentation organized by audience and deployment status:

docs/
โ”œโ”€โ”€ README.md                   # End-user guide (DEPLOYED)
โ”œโ”€โ”€ INSTALLATION.md             # Setup and configuration (DEPLOYED)
โ”œโ”€โ”€ CHANGELOG.md               # Release notes for end-users (DEPLOYED)
โ”œโ”€โ”€ API.md                     # OpenAI API configuration (DEPLOYED)
โ”œโ”€โ”€ DEVELOPMENT.md             # Developer guide (NOT DEPLOYED)
โ””โ”€โ”€ DIFFERENTIAL_STORAGE.md    # Technical storage details (NOT DEPLOYED)

Documentation Deployment Strategy

  • Deployed docs: End-user facing documentation
  • Non-deployed docs: Developer and technical implementation details

๐Ÿ”ง Key File Relationships

Data Flow Architecture

smartmontools โ†’ SmartCollector.pm โ†’ PostgreSQL โ†’ PredictionEngine.pm โ†’ OpenAI API
     โ†“               โ†“                    โ†“              โ†“
autosmart-collector.pl โ†’ Database โ†’ autosmart-predictor.pl โ†’ Reports

Configuration Hierarchy

cluster.conf (global) โ†’ node-specific.conf โ†’ smart.conf โ†’ openai.conf
                                โ†“
                        Individual script configurations

Module Dependencies

autosmart-collector.pl
โ”œโ”€โ”€ SmartCollector.pm
โ”œโ”€โ”€ database.conf
โ”œโ”€โ”€ smart.conf
โ””โ”€โ”€ cluster.conf

autosmart-predictor.pl
โ”œโ”€โ”€ PredictionEngine.pm
โ”œโ”€โ”€ SmartCollector.pm (for data access)
โ”œโ”€โ”€ openai.conf
โ””โ”€โ”€ database.conf

๐Ÿ“Š Codebase Metrics

File Type Distribution

  • Perl Scripts: 8 production scripts + 4 test scripts (12 total)
  • Perl Modules: 2 core modules (1,409 total lines)
  • Shell Scripts: 5 deployment/management scripts (1,645 total lines)
  • SQL Files: 2 schema files (1,149 total lines)
  • Configuration: 7 configuration files (374 total lines)
  • Documentation: 5 documentation files

Code Complexity by Lines of Code

  • SmartCollector.pm: 802 lines (High complexity - hardware integration, differential storage)
  • PredictionEngine.pm: 607 lines (Medium complexity - API integration, data processing)
  • Database Schema: 726 lines (High complexity - advanced PostgreSQL features)
  • Deploy Scripts: 697 lines each (Medium complexity - system integration)
  • Report Generation: 662 lines (Medium complexity - data analysis and formatting)
  • Migration Analysis: 615 lines (Medium complexity - pattern detection)
  • Cluster Monitoring: 515 lines (Medium complexity - distributed system monitoring)

Total Codebase Size

  • Production Code: ~4,500 lines (Perl modules + production scripts)
  • Deployment & Management: ~1,800 lines (deployment and monitoring scripts)
  • Testing Code: ~600 lines (test scripts and utilities)
  • Database Schema: ~1,150 lines (PostgreSQL schema and functions)
  • Configuration: ~375 lines (configuration templates and examples)
  • Total: ~8,400+ lines of code

Testing Coverage Areas

  • Unit Tests: Module-specific functionality testing
  • Integration Tests: End-to-end data flow validation
  • Performance Tests: Storage efficiency and query optimization benchmarks
  • Deployment Tests: Installation and configuration validation across environments
  • Regression Tests: Automated testing for core functionality preservation

๐Ÿ—๏ธ Development Workflow

Getting Started with Development

  1. Clone Repository: Set up local development environment
  2. Database Setup: Configure PostgreSQL connection to development database
  3. Perl Dependencies: Install required CPAN modules
  4. Configuration: Copy and customize configuration templates
  5. Testing: Run test suite to verify setup

Adding New Features

  1. Module Development: Extend existing Perl modules or create new ones
  2. Script Integration: Create or modify scripts to use new functionality
  3. Database Changes: Update schema if new data structures are needed
  4. Testing: Add comprehensive tests for new functionality
  5. Documentation: Update both end-user and developer documentation

Code Organization Principles

  • Separation of Concerns: Each module and script has a specific, well-defined responsibility
  • Configuration-Driven: System behavior is controlled through configuration files rather than hard-coded values
  • Database-Centric: PostgreSQL serves as the central data store with business logic in database functions
  • Modular Design: Components can be developed, tested, and deployed independently
  • Error Handling: Comprehensive error handling and logging throughout all components
  • Performance-First: Optimized for high-volume data collection and processing
  • Scalability: Designed to scale across multiple nodes in a cluster environment

Development Patterns Used

  • Factory Pattern: Configuration-based object creation in Perl modules
  • Observer Pattern: Event-driven processing for hardware changes and alerts
  • Strategy Pattern: Configurable algorithms for different drive types and thresholds
  • Template Method: Standardized data processing pipelines with customizable steps
  • Singleton Pattern: Database connection management and configuration loading
  • Command Pattern: Script-based operations with standardized interfaces

Code Quality Standards

  • Perl Best Practices: Strict warnings, proper scoping, and defensive programming
  • Database Normalization: Proper relational design with referential integrity
  • Configuration Validation: Input validation and sanitization throughout
  • Error Recovery: Graceful degradation and automatic recovery mechanisms
  • Performance Monitoring: Built-in performance metrics and optimization tracking
  • Security Practices: SQL injection prevention, input validation, and secure configuration management

๐Ÿ—๏ธ Development Environment Setup

Prerequisites

System Requirements

  • Operating System: Linux/macOS (tested on macOS, deployed on Proxmox VE)
  • Perl: Version 5.20+ with CPAN access
  • PostgreSQL: Version 13+ with JSONB and extension support
  • Git: For version control and collaboration

Development Database

# Current test database configuration
Host: 192.168.2.102
Database: autosmart  
User: postgres
Password: (no password)
Port: 5432

Required Perl Modules

# Core database modules
cpan install DBI DBD::Pg

# JSON processing
cpan install JSON::XS

# System utilities  
cpan install Config::Simple File::Slurp Time::HiRes

# Security and hashing
cpan install Digest::SHA

# HTTP/API clients (for OpenAI integration)
cpan install LWP::UserAgent HTTP::Request::Common

# Optional: Development and testing
cpan install Data::Dumper Test::More Test::Exception

Development Workflow

1. Environment Setup

# Clone the project
cd /Users/bogdan/Documents/workspace/
git clone <autoSMART-repo>
cd autoSMART

# Set environment variables
export AUTOSMART_DB_HOST=192.168.2.102
export AUTOSMART_DB_NAME=autosmart
export AUTOSMART_DB_USER=postgres
export AUTOSMART_DB_PASS=
export AUTOSMART_DB_PORT=5432

# Optional: OpenAI API key for AI features
export OPENAI_API_KEY=your-api-key-here

2. Database Setup

# Initialize the database schema
psql -h 192.168.2.102 -U postgres -d autosmart -f sql/schema.sql

# Verify installation
psql -h 192.168.2.102 -U postgres -d autosmart -c "\\dt"

3. Testing Environment

# Run the differential storage test suite
cd scripts/
perl test-differential-storage.pl

# Test database connectivity
perl -e "
use DBI;
my \$dsn = 'DBI:Pg:dbname=autosmart;host=192.168.2.102;port=5432';
my \$dbh = DBI->connect(\$dsn, 'postgres', '', {RaiseError => 1});
print \"Database connection successful!\\n\";
\$dbh->disconnect();
"

๐Ÿงฉ Architecture Overview

System Components

autoSMART Architecture
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Proxmox Cluster                          โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚      Node 1         โ”‚       Node 2        โ”‚      Node 3     โ”‚
โ”‚                     โ”‚                     โ”‚                 โ”‚
โ”‚ โ”Œโ”€โ”€โ”€ SmartCollector โ”ค โ”Œโ”€โ”€โ”€ SmartCollector โ”ค โ”Œโ”€โ”€โ”€ SmartCollector
โ”‚ โ”‚   - HDD Scanning  โ”‚ โ”‚   - HDD Scanning  โ”‚ โ”‚   - HDD Scanning
โ”‚ โ”‚   - SMART Reading โ”‚ โ”‚   - SMART Reading โ”‚ โ”‚   - SMART Reading  
โ”‚ โ”‚   - Migration Det โ”‚ โ”‚   - Migration Det โ”‚ โ”‚   - Migration Det
โ”‚ โ””โ”€โ”€โ”€ Data Storage   โ”‚ โ””โ”€โ”€โ”€ Data Storage   โ”‚ โ””โ”€โ”€โ”€ Data Storage
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ”‚
                      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                      โ”‚   PostgreSQL DB   โ”‚
                      โ”‚                  โ”‚
                      โ”‚ โ€ข HDD Inventory  โ”‚
                      โ”‚ โ€ข SMART Readings โ”‚
                      โ”‚ โ€ข Migrations     โ”‚
                      โ”‚ โ€ข AI Predictions โ”‚
                      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ”‚
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚    SmartAnalyzer     โ”‚
                    โ”‚                      โ”‚
                    โ”‚ โ€ข OpenAI API         โ”‚
                    โ”‚ โ€ข Failure Prediction โ”‚
                    โ”‚ โ€ข Pattern Analysis   โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ”‚
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚    SmartReporter     โ”‚
                    โ”‚                      โ”‚
                    โ”‚ โ€ข Alert Generation   โ”‚
                    โ”‚ โ€ข Report Creation    โ”‚
                    โ”‚ โ€ข Dashboard Data     โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Data Flow

  1. Collection Phase:

    • SmartCollector scans HDDs on each node
    • Hardware identification (serial + model)
    • Migration detection if HDD moved
    • Differential storage decision
    • Store only changed/critical data
  2. Analysis Phase:

    • SmartAnalyzer processes stored data
    • Historical pattern analysis
    • OpenAI API calls for predictions
    • Risk assessment and trending
  3. Reporting Phase:

    • SmartReporter generates alerts
    • Dashboard data preparation
    • Health reports creation
    • Maintenance recommendations

๐Ÿ”ง Module Development

SmartCollector.pm Development

Key Methods to Understand

# Hardware identification and migration detection
sub _detect_or_create_hdd($drive_info, $smart_data)

# Differential storage decision making
sub _should_store_reading($hdd_id, $smart_data)

# Optimized data storage
sub _insert_smart_reading_differential($hdd_id, $drive_info, $smart_data, $storage_info)

Adding New Features

  1. New SMART Parameters: ```perl

    Add parameter processing in collect_smart_data()

    if ($line =~ /New_Parameter.*\s+(\d+)/) { $smart_data->{parameters}{'New_Parameter'} = $1; } ```

  2. Custom Manufacturer Detection: ```perl

    Extend _detect_manufacturer() method

    sub _detect_manufacturer { my ($self, $model) = @_; return 'Custom_Manufacturer' if $model =~ /CUSTOM_PATTERN/; # ... existing logic } ```

SmartAnalyzer.pm Development

AI Integration Patterns

# OpenAI API call structure
sub _call_openai_api {
    my ($self, $prompt, $smart_data) = @_;
    
    my $request = HTTP::Request->new(POST => 'https://api.openai.com/v1/chat/completions');
    $request->header('Authorization' => "Bearer $self->{openai_api_key}");
    $request->header('Content-Type' => 'application/json');
    
    my $payload = {
        model => "gpt-4",
        messages => [
            {
                role => "system", 
                content => "You are an expert in HDD failure prediction..."
            },
            {
                role => "user",
                content => $prompt
            }
        ]
    };
    
    # ... handle response
}

๐Ÿ—ƒ๏ธ Database Development

Schema Evolution

Adding New Tables

-- Always include migration scripts
CREATE TABLE new_feature (
    id SERIAL PRIMARY KEY,
    hdd_id INTEGER REFERENCES hdd_inventory(id),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Add indexes for performance
CREATE INDEX idx_new_feature_hdd_id ON new_feature(hdd_id);

Modifying Existing Tables

-- Use ALTER statements for compatibility
ALTER TABLE smart_readings ADD COLUMN new_field VARCHAR(100);
CREATE INDEX CONCURRENTLY idx_smart_readings_new_field ON smart_readings(new_field);

Query Optimization

Efficient SMART Data Queries

-- Use the reconstructed view for complete data
SELECT * FROM smart_readings_reconstructed 
WHERE hdd_id = $1 
  AND timestamp > NOW() - INTERVAL '30 days'
ORDER BY timestamp DESC;

-- Use raw table for storage statistics
SELECT reading_type, COUNT(*) 
FROM smart_readings 
WHERE timestamp > NOW() - INTERVAL '7 days'
GROUP BY reading_type;

๐Ÿงช Testing Guidelines

Unit Testing

# Example test structure
use Test::More tests => 5;
use lib '../lib';
use SmartCollector;

my $collector = SmartCollector->new({
    db_host => '192.168.2.102',
    db_name => 'autosmart_test',
    # ... test config
});

# Test hardware identification
my $hdd_id = $collector->_detect_or_create_hdd($drive_info, $smart_data);
ok($hdd_id > 0, "HDD identification successful");

# Test differential storage
my $storage_decision = $collector->_should_store_reading($hdd_id, $smart_data);
ok($storage_decision->{store}, "Storage decision made");

Integration Testing

# Run the comprehensive test suite
cd scripts/
perl test-differential-storage.pl

# Test with real hardware (if available)
perl collect-smart-data.pl --test-mode --device /dev/sdb

Performance Testing

-- Test query performance
EXPLAIN ANALYZE 
SELECT * FROM smart_readings_reconstructed 
WHERE hdd_id IN (1,2,3,4,5) 
  AND timestamp > NOW() - INTERVAL '90 days';

-- Monitor storage efficiency
SELECT 
    reading_type,
    COUNT(*) as readings,
    AVG(length(parameters_json::text)) as avg_size_bytes
FROM smart_readings 
WHERE timestamp > NOW() - INTERVAL '24 hours'
GROUP BY reading_type;

๐Ÿ” Debugging and Troubleshooting

Logging System

# Enable debug logging
$ENV{AUTOSMART_DEBUG} = 3;  # Maximum verbosity

# Log levels:
# 1 = Errors only
# 2 = Warnings and errors  
# 3 = Info, warnings, errors
# 4 = Debug everything

Common Issues

Database Connection Problems

# Test database connectivity
psql -h 192.168.2.102 -U postgres -d autosmart -c "SELECT version();"

# Check permissions
psql -h 192.168.2.102 -U postgres -d autosmart -c "\\dp smart_readings"

SMART Data Collection Issues

# Test smartctl access
sudo smartctl -a /dev/sda

# Check permissions
ls -la /dev/sd*

Migration Detection Problems

-- Check migration logs
SELECT * FROM hdd_migrations 
ORDER BY detected_at DESC 
LIMIT 10;

-- Verify HDD inventory
SELECT serial_number, model_name, current_device_path, current_node_id 
FROM hdd_inventory 
WHERE status = 'active';

๐Ÿ“Š Performance Monitoring

Database Performance

-- Monitor table sizes
SELECT schemaname, tablename, 
       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables 
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

-- Monitor query performance
SELECT query, mean_time, calls 
FROM pg_stat_statements 
WHERE query LIKE '%smart_readings%'
ORDER BY mean_time DESC;

Application Performance

# Add timing to critical operations
use Time::HiRes qw(time);

my $start_time = time();
my $result = $self->collect_smart_data($device_path);
my $duration = time() - $start_time;

$self->_log("SMART collection took ${duration}s for $device_path", 3);

๐Ÿš€ Deployment Guidelines

Production Deployment

  1. Database Setup:

    • Use dedicated PostgreSQL server
    • Configure proper backup strategy
    • Set up monitoring and alerting
  2. Security Configuration:

    • Use dedicated database users with minimal privileges
    • Secure API keys and configuration files
    • Enable SSL connections for database
  3. Performance Tuning:

    • Configure PostgreSQL for time-series workload
    • Set up proper indexing strategy
    • Monitor and optimize slow queries

Proxmox Integration

# Install on cluster nodes
for node in pve01 pve02 pve03; do
    scp -r autoSMART/ root@$node:/etc/pve/
done

# Configure systemd services
systemctl enable autosmart-collector
systemctl start autosmart-collector

๐Ÿ“š Additional Resources

Useful Commands

# Monitor system in real-time
watch -n 30 'psql -h 192.168.2.102 -U postgres -d autosmart -c "SELECT COUNT(*) FROM smart_readings WHERE timestamp > NOW() - INTERVAL '\''1 hour'\''"'

# Generate performance report
psql -h 192.168.2.102 -U postgres -d autosmart -f sql/performance-report.sql

Development Tools

  • pgAdmin: Database administration and query development
  • Perl::Critic: Code quality analysis
  • Perl::Tidy: Code formatting
  • Git: Version control with feature branches

๐Ÿ“ Developer Changelog

This section contains detailed technical changes, internal API modifications, and development-specific information that is not relevant for end-users.

[1.0.0] - 2025-08-15 - Development Details

๐Ÿ—๏ธ Architecture Changes

  • Database Schema Evolution: Complete redesign from simple SMART storage to differential storage architecture
  • Hardware Tracking Implementation: Added hdd_inventory and hdd_migrations tables for hardware-based identification
  • Differential Storage Engine: Implemented should_store_smart_reading() PostgreSQL function with configurable change detection
  • Migration Detection Algorithm: Created automatic hardware migration detection using serial numbers and model matching

๐Ÿ”ง Internal API Changes

  • SmartCollector.pm Refactor:
    • Added hardware identification methods (identify_hardware(), detect_migration())
    • Implemented differential storage integration (should_store_reading())
    • Added storage efficiency monitoring
    • Breaking change: Constructor now requires database handle
  • Database Functions:
    • Added should_store_smart_reading(jsonb, text, text, interval, text[]) function
    • Added smart_readings_reconstructed view for seamless data access
    • Added migration tracking triggers
  • Configuration Schema:
    • Split configuration into cluster-wide (cluster.conf) and node-specific (autosmart.conf)
    • Added differential storage parameters (force_storage_interval, critical_parameters)

๐Ÿงช Testing Infrastructure

  • Differential Storage Test Suite: Added comprehensive test coverage in test-differential-storage.pl
  • Migration Detection Tests: Validated hardware tracking across different scenarios
  • Performance Benchmarks: Established baseline performance metrics for storage efficiency
  • Database Integration Tests: Added tests for PostgreSQL function behavior

๐Ÿ“Š Performance Optimizations

  • Storage Efficiency: Achieved 60-80% database size reduction through differential storage
  • Query Optimization: Added proper indexing for hardware tracking and temporal queries
  • Background Processing: Implemented non-blocking collection and analysis workflows
  • Memory Management: Optimized Perl module memory usage for long-running processes

๐Ÿ”’ Security Enhancements

  • Configuration Security: Separated sensitive configuration from shared cluster config
  • Database Security: Implemented proper user permissions and access controls
  • API Key Management: Secure storage and rotation procedures for OpenAI API keys
  • Audit Trail: Complete logging of all system changes and data access

๐Ÿ› Known Technical Issues

  • Large Dataset Performance: Initial data collection on large clusters may require tuning
  • Migration Detection Edge Cases: Rare scenarios with identical drives may need manual verification
  • PostgreSQL Version Compatibility: Requires PostgreSQL 13+ for JSONB and advanced indexing features
  • Perl Module Dependencies: Some CPAN modules may require system-level library installation

๐Ÿ”ฎ Technical Roadmap

  • Phase 2: Real-time streaming data collection with Apache Kafka
  • Phase 3: Machine learning model training on historical data
  • Phase 4: Integration with Proxmox VE API for automated responses
  • Phase 5: Multi-tenant architecture for managed service providers

๐Ÿ’ป Development Environment Notes

  • Test Database: Currently using 192.168.2.102 for development and testing
  • Perl Version: Developed and tested on Perl 5.32+
  • PostgreSQL Extensions: Requires uuid-ossp and btree_gin extensions
  • Development Workflow: Feature branch development with PR reviews required

๐Ÿ”ง Technical Reference for Developers

Database Schema Reference

Perl Module Architecture

  • SmartCollector.pm: Data collection and hardware tracking
    • Hardware manufacturer detection
    • Migration detection and logging
    • Differential storage integration
    • Storage efficiency monitoring
  • SmartAnalyzer.pm: AI-powered analysis and predictions
  • SmartReporter.pm: Report generation and alerting
  • Module documentation: Inline POD documentation in each module

Configuration Management

  • Cluster config: ../config/cluster.conf (shared across all nodes)
  • Node config: ../config/defaults/autosmart (node-specific settings)
  • OpenAI config: ../config/openai.conf (API configuration)
  • Configuration documentation: INSTALLATION.md

Scripts and Development Tools

  • Collection: ../scripts/collect-smart-data.pl
  • Analysis: ../scripts/analyze-smart-data.pl
  • Reporting: ../scripts/generate-reports.pl
  • Testing: ../scripts/test-differential-storage.pl
  • Deployment: ../scripts/deploy.sh, ../scripts/deploy-production.sh

Development Scenarios

Scenario 1: Adding New SMART Parameters

Files to modify: 1. lib/SmartCollector.pm - Add parameter collection logic 2. sql/schema.sql - Update parameter definitions if needed 3. scripts/test-differential-storage.pl - Add parameter tests 4. docs/DIFFERENTIAL_STORAGE.md - Document parameter behavior

Scenario 2: Implementing New AI Prediction Models

Files to modify: 1. lib/SmartAnalyzer.pm - Add new prediction algorithms 2. docs/API.md - Update API integration patterns 3. scripts/analyze-smart-data.pl - Add model selection logic 4. sql/schema.sql - Add prediction result tables if needed

Scenario 3: Performance Optimization

Areas to investigate: 1. docs/DIFFERENTIAL_STORAGE.md - Storage optimization techniques 2. sql/schema.sql - Index optimization 3. lib/SmartCollector.pm - Collection efficiency 4. PostgreSQL query performance using EXPLAIN ANALYZE

Scenario 4: Adding New Hardware Support

Files to modify: 1. lib/SmartCollector.pm - Hardware detection logic 2. docs/MIGRATION_DETECTION.md - Hardware tracking specifications 3. scripts/test-differential-storage.pl - Hardware-specific tests 4. Configuration templates for new hardware types

Code Quality Guidelines

Perl Coding Standards

# Use strict and warnings
use strict;
use warnings;

# Consistent indentation (4 spaces)
sub example_function {
    my ($self, $param) = @_;
    
    # Clear variable names
    my $smart_data = $self->collect_smart_data($param);
    
    # Error handling
    return unless defined $smart_data;
    
    return $smart_data;
}

Database Development Patterns

-- Use transactions for data consistency
BEGIN;
    -- Multiple related operations
    INSERT INTO hdd_inventory (...) VALUES (...);
    INSERT INTO smart_readings (...) VALUES (...);
COMMIT;

-- Use proper indexing
CREATE INDEX CONCURRENTLY idx_smart_readings_timestamp 
ON smart_readings(timestamp DESC, serial_number);

-- Use parameterized queries to prevent SQL injection
my $sth = $dbh->prepare("SELECT * FROM smart_readings WHERE serial_number = ?");
$sth->execute($serial_number);

This development guide provides the foundation for extending and maintaining the autoSMART system. Follow these guidelines to ensure code quality, performance, and reliability.