autoSMART Development Guide

📚 Developer Documentation Index

This document serves as the complete guide for developers working on autoSMART. It includes development environment setup, architecture documentation, testing procedures, and developer-specific changelog.

Quick Navigation

Codebase Structure
Development Environment Setup
Architecture Overview
Database Development
Module Development
Testing Strategies
Deployment Procedures
Developer Changelog
Technical Reference

📁 Codebase Structure

autoSMART follows a modular architecture with clear separation of concerns. Below is the complete directory structure and file descriptions:

Project Root

autoSMART/
├── README.md                    # Symlink to docs/README.md (end-user documentation)
├── .deployignore               # Files excluded from production deployment
├── config/                     # Configuration files and templates
├── docs/                       # Documentation (mixed deployment)
├── lib/                        # Perl modules and core libraries
├── scripts/                    # Executable scripts and utilities
└── sql/                        # Database schema and SQL files

📁 `/config/` - Configuration Management

Configuration files are organized by scope and environment:

config/
├── cluster.conf                # Cluster-wide settings (shared across nodes)
├── cluster-ebony.conf         # Node-specific configuration for ebony
├── database.conf              # PostgreSQL connection settings
├── openai.conf               # OpenAI API configuration and prompts
├── smart.conf                # SMART parameter thresholds and monitoring rules
├── default                   # Default/template configuration
└── debug-ebony.sh           # Development debugging script for ebony node

Configuration File Details

cluster.conf (88 lines):
- Cluster topology and node definitions
- Node hostnames, IP addresses, and roles
- Shared monitoring parameters across cluster
- Global system settings and defaults
- Inter-node communication configuration
database.conf (30 lines):
- PostgreSQL connection parameters (host, port, database, credentials)
- Connection pooling settings and timeouts
- Database-specific optimizations and tuning parameters
- SSL configuration and security settings
openai.conf (50 lines):
- OpenAI API key and model configuration
- Prompt templates for failure prediction analysis
- Response parsing rules and confidence thresholds
- Rate limiting and cost management settings
- Fallback configurations for API failures
smart.conf (57 lines):
- SMART parameter monitoring thresholds for different drive types
- Critical parameter definitions and escalation rules
- Alert generation rules and notification preferences
- Parameter collection intervals and scheduling
- Drive type specific monitoring configurations
default (107 lines):
- Default/template configuration for new node deployments
- Standard parameter values and system defaults
- Configuration validation rules and constraints
- Example configurations with detailed comments
cluster-ebony.conf (13 lines):
- Node-specific configuration overrides for ebony node
- Local network settings and hardware-specific parameters
- Custom thresholds for specific hardware configurations
debug-ebony.sh (29 lines):
- Development debugging utilities for ebony node
- Test data generation and validation scripts
- Development environment setup and configuration
- Debugging tools and diagnostic utilities

📁 `/lib/` - Core Perl Modules

Core business logic implemented as reusable Perl modules:

lib/
├── SmartCollector.pm          # SMART data collection and hardware tracking
└── PredictionEngine.pm        # AI-powered failure prediction engine

Module Architecture

SmartCollector.pm (802 lines):
- Hardware Identification: Device detection using serial numbers and model names
- SMART Data Collection: Integration with smartmontools for comprehensive parameter collection
- Migration Detection: Algorithms to detect when drives move between nodes or device paths
- Differential Storage: Intelligent storage system that only saves changed parameters
- Database Layer: PostgreSQL integration with connection pooling and error handling
- Storage Efficiency: Real-time monitoring of storage optimization effectiveness
- Configuration Management: Dynamic configuration loading and validation
- Error Handling: Comprehensive error handling with detailed logging
PredictionEngine.pm (607 lines):
- OpenAI Integration: Direct API communication with GPT models
- Prompt Engineering: Sophisticated prompt templates for failure prediction
- Response Processing: Parsing and validation of AI-generated predictions
- Confidence Scoring: Statistical analysis of prediction reliability
- Timeline Estimation: Failure time prediction with confidence intervals
- Cost Optimization: API usage optimization and request batching
- Error Recovery: Robust error handling for API failures and rate limits

📁 `/scripts/` - Executable Components

Production scripts and development utilities:

scripts/
├── autosmart-collector.pl      # Main data collection daemon
├── autosmart-predictor.pl      # AI prediction processing
├── autosmart-report.pl         # Report generation engine
├── autosmart-migration-report.pl # Hardware migration analysis
├── smart-collector-daemon.pl   # Background collection service
├── deploy.sh                   # Unified deployment script
├── deploy-production.sh        # Production cluster deployment
├── install.sh                  # Symlink to deploy.sh for compatibility
├── uninstall.sh               # Complete system removal
├── monitor-cluster.sh          # Cluster health monitoring
├── test-smart-collection.pl    # SMART collection testing
├── test-differential-storage.pl # Storage optimization testing
├── test-db-connection.pl       # Database connectivity testing
└── simple-smart-test.pl        # Basic SMART functionality test

Script Categories

Production Scripts

autosmart-collector.pl (348 lines):
- Main collection daemon that runs on each node
- Scheduled SMART data collection and processing
- Hardware detection and migration tracking
- Integration with SmartCollector.pm module
- Command-line options for daemon mode, single-run, and debugging
autosmart-predictor.pl (483 lines):
- Processes collected data for AI predictions
- Batch processing of pending SMART readings
- Integration with PredictionEngine.pm for OpenAI communication
- Prediction result storage and confidence tracking
autosmart-report.pl (662 lines):
- Generates comprehensive health reports and alerts
- Configurable report formats (summary, detailed, trend analysis)
- Email notification system for critical alerts
- Historical data analysis and trend detection
smart-collector-daemon.pl (252 lines):
- Background service wrapper for collector
- Process management and restart capabilities
- Log rotation and system integration
- Service status monitoring and health checks

Deployment Scripts

deploy.sh (697 lines):
- Unified deployment for single node or cluster
- Supports install, uninstall, and cluster deployment modes
- Automatic dependency checking and installation
- Configuration template deployment and customization
- System service registration and startup
deploy-production.sh (116 lines):
- Production-specific deployment procedures
- Multi-node cluster deployment automation
- Production safety checks and validation
- Rollback capabilities for failed deployments
uninstall.sh (187 lines):
- Complete system cleanup and removal
- Service stopping and deregistration
- File and directory cleanup
- Database cleanup options (configurable)
monitor-cluster.sh (515 lines):
- Ongoing cluster health monitoring
- Node status verification and reporting
- Service health checks across all cluster nodes
- Automated restart capabilities for failed services

Development & Testing Scripts

test-smart-collection.pl (132 lines):
- Validates SMART data collection functionality
- Tests hardware detection and identification
- Verifies database connectivity and data storage
- Performance benchmarking for collection operations
test-differential-storage.pl (270 lines):
- Comprehensive testing of storage optimization
- Validates differential storage algorithms
- Tests change detection and storage efficiency
- Performance analysis and optimization verification
test-db-connection.pl (55 lines):
- Database connectivity verification
- Connection pooling and timeout testing
- SQL execution validation
- Database performance testing
simple-smart-test.pl (144 lines):
- Basic functionality testing
- Quick validation of core components
- Integration testing for development
- Smoke testing for deployment validation

Analysis Scripts

autosmart-migration-report.pl (615 lines):
- Hardware migration tracking and analysis
- Migration pattern detection and reporting
- Historical migration data analysis
- Migration-related issue identification and troubleshooting

📁 `/sql/` - Database Schema

PostgreSQL database definitions and utilities:

sql/
├── schema.sql                  # Complete production database schema
└── schema-fixed.sql           # Schema with specific fixes/patches

Database Schema Components

Core Tables:
- hdd_inventory: Hardware identification and location tracking
- smart_readings: SMART parameter data with differential storage
- hdd_migrations: Drive movement logging between nodes/paths
AI Integration:
- predictions: AI-generated failure predictions with confidence scores
- alert_history: Alert notification tracking and escalation
Configuration:
- smart_thresholds: Configurable parameter thresholds and alert rules
- system_config: System-wide configuration parameters
Optimization:
- Differential storage functions (should_store_smart_reading())
- Reconstructed views (smart_readings_reconstructed)
- Change detection algorithms with SHA256 checksums
Indexing:
- Performance-optimized indexes for temporal queries
- Hardware identification indexes for fast lookups
- Composite indexes for complex query patterns

Schema Files Details

schema.sql (726 lines):
- Complete production database schema
- Full table definitions with constraints and indexes
- PostgreSQL functions for differential storage
- Views for data reconstruction and reporting
- Trigger definitions for automated processes
schema-fixed.sql (423 lines):
- Schema patches and specific fixes
- Migration scripts for schema updates
- Performance optimization adjustments
- Compatibility fixes for different PostgreSQL versions

📁 `/docs/` - Documentation

Documentation organized by audience and deployment status:

docs/
├── README.md                   # End-user guide (DEPLOYED)
├── INSTALLATION.md             # Setup and configuration (DEPLOYED)
├── CHANGELOG.md               # Release notes for end-users (DEPLOYED)
├── API.md                     # OpenAI API configuration (DEPLOYED)
├── DEVELOPMENT.md             # Developer guide (NOT DEPLOYED)
└── DIFFERENTIAL_STORAGE.md    # Technical storage details (NOT DEPLOYED)

Documentation Deployment Strategy

Deployed docs: End-user facing documentation
Non-deployed docs: Developer and technical implementation details

🔧 Key File Relationships

Data Flow Architecture

smartmontools → SmartCollector.pm → PostgreSQL → PredictionEngine.pm → OpenAI API
     ↓               ↓                    ↓              ↓
autosmart-collector.pl → Database → autosmart-predictor.pl → Reports

Configuration Hierarchy

cluster.conf (global) → node-specific.conf → smart.conf → openai.conf
                                ↓
                        Individual script configurations

Module Dependencies

autosmart-collector.pl
├── SmartCollector.pm
├── database.conf
├── smart.conf
└── cluster.conf

autosmart-predictor.pl
├── PredictionEngine.pm
├── SmartCollector.pm (for data access)
├── openai.conf
└── database.conf

📊 Codebase Metrics

File Type Distribution

Perl Scripts: 8 production scripts + 4 test scripts (12 total)
Perl Modules: 2 core modules (1,409 total lines)
Shell Scripts: 5 deployment/management scripts (1,645 total lines)
SQL Files: 2 schema files (1,149 total lines)
Configuration: 7 configuration files (374 total lines)
Documentation: 5 documentation files

Code Complexity by Lines of Code

SmartCollector.pm: 802 lines (High complexity - hardware integration, differential storage)
PredictionEngine.pm: 607 lines (Medium complexity - API integration, data processing)
Database Schema: 726 lines (High complexity - advanced PostgreSQL features)
Deploy Scripts: 697 lines each (Medium complexity - system integration)
Report Generation: 662 lines (Medium complexity - data analysis and formatting)
Migration Analysis: 615 lines (Medium complexity - pattern detection)
Cluster Monitoring: 515 lines (Medium complexity - distributed system monitoring)

Total Codebase Size

Production Code: ~4,500 lines (Perl modules + production scripts)
Deployment & Management: ~1,800 lines (deployment and monitoring scripts)
Testing Code: ~600 lines (test scripts and utilities)
Database Schema: ~1,150 lines (PostgreSQL schema and functions)
Configuration: ~375 lines (configuration templates and examples)
Total: ~8,400+ lines of code

Testing Coverage Areas

Unit Tests: Module-specific functionality testing
Integration Tests: End-to-end data flow validation
Performance Tests: Storage efficiency and query optimization benchmarks
Deployment Tests: Installation and configuration validation across environments
Regression Tests: Automated testing for core functionality preservation

🏗️ Development Workflow

Getting Started with Development

Clone Repository: Set up local development environment
Database Setup: Configure PostgreSQL connection to development database
Perl Dependencies: Install required CPAN modules
Configuration: Copy and customize configuration templates
Testing: Run test suite to verify setup

Adding New Features

Module Development: Extend existing Perl modules or create new ones
Script Integration: Create or modify scripts to use new functionality
Database Changes: Update schema if new data structures are needed
Testing: Add comprehensive tests for new functionality
Documentation: Update both end-user and developer documentation

Code Organization Principles

Separation of Concerns: Each module and script has a specific, well-defined responsibility
Configuration-Driven: System behavior is controlled through configuration files rather than hard-coded values
Database-Centric: PostgreSQL serves as the central data store with business logic in database functions
Modular Design: Components can be developed, tested, and deployed independently
Error Handling: Comprehensive error handling and logging throughout all components
Performance-First: Optimized for high-volume data collection and processing
Scalability: Designed to scale across multiple nodes in a cluster environment

Development Patterns Used

Factory Pattern: Configuration-based object creation in Perl modules
Observer Pattern: Event-driven processing for hardware changes and alerts
Strategy Pattern: Configurable algorithms for different drive types and thresholds
Template Method: Standardized data processing pipelines with customizable steps
Singleton Pattern: Database connection management and configuration loading
Command Pattern: Script-based operations with standardized interfaces

Code Quality Standards

Perl Best Practices: Strict warnings, proper scoping, and defensive programming
Database Normalization: Proper relational design with referential integrity
Configuration Validation: Input validation and sanitization throughout
Error Recovery: Graceful degradation and automatic recovery mechanisms
Performance Monitoring: Built-in performance metrics and optimization tracking
Security Practices: SQL injection prevention, input validation, and secure configuration management

🏗️ Development Environment Setup

Prerequisites

System Requirements

Operating System: Linux/macOS (tested on macOS, deployed on Proxmox VE)
Perl: Version 5.20+ with CPAN access
PostgreSQL: Version 13+ with JSONB and extension support
Git: For version control and collaboration

Development Database

# Current test database configuration
Host: 192.168.2.102
Database: autosmart  
User: postgres
Password: (no password)
Port: 5432

Required Perl Modules

# Core database modules
cpan install DBI DBD::Pg

# JSON processing
cpan install JSON::XS

# System utilities  
cpan install Config::Simple File::Slurp Time::HiRes

# Security and hashing
cpan install Digest::SHA

# HTTP/API clients (for OpenAI integration)
cpan install LWP::UserAgent HTTP::Request::Common

# Optional: Development and testing
cpan install Data::Dumper Test::More Test::Exception

Development Workflow

1. Environment Setup

# Clone the project
cd /Users/bogdan/Documents/workspace/
git clone <autoSMART-repo>
cd autoSMART

# Set environment variables
export AUTOSMART_DB_HOST=192.168.2.102
export AUTOSMART_DB_NAME=autosmart
export AUTOSMART_DB_USER=postgres
export AUTOSMART_DB_PASS=
export AUTOSMART_DB_PORT=5432

# Optional: OpenAI API key for AI features
export OPENAI_API_KEY=your-api-key-here

2. Database Setup

# Initialize the database schema
psql -h 192.168.2.102 -U postgres -d autosmart -f sql/schema.sql

# Verify installation
psql -h 192.168.2.102 -U postgres -d autosmart -c "\\dt"

3. Testing Environment

# Run the differential storage test suite
cd scripts/
perl test-differential-storage.pl

# Test database connectivity
perl -e "
use DBI;
my \$dsn = 'DBI:Pg:dbname=autosmart;host=192.168.2.102;port=5432';
my \$dbh = DBI->connect(\$dsn, 'postgres', '', {RaiseError => 1});
print \"Database connection successful!\\n\";
\$dbh->disconnect();
"

🧩 Architecture Overview

System Components

autoSMART Architecture
┌─────────────────────────────────────────────────────────────┐
│                    Proxmox Cluster                          │
├─────────────────────┬─────────────────────┬─────────────────┤
│      Node 1         │       Node 2        │      Node 3     │
│                     │                     │                 │
│ ┌─── SmartCollector ┤ ┌─── SmartCollector ┤ ┌─── SmartCollector
│ │   - HDD Scanning  │ │   - HDD Scanning  │ │   - HDD Scanning
│ │   - SMART Reading │ │   - SMART Reading │ │   - SMART Reading  
│ │   - Migration Det │ │   - Migration Det │ │   - Migration Det
│ └─── Data Storage   │ └─── Data Storage   │ └─── Data Storage
└─────────────────────┴─────────────────────┴─────────────────┘
                               │
                      ┌────────▼─────────┐
                      │   PostgreSQL DB   │
                      │                  │
                      │ • HDD Inventory  │
                      │ • SMART Readings │
                      │ • Migrations     │
                      │ • AI Predictions │
                      └────────┬─────────┘
                               │
                    ┌──────────▼───────────┐
                    │    SmartAnalyzer     │
                    │                      │
                    │ • OpenAI API         │
                    │ • Failure Prediction │
                    │ • Pattern Analysis   │
                    └──────────┬───────────┘
                               │
                    ┌──────────▼───────────┐
                    │    SmartReporter     │
                    │                      │
                    │ • Alert Generation   │
                    │ • Report Creation    │
                    │ • Dashboard Data     │
                    └──────────────────────┘

Data Flow

Collection Phase:
- SmartCollector scans HDDs on each node
- Hardware identification (serial + model)
- Migration detection if HDD moved
- Differential storage decision
- Store only changed/critical data
Analysis Phase:
- SmartAnalyzer processes stored data
- Historical pattern analysis
- OpenAI API calls for predictions
- Risk assessment and trending
Reporting Phase:
- SmartReporter generates alerts
- Dashboard data preparation
- Health reports creation
- Maintenance recommendations

🔧 Module Development

SmartCollector.pm Development

Key Methods to Understand

# Hardware identification and migration detection
sub _detect_or_create_hdd($drive_info, $smart_data)

# Differential storage decision making
sub _should_store_reading($hdd_id, $smart_data)

# Optimized data storage
sub _insert_smart_reading_differential($hdd_id, $drive_info, $smart_data, $storage_info)

Adding New Features

New SMART Parameters: ```perl

Add parameter processing in collect_smart_data()

if ($line =~ /New_Parameter.*\s+(\d+)/) { $smart_data->{parameters}{'New_Parameter'} = $1; } ```
Custom Manufacturer Detection: ```perl

Extend _detect_manufacturer() method

sub _detect_manufacturer { my ($self, $model) = @_; return 'Custom_Manufacturer' if $model =~ /CUSTOM_PATTERN/; # ... existing logic } ```

SmartAnalyzer.pm Development

AI Integration Patterns

# OpenAI API call structure
sub _call_openai_api {
    my ($self, $prompt, $smart_data) = @_;
    
    my $request = HTTP::Request->new(POST => 'https://api.openai.com/v1/chat/completions');
    $request->header('Authorization' => "Bearer $self->{openai_api_key}");
    $request->header('Content-Type' => 'application/json');
    
    my $payload = {
        model => "gpt-4",
        messages => [
            {
                role => "system", 
                content => "You are an expert in HDD failure prediction..."
            },
            {
                role => "user",
                content => $prompt
            }
        ]
    };
    
    # ... handle response
}

🗃️ Database Development

Schema Evolution

Adding New Tables

-- Always include migration scripts
CREATE TABLE new_feature (
    id SERIAL PRIMARY KEY,
    hdd_id INTEGER REFERENCES hdd_inventory(id),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

-- Add indexes for performance
CREATE INDEX idx_new_feature_hdd_id ON new_feature(hdd_id);

Modifying Existing Tables

-- Use ALTER statements for compatibility
ALTER TABLE smart_readings ADD COLUMN new_field VARCHAR(100);
CREATE INDEX CONCURRENTLY idx_smart_readings_new_field ON smart_readings(new_field);

Query Optimization

Efficient SMART Data Queries

-- Use the reconstructed view for complete data
SELECT * FROM smart_readings_reconstructed 
WHERE hdd_id = $1 
  AND timestamp > NOW() - INTERVAL '30 days'
ORDER BY timestamp DESC;

-- Use raw table for storage statistics
SELECT reading_type, COUNT(*) 
FROM smart_readings 
WHERE timestamp > NOW() - INTERVAL '7 days'
GROUP BY reading_type;

🧪 Testing Guidelines

Unit Testing

# Example test structure
use Test::More tests => 5;
use lib '../lib';
use SmartCollector;

my $collector = SmartCollector->new({
    db_host => '192.168.2.102',
    db_name => 'autosmart_test',
    # ... test config
});

# Test hardware identification
my $hdd_id = $collector->_detect_or_create_hdd($drive_info, $smart_data);
ok($hdd_id > 0, "HDD identification successful");

# Test differential storage
my $storage_decision = $collector->_should_store_reading($hdd_id, $smart_data);
ok($storage_decision->{store}, "Storage decision made");

Integration Testing

# Run the comprehensive test suite
cd scripts/
perl test-differential-storage.pl

# Test with real hardware (if available)
perl collect-smart-data.pl --test-mode --device /dev/sdb

Performance Testing

-- Test query performance
EXPLAIN ANALYZE 
SELECT * FROM smart_readings_reconstructed 
WHERE hdd_id IN (1,2,3,4,5) 
  AND timestamp > NOW() - INTERVAL '90 days';

-- Monitor storage efficiency
SELECT 
    reading_type,
    COUNT(*) as readings,
    AVG(length(parameters_json::text)) as avg_size_bytes
FROM smart_readings 
WHERE timestamp > NOW() - INTERVAL '24 hours'
GROUP BY reading_type;

🔍 Debugging and Troubleshooting

Logging System

# Enable debug logging
$ENV{AUTOSMART_DEBUG} = 3;  # Maximum verbosity

# Log levels:
# 1 = Errors only
# 2 = Warnings and errors  
# 3 = Info, warnings, errors
# 4 = Debug everything

Common Issues

Database Connection Problems

# Test database connectivity
psql -h 192.168.2.102 -U postgres -d autosmart -c "SELECT version();"

# Check permissions
psql -h 192.168.2.102 -U postgres -d autosmart -c "\\dp smart_readings"

SMART Data Collection Issues

# Test smartctl access
sudo smartctl -a /dev/sda

# Check permissions
ls -la /dev/sd*

Migration Detection Problems

-- Check migration logs
SELECT * FROM hdd_migrations 
ORDER BY detected_at DESC 
LIMIT 10;

-- Verify HDD inventory
SELECT serial_number, model_name, current_device_path, current_node_id 
FROM hdd_inventory 
WHERE status = 'active';

📊 Performance Monitoring

Database Performance

-- Monitor table sizes
SELECT schemaname, tablename, 
       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables 
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

-- Monitor query performance
SELECT query, mean_time, calls 
FROM pg_stat_statements 
WHERE query LIKE '%smart_readings%'
ORDER BY mean_time DESC;

Application Performance

# Add timing to critical operations
use Time::HiRes qw(time);

my $start_time = time();
my $result = $self->collect_smart_data($device_path);
my $duration = time() - $start_time;

$self->_log("SMART collection took ${duration}s for $device_path", 3);

🚀 Deployment Guidelines

Production Deployment

Database Setup:
- Use dedicated PostgreSQL server
- Configure proper backup strategy
- Set up monitoring and alerting
Security Configuration:
- Use dedicated database users with minimal privileges
- Secure API keys and configuration files
- Enable SSL connections for database
Performance Tuning:
- Configure PostgreSQL for time-series workload
- Set up proper indexing strategy
- Monitor and optimize slow queries

Proxmox Integration

# Install on cluster nodes
for node in pve01 pve02 pve03; do
    scp -r autoSMART/ root@$node:/etc/pve/
done

# Configure systemd services
systemctl enable autosmart-collector
systemctl start autosmart-collector

📚 Additional Resources

Useful Commands

# Monitor system in real-time
watch -n 30 'psql -h 192.168.2.102 -U postgres -d autosmart -c "SELECT COUNT(*) FROM smart_readings WHERE timestamp > NOW() - INTERVAL '\''1 hour'\''"'

# Generate performance report
psql -h 192.168.2.102 -U postgres -d autosmart -f sql/performance-report.sql

Development Tools

pgAdmin: Database administration and query development
Perl::Critic: Code quality analysis
Perl::Tidy: Code formatting
Git: Version control with feature branches

📝 Developer Changelog

This section contains detailed technical changes, internal API modifications, and development-specific information that is not relevant for end-users.

[1.0.0] - 2025-08-15 - Development Details

🏗️ Architecture Changes

Database Schema Evolution: Complete redesign from simple SMART storage to differential storage architecture
Hardware Tracking Implementation: Added hdd_inventory and hdd_migrations tables for hardware-based identification
Differential Storage Engine: Implemented should_store_smart_reading() PostgreSQL function with configurable change detection
Migration Detection Algorithm: Created automatic hardware migration detection using serial numbers and model matching

🔧 Internal API Changes

SmartCollector.pm Refactor:
- Added hardware identification methods (identify_hardware(), detect_migration())
- Implemented differential storage integration (should_store_reading())
- Added storage efficiency monitoring
- Breaking change: Constructor now requires database handle
Database Functions:
- Added should_store_smart_reading(jsonb, text, text, interval, text[]) function
- Added smart_readings_reconstructed view for seamless data access
- Added migration tracking triggers
Configuration Schema:
- Split configuration into cluster-wide (cluster.conf) and node-specific (autosmart.conf)
- Added differential storage parameters (force_storage_interval, critical_parameters)

🧪 Testing Infrastructure

Differential Storage Test Suite: Added comprehensive test coverage in test-differential-storage.pl
Migration Detection Tests: Validated hardware tracking across different scenarios
Performance Benchmarks: Established baseline performance metrics for storage efficiency
Database Integration Tests: Added tests for PostgreSQL function behavior

📊 Performance Optimizations

Storage Efficiency: Achieved 60-80% database size reduction through differential storage
Query Optimization: Added proper indexing for hardware tracking and temporal queries
Background Processing: Implemented non-blocking collection and analysis workflows
Memory Management: Optimized Perl module memory usage for long-running processes

🔒 Security Enhancements

Configuration Security: Separated sensitive configuration from shared cluster config
Database Security: Implemented proper user permissions and access controls
API Key Management: Secure storage and rotation procedures for OpenAI API keys
Audit Trail: Complete logging of all system changes and data access

🐛 Known Technical Issues

Large Dataset Performance: Initial data collection on large clusters may require tuning
Migration Detection Edge Cases: Rare scenarios with identical drives may need manual verification
PostgreSQL Version Compatibility: Requires PostgreSQL 13+ for JSONB and advanced indexing features
Perl Module Dependencies: Some CPAN modules may require system-level library installation

🔮 Technical Roadmap

Phase 2: Real-time streaming data collection with Apache Kafka
Phase 3: Machine learning model training on historical data
Phase 4: Integration with Proxmox VE API for automated responses
Phase 5: Multi-tenant architecture for managed service providers

💻 Development Environment Notes

Test Database: Currently using 192.168.2.102 for development and testing
Perl Version: Developed and tested on Perl 5.32+
PostgreSQL Extensions: Requires uuid-ossp and btree_gin extensions
Development Workflow: Feature branch development with PR reviews required

🔧 Technical Reference for Developers

Database Schema Reference

Primary location: ../sql/schema.sql
Documentation: DIFFERENTIAL_STORAGE.md, MIGRATION_DETECTION.md
Sample queries: ../sql/sample-queries.sql
Migration scripts: ../sql/migrations/

Perl Module Architecture

SmartCollector.pm: Data collection and hardware tracking
- Hardware manufacturer detection
- Migration detection and logging
- Differential storage integration
- Storage efficiency monitoring
SmartAnalyzer.pm: AI-powered analysis and predictions
SmartReporter.pm: Report generation and alerting
Module documentation: Inline POD documentation in each module

Configuration Management

Cluster config: ../config/cluster.conf (shared across all nodes)
Node config: ../config/defaults/autosmart (node-specific settings)
OpenAI config: ../config/openai.conf (API configuration)
Configuration documentation: INSTALLATION.md

Scripts and Development Tools

Collection: ../scripts/collect-smart-data.pl
Analysis: ../scripts/analyze-smart-data.pl
Reporting: ../scripts/generate-reports.pl
Testing: ../scripts/test-differential-storage.pl
Deployment: ../scripts/deploy.sh, ../scripts/deploy-production.sh

Development Scenarios

Scenario 1: Adding New SMART Parameters

Files to modify: 1. lib/SmartCollector.pm - Add parameter collection logic 2. sql/schema.sql - Update parameter definitions if needed 3. scripts/test-differential-storage.pl - Add parameter tests 4. docs/DIFFERENTIAL_STORAGE.md - Document parameter behavior

Scenario 2: Implementing New AI Prediction Models

Files to modify: 1. lib/SmartAnalyzer.pm - Add new prediction algorithms 2. docs/API.md - Update API integration patterns 3. scripts/analyze-smart-data.pl - Add model selection logic 4. sql/schema.sql - Add prediction result tables if needed

Scenario 3: Performance Optimization

Areas to investigate: 1. docs/DIFFERENTIAL_STORAGE.md - Storage optimization techniques 2. sql/schema.sql - Index optimization 3. lib/SmartCollector.pm - Collection efficiency 4. PostgreSQL query performance using EXPLAIN ANALYZE

Scenario 4: Adding New Hardware Support

Files to modify: 1. lib/SmartCollector.pm - Hardware detection logic 2. docs/MIGRATION_DETECTION.md - Hardware tracking specifications 3. scripts/test-differential-storage.pl - Hardware-specific tests 4. Configuration templates for new hardware types

Code Quality Guidelines

Perl Coding Standards

# Use strict and warnings
use strict;
use warnings;

# Consistent indentation (4 spaces)
sub example_function {
    my ($self, $param) = @_;
    
    # Clear variable names
    my $smart_data = $self->collect_smart_data($param);
    
    # Error handling
    return unless defined $smart_data;
    
    return $smart_data;
}

Database Development Patterns

-- Use transactions for data consistency
BEGIN;
    -- Multiple related operations
    INSERT INTO hdd_inventory (...) VALUES (...);
    INSERT INTO smart_readings (...) VALUES (...);
COMMIT;

-- Use proper indexing
CREATE INDEX CONCURRENTLY idx_smart_readings_timestamp 
ON smart_readings(timestamp DESC, serial_number);

-- Use parameterized queries to prevent SQL injection
my $sth = $dbh->prepare("SELECT * FROM smart_readings WHERE serial_number = ?");
$sth->execute($serial_number);

This development guide provides the foundation for extending and maintaining the autoSMART system. Follow these guidelines to ensure code quality, performance, and reliability.

autoSMART Development Guide

📚 Developer Documentation Index

Quick Navigation

📁 Codebase Structure

Project Root

📁 /config/ - Configuration Management

Configuration File Details

📁 /lib/ - Core Perl Modules

Module Architecture

📁 /scripts/ - Executable Components

Script Categories

Production Scripts

Deployment Scripts

Development & Testing Scripts

Analysis Scripts

📁 /sql/ - Database Schema

Database Schema Components

Schema Files Details

📁 /docs/ - Documentation

Documentation Deployment Strategy

🔧 Key File Relationships

Data Flow Architecture

Configuration Hierarchy

Module Dependencies

📊 Codebase Metrics

File Type Distribution

Code Complexity by Lines of Code

Total Codebase Size

Testing Coverage Areas

🏗️ Development Workflow

Getting Started with Development

Adding New Features

Code Organization Principles

Development Patterns Used

Code Quality Standards

🏗️ Development Environment Setup

Prerequisites

System Requirements

Development Database

Required Perl Modules

Development Workflow

1. Environment Setup

2. Database Setup

3. Testing Environment

🧩 Architecture Overview

System Components

Data Flow

🔧 Module Development

SmartCollector.pm Development

Key Methods to Understand

Adding New Features

Add parameter processing in collect_smart_data()

Extend _detect_manufacturer() method

SmartAnalyzer.pm Development

AI Integration Patterns

🗃️ Database Development

Schema Evolution

Adding New Tables

Modifying Existing Tables

Query Optimization

Efficient SMART Data Queries

🧪 Testing Guidelines

Unit Testing

Integration Testing

Performance Testing

🔍 Debugging and Troubleshooting

Logging System

Common Issues

Database Connection Problems

SMART Data Collection Issues

Migration Detection Problems

📊 Performance Monitoring

Database Performance

Application Performance

🚀 Deployment Guidelines

Production Deployment

Proxmox Integration

📚 Additional Resources

Useful Commands

Development Tools

📁 `/config/` - Configuration Management

📁 `/lib/` - Core Perl Modules

📁 `/scripts/` - Executable Components

📁 `/sql/` - Database Schema

📁 `/docs/` - Documentation