# autoSMART v1.0 - Intelligent HDD Monitoring & Failure Prediction

autoSMART este un sistem inteligent de monitorizare SMART pentru HDD-urile din cluster-ul Proxmox, cu predicții de defectare bazate pe AI și stocare optimizată în PostgreSQL.

## 🎯 **Scopul Proiectului**

- **Monitorizare continuă** a parametrilor SMART pentru toate HDD-urile din cluster
- **Predicții AI** pentru defectări iminente folosind OpenAI API
- **Stocare long-term** în PostgreSQL pentru analize temporale
- **Alerting proactiv** pentru mentenanță preventivă

## Key Features

- **🔍 Hardware-based HDD tracking**: Permanent identification using serial numbers and model names (not volatile /dev/sdX paths)
- **🔄 Migration detection**: Automatic detection and logging when HDDs move between nodes or device paths
- **💾 Differential storage optimization**: Store only SMART readings with changes, reducing database size by 60-80%
- **🤖 AI-powered failure prediction**: Uses OpenAI GPT for intelligent drive failure forecasting
- **🏥 Health monitoring**: Continuous SMART parameter analysis with configurable thresholds
- **📊 Comprehensive reporting**: Detailed drive health reports and predictive analytics
- **🔧 Proxmox cluster integration**: Designed for distributed Proxmox VE environments
- **⚡ High performance**: PostgreSQL backend with optimized indexing and queries

## 🚀 Quick Start

### Prerequisites
- **PostgreSQL 13+** for data storage
- **Perl 5.20+** with required modules
- **Proxmox VE** cluster environment
- **smartmontools** for SMART data collection
- **OpenAI API key** for failure predictions

### Installation
```bash
# 1. Download autoSMART and run automated deployment
git clone <repository-url>
cd autoSMART
sudo ./scripts/deploy.sh install

# The deployment script automatically:
# - Installs all dependencies (Perl modules, smartmontools, etc.)
# - Creates system directories and sets permissions
# - Deploys application files to /opt/autoSMART/
# - Creates configuration files in /etc/autosmart/
# - Registers and starts systemd services
# - Performs initial system validation

# 2. Configure database connection (interactive prompts during install)
# 3. Configure OpenAI API key (interactive prompts during install)
# 4. System is ready - services are automatically started
```

### Verification
```bash
# Check system status (all services should be active)
sudo systemctl status autosmart

# View recent SMART data collection
sudo journalctl -u autosmart-collector -f

# Generate initial health report
sudo /opt/autoSMART/scripts/autosmart-report.pl --summary
```

## 📚 Documentation

### Getting Started
- **[CHANGELOG.md](CHANGELOG.md)** - Version history and release notes

### System Configuration
- **[API.md](API.md)** - OpenAI API integration and configuration

## 🏥 Monitoring Dashboard

autoSMART provides comprehensive monitoring capabilities:

### Health Status Overview
- Real-time drive health status for all cluster nodes
- Critical parameter alerts and warnings
- AI-powered failure predictions with confidence scores
- Storage efficiency metrics

### Historical Analysis
- Long-term SMART parameter trends
- Performance degradation tracking
- Migration history between nodes
- Predictive analytics reports

### Alerting System
- Configurable thresholds for all SMART parameters
- Email/webhook notifications
- Integration with monitoring systems
- Escalation procedures for critical alerts

## 🔧 System Architecture

autoSMART operates as a distributed system across your Proxmox cluster:

### Data Collection
- Continuous SMART data collection from all nodes
- Hardware-based drive identification
- Migration detection and logging
- Differential storage for efficiency

### Analysis Engine
- AI-powered failure prediction
- Threshold-based alerting
- Trend analysis and reporting
- Performance optimization recommendations

### Storage Layer
- PostgreSQL database with optimized schema
- Differential storage reducing size by 60-80%
- Historical data retention policies
- Automated backup and maintenance

## 📁 Installed File Structure

When autoSMART is installed on your system, it creates the following directory structure:

### System Directories

```
/opt/autoSMART/                    # Main installation directory
├── scripts/                      # Executable scripts and utilities
│   ├── autosmart-collector.pl    # Main data collection daemon
│   ├── autosmart-predictor.pl    # AI prediction processing
│   ├── autosmart-report.pl       # Report generation engine
│   ├── autosmart-migration-report.pl # Hardware migration analysis
│   ├── smart-collector-daemon.pl # Background collection service
│   ├── uninstall.sh             # System removal script
│   ├── monitor-cluster.sh        # Cluster health monitoring
│   └── test-*.pl                # Testing and validation utilities
├── lib/                         # Perl modules and core libraries
│   ├── SmartCollector.pm        # SMART data collection and hardware tracking
│   └── PredictionEngine.pm      # AI-powered failure prediction engine
├── config/                      # Configuration templates and examples
│   └── (template files)        # Default configuration templates
├── docs/                        # End-user documentation
│   ├── README.md               # System overview and quick start
│   ├── CHANGELOG.md            # Release notes and version history
│   └── API.md                  # OpenAI API configuration guide

/etc/autosmart/                   # System configuration directory
├── autosmart.conf              # Main system configuration
├── cluster.conf                # Cluster topology and node definitions
├── database.conf               # PostgreSQL connection settings
├── openai.conf                 # OpenAI API configuration and prompts
└── smart.conf                  # SMART parameter thresholds and monitoring rules

/etc/systemd/system/             # Systemd service files
├── autosmart.service           # Main autoSMART service
├── autosmart-collector.service # Data collection service
└── autosmart-predictor.service # AI prediction service
```

### Configuration Files Detail

#### `/etc/autosmart/autosmart.conf`
Main system configuration file containing:
- Database connection parameters
- Collection intervals and scheduling
- Local node identification and settings
- Log levels and debugging options

#### `/etc/autosmart/cluster.conf`
Cluster-wide configuration shared across all nodes:
- Node topology and IP addresses
- Shared monitoring parameters
- Cluster-wide alert settings
- Inter-node communication settings

#### `/etc/autosmart/database.conf`
PostgreSQL database connection settings:
- Database host, port, and credentials
- Connection pooling configuration
- SSL settings and security parameters
- Performance tuning options

#### `/etc/autosmart/openai.conf`
OpenAI API integration configuration:
- API key and model selection
- Prompt templates for failure prediction
- Response parsing and confidence thresholds
- Rate limiting and cost management

#### `/etc/autosmart/smart.conf`
SMART parameter monitoring configuration:
- Parameter thresholds for different drive types
- Critical parameter definitions
- Alert escalation rules and notifications
- Drive-specific monitoring settings

### Service Integration

#### Systemd Services
- **`autosmart.service`**: Main system service that manages other components
- **`autosmart-collector.service`**: Background data collection service
- **`autosmart-predictor.service`**: AI prediction processing service

#### Service Management
```bash
# Start/stop services
sudo systemctl start autosmart
sudo systemctl stop autosmart

# Enable/disable automatic startup
sudo systemctl enable autosmart
sudo systemctl disable autosmart

# Check service status
sudo systemctl status autosmart

# View service logs using systemd journal
sudo journalctl -u autosmart -f                    # Follow main service logs
sudo journalctl -u autosmart-collector -f          # Follow data collection logs  
sudo journalctl -u autosmart-predictor -f          # Follow AI prediction logs

# View logs by time period
sudo journalctl -u autosmart --since "1 hour ago"  # Last hour
sudo journalctl -u autosmart --since today         # Today's logs
sudo journalctl -u autosmart --since yesterday     # Yesterday's logs

# View logs by priority level
sudo journalctl -u autosmart -p err                # Error level and above
sudo journalctl -u autosmart -p warning            # Warning level and above
```

### File Permissions

#### Executable Files
- All scripts in `/opt/autoSMART/scripts/` are executable (755)
- Perl modules in `/opt/autoSMART/lib/` are readable (644)
- Configuration files in `/etc/autosmart/` are readable by autosmart user (640)

#### Log Management
- All application logs are handled by systemd journal
- No separate log files created in filesystem
- Log retention managed by journald configuration
- Logs accessible via `journalctl` commands
- Automatic log rotation and cleanup by systemd

### Storage Requirements

#### Disk Space
- **Installation**: ~50MB for application files and documentation
- **Configuration**: ~1MB for all configuration files
- **Logs**: Managed by systemd journal (configurable retention)
- **Database**: Handled separately on PostgreSQL server

#### Network Requirements
- **Database Access**: Persistent connection to PostgreSQL server
- **OpenAI API**: HTTPS access for AI predictions (configurable)
- **Inter-node Communication**: SSH access between cluster nodes for deployment

This file structure provides a complete, organized installation that integrates seamlessly with Linux system conventions while maintaining clear separation between application code, configuration, and operational data.

## 📊 Performance Benefits

### Storage Optimization
- **60-80% reduction** in database storage through differential storage
- **Intelligent change detection** stores only modified SMART parameters
- **Baseline reconstruction** provides complete historical views
- **Configurable retention** policies for long-term storage

### Monitoring Efficiency
- **Hardware-based tracking** eliminates /dev/sdX path volatility
- **Migration detection** automatically tracks drive movements
- **Real-time analysis** with configurable collection intervals
- **Distributed architecture** scales across cluster nodes

## 🚨 Alert Examples

### Critical Alerts
- **Imminent Failure**: AI predicts drive failure within 24-48 hours
- **Temperature Critical**: Drive operating above safe temperature thresholds
- **Reallocated Sectors**: Increasing bad sector count detected
- **Spin Retry Count**: Mechanical issues detected

### Warning Alerts
- **Performance Degradation**: Slower response times detected
- **Temperature Warning**: Operating temperatures approaching limits
- **SMART Threshold**: Parameters approaching warning thresholds
- **Migration Detected**: Drive moved to different node or path

## 💡 Use Cases

### Preventive Maintenance
- Schedule drive replacements before failures occur
- Optimize workload distribution based on drive health
- Plan cluster maintenance windows effectively
- Track warranty and replacement schedules

### Capacity Planning
- Monitor storage growth trends
- Predict future storage requirements
- Optimize drive allocation across nodes
- Plan cluster expansion timing

### Performance Optimization
- Identify performance bottlenecks
- Balance load across healthy drives
- Optimize I/O patterns based on drive characteristics
- Monitor storage tier performance

## 🆘 Support & Troubleshooting

### Common Issues
- **Collection failures**: Check smartmontools installation
- **Database connectivity**: Verify PostgreSQL connection settings
- **API errors**: Validate OpenAI API key and quotas
- **Performance issues**: Review differential storage configuration

### Log Analysis
Use systemd journal for comprehensive log analysis:
- **All service logs**: `sudo journalctl -u autosmart*`
- **Data collection**: `sudo journalctl -u autosmart-collector`
- **AI predictions**: `sudo journalctl -u autosmart-predictor`
- **System errors**: `sudo journalctl -u autosmart* -p err`

### Getting Help
For detailed installation, configuration, and troubleshooting information, refer to the complete documentation in the `docs/` directory.

---

**autoSMART v1.0** - Intelligent drive monitoring for mission-critical infrastructure
