f16725e 3 months ago History
1 contributor
467 lines | 16.261kb

autoSMART Database Documentation

Overview

autoSMART uses PostgreSQL as its primary database for storing SMART data, HDD tracking information, predictions, and system configuration. The database is designed for multi-node cluster deployments with comprehensive HDD mobility tracking.

Database Schema

Core Tables

hdd_inventory

The central inventory table that tracks all HDDs across the cluster.

CREATE TABLE hdd_inventory (
    id                  SERIAL PRIMARY KEY,
    serial_number       VARCHAR(100) NOT NULL,
    model_name          VARCHAR(200) NOT NULL,
    firmware            VARCHAR(50),
    size_gb             INTEGER,
    manufacturer        VARCHAR(100),
    current_device_path VARCHAR(50),
    current_node_id     VARCHAR(50),
    current_slot        VARCHAR(20),
    madagascar_id       VARCHAR(100),
    first_seen          TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    last_seen           TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    status              VARCHAR(20) DEFAULT 'active',
    status_changed_at   TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    notes               TEXT,
    created_at          TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at          TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    
    CONSTRAINT unique_hardware_id UNIQUE (serial_number, model_name)
);

Key Features: - Hardware-based identification: Uses serial_number + model_name as unique constraint - Current location tracking: current_device_path, current_node_id show where HDD is now - Lifecycle management: first_seen, last_seen, status track HDD lifecycle - Madagascar integration: madagascar_id field for cluster-specific identification

hdd_presence

Tracks HDD mobility across cluster nodes - records when HDDs are present on different nodes.

CREATE TABLE hdd_presence (
    id SERIAL PRIMARY KEY,
    serial_number VARCHAR(64) NOT NULL,
    node VARCHAR(64) NOT NULL,
    data_start TIMESTAMP NOT NULL,
    data_end TIMESTAMP NOT NULL,
    is_current BOOLEAN NOT NULL DEFAULT TRUE
);

Key Features: - Mobility tracking: Records when HDDs move between nodes - Time-based records: data_start/data_end define presence periods - Current vs Historic: is_current flag marks active presence - Independent of inventory: Works independently of hdd_inventory for pure mobility data

Example Data: sql id | serial_number | node | data_start | data_end | is_current ----+----------------+-----------+----------------------------+----------------------------+------------ 4 | ZW60K01R | ebony | 2025-08-16 22:05:15.863971 | 2025-08-16 22:05:15.863971 | t 3 | S2HSNXRH402205 | ebony | 2025-08-16 22:05:15.109956 | 2025-08-16 22:05:15.109956 | t 2 | ZW60K01R | baobab | 2025-08-16 21:47:13.873642 | 2025-08-16 22:03:31.052316 | f 1 | S2HSNXRH402205 | tapia | 2025-08-16 21:47:13.078524 | 2025-08-16 22:03:30.268985 | f

smart_readings

Stores SMART data readings with differential storage optimization.

CREATE TABLE smart_readings (
    id                   BIGSERIAL PRIMARY KEY,
    hdd_id               INTEGER REFERENCES hdd_inventory(id),
    serial_number        VARCHAR(100) NOT NULL,
    device_path          VARCHAR(50),
    node_id              VARCHAR(50),
    timestamp            TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    collection_ok        BOOLEAN DEFAULT true,
    temperature          INTEGER,
    parameters_json      JSONB,
    reading_type         VARCHAR(20) DEFAULT 'full',
    changes_detected     BOOLEAN DEFAULT true,
    changed_parameters   JSONB,
    previous_reading_id  INTEGER REFERENCES smart_readings(id),
    checksum             VARCHAR(64)
);

Reading Types: - baseline: First reading for an HDD - full: Complete parameter set (forced by time interval) - differential: Only changed parameters (optimization) - skipped: No changes detected

Key Features: - Differential storage: Only stores changes to reduce data volume - Full context: Links to hdd_inventory and includes node information - Change tracking: previous_reading_id creates reading chains - JSONB parameters: Flexible storage for SMART attributes

predictions

AI-generated failure predictions and analysis.

CREATE TABLE predictions (
    id                    SERIAL PRIMARY KEY,
    hdd_id                INTEGER REFERENCES hdd_inventory(id),
    serial_number         VARCHAR(100) NOT NULL,
    device_path           VARCHAR(50),
    timestamp             TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    risk_level            VARCHAR(20),
    failure_probability   DECIMAL(5,4),
    predicted_failure_date DATE,
    confidence_score      DECIMAL(5,4),
    analysis_summary      TEXT,
    recommendations       JSONB,
    openai_response       JSONB,
    created_at            TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

alert_history

Tracks all alerts sent about HDD issues.

CREATE TABLE alert_history (
    id              SERIAL PRIMARY KEY,
    hdd_id          INTEGER REFERENCES hdd_inventory(id),
    serial_number   VARCHAR(100) NOT NULL,
    alert_type      VARCHAR(50),
    severity        VARCHAR(20),
    message         TEXT,
    sent_at         TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    sent_to         TEXT,
    delivery_status VARCHAR(20) DEFAULT 'pending',
    related_reading_id BIGINT REFERENCES smart_readings(id),
    related_prediction_id INTEGER REFERENCES predictions(id)
);

Configuration Tables

system_config

Global system configuration parameters.

CREATE TABLE system_config (
    id          SERIAL PRIMARY KEY,
    config_key  VARCHAR(100) UNIQUE NOT NULL,
    value       TEXT,
    description TEXT,
    created_at  TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at  TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Default Configuration: - collection_interval_seconds: SMART data collection frequency - differential_storage_enabled: Enable/disable storage optimization - forced_storage_interval_hours: Force full readings periodically - critical_parameter_force_store: Always store critical changes - temperature_change_threshold: Temperature delta for storage

smart_thresholds

SMART parameter warning and critical thresholds.

CREATE TABLE smart_thresholds (
    id                SERIAL PRIMARY KEY,
    parameter_name    VARCHAR(100) NOT NULL,
    warning_threshold NUMERIC,
    critical_threshold NUMERIC,
    weight            NUMERIC DEFAULT 1.0,
    enabled           BOOLEAN DEFAULT true,
    description       TEXT,
    created_at        TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at        TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Views

smart_readings_reconstructed

Reconstructs complete SMART data from differential storage.

CREATE VIEW smart_readings_reconstructed AS
WITH RECURSIVE reading_chain AS (
    -- Base case: get baseline readings
    SELECT id, hdd_id, serial_number, timestamp, 
           parameters_json, temperature, reading_type,
           previous_reading_id, 1 as chain_level
    FROM smart_readings 
    WHERE reading_type IN ('baseline', 'full')
    
    UNION ALL
    
    -- Recursive case: follow the chain of differential readings
    SELECT sr.id, sr.hdd_id, sr.serial_number, sr.timestamp,
           COALESCE(rc.parameters_json, '{}'::jsonb) || sr.parameters_json as parameters_json,
           COALESCE(sr.temperature, rc.temperature) as temperature,
           sr.reading_type, sr.previous_reading_id,
           rc.chain_level + 1
    FROM smart_readings sr
    JOIN reading_chain rc ON sr.previous_reading_id = rc.id
    WHERE sr.reading_type = 'differential'
)
SELECT id, hdd_id, serial_number, timestamp,
       parameters_json, temperature, reading_type, chain_level
FROM reading_chain;

latest_smart_readings

Current SMART status for all active drives.

CREATE VIEW latest_smart_readings AS
SELECT DISTINCT ON (sr.hdd_id)
    sr.id, sr.hdd_id, sr.serial_number, sr.timestamp,
    sr.parameters_json, sr.temperature,
    hi.model_name, hi.manufacturer, hi.size_gb,
    hi.current_device_path, hi.current_node_id
FROM smart_readings_reconstructed sr
JOIN hdd_inventory hi ON sr.hdd_id = hi.id
ORDER BY sr.hdd_id, sr.timestamp DESC;

drive_health_summary

Comprehensive health overview for all drives.

CREATE VIEW drive_health_summary AS
SELECT 
    hi.id as hdd_id, hi.serial_number, hi.model_name,
    hi.manufacturer, hi.current_device_path, hi.current_node_id,
    hi.status, lsr.timestamp as last_reading, lsr.temperature,
    p.risk_level, p.failure_probability, p.predicted_failure_date,
    EXTRACT(EPOCH FROM (NOW() - lsr.timestamp))/3600 as hours_since_last_reading
FROM hdd_inventory hi
LEFT JOIN latest_smart_readings lsr ON hi.id = lsr.hdd_id
LEFT JOIN LATERAL (
    SELECT risk_level, failure_probability, predicted_failure_date
    FROM predictions 
    WHERE hdd_id = hi.id 
    ORDER BY timestamp DESC 
    LIMIT 1
) p ON true
WHERE hi.status = 'active';

Functions

update_hdd_presence()

Manages HDD presence tracking when a drive is detected on a node.

CREATE OR REPLACE FUNCTION update_hdd_presence(
    p_serial_number VARCHAR(64),
    p_node VARCHAR(64)
) RETURNS VOID AS $$
BEGIN
    -- Mark all previous presence records for this serial as historic
    UPDATE hdd_presence 
    SET is_current = FALSE 
    WHERE serial_number = p_serial_number AND is_current = TRUE AND node <> p_node;
    
    -- Check if there's already a current presence for this serial/node
    IF EXISTS (SELECT 1 FROM hdd_presence WHERE serial_number = p_serial_number AND node = p_node AND is_current = TRUE) THEN
        -- Update data_end for existing current presence
        UPDATE hdd_presence 
        SET data_end = NOW() 
        WHERE serial_number = p_serial_number AND node = p_node AND is_current = TRUE;
    ELSE
        -- Create new presence record
        INSERT INTO hdd_presence (serial_number, node, data_start, data_end, is_current)
        VALUES (p_serial_number, p_node, NOW(), NOW(), TRUE);
    END IF;
END;
$$ LANGUAGE plpgsql;

should_store_smart_reading()

Determines if a SMART reading should be stored based on differential storage logic.

CREATE OR REPLACE FUNCTION should_store_smart_reading(
    p_hdd_id INTEGER,
    p_parameters_json JSONB,
    p_checksum VARCHAR(64),
    p_timestamp TIMESTAMP WITH TIME ZONE DEFAULT NOW()
) RETURNS TABLE(
    should_store BOOLEAN,
    reading_type VARCHAR(20),
    changes_detected BOOLEAN,
    changed_parameters JSONB,
    previous_reading_id INTEGER
) AS $$
-- Function implementation handles:
-- - Differential storage enabled/disabled
-- - Checksum-based change detection
-- - Force intervals for full readings
-- - Reading type determination
$$;

Indexes

Performance Indexes

-- hdd_inventory indexes
CREATE INDEX idx_hdd_inventory_device_path ON hdd_inventory(current_device_path);
CREATE INDEX idx_hdd_inventory_node ON hdd_inventory(current_node_id);
CREATE INDEX idx_hdd_inventory_status ON hdd_inventory(status);
CREATE INDEX idx_hdd_inventory_last_seen ON hdd_inventory(last_seen);

-- hdd_presence indexes
CREATE INDEX idx_hdd_presence_serial_current ON hdd_presence(serial_number, is_current);
CREATE INDEX idx_hdd_presence_node ON hdd_presence(node);
CREATE INDEX idx_hdd_presence_data_end ON hdd_presence(data_end DESC);

-- smart_readings indexes
CREATE INDEX idx_smart_readings_hdd_id ON smart_readings(hdd_id);
CREATE INDEX idx_smart_readings_timestamp ON smart_readings(timestamp DESC);
CREATE INDEX idx_smart_readings_serial ON smart_readings(serial_number);
CREATE INDEX idx_smart_readings_device_path ON smart_readings(device_path);
CREATE INDEX idx_smart_readings_type ON smart_readings(reading_type);
CREATE INDEX idx_smart_readings_checksum ON smart_readings(checksum);
CREATE INDEX idx_smart_readings_previous ON smart_readings(previous_reading_id);

-- JSONB indexes for flexible queries
CREATE INDEX idx_smart_readings_parameters ON smart_readings USING GIN (parameters_json);
CREATE INDEX idx_smart_readings_changed_params ON smart_readings USING GIN (changed_parameters);

Data Flow

Collection Process

  1. Device Discovery: Collector scans /dev/sd* and /dev/nvme* devices
  2. SMART Reading: Uses smartctl to extract device information and parameters
  3. HDD Registration: get_or_create_hdd() adds new devices to hdd_inventory
  4. Presence Tracking: update_hdd_presence() records current node location
  5. Data Storage: Stores SMART readings with differential optimization
  6. Change Detection: Uses checksums to detect parameter changes

Mobility Tracking

  1. HDD Detected: When HDD is found on a new node
  2. Historic Records: Previous presence records marked is_current = FALSE
  3. New Presence: New record created with is_current = TRUE
  4. Timeline: Complete history maintained with data_start/data_end timestamps

Query Examples

Find HDD History

SELECT serial_number, node, data_start, data_end, is_current
FROM hdd_presence 
WHERE serial_number = 'ZW60K01R' 
ORDER BY data_start DESC;

Current HDD Locations

SELECT h.serial_number, h.model_name, p.node, h.current_device_path
FROM hdd_inventory h
JOIN hdd_presence p ON h.serial_number = p.serial_number
WHERE p.is_current = TRUE;

SMART Parameter Trends

SELECT timestamp, 
       parameters_json->>'Power_On_Hours' as power_hours,
       parameters_json->>'Temperature_Celsius' as temp,
       temperature
FROM smart_readings_reconstructed 
WHERE serial_number = 'ZW60K01R' 
ORDER BY timestamp DESC 
LIMIT 10;

Health Summary

SELECT * FROM drive_health_summary 
WHERE current_node_id = 'ebony';

Troubleshooting

Common Issues

1. Node ID Mismatch

Problem: HDD presence shows wrong node name Cause: Deploy script used local hostname instead of target node name Solution: Deploy script now correctly determines target node name from cluster.json

2. Empty hdd_presence Table

Problem: No mobility tracking data Causes: - SMART parameter parsing regex incompatible with new smartctl format - Missing database sequence permissions - Incomplete smart_readings INSERT statements

Solutions: - Updated regex to support both old and new smartctl formats - Added sequence permissions: GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA public TO autosmart; - Fixed INSERT to include all required fields

3. Differential Storage Issues

Problem: Too much or too little data stored Configuration: Adjust in system_config table: sql UPDATE system_config SET value = 'false' WHERE config_key = 'differential_storage_enabled'; UPDATE system_config SET value = '12' WHERE config_key = 'forced_storage_interval_hours';

Permissions

Database User Setup

-- Create autosmart user
CREATE USER autosmart WITH PASSWORD 'autoSMART2025!';

-- Grant permissions
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO autosmart;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO autosmart;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA public TO autosmart;

Sequence Permissions

-- Required for INSERT operations with SERIAL columns
GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA public TO autosmart;

Maintenance

Regular Tasks

  1. Monitor disk usage: SMART readings table grows over time
  2. Archive old data: Consider archiving readings older than 1 year
  3. Index maintenance: REINDEX periodically for performance
  4. Backup: Regular PostgreSQL backups recommended

Performance Monitoring

-- Table sizes
SELECT schemaname, tablename, 
       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables 
WHERE schemaname = 'public' 
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

-- Recent activity
SELECT COUNT(*) as readings_today 
FROM smart_readings 
WHERE timestamp > CURRENT_DATE;

SELECT COUNT(*) as active_drives 
FROM hdd_inventory 
WHERE status = 'active';