|
Bogdan Timofte
authored
3 months ago
|
1
|
# autoSMART Differential Storage System
|
|
|
2
|
|
|
|
3
|
## Overview
|
|
|
4
|
|
|
|
5
|
The autoSMART v1.0 system now implements **differential storage optimization** to significantly reduce database storage requirements while maintaining full data integrity and analysis capabilities.
|
|
|
6
|
|
|
|
7
|
## How It Works
|
|
|
8
|
|
|
|
9
|
### Storage Strategy
|
|
|
10
|
|
|
|
11
|
Instead of storing complete SMART readings for every collection cycle, the system intelligently stores only:
|
|
|
12
|
|
|
|
13
|
1. **Baseline readings** - First reading for each HDD
|
|
|
14
|
2. **Full readings** - When critical parameters change or forced intervals are reached
|
|
|
15
|
3. **Differential readings** - When only non-critical parameters change (stores only the changes)
|
|
|
16
|
4. **Skipped readings** - When no changes are detected (no storage)
|
|
|
17
|
|
|
|
18
|
### Change Detection
|
|
|
19
|
|
|
|
20
|
The system uses multiple methods to detect changes:
|
|
|
21
|
|
|
|
22
|
- **Checksum comparison** - SHA256 hash of all parameters + temperature
|
|
|
23
|
- **Parameter-level analysis** - Individual SMART parameter change detection
|
|
|
24
|
- **Critical parameter monitoring** - Immediate storage for health-critical changes
|
|
|
25
|
- **Temperature thresholds** - Configurable temperature change sensitivity
|
|
|
26
|
- **Time-based forcing** - Periodic full readings regardless of changes (default: 24 hours)
|
|
|
27
|
|
|
|
28
|
## Database Schema Changes
|
|
|
29
|
|
|
|
30
|
### Enhanced smart_readings Table
|
|
|
31
|
|
|
|
32
|
```sql
|
|
|
33
|
ALTER TABLE smart_readings ADD COLUMN reading_type VARCHAR(20) DEFAULT 'full';
|
|
|
34
|
ALTER TABLE smart_readings ADD COLUMN changes_detected BOOLEAN DEFAULT true;
|
|
|
35
|
ALTER TABLE smart_readings ADD COLUMN changed_parameters JSONB;
|
|
|
36
|
ALTER TABLE smart_readings ADD COLUMN previous_reading_id INTEGER REFERENCES smart_readings(id);
|
|
|
37
|
ALTER TABLE smart_readings ADD COLUMN checksum VARCHAR(64);
|
|
|
38
|
```
|
|
|
39
|
|
|
|
40
|
### New PostgreSQL Function
|
|
|
41
|
|
|
|
42
|
The `should_store_smart_reading()` function provides intelligent storage decisions:
|
|
|
43
|
|
|
|
44
|
```sql
|
|
|
45
|
SELECT should_store_smart_reading(hdd_id, parameters_json, checksum, current_timestamp);
|
|
|
46
|
```
|
|
|
47
|
|
|
|
48
|
Returns:
|
|
|
49
|
- `should_store` - Boolean indicating if reading should be stored
|
|
|
50
|
- `reading_type` - 'baseline', 'full', or 'differential'
|
|
|
51
|
- `changes_detected` - Boolean indicating if changes were found
|
|
|
52
|
- `changed_parameters` - JSON array of changed parameter names
|
|
|
53
|
- `previous_reading_id` - Reference to previous reading for chaining
|
|
|
54
|
|
|
|
55
|
### Reconstructed Data View
|
|
|
56
|
|
|
|
57
|
The `smart_readings_reconstructed` view uses recursive SQL to rebuild complete SMART data from differential readings:
|
|
|
58
|
|
|
|
59
|
```sql
|
|
|
60
|
SELECT * FROM smart_readings_reconstructed WHERE hdd_id = 123;
|
|
|
61
|
```
|
|
|
62
|
|
|
|
63
|
## Configuration Parameters
|
|
|
64
|
|
|
|
65
|
Add to `system_config` table:
|
|
|
66
|
|
|
|
67
|
```sql
|
|
|
68
|
INSERT INTO system_config (key, value, description) VALUES
|
|
|
69
|
('differential_storage_enabled', 'true', 'Enable differential storage optimization'),
|
|
|
70
|
('forced_storage_interval_hours', '24', 'Hours between forced full readings'),
|
|
|
71
|
('critical_parameter_force_store', 'true', 'Force storage for critical parameter changes'),
|
|
|
72
|
('temperature_change_threshold', '5', 'Temperature change threshold for storage (Celsius)');
|
|
|
73
|
```
|
|
|
74
|
|
|
|
75
|
## Updated Perl Modules
|
|
|
76
|
|
|
|
77
|
### SmartCollector.pm Changes
|
|
|
78
|
|
|
|
79
|
1. **New methods**:
|
|
|
80
|
- `_should_store_reading()` - Check storage requirements
|
|
|
81
|
- `_insert_smart_reading_differential()` - Store with differential info
|
|
|
82
|
- `_get_recent_storage_stats()` - Monitor storage efficiency
|
|
|
83
|
|
|
|
84
|
2. **Enhanced collection**:
|
|
|
85
|
- Automatic change detection
|
|
|
86
|
- Storage type determination
|
|
|
87
|
- Efficiency reporting
|
|
|
88
|
|
|
|
89
|
3. **Storage optimization**:
|
|
|
90
|
- Only changed parameters stored for differential readings
|
|
|
91
|
- Checksum validation
|
|
|
92
|
- Chain reference tracking
|
|
|
93
|
|
|
|
94
|
## Benefits
|
|
|
95
|
|
|
|
96
|
### Storage Reduction
|
|
|
97
|
|
|
|
98
|
Expected storage reduction of **60-80%** for typical HDD environments:
|
|
|
99
|
|
|
|
100
|
- **Baseline readings**: ~1% of all readings
|
|
|
101
|
- **Full readings**: ~15-20% of readings (critical changes + forced intervals)
|
|
|
102
|
- **Differential readings**: ~5-15% of readings (minor changes)
|
|
|
103
|
- **Skipped readings**: ~60-75% of readings (no changes)
|
|
|
104
|
|
|
|
105
|
### Performance Impact
|
|
|
106
|
|
|
|
107
|
- **Minimal collection overhead**: Single database function call for decision
|
|
|
108
|
- **Fast reconstruction**: Recursive SQL with indexes
|
|
|
109
|
- **Efficient queries**: Reconstructed view handles complexity
|
|
|
110
|
|
|
|
111
|
### Data Integrity
|
|
|
112
|
|
|
|
113
|
- **Complete reconstruction**: All historical data accessible
|
|
|
114
|
- **Change tracking**: Full audit trail of parameter changes
|
|
|
115
|
- **Critical monitoring**: No loss of important health indicators
|
|
|
116
|
|
|
|
117
|
## Usage Examples
|
|
|
118
|
|
|
|
119
|
### Collection with Statistics
|
|
|
120
|
|
|
|
121
|
```perl
|
|
|
122
|
use SmartCollector;
|
|
|
123
|
|
|
|
124
|
my $collector = SmartCollector->new($config);
|
|
|
125
|
my $result = $collector->collect_all();
|
|
|
126
|
|
|
|
127
|
print "Storage efficiency: " . $result->{storage_stats}->{efficiency_percent} . "%\n";
|
|
|
128
|
print "Differential readings: " . $result->{storage_stats}->{differential} . "\n";
|
|
|
129
|
```
|
|
|
130
|
|
|
|
131
|
### Testing the System
|
|
|
132
|
|
|
|
133
|
Run the comprehensive test suite:
|
|
|
134
|
|
|
|
135
|
```bash
|
|
|
136
|
cd /etc/pve/autoSMART
|
|
|
137
|
./scripts/test-differential-storage.pl
|
|
|
138
|
```
|
|
|
139
|
|
|
|
140
|
This will:
|
|
|
141
|
1. Create test HDD entries
|
|
|
142
|
2. Test storage decisions for various change scenarios
|
|
|
143
|
3. Validate data reconstruction
|
|
|
144
|
4. Show storage efficiency statistics
|
|
|
145
|
|
|
|
146
|
## Migration from Legacy Data
|
|
|
147
|
|
|
|
148
|
Existing installations can migrate seamlessly:
|
|
|
149
|
|
|
|
150
|
1. **Schema updates**: Run the enhanced schema SQL
|
|
|
151
|
2. **Existing data**: Marked as 'full' readings automatically
|
|
|
152
|
3. **No data loss**: All existing readings preserved
|
|
|
153
|
4. **Gradual optimization**: New readings use differential storage immediately
|
|
|
154
|
|
|
|
155
|
## Monitoring and Maintenance
|
|
|
156
|
|
|
|
157
|
### Storage Statistics Query
|
|
|
158
|
|
|
|
159
|
```sql
|
|
|
160
|
SELECT
|
|
|
161
|
reading_type,
|
|
|
162
|
COUNT(*) as count,
|
|
|
163
|
COUNT(*) * 100.0 / SUM(COUNT(*)) OVER() as percentage
|
|
|
164
|
FROM smart_readings
|
|
|
165
|
WHERE timestamp > NOW() - INTERVAL '7 days'
|
|
|
166
|
GROUP BY reading_type;
|
|
|
167
|
```
|
|
|
168
|
|
|
|
169
|
### Reconstruction Performance
|
|
|
170
|
|
|
|
171
|
```sql
|
|
|
172
|
EXPLAIN ANALYZE
|
|
|
173
|
SELECT * FROM smart_readings_reconstructed
|
|
|
174
|
WHERE hdd_id = 123 AND timestamp > NOW() - INTERVAL '30 days';
|
|
|
175
|
```
|
|
|
176
|
|
|
|
177
|
### Space Savings Report
|
|
|
178
|
|
|
|
179
|
```sql
|
|
|
180
|
SELECT
|
|
|
181
|
COUNT(*) as total_possible_readings,
|
|
|
182
|
COUNT(*) FILTER (WHERE reading_type != 'skipped') as stored_readings,
|
|
|
183
|
(COUNT(*) FILTER (WHERE reading_type != 'skipped') * 100.0 / COUNT(*)) as storage_percentage,
|
|
|
184
|
(100 - (COUNT(*) FILTER (WHERE reading_type != 'skipped') * 100.0 / COUNT(*))) as savings_percentage
|
|
|
185
|
FROM smart_readings
|
|
|
186
|
WHERE timestamp > NOW() - INTERVAL '30 days';
|
|
|
187
|
```
|
|
|
188
|
|
|
|
189
|
## Critical Parameters List
|
|
|
190
|
|
|
|
191
|
Default parameters that trigger immediate full storage:
|
|
|
192
|
- Reallocated_Sector_Ct
|
|
|
193
|
- Current_Pending_Sector
|
|
|
194
|
- Offline_Uncorrectable
|
|
|
195
|
- Reallocated_Event_Count
|
|
|
196
|
- Spin_Retry_Count
|
|
|
197
|
|
|
|
198
|
Configure in `smart_thresholds` table with `weight >= 8.0`.
|
|
|
199
|
|
|
|
200
|
## Conclusion
|
|
|
201
|
|
|
|
202
|
The differential storage system provides significant storage optimization while maintaining complete data integrity and analytical capabilities. The system automatically adapts to HDD behavior patterns, storing more data when drives show issues and reducing storage when drives are stable.
|
|
|
203
|
|
|
|
204
|
This optimization is particularly beneficial for large-scale deployments like the Madagascar cluster, where hundreds of HDDs generate continuous SMART data over years of operation.
|