1 contributor
# Development Guide
## 1. Objectives
The Media Importer project aims to provide a robust, efficient solution for organizing media files by date with proper timezone handling and conflict resolution.
Key objectives:
- Reliable EXIF/metadata extraction and date parsing
- Proper UTC time conversion for QuickTime/Apple media files
- Flexible organization patterns (year/month/day/hour)
- Safe file operations with dry-run capabilities
- Cross-platform compatibility (macOS/Linux)
## 2. Guide
### Development Workflow
When making changes to the project, follow this structured approach:
### Changelog Entries Format
All changelog entries should follow this format:
```text
- Date time
- Bug/Feature description
- Changes made
```
Example:
```text
- 2025-09-07 10:30
- Fixed data loss issue when processing already-sorted folders
- Added exclusion patterns for sorted/organized/processed folders
- Added --include-sorted flag to override exclusions when needed
```
### File Move Confirmation
Every file move operation should be confirmed:
- After moving a file, the script must check that the file exists at the destination.
- If the file is not present at the destination after the move operation, the script should immediately stop and report an error.
- This ensures data integrity and prevents silent data loss.
Implementation note (2026-05):
- Move flow now uses `copy -> verify -> delete source`.
- Verification is controlled by `--verify-mode` with modes `size` (default), `strict`, and `none`.
- `size` validates destination existence, size match, and metadata/date consistency.
- `strict` adds byte-to-byte validation (`cmp`) on top of `size` checks.
- Source deletion is allowed only after successful verification.
### Destination Inside Source Handling
- Given the script's purpose, the destination folder may be inside the source folder.
- In this case, all files within the destination folder must be excluded from scanning and processing.
- For extra safety: before processing any file, if its path matches (or is inside) the destination path, the script must report an error and stop immediately.
- This prevents accidental re-processing or moving of files that have already been sorted, ensuring data integrity.
### Testing
Testing is essential to ensure the script's reliability and data safety. The following methodology should be used:
#### Test Environment Setup
- The `samples` directory contains a variety of media files for testing.
- Create a dedicated working directory named `test` for each test run.
- Copy selected files from `samples` into the `test` directory to simulate real-world scenarios.
- Perform import operations using the script, targeting the `test` directory as the source and a subdirectory (e.g., `test/sorted`) as the destination.
#### Test Execution and Documentation
- Before and after each import operation, run `find` on both the source and destination directories to capture the file structure:
- Example: `find ./test > test/source_before.txt`
- Example: `find ./test/sorted > test/dest_before.txt`
- Log all results, including script output and directory listings, into a dedicated log file for each test.
#### Test Report Format
Each test must generate a comprehensive Markdown report in `test/test_report.md` with the following structure:
```markdown
# Test Report: [Test Name/Scenario]
## Test Information
- **Date**: $(date)
- **Scenario**: [Brief description of what is being tested]
- **Objective**: [What specific functionality/behavior is being verified]
- **Files Used**: [List of test files and their characteristics]
## Pre-Test State
### Source Directory Structure
\`\`\`
[Contents of source_before.txt]
\`\`\`
### Destination Directory Structure
\`\`\`
[Contents of dest_before.txt]
\`\`\`
## Test Execution
### Command Used
\`\`\`bash
[Exact command executed]
\`\`\`
### Script Output
\`\`\`
[Full script output from import_log.txt]
\`\`\`
## Post-Test State
### Source Directory Structure
\`\`\`
[Contents of source_after.txt]
\`\`\`
### Destination Directory Structure
\`\`\`
[Contents of dest_after.txt]
\`\`\`
## Analysis and Verification
### Expected Results
- [List what should happen]
### Actual Results
- [List what actually happened]
### Issues Found
- [Any problems, errors, or unexpected behavior]
- [Include error messages, incorrect file placements, etc.]
### Protections Verified
- [ ] Destination exclusion working
- [ ] Move confirmation functional
- [ ] No data loss detected
- [ ] UTC conversion correct (for QuickTime files)
- [ ] Unimportable files handling (if applicable)
## Corrective Actions
### Issues Identified
- [Detailed description of problems found]
### Fixes Applied
- [Code changes made]
- [Configuration adjustments]
- [Process improvements]
### Re-test Results
- [Results after applying fixes]
## Conclusion
### Test Result
- [ ] PASSED
- [ ] FAILED
- [ ] PARTIAL (with notes)
### Notes
[Any additional observations, recommendations, or follow-up actions needed]
### Files Generated
- `test/source_before.txt` - Pre-test source structure
- `test/dest_before.txt` - Pre-test destination structure
- `test/source_after.txt` - Post-test source structure
- `test/dest_after.txt` - Post-test destination structure
- `test/import_log.txt` - Full script execution log
- `test/test_report.md` - This report
```
#### Automated Test Runner
A comprehensive test runner script (`test_runner.sh`) is available to automate the testing process:
```bash
./test_runner.sh
```
The script provides:
- **Pre-configured test scenarios** for common use cases
- **Automatic report generation** in Markdown format
- **State capture** before and after test execution
- **Protection verification** with checkboxes
- **Custom test support** for specific scenarios
#### Test Categories
The test runner provides the following pre-configured test scenarios:
1. **Basic Functionality Test**: Tests processing of files with valid EXIF data to verify correct sorting and organization
2. **Unimportable Files Test**: Tests handling of files without EXIF data in both root and subfolders, without --collect-unimportable flag
3. **Mixed Content Test**: Tests processing of sortable and unimportable files in separate folders to verify cleanup behavior
4. **Safety Protections Test**: Tests destination exclusion and move confirmation mechanisms to prevent data loss
5. **UTC Conversion Test**: Tests UTC timestamp conversion for QuickTime/Apple EXIF data
6. **Subdirectory Processing Test**: Tests processing of files in nested subdirectories to ensure recursive file discovery
7. **Source Only Test**: Tests automatic `source/sorted` destination behavior
8. **Destination Inside Source Test**: Verifies destination exclusion and prevents `sorted/sorted` recursion
9. **Verify Mode Test**: Verifies default `size` verification and explicit `strict` mode
10. **Timestamp Collision No-Overwrite Test**: Reproduces GoPro-style chapters with identical `CreateDate` values and verifies unique destination filenames with no overwrite collapse
11. **GoPro Sidecar Metadata Sync Test**: Verifies GoPro MP4 imports use THM start times and automatically write the corrected timestamp into destination metadata
#### Test Result Persistence
The test runner includes automatic result persistence:
- **Archival Location**: Test results are saved as individual Markdown files in `test_reports/` directory
- **Naming Convention**: `{s/f}_{YYYYMMDD_HHMMSS}_{TestName}.md` format where:
- `s` = success, `f` = failure
- Followed by timestamp and test name (spaces converted to underscores)
- **Contents Preserved**: Single self-contained Markdown file with:
- Complete test information and directory structures
- Full script execution output embedded inline (ANSI codes stripped for readability)
- Import log content included directly in the report
- **Excluded Files**: No separate files - everything is consolidated in the Markdown report
- **Historical Tracking**: Maintains complete test history for debugging and regression testing
#### Cleanup
- Review the test report and verify all aspects are documented
- Clean up the `test` directory after each test run to ensure a fresh environment for subsequent tests
- Archive important test reports in a `test_reports/` directory for future reference
## 3. Changelog
### 2026-05-16 00:10 - GoPro Chapter Overwrite Postmortem
- Documented the 2026-05-15 GoPro import data-loss incident in `INCIDENTS.md`.
- Root cause: multiple chaptered GoPro MP4 files shared the same timestamp-derived destination filename; the script warned but continued, allowing each copy to overwrite the previous destination before deleting the current source after verification.
- Clarified that QuickTime UTC-to-local conversion likely explains the visible `19:20` vs `22:20` timestamp difference, but was not the data-loss mechanism.
- Recorded surviving evidence: only the final full-resolution chapter remained in destination; `.THM` and `.LRV` sidecars remained on the card and preserved the chapter timeline.
- Added operational guidance for future GoPro imports: start with `--date-source filesystem --sync-metadata --dry-run -v` and verify that GoPro MP4 dates come from matching `.THM` sidecars.
- Added regression rule: future changes to date extraction, naming, conflict handling, copy/move, or GoPro/Garmin/Varia behavior must run `./test_runner.sh timestamp-collision`.
---
### 2026-05-17 08:35 - Automatic GoPro Metadata Timestamp Correction
- Changed GoPro import behavior so `GX/GH/GP*.MP4` files with matching `.THM` sidecars prefer the THM filesystem timestamp even in default `auto` date mode.
- GoPro imports that use THM/filesystem dates now automatically sync destination metadata to the corrected clip start time; `--sync-metadata` is no longer required for this GoPro path.
- Changed move flow for metadata-corrected imports to copy and verify first, write destination metadata, verify the metadata timestamp, and only then delete the source file.
- Added `GoPro_Sidecar_Metadata_Sync` regression coverage in `test_runner.sh`.
- Verified the real `/Volumes/GOPRO` dry-run maps the current files to `19:03:20`, `19:15:08`, `19:26:56`, and `19:38:44`, all from matching THM sidecars.
---
### 2026-05-17 09:20 - Explicit Destination Conflict Policy
- Added `--unattended`; unattended/non-interactive runs never prompt and resolve existing destinations with numeric suffixes (`_1`, `_2`, ...).
- Interactive runs now ask on destination conflicts and offer suffix once/all, skip once/all, or abort, so long imports do not require repeated decisions.
- Dry-run reserves planned destinations, so collisions within the same run are visible before writing.
- Updated `Timestamp_Collision_No_Overwrite` to assert numeric suffixes and reject the legacy `__GX...` suffix form.
---
### 2026-05-17 12:05 - Long Import UX Feedback
- Added per-file progress lines (`Processing [n/total]`) before long copy/move work starts.
- Replaced integer `files/sec` reporting with decimal throughput that shows `files/min` for large slow imports and keeps `MB/sec` precision.
- Added average time per processed file to the final report.
---
### 2026-05-17 12:20 - GoPro Filesystem Date Fallback
- GoPro auto date extraction now stays on filesystem timestamps even when THM is missing.
- Fallback order is matching `THM`, matching `LRV`, then the MP4 file's own filesystem mtime.
- Automatic metadata sync now applies to all GoPro filesystem date sources, including the MP4 fallback.
- Extended `GoPro_Sidecar_Metadata_Sync` to cover THM, LRV-only, and no-sidecar GoPro imports.
- Added `GoPro_No_Sidecar_Reimport` to verify no-sidecar MP4 fallback plus safe numeric suffixing when the same GoPro file is imported again.
- Excluded macOS AppleDouble `._*` files from source discovery so NAS metadata forks are not treated as importable media.
---
### 2026-05-13 21:10 - Critical Data Loss Incident + Hardening
- Observed incident: running with explicit source on a problematic/near-full card led to writes on source media and source content removal without strong post-write confirmation.
- Impact: major data loss risk on unstable storage media (including read-only/IO edge cases).
- Clarified behavior kept as valid: destination may be inside source (`source/sorted`) and must be excluded from scanning.
- Reworked move safety in `media-importer.sh`: direct `mv` path replaced by verified flow `copy -> verify -> delete source`.
- Added verification modes via `--verify-mode`:
- `size` (default): destination exists + size match + metadata/date validation
- `strict`: adds byte-to-byte content validation (`cmp`)
- `none`: disables verification (explicit opt-out)
- Applied the same verified move logic for unsortable collection path.
- Added regression coverage in `test_runner.sh`:
- Destination-inside-source test (`-s source -d source/sorted`) to verify destination exclusion and no `sorted/sorted` recursion.
- Verify-mode test to confirm default `size` and explicit `strict` behavior.
- Updated ignore/staging hygiene to avoid committing generated test artifacts and `.DS_Store`.
---
### 2025-09-07 21:15 - Test 2 and 3 Enhancements
- Updated Test 2 (Unimportable Files Test) to include files in both root and subfolder
- Removed --collect-unimportable flag from Test 2 to test default behavior
- Updated Test 3 (Mixed Content Test) to use separate folders for sortable vs unimportable files
- Test 3 now verifies that folders with only sortable files are cleaned up while folders with unimportable files are preserved
- Updated menu descriptions to reflect the changes
- Tests now verify proper handling of unimportable files without collection flag
---
### 2025-09-07 21:25 - Documentation Enhancement
- Added comprehensive documentation for --collect-unimportable flag in README.md
- Added Example 4 showing how to use --collect-unimportable flag
- Updated Features section to mention unimportable files handling
- Updated Configuration section to explain default behavior for unimportable files
- Added usage example for --collect-unimportable in Basic Usage section
---
### 2025-09-07 21:30 - Git Ignore Enhancement
- Added test_reports/ to .gitignore to exclude generated test reports from version control
- Test reports are generated files that don't need to be tracked in Git
- Prevents large numbers of timestamped report files from cluttering the repository
- Added sample/ to .gitignore to exclude test media files from version control
---
### 2025-09-07 20:40 - Source Only Test Addition
- Added Test 8: Source Only Test to test runner
- Tests processing with only source parameter (creates sorted subdirectory automatically)
- Verifies that when no destination is specified, files are sorted into source/sorted/
- Updated menu and command line options for new test
---
### 2025-09-07 20:45 - Test 7 Refinement
- Updated Test 7 to test --keep-empty-dirs functionality instead of cleanup
- Since cleanup is now default behavior, Test 7 now verifies empty directory preservation
- Renamed from "Cleanup Empty Directories Test" to "Keep Empty Directories Test"
- Updated test scenario to validate --keep-empty-dirs flag behavior
- Added command line option "keep-empty-dirs" for test 7
---
### 2025-09-07 19:30 - Test Runner Directory Separation
- Adapted test runner to use separate source and destination directories
- Changed from test/ as source to test/source/ and test/destination/
- Updated all test functions to use proper directory separation
- Improved test isolation and clarity
---
### 2025-09-07 19:00 - Default Cleanup Behavior
- Made --cleanup-empty-dirs the default behavior (implicit option)
- Added --keep-empty-dirs flag to disable cleanup if needed
- Updated help text and configuration display to reflect new default
- Cleanup now runs automatically unless explicitly disabled
---
### 2025-09-07 18:56 - Cleanup Empty Directories Feature
- Added --cleanup-empty-dirs option to remove empty directories from source after processing
- Added cleanup_empty_directories() function with safe empty directory detection
- Updated final report to show cleanup status
- Maintains safety by not removing source root directories
- Works correctly with dry-run mode
## 4. Todo
Key areas for future development:
- GPS metadata integration for timezone detection
- Enhanced duplicate detection
- Performance optimizations for large file sets
- Additional organization patterns