Development Guide
1. Objectives
The Media Importer project aims to provide a robust, efficient solution for organizing media files by date with proper timezone handling and conflict resolution.
Key objectives:
- Reliable EXIF/metadata extraction and date parsing
- Proper UTC time conversion for QuickTime/Apple media files
- Flexible organization patterns (year/month/day/hour)
- Safe file operations with dry-run capabilities
- Cross-platform compatibility (macOS/Linux)
2. Guide
Development Workflow
When making changes to the project, follow this structured approach:
Changelog Entries Format
All changelog entries should follow this format:
- Date time
- Bug/Feature description
- Changes made
Example:
- 2025-09-07 10:30
- Fixed data loss issue when processing already-sorted folders
- Added exclusion patterns for sorted/organized/processed folders
- Added --include-sorted flag to override exclusions when needed
File Move Confirmation
Every file move operation should be confirmed:
- After moving a file, the script must check that the file exists at the destination.
- If the file is not present at the destination after the move operation, the script should immediately stop and report an error.
- This ensures data integrity and prevents silent data loss.
Implementation note (2026-05):
- Move flow now uses
copy -> verify -> delete source.
- Verification is controlled by
--verify-mode with modes size (default), strict, and none.
size validates destination existence, size match, and metadata/date consistency.
strict adds byte-to-byte validation (cmp) on top of size checks.
- Source deletion is allowed only after successful verification.
Destination Inside Source Handling
- Given the script's purpose, the destination folder may be inside the source folder.
- In this case, all files within the destination folder must be excluded from scanning and processing.
- For extra safety: before processing any file, if its path matches (or is inside) the destination path, the script must report an error and stop immediately.
- This prevents accidental re-processing or moving of files that have already been sorted, ensuring data integrity.
Testing
Testing is essential to ensure the script's reliability and data safety. The following methodology should be used:
Test Environment Setup
- The
samples directory contains a variety of media files for testing.
- Create a dedicated working directory named
test for each test run.
- Copy selected files from
samples into the test directory to simulate real-world scenarios.
- Perform import operations using the script, targeting the
test directory as the source and a subdirectory (e.g., test/sorted) as the destination.
Test Execution and Documentation
- Before and after each import operation, run
find on both the source and destination directories to capture the file structure:
- Example:
find ./test > test/source_before.txt
- Example:
find ./test/sorted > test/dest_before.txt
- Log all results, including script output and directory listings, into a dedicated log file for each test.
Test Report Format
Each test must generate a comprehensive Markdown report in test/test_report.md with the following structure:
# Test Report: [Test Name/Scenario]
## Test Information
- **Date**: $(date)
- **Scenario**: [Brief description of what is being tested]
- **Objective**: [What specific functionality/behavior is being verified]
- **Files Used**: [List of test files and their characteristics]
## Pre-Test State
### Source Directory Structure
\`\`\`
[Contents of source_before.txt]
\`\`\`
### Destination Directory Structure
\`\`\`
[Contents of dest_before.txt]
\`\`\`
## Test Execution
### Command Used
\`\`\`bash
[Exact command executed]
\`\`\`
### Script Output
\`\`\`
[Full script output from import_log.txt]
\`\`\`
## Post-Test State
### Source Directory Structure
\`\`\`
[Contents of source_after.txt]
\`\`\`
### Destination Directory Structure
\`\`\`
[Contents of dest_after.txt]
\`\`\`
## Analysis and Verification
### Expected Results
- [List what should happen]
### Actual Results
- [List what actually happened]
### Issues Found
- [Any problems, errors, or unexpected behavior]
- [Include error messages, incorrect file placements, etc.]
### Protections Verified
- [ ] Destination exclusion working
- [ ] Move confirmation functional
- [ ] No data loss detected
- [ ] UTC conversion correct (for QuickTime files)
- [ ] Unimportable files handling (if applicable)
## Corrective Actions
### Issues Identified
- [Detailed description of problems found]
### Fixes Applied
- [Code changes made]
- [Configuration adjustments]
- [Process improvements]
### Re-test Results
- [Results after applying fixes]
## Conclusion
### Test Result
- [ ] PASSED
- [ ] FAILED
- [ ] PARTIAL (with notes)
### Notes
[Any additional observations, recommendations, or follow-up actions needed]
### Files Generated
- `test/source_before.txt` - Pre-test source structure
- `test/dest_before.txt` - Pre-test destination structure
- `test/source_after.txt` - Post-test source structure
- `test/dest_after.txt` - Post-test destination structure
- `test/import_log.txt` - Full script execution log
- `test/test_report.md` - This report
Automated Test Runner
A comprehensive test runner script (test_runner.sh) is available to automate the testing process:
./test_runner.sh
The script provides:
- Pre-configured test scenarios for common use cases
- Automatic report generation in Markdown format
- State capture before and after test execution
- Protection verification with checkboxes
- Custom test support for specific scenarios
Test Categories
The test runner provides the following pre-configured test scenarios:
- Basic Functionality Test: Tests processing of files with valid EXIF data to verify correct sorting and organization
- Unimportable Files Test: Tests handling of files without EXIF data in both root and subfolders, without --collect-unimportable flag
- Mixed Content Test: Tests processing of sortable and unimportable files in separate folders to verify cleanup behavior
- Safety Protections Test: Tests destination exclusion and move confirmation mechanisms to prevent data loss
- UTC Conversion Test: Tests UTC timestamp conversion for QuickTime/Apple EXIF data
- Subdirectory Processing Test: Tests processing of files in nested subdirectories to ensure recursive file discovery
- Source Only Test: Tests automatic
source/sorted destination behavior
- Destination Inside Source Test: Verifies destination exclusion and prevents
sorted/sorted recursion
- Verify Mode Test: Verifies default
size verification and explicit strict mode
- Timestamp Collision No-Overwrite Test: Reproduces GoPro-style chapters with identical
CreateDate values and verifies unique destination filenames with no overwrite collapse
- GoPro Sidecar Metadata Sync Test: Verifies GoPro MP4 imports use THM start times and automatically write the corrected timestamp into destination metadata
Test Result Persistence
The test runner includes automatic result persistence:
- Archival Location: Test results are saved as individual Markdown files in
test_reports/ directory
- Naming Convention:
{s/f}_{YYYYMMDD_HHMMSS}_{TestName}.md format where:
s = success, f = failure
- Followed by timestamp and test name (spaces converted to underscores)
- Contents Preserved: Single self-contained Markdown file with:
- Complete test information and directory structures
- Full script execution output embedded inline (ANSI codes stripped for readability)
- Import log content included directly in the report
- Excluded Files: No separate files - everything is consolidated in the Markdown report
- Historical Tracking: Maintains complete test history for debugging and regression testing
Cleanup
- Review the test report and verify all aspects are documented
- Clean up the
test directory after each test run to ensure a fresh environment for subsequent tests
- Archive important test reports in a
test_reports/ directory for future reference
3. Changelog
2026-05-16 00:10 - GoPro Chapter Overwrite Postmortem
- Documented the 2026-05-15 GoPro import data-loss incident in
INCIDENTS.md.
- Root cause: multiple chaptered GoPro MP4 files shared the same timestamp-derived destination filename; the script warned but continued, allowing each copy to overwrite the previous destination before deleting the current source after verification.
- Clarified that QuickTime UTC-to-local conversion likely explains the visible
19:20 vs 22:20 timestamp difference, but was not the data-loss mechanism.
- Recorded surviving evidence: only the final full-resolution chapter remained in destination;
.THM and .LRV sidecars remained on the card and preserved the chapter timeline.
- Added operational guidance for future GoPro imports: start with
--date-source filesystem --sync-metadata --dry-run -v and verify that GoPro MP4 dates come from matching .THM sidecars.
- Added regression rule: future changes to date extraction, naming, conflict handling, copy/move, or GoPro/Garmin/Varia behavior must run
./test_runner.sh timestamp-collision.
2026-05-17 08:35 - Automatic GoPro Metadata Timestamp Correction
- Changed GoPro import behavior so
GX/GH/GP*.MP4 files with matching .THM sidecars prefer the THM filesystem timestamp even in default auto date mode.
- GoPro imports that use THM/filesystem dates now automatically sync destination metadata to the corrected clip start time;
--sync-metadata is no longer required for this GoPro path.
- Changed move flow for metadata-corrected imports to copy and verify first, write destination metadata, verify the metadata timestamp, and only then delete the source file.
- Added
GoPro_Sidecar_Metadata_Sync regression coverage in test_runner.sh.
- Verified the real
/Volumes/GOPRO dry-run maps the current files to 19:03:20, 19:15:08, 19:26:56, and 19:38:44, all from matching THM sidecars.
2026-05-17 09:20 - Explicit Destination Conflict Policy
- Added
--unattended; unattended/non-interactive runs never prompt and resolve existing destinations with numeric suffixes (_1, _2, ...).
- Interactive runs now ask on destination conflicts and offer suffix once/all, skip once/all, or abort, so long imports do not require repeated decisions.
- Dry-run reserves planned destinations, so collisions within the same run are visible before writing.
- Updated
Timestamp_Collision_No_Overwrite to assert numeric suffixes and reject the legacy __GX... suffix form.
2026-05-17 12:05 - Long Import UX Feedback
- Added per-file progress lines (
Processing [n/total]) before long copy/move work starts.
- Replaced integer
files/sec reporting with decimal throughput that shows files/min for large slow imports and keeps MB/sec precision.
- Added average time per processed file to the final report.
2026-05-17 12:20 - GoPro Filesystem Date Fallback
- GoPro auto date extraction now stays on filesystem timestamps even when THM is missing.
- Fallback order is matching
THM, matching LRV, then the MP4 file's own filesystem mtime.
- Automatic metadata sync now applies to all GoPro filesystem date sources, including the MP4 fallback.
- Extended
GoPro_Sidecar_Metadata_Sync to cover THM, LRV-only, and no-sidecar GoPro imports.
- Added
GoPro_No_Sidecar_Reimport to verify no-sidecar MP4 fallback plus safe numeric suffixing when the same GoPro file is imported again.
- Excluded macOS AppleDouble
._* files from source discovery so NAS metadata forks are not treated as importable media.
2026-05-18 08:40 - Report Table Cleanup
- Aligned final report labels with a shared formatter.
- Removed file-per-time throughput and average-per-file metrics from the final report.
- Kept elapsed time and data throughput in
MB/sec, which better reflects transfer performance.
- Made test report cleanup portable on macOS by removing GNU
find -printf and negative head usage.
2026-05-13 21:10 - Critical Data Loss Incident + Hardening
- Observed incident: running with explicit source on a problematic/near-full card led to writes on source media and source content removal without strong post-write confirmation.
- Impact: major data loss risk on unstable storage media (including read-only/IO edge cases).
- Clarified behavior kept as valid: destination may be inside source (
source/sorted) and must be excluded from scanning.
- Reworked move safety in
media-importer.sh: direct mv path replaced by verified flow copy -> verify -> delete source.
- Added verification modes via
--verify-mode:
size (default): destination exists + size match + metadata/date validation
strict: adds byte-to-byte content validation (cmp)
none: disables verification (explicit opt-out)
- Applied the same verified move logic for unsortable collection path.
- Added regression coverage in
test_runner.sh:
- Destination-inside-source test (
-s source -d source/sorted) to verify destination exclusion and no sorted/sorted recursion.
- Verify-mode test to confirm default
size and explicit strict behavior.
- Updated ignore/staging hygiene to avoid committing generated test artifacts and
.DS_Store.
2025-09-07 21:15 - Test 2 and 3 Enhancements
- Updated Test 2 (Unimportable Files Test) to include files in both root and subfolder
- Removed --collect-unimportable flag from Test 2 to test default behavior
- Updated Test 3 (Mixed Content Test) to use separate folders for sortable vs unimportable files
- Test 3 now verifies that folders with only sortable files are cleaned up while folders with unimportable files are preserved
- Updated menu descriptions to reflect the changes
- Tests now verify proper handling of unimportable files without collection flag
2025-09-07 21:25 - Documentation Enhancement
- Added comprehensive documentation for --collect-unimportable flag in README.md
- Added Example 4 showing how to use --collect-unimportable flag
- Updated Features section to mention unimportable files handling
- Updated Configuration section to explain default behavior for unimportable files
- Added usage example for --collect-unimportable in Basic Usage section
2025-09-07 21:30 - Git Ignore Enhancement
- Added test_reports/ to .gitignore to exclude generated test reports from version control
- Test reports are generated files that don't need to be tracked in Git
- Prevents large numbers of timestamped report files from cluttering the repository
- Added sample/ to .gitignore to exclude test media files from version control
2025-09-07 20:40 - Source Only Test Addition
- Added Test 8: Source Only Test to test runner
- Tests processing with only source parameter (creates sorted subdirectory automatically)
- Verifies that when no destination is specified, files are sorted into source/sorted/
- Updated menu and command line options for new test
2025-09-07 20:45 - Test 7 Refinement
- Updated Test 7 to test --keep-empty-dirs functionality instead of cleanup
- Since cleanup is now default behavior, Test 7 now verifies empty directory preservation
- Renamed from "Cleanup Empty Directories Test" to "Keep Empty Directories Test"
- Updated test scenario to validate --keep-empty-dirs flag behavior
- Added command line option "keep-empty-dirs" for test 7
2025-09-07 19:30 - Test Runner Directory Separation
- Adapted test runner to use separate source and destination directories
- Changed from test/ as source to test/source/ and test/destination/
- Updated all test functions to use proper directory separation
- Improved test isolation and clarity
2025-09-07 19:00 - Default Cleanup Behavior
- Made --cleanup-empty-dirs the default behavior (implicit option)
- Added --keep-empty-dirs flag to disable cleanup if needed
- Updated help text and configuration display to reflect new default
- Cleanup now runs automatically unless explicitly disabled
2025-09-07 18:56 - Cleanup Empty Directories Feature
- Added --cleanup-empty-dirs option to remove empty directories from source after processing
- Added cleanup_empty_directories() function with safe empty directory detection
- Updated final report to show cleanup status
- Maintains safety by not removing source root directories
- Works correctly with dry-run mode
4. Todo
Key areas for future development:
- GPS metadata integration for timezone detection
- Enhanced duplicate detection
- Performance optimizations for large file sets
- Additional organization patterns