MediaImporter / DEVELOPMENT.md
02f71bb 8 months ago History
1 contributor
280 lines | 10.786kb
# Development Guide

## 1. Objectives

The Media Importer project aims to provide a robust, efficient solution for organizing media files by date with proper timezone handling and conflict - **Naming Convention**: `{s/f}_{YYYYMMDD_HHMMSS}_{TestName}.md` format where:
  - `s` = success, `f` = failure
  - Followed by timestamp and test name (spaces converted to underscores)esolution.

Key objectives:
- Reliable EXIF/metadata extraction and date parsing
- Proper UTC time conversion for QuickTime/Apple media files
- Flexible organization patterns (year/month/day/hour)
- Safe file operations with dry-run capabilities
- Cross-platform compatibility (macOS/Linux)

## 2. Guide

### Development Workflow

When making changes to the project, follow this structured approach:

### Changelog Entries Format

All changelog entries should follow this format:

```
- Date time
- Bug/Feature description  
- Changes made
```

Example:
```
- 2025-09-07 10:30
- Fixed data loss issue when processing already-sorted folders
- Added exclusion patterns for sorted/organized/processed folders
- Added --include-sorted flag to override exclusions when needed
```

### File Move Confirmation

Every file move operation should be confirmed:
- After moving a file, the script must check that the file exists at the destination.
- If the file is not present at the destination after the move operation, the script should immediately stop and report an error.
- This ensures data integrity and prevents silent data loss.

### Destination Inside Source Handling

- Given the script's purpose, the destination folder may be inside the source folder.
- In this case, all files within the destination folder must be excluded from scanning and processing.
- For extra safety: before processing any file, if its path matches (or is inside) the destination path, the script must report an error and stop immediately.
- This prevents accidental re-processing or moving of files that have already been sorted, ensuring data integrity.

### Testing

Testing is essential to ensure the script's reliability and data safety. The following methodology should be used:

#### Test Environment Setup
- The `samples` directory contains a variety of media files for testing.
- Create a dedicated working directory named `test` for each test run.
- Copy selected files from `samples` into the `test` directory to simulate real-world scenarios.
- Perform import operations using the script, targeting the `test` directory as the source and a subdirectory (e.g., `test/sorted`) as the destination.

#### Test Execution and Documentation
- Before and after each import operation, run `find` on both the source and destination directories to capture the file structure:
  - Example: `find ./test > test/source_before.txt`
  - Example: `find ./test/sorted > test/dest_before.txt`
- Log all results, including script output and directory listings, into a dedicated log file for each test.

#### Test Report Format
Each test must generate a comprehensive Markdown report in `test/test_report.md` with the following structure:

```markdown
# Test Report: [Test Name/Scenario]

## Test Information
- **Date**: $(date)
- **Scenario**: [Brief description of what is being tested]
- **Objective**: [What specific functionality/behavior is being verified]
- **Files Used**: [List of test files and their characteristics]

## Pre-Test State
### Source Directory Structure
\`\`\`
[Contents of source_before.txt]
\`\`\`

### Destination Directory Structure
\`\`\`
[Contents of dest_before.txt]
\`\`\`

## Test Execution
### Command Used
\`\`\`bash
[Exact command executed]
\`\`\`

### Script Output
\`\`\`
[Full script output from import_log.txt]
\`\`\`

## Post-Test State
### Source Directory Structure
\`\`\`
[Contents of source_after.txt]
\`\`\`

### Destination Directory Structure
\`\`\`
[Contents of dest_after.txt]
\`\`\`

## Analysis and Verification
### Expected Results
- [List what should happen]

### Actual Results
- [List what actually happened]

### Issues Found
- [Any problems, errors, or unexpected behavior]
- [Include error messages, incorrect file placements, etc.]

### Protections Verified
- [ ] Destination exclusion working
- [ ] Move confirmation functional
- [ ] No data loss detected
- [ ] UTC conversion correct (for QuickTime files)
- [ ] Unimportable files handling (if applicable)

## Corrective Actions
### Issues Identified
- [Detailed description of problems found]

### Fixes Applied
- [Code changes made]
- [Configuration adjustments]
- [Process improvements]

### Re-test Results
- [Results after applying fixes]

## Conclusion
### Test Result
- [ ] PASSED
- [ ] FAILED
- [ ] PARTIAL (with notes)

### Notes
[Any additional observations, recommendations, or follow-up actions needed]

### Files Generated
- `test/source_before.txt` - Pre-test source structure
- `test/dest_before.txt` - Pre-test destination structure
- `test/source_after.txt` - Post-test source structure
- `test/dest_after.txt` - Post-test destination structure
- `test/import_log.txt` - Full script execution log
- `test/test_report.md` - This report
```

#### Automated Test Runner
A comprehensive test runner script (`test_runner.sh`) is available to automate the testing process:

```bash
./test_runner.sh
```

The script provides:
- **Pre-configured test scenarios** for common use cases
- **Automatic report generation** in Markdown format
- **State capture** before and after test execution
- **Protection verification** with checkboxes
- **Custom test support** for specific scenarios

#### Test Categories
The test runner provides the following pre-configured test scenarios:

1. **Basic Functionality Test**: Tests processing of files with valid EXIF data to verify correct sorting and organization
2. **Unimportable Files Test**: Tests handling of files without EXIF data in both root and subfolders, without --collect-unimportable flag
3. **Mixed Content Test**: Tests processing of sortable and unimportable files in separate folders to verify cleanup behavior
4. **Safety Protections Test**: Tests destination exclusion and move confirmation mechanisms to prevent data loss
5. **UTC Conversion Test**: Tests UTC timestamp conversion for QuickTime/Apple EXIF data
6. **Subdirectory Processing Test**: Tests processing of files in nested subdirectories to ensure recursive file discovery
7. **Custom Test**: Allows user-defined test scenarios with custom file sets and commands

#### Test Result Persistence
The test runner includes automatic result persistence:

- **Archival Location**: Test results are saved as individual Markdown files in `test_reports/` directory
- **Naming Convention**: `{TestName}_{YYYYMMDD_HHMMSS}.md` format for easy identification
- **Contents Preserved**: Single self-contained Markdown file with:
  - Complete test information and directory structures
  - Full script execution output embedded inline (ANSI codes stripped for readability)
  - Import log content included directly in the report
- **Excluded Files**: No separate files - everything is consolidated in the Markdown report
- **Historical Tracking**: Maintains complete test history for debugging and regression testing

#### Cleanup
- Review the test report and verify all aspects are documented
- Clean up the `test` directory after each test run to ensure a fresh environment for subsequent tests
- Archive important test reports in a `test_reports/` directory for future reference

## 3. Changelog

### 2025-09-07 21:15 - Test 2 and 3 Enhancements
- Updated Test 2 (Unimportable Files Test) to include files in both root and subfolder
- Removed --collect-unimportable flag from Test 2 to test default behavior
- Updated Test 3 (Mixed Content Test) to use separate folders for sortable vs unimportable files
- Test 3 now verifies that folders with only sortable files are cleaned up while folders with unimportable files are preserved
- Updated menu descriptions to reflect the changes
- Tests now verify proper handling of unimportable files without collection flag

---

### 2025-09-07 21:25 - Documentation Enhancement
- Added comprehensive documentation for --collect-unimportable flag in README.md
- Added Example 4 showing how to use --collect-unimportable flag
- Updated Features section to mention unimportable files handling
- Updated Configuration section to explain default behavior for unimportable files
- Added usage example for --collect-unimportable in Basic Usage section

---

### 2025-09-07 21:30 - Git Ignore Enhancement
- Added test_reports/ to .gitignore to exclude generated test reports from version control
- Test reports are generated files that don't need to be tracked in Git
- Prevents large numbers of timestamped report files from cluttering the repository
- Added sample/ to .gitignore to exclude test media files from version control

---

### 2025-09-07 20:40 - Source Only Test Addition
- Added Test 8: Source Only Test to test runner
- Tests processing with only source parameter (creates sorted subdirectory automatically)
- Verifies that when no destination is specified, files are sorted into source/sorted/
- Updated menu and command line options for new test

---

### 2025-09-07 20:45 - Test 7 Refinement
- Updated Test 7 to test --keep-empty-dirs functionality instead of cleanup
- Since cleanup is now default behavior, Test 7 now verifies empty directory preservation
- Renamed from "Cleanup Empty Directories Test" to "Keep Empty Directories Test"
- Updated test scenario to validate --keep-empty-dirs flag behavior
- Added command line option "keep-empty-dirs" for test 7

---

### 2025-09-07 19:30 - Test Runner Directory Separation
- Adapted test runner to use separate source and destination directories
- Changed from test/ as source to test/source/ and test/destination/
- Updated all test functions to use proper directory separation
- Improved test isolation and clarity

---

### 2025-09-07 19:00 - Default Cleanup Behavior
- Made --cleanup-empty-dirs the default behavior (implicit option)
- Added --keep-empty-dirs flag to disable cleanup if needed
- Updated help text and configuration display to reflect new default
- Cleanup now runs automatically unless explicitly disabled

---

### 2025-09-07 18:56 - Cleanup Empty Directories Feature
- Added --cleanup-empty-dirs option to remove empty directories from source after processing
- Added cleanup_empty_directories() function with safe empty directory detection
- Updated final report to show cleanup status
- Maintains safety by not removing source root directories
- Works correctly with dry-run mode

## 4. Todo

Key areas for future development:
- GPS metadata integration for timezone detection
- Enhanced duplicate detection
- Performance optimizations for large file sets
- Additional organization patterns