Module compare_and_list

compare_and_list.py


This script compares files in the local directory structure (source) against a destination folder (remote copy). It checks for: - missing files - "obsolete" files (based on file size, last modification date, or both)

NEW FEATURES: - For MISSING files, display (YYYY-MM-DD) and size of the source file. - Third argument "missing" => only show missing files "update" => show missing and obsolete files (default).

Usage

./compare_and_list.py [comparison_mode] [operation_mode] [create_missing_folders_flag]

Examples

./compare_and_list.py /path/to/remote # Default: comparison_mode=date, operation_mode=update, no folder creation ./compare_and_list.py /path/to/remote size missing yes => Compare by size only; display only missing files; create missing folders.

Production Examples: /compare_and_list.py $HOME/onedriveOV_AgroParisTech/Han/dev/Pizza3

OUTPUT FORMAT (examples): (size diff: +2.3 KB, time diff: +45.0 min) OBSOLETE: /path/to/source -> /path/to/dest (source: 2025-01-05, 12 KB) MISSING: /path/to/source -> /path/to/dest


# Documentation

## Overview

This script, *compare_and_list.py*, inspects a local *Pizza3* project directory (which we call the **source**) and a remote copy (which we call the **destination**). It reports files that are:
- **Missing** in the destination.
- **Obsolete** in the destination, based on one of three modes:
- **date**: The source file has a more recent modification date.
- **size**: The source file is larger than the destination file.
- **both**: Either the source is newer or the source is larger.

By default, *compare_and_list.py* compares using the **date** mode.

It also checks the **inclusion** and **exclusion** patterns defined in the script, mirroring a backup configuration. Only files that match an inclusion pattern and do **not** match an exclusion pattern will be considered for comparison.

## Prerequisites

1. **Script location**: The script must reside in `$mainfolder/utils/` and must be run from there.
2. **Name of the current folder**: The script checks that the current directory’s name is <code>utils</code>. If not, it fails.
3. **Presence of pdocme.sh**: The script requires a file named <code>pdocme.sh</code> in the `utils/` folder to confirm the environment is correct.
4. **Destination folder**: The script requires a mandatory first argument indicating the destination folder. It checks that this folder has a `utils/` subfolder containing <code>pdocme.sh</code>, ensuring it is a remote copy of the same structure.

## Command-Line Arguments

The script takes up to four arguments:

1. **destination_folder** (mandatory)  
The path to the remote/copy of the main folder. If relative, the script automatically converts it to an absolute path.  
Example: `$HOME/onedriveOV_AgroParisTech/Han/dev/Pizza3`

2. **comparison_mode** (optional)  
Specifies the comparison mode:  
- **date** : Compare the modification times (source newer => replace needed).  
- **size** : Compare the sizes (source bigger => replace needed).  
- **both** : Compare both criteria (if source is either newer or bigger => replace needed).  
If not provided, the default is **date**.

3. **operation_mode** (optional)  
- **missing**: Only list missing files.  
- **update** (default): List missing and obsolete.

4. **create_missing_folders_flag** (optional)  
- **yes** : If set to 'yes', the script will automatically create any missing folders in the destination when a missing file is found.  
- **no**  : No folder creation if the file’s destination path doesn’t exist.  
The default is **no**.

## Usage Examples

### Example 1: Compare only by date, no folder creation

```bash
cd $HOME/han/dev/Pizza3/utils/
./compare_and_list.py $HOME/onedriveOV_AgroParisTech/Han/dev/Pizza3
```

1. The script changes into the *utils* directory:  
*$HOME/han/dev/Pizza3/utils/*
2. Runs the script with only one argument: the destination folder.  
3. Uses the default comparison mode (**date**).  
4. Does **not** create missing folders.

Any files missing or “obsolete by date” in the destination are listed in the output, which can be redirected as needed, for example:

```bash
./compare_and_list.py $HOME/onedriveOV_AgroParisTech/Han/dev/Pizza3 > missing_or_obsolete.txt
```

### Example 2: Compare by size and create missing folders

```bash
cd $HOME/han/dev/Pizza3/utils/
./compare_and_list.py $HOME/onedriveOV_AgroParisTech/Han/dev/Pizza3 size yes
```

1. The script again is run from the same local directory (*utils*).
2. **destination_folder** is `$HOME/onedriveOV_AgroParisTech/Han/dev/Pizza3`.  
3. **comparison_mode** is <code>size</code>.  
4. **create_missing_folders_flag** is <code>yes</code>, so any required subfolders that do not exist in the destination will be created.

The script outputs lines like:

- `MISSING: source_file_path -> destination_file_path`  
- `OBSOLETE: source_file_path -> destination_file_path`

You can filter or redirect this output to logs or another program.

## Explanation of Output Lines

- **MISSING:** A file was found in the source but is absent in the destination.  
- **OBSOLETE:** A file exists in both locations, but the source is considered “newer” or “larger” (depending on the mode), meaning a replacement is appropriate.

Each line contains the **full path** of the source file and the **corresponding path** in the destination. This format is convenient for other scripts or tooling to parse and handle necessary copies.

## Inclusion/Exclusion Patterns

- **include_patterns**: List of filename patterns (wildcards) that must match for the file to be considered (e.g., `*.py`, `*.m`, `*.txt`, etc.).  
- **exclude_files**: Specific file patterns to exclude (e.g., `*.log`, <code>backupme.README.md</code>, etc.).  
- **exclude_folders_rel**: Relative folders to exclude (e.g., `./old`, `./tmp`).  
- **exclude_folders_abs**: Absolute folders to exclude, relevant to `$mainfolder` (e.g., `$mainfolder/release`).

The script automatically ignores files and folders that match any exclusion pattern.

## Logging and Creation of Folders

- When *create_missing_folders_flag* is set to **yes**, the script attempts to create the destination folder structure if it is missing. Upon successful creation, it logs a line such as:  
`Created missing folder: /path/to/new/folder`

- In all cases, the script **never** modifies the source files or folders. It only prints actions that might be needed in the destination directory.

## Error Handling

1. **Script not in utils**: The script will refuse to run if the current directory name is not “utils”.  
2. **pdocme.sh missing**: The script checks for this file as a marker that `$mainfolder/utils` is correct.  
3. **Destination folder not found**: The script exits with an error if the user-supplied destination folder path does not exist.  
4. **Invalid remote copy**: If the destination path lacks `utils/pdocme.sh`, the script fails.  
5. **Invalid comparison mode**: If the user supplies an unrecognized mode, the script warns and reverts to the default “date” mode.

---

Author:

INRAE\Olivier Vitrac
Email: olivier.vitrac@agroparistech.fr
Last Revised: 2025-01-09

Expand source code
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
'''
    compare_and_list.py

----------------------------------------------------------------------
This script compares files in the local directory structure (source)
against a destination folder (remote copy). It checks for:
  - missing files
  - "obsolete" files (based on file size, last modification date, or both)

NEW FEATURES:
  - For MISSING files, display (YYYY-MM-DD) and size of the source file.
  - Third argument "missing" => only show missing files
                "update"  => show missing and obsolete files (default).

Usage:
  ./compare_and_list.py <destination_folder> [comparison_mode] [operation_mode] [create_missing_folders_flag]

Examples:
  ./compare_and_list.py /path/to/remote  # Default: comparison_mode=date, operation_mode=update, no folder creation
  ./compare_and_list.py /path/to/remote size missing yes
    => Compare by size only; display only missing files; create missing folders.

Production Examples:
    /compare_and_list.py $HOME/onedriveOV_AgroParisTech/Han/dev/Pizza3

OUTPUT FORMAT (examples):
  (size diff: +2.3 KB, time diff: +45.0 min) OBSOLETE: /path/to/source -> /path/to/dest
  (source: 2025-01-05, 12 KB) MISSING: /path/to/source -> /path/to/dest

----------------------------------------------------------------------

    # Documentation

    ## Overview

    This script, *compare_and_list.py*, inspects a local *Pizza3* project directory (which we call the **source**) and a remote copy (which we call the **destination**). It reports files that are:
    - **Missing** in the destination.
    - **Obsolete** in the destination, based on one of three modes:
    - **date**: The source file has a more recent modification date.
    - **size**: The source file is larger than the destination file.
    - **both**: Either the source is newer or the source is larger.

    By default, *compare_and_list.py* compares using the **date** mode.

    It also checks the **inclusion** and **exclusion** patterns defined in the script, mirroring a backup configuration. Only files that match an inclusion pattern and do **not** match an exclusion pattern will be considered for comparison.

    ## Prerequisites

    1. **Script location**: The script must reside in `$mainfolder/utils/` and must be run from there.
    2. **Name of the current folder**: The script checks that the current directory’s name is `utils`. If not, it fails.
    3. **Presence of pdocme.sh**: The script requires a file named `pdocme.sh` in the `utils/` folder to confirm the environment is correct.
    4. **Destination folder**: The script requires a mandatory first argument indicating the destination folder. It checks that this folder has a `utils/` subfolder containing `pdocme.sh`, ensuring it is a remote copy of the same structure.

    ## Command-Line Arguments

    The script takes up to four arguments:

    1. **destination_folder** (mandatory)  
    The path to the remote/copy of the main folder. If relative, the script automatically converts it to an absolute path.  
    Example: `$HOME/onedriveOV_AgroParisTech/Han/dev/Pizza3`

    2. **comparison_mode** (optional)  
    Specifies the comparison mode:  
    - **date** : Compare the modification times (source newer => replace needed).  
    - **size** : Compare the sizes (source bigger => replace needed).  
    - **both** : Compare both criteria (if source is either newer or bigger => replace needed).  
    If not provided, the default is **date**.

    3. **operation_mode** (optional)  
    - **missing**: Only list missing files.  
    - **update** (default): List missing and obsolete.  

    4. **create_missing_folders_flag** (optional)  
    - **yes** : If set to 'yes', the script will automatically create any missing folders in the destination when a missing file is found.  
    - **no**  : No folder creation if the file’s destination path doesn’t exist.  
    The default is **no**.

    ## Usage Examples

    ### Example 1: Compare only by date, no folder creation

    ```bash
    cd $HOME/han/dev/Pizza3/utils/
    ./compare_and_list.py $HOME/onedriveOV_AgroParisTech/Han/dev/Pizza3
    ```

    1. The script changes into the *utils* directory:  
    *$HOME/han/dev/Pizza3/utils/*
    2. Runs the script with only one argument: the destination folder.  
    3. Uses the default comparison mode (**date**).  
    4. Does **not** create missing folders.

    Any files missing or “obsolete by date” in the destination are listed in the output, which can be redirected as needed, for example:

    ```bash
    ./compare_and_list.py $HOME/onedriveOV_AgroParisTech/Han/dev/Pizza3 > missing_or_obsolete.txt
    ```

    ### Example 2: Compare by size and create missing folders

    ```bash
    cd $HOME/han/dev/Pizza3/utils/
    ./compare_and_list.py $HOME/onedriveOV_AgroParisTech/Han/dev/Pizza3 size yes
    ```

    1. The script again is run from the same local directory (*utils*).
    2. **destination_folder** is `$HOME/onedriveOV_AgroParisTech/Han/dev/Pizza3`.  
    3. **comparison_mode** is `size`.  
    4. **create_missing_folders_flag** is `yes`, so any required subfolders that do not exist in the destination will be created.

    The script outputs lines like:

    - `MISSING: source_file_path -> destination_file_path`  
    - `OBSOLETE: source_file_path -> destination_file_path`  

    You can filter or redirect this output to logs or another program.

    ## Explanation of Output Lines

    - **MISSING:** A file was found in the source but is absent in the destination.  
    - **OBSOLETE:** A file exists in both locations, but the source is considered “newer” or “larger” (depending on the mode), meaning a replacement is appropriate.

    Each line contains the **full path** of the source file and the **corresponding path** in the destination. This format is convenient for other scripts or tooling to parse and handle necessary copies.

    ## Inclusion/Exclusion Patterns

    - **include_patterns**: List of filename patterns (wildcards) that must match for the file to be considered (e.g., `*.py`, `*.m`, `*.txt`, etc.).  
    - **exclude_files**: Specific file patterns to exclude (e.g., `*.log`, `backupme.README.md`, etc.).  
    - **exclude_folders_rel**: Relative folders to exclude (e.g., `./old`, `./tmp`).  
    - **exclude_folders_abs**: Absolute folders to exclude, relevant to `$mainfolder` (e.g., `$mainfolder/release`).  

    The script automatically ignores files and folders that match any exclusion pattern.

    ## Logging and Creation of Folders

    - When *create_missing_folders_flag* is set to **yes**, the script attempts to create the destination folder structure if it is missing. Upon successful creation, it logs a line such as:  
    `Created missing folder: /path/to/new/folder`

    - In all cases, the script **never** modifies the source files or folders. It only prints actions that might be needed in the destination directory.

    ## Error Handling

    1. **Script not in utils**: The script will refuse to run if the current directory name is not “utils”.  
    2. **pdocme.sh missing**: The script checks for this file as a marker that `$mainfolder/utils` is correct.  
    3. **Destination folder not found**: The script exits with an error if the user-supplied destination folder path does not exist.  
    4. **Invalid remote copy**: If the destination path lacks `utils/pdocme.sh`, the script fails.  
    5. **Invalid comparison mode**: If the user supplies an unrecognized mode, the script warns and reverts to the default “date” mode.

    --- 

**Author:**
---------
INRAE\Olivier Vitrac  
Email: olivier.vitrac@agroparistech.fr  
Last Revised: 2025-01-09
'''

import os
import sys
import fnmatch
import time
import datetime

# ----------------------------------------------------------------------
# Global Patterns for Inclusions and Exclusions
# ----------------------------------------------------------------------
include_patterns = [
    "*.m",       # MATLAB files
    "*.asv",     # Auto-saved files
    "*.m~",      # Backup files
    "*.pynb",    # Jupyter notebooks
    "*.py",      # Python scripts
    "*.sh",      # Shell scripts
    "*.txt",     # Text files
    "*.md",      # Markdown files
    "*.html",    # HTML files
    "*.json",    # JSON files
    "*.css",     # CSS files
    "*.manifest" # Manifest files
]

exclude_folders_rel = [
    "./old",
    "./tmp",
    "./sandbox",
    "./debug",
    "./obsolete",
    "./.git",
    "./.vscode",
    "./.spyproject",
    "./__all__",
    "./__pycache__"
]

exclude_folders_abs = [
    # e.g. "/absolute/path/to/history",
    # e.g. "/absolute/path/to/release",
]

exclude_files = [
    "*.log",
    "*.zip",
    "backupme.README.md"
]

# ----------------------------------------------------------------------
# Additional constants
# ----------------------------------------------------------------------
TIME_DIFF_TOLERANCE = 3600  # e.g., 1 hour

def is_included(filename):
    """
    Checks if a file should be included based on the include_patterns.
    """
    return any(fnmatch.fnmatch(filename, pat) for pat in include_patterns)

def is_excluded_file(filename):
    """
    Checks if a file is explicitly excluded based on exclude_files.
    """
    return any(fnmatch.fnmatch(filename, pat) for pat in exclude_files)

def is_excluded_folder(path, mainfolder):
    """
    Checks if a folder should be excluded based on:
      1) The relative folder exclusions (exclude_folders_rel)
      2) The absolute folder exclusions (exclude_folders_abs)
    """
    abs_path = os.path.abspath(path)
    
    # Check relative folder patterns
    for rel_excl in exclude_folders_rel:
        if path.endswith(rel_excl.lstrip("./")):
            return True
    
    # Check absolute folder patterns
    for abs_excl in exclude_folders_abs:
        if abs_path.startswith(abs_excl):
            return True
    
    return False

def usage():
    """
    Prints usage instructions and exits.
    """
    print("Usage:")
    print("  ./compare_and_list.py <destination_folder> [comparison_mode] [operation_mode] [create_missing_folders_flag]")
    print("")
    print("Where:")
    print("  <destination_folder> : Mandatory. Path to remote/copy of mainfolder.")
    print("  [comparison_mode]    : Optional. 'date', 'size', or 'both'. Default is 'date'.")
    print("  [operation_mode]     : Optional. 'missing' or 'update'. Default is 'update'.")
    print("      'missing' => only show missing files")
    print("      'update'  => show missing and obsolete files")
    print("  [create_missing_folders_flag] : Optional. 'yes' or 'no'. Default is 'no'.")
    sys.exit(1)

# ----------------------------------------------------------------------
# Helper functions to format size/time differences
# ----------------------------------------------------------------------
def format_size(bytes_val):
    """
    Return a short string for a file size in bytes, e.g. '12 B', '1.2 KB', '3.4 MB', etc.
    """
    if bytes_val < 1024:
        return f"{bytes_val} B"
    elif bytes_val < 1024**2:
        return f"{bytes_val/1024:.1f} KB"
    else:
        return f"{bytes_val/(1024**2):.1f} MB"

def format_size_diff(size_diff_bytes):
    sign = "+" if size_diff_bytes >= 0 else "-"
    abs_val = abs(size_diff_bytes)
    if abs_val < 1024:
        return f"{sign}{abs_val} B"
    elif abs_val < 1024**2:
        return f"{sign}{abs_val/1024:.1f} KB"
    else:
        return f"{sign}{abs_val/(1024**2):.1f} MB"

def format_time_diff(time_diff_seconds):
    sign = "+" if time_diff_seconds >= 0 else "-"
    abs_diff = abs(time_diff_seconds)
    # if < 60 seconds
    if abs_diff < 60:
        return f"{sign}{int(abs_diff)} s"
    elif abs_diff < 3600:
        mins = abs_diff / 60.0
        return f"{sign}{mins:.1f} min"
    elif abs_diff < 86400:
        hours = abs_diff / 3600.0
        return f"{sign}{hours:.1f} h"
    else:
        days = abs_diff / 86400.0
        return f"{sign}{days:.1f} days"

def format_date(timestamp):
    """
    Convert an epoch timestamp (float or int) to YYYY-MM-DD.
    """
    dt = datetime.datetime.fromtimestamp(timestamp)
    return dt.strftime("%Y-%m-%d")

def main():
    # -------------------------------------------------
    # Step 0: Preliminary checks on script location
    # -------------------------------------------------
    current_dir = os.path.basename(os.getcwd())
    if current_dir != "utils":
        print("Error: You must run this script from the 'utils' folder.")
        sys.exit(1)
    # Check presence of pdocme.sh
    if not os.path.exists("pdocme.sh"):
        print("Error: 'pdocme.sh' is not found in the current folder. Execution aborted.")
        sys.exit(1)
    
    # -------------------------------------------------
    # Step 1: Parse arguments
    # -------------------------------------------------
    if len(sys.argv) < 2:
        usage()
    destination_folder_input = sys.argv[1]
    comparison_mode = "date"  # default
    operation_mode = "update" # default
    create_missing_folders = False
    
    # Arg2: comparison_mode
    if len(sys.argv) >= 3:
        arg2 = sys.argv[2].lower()
        if arg2 in ["date", "size", "both"]:
            comparison_mode = arg2
        elif arg2 in ["missing", "update"]:
            # If user provided 'missing' or 'update' as second arg, treat that as operation_mode
            operation_mode = arg2
        else:
            # not recognized => keep default comparison_mode
            pass
    
    # Arg3: either operation_mode or create_missing_folders_flag
    if len(sys.argv) >= 4:
        arg3 = sys.argv[3].lower()
        if arg3 in ["missing", "update"]:
            operation_mode = arg3
        elif arg3 in ["yes", "no"]:
            create_missing_folders = (arg3 == "yes")
        else:
            # not recognized
            pass
    
    # Arg4: possibly leftover for create_missing_folders_flag
    if len(sys.argv) >= 5:
        arg4 = sys.argv[4].lower()
        if arg4 in ["yes", "no"]:
            create_missing_folders = (arg4 == "yes")
    
    # -------------------------------------------------
    # Step 2: Validate destination folder
    # -------------------------------------------------
    destination_folder = os.path.abspath(destination_folder_input)
    
    if not os.path.exists(destination_folder):
        print(f"Error: Destination folder '{destination_folder}' does not exist.")
        sys.exit(1)
    
    # Check that destination_folder includes 'utils' subfolder and 'pdocme.sh'
    utils_subfolder = os.path.join(destination_folder, "utils")
    pdocme_file = os.path.join(utils_subfolder, "pdocme.sh")
    if not os.path.exists(utils_subfolder) or not os.path.exists(pdocme_file):
        print("Error: Destination folder is not a valid remote copy of the mainfolder. "
              "It must contain 'utils/pdocme.sh'.")
        sys.exit(1)
    
    # -------------------------------------------------
    # Step 3: Walk through the source folder ($mainfolder)
    # -------------------------------------------------
    mainfolder = os.path.dirname(os.path.abspath(os.getcwd()))
    
    for root, dirs, files in os.walk(mainfolder):
        rel_path = os.path.relpath(root, mainfolder)
        if rel_path == ".":
            rel_path = ""
        
        if is_excluded_folder(root, mainfolder):
            dirs[:] = []
            continue
        
        for filename in files:
            source_file = os.path.join(root, filename)
            
            if not is_included(filename):
                continue
            if is_excluded_file(filename):
                continue
            
            dest_file = os.path.join(destination_folder, rel_path, filename)
            source_stat = os.stat(source_file)
            
            # Check if file is missing
            if not os.path.exists(dest_file):
                if operation_mode == "update" or operation_mode == "missing":
                    # We do display missing if mode is 'missing' or 'update'
                    # Show date + size of the source
                    source_date = format_date(source_stat.st_mtime)
                    source_size = format_size(source_stat.st_size)
                    msg = f"(source: {source_date}, {source_size}) MISSING: {source_file} -> {dest_file}"
                    print(msg)
                    if create_missing_folders:
                        dest_dir = os.path.dirname(dest_file)
                        if not os.path.exists(dest_dir):
                            os.makedirs(dest_dir, exist_ok=True)
                            print(f"Created missing folder: {dest_dir}")
                # No need to check obsolete logic because it's missing
                continue
            
            # If operation_mode == "missing", we skip checking obsolete
            if operation_mode == "missing":
                continue
            
            # operation_mode == "update": check if it is obsolete
            dest_stat = os.stat(dest_file)
            replace_required = False
            
            # Evaluate time difference
            time_diff = source_stat.st_mtime - dest_stat.st_mtime
            # Evaluate size difference
            size_diff = source_stat.st_size - dest_stat.st_size
            
            # Check date
            if comparison_mode in ["date", "both"]:
                if time_diff > TIME_DIFF_TOLERANCE:
                    replace_required = True
            
            # Check size
            if comparison_mode in ["size", "both"]:
                if source_stat.st_size > dest_stat.st_size:
                    replace_required = True
            
            if replace_required:
                size_diff_str = format_size_diff(size_diff)
                time_diff_str = format_time_diff(time_diff)
                print(f"({size_diff_str}, {time_diff_str}) OBSOLETE: {source_file} -> {dest_file}")
    
    print("Comparison completed.")

if __name__ == "__main__":
    main()

Functions

def format_date(timestamp)

Convert an epoch timestamp (float or int) to YYYY-MM-DD.

Expand source code
def format_date(timestamp):
    """
    Convert an epoch timestamp (float or int) to YYYY-MM-DD.
    """
    dt = datetime.datetime.fromtimestamp(timestamp)
    return dt.strftime("%Y-%m-%d")
def format_size(bytes_val)

Return a short string for a file size in bytes, e.g. '12 B', '1.2 KB', '3.4 MB', etc.

Expand source code
def format_size(bytes_val):
    """
    Return a short string for a file size in bytes, e.g. '12 B', '1.2 KB', '3.4 MB', etc.
    """
    if bytes_val < 1024:
        return f"{bytes_val} B"
    elif bytes_val < 1024**2:
        return f"{bytes_val/1024:.1f} KB"
    else:
        return f"{bytes_val/(1024**2):.1f} MB"
def format_size_diff(size_diff_bytes)
Expand source code
def format_size_diff(size_diff_bytes):
    sign = "+" if size_diff_bytes >= 0 else "-"
    abs_val = abs(size_diff_bytes)
    if abs_val < 1024:
        return f"{sign}{abs_val} B"
    elif abs_val < 1024**2:
        return f"{sign}{abs_val/1024:.1f} KB"
    else:
        return f"{sign}{abs_val/(1024**2):.1f} MB"
def format_time_diff(time_diff_seconds)
Expand source code
def format_time_diff(time_diff_seconds):
    sign = "+" if time_diff_seconds >= 0 else "-"
    abs_diff = abs(time_diff_seconds)
    # if < 60 seconds
    if abs_diff < 60:
        return f"{sign}{int(abs_diff)} s"
    elif abs_diff < 3600:
        mins = abs_diff / 60.0
        return f"{sign}{mins:.1f} min"
    elif abs_diff < 86400:
        hours = abs_diff / 3600.0
        return f"{sign}{hours:.1f} h"
    else:
        days = abs_diff / 86400.0
        return f"{sign}{days:.1f} days"
def is_excluded_file(filename)

Checks if a file is explicitly excluded based on exclude_files.

Expand source code
def is_excluded_file(filename):
    """
    Checks if a file is explicitly excluded based on exclude_files.
    """
    return any(fnmatch.fnmatch(filename, pat) for pat in exclude_files)
def is_excluded_folder(path, mainfolder)

Checks if a folder should be excluded based on: 1) The relative folder exclusions (exclude_folders_rel) 2) The absolute folder exclusions (exclude_folders_abs)

Expand source code
def is_excluded_folder(path, mainfolder):
    """
    Checks if a folder should be excluded based on:
      1) The relative folder exclusions (exclude_folders_rel)
      2) The absolute folder exclusions (exclude_folders_abs)
    """
    abs_path = os.path.abspath(path)
    
    # Check relative folder patterns
    for rel_excl in exclude_folders_rel:
        if path.endswith(rel_excl.lstrip("./")):
            return True
    
    # Check absolute folder patterns
    for abs_excl in exclude_folders_abs:
        if abs_path.startswith(abs_excl):
            return True
    
    return False
def is_included(filename)

Checks if a file should be included based on the include_patterns.

Expand source code
def is_included(filename):
    """
    Checks if a file should be included based on the include_patterns.
    """
    return any(fnmatch.fnmatch(filename, pat) for pat in include_patterns)
def main()
Expand source code
def main():
    # -------------------------------------------------
    # Step 0: Preliminary checks on script location
    # -------------------------------------------------
    current_dir = os.path.basename(os.getcwd())
    if current_dir != "utils":
        print("Error: You must run this script from the 'utils' folder.")
        sys.exit(1)
    # Check presence of pdocme.sh
    if not os.path.exists("pdocme.sh"):
        print("Error: 'pdocme.sh' is not found in the current folder. Execution aborted.")
        sys.exit(1)
    
    # -------------------------------------------------
    # Step 1: Parse arguments
    # -------------------------------------------------
    if len(sys.argv) < 2:
        usage()
    destination_folder_input = sys.argv[1]
    comparison_mode = "date"  # default
    operation_mode = "update" # default
    create_missing_folders = False
    
    # Arg2: comparison_mode
    if len(sys.argv) >= 3:
        arg2 = sys.argv[2].lower()
        if arg2 in ["date", "size", "both"]:
            comparison_mode = arg2
        elif arg2 in ["missing", "update"]:
            # If user provided 'missing' or 'update' as second arg, treat that as operation_mode
            operation_mode = arg2
        else:
            # not recognized => keep default comparison_mode
            pass
    
    # Arg3: either operation_mode or create_missing_folders_flag
    if len(sys.argv) >= 4:
        arg3 = sys.argv[3].lower()
        if arg3 in ["missing", "update"]:
            operation_mode = arg3
        elif arg3 in ["yes", "no"]:
            create_missing_folders = (arg3 == "yes")
        else:
            # not recognized
            pass
    
    # Arg4: possibly leftover for create_missing_folders_flag
    if len(sys.argv) >= 5:
        arg4 = sys.argv[4].lower()
        if arg4 in ["yes", "no"]:
            create_missing_folders = (arg4 == "yes")
    
    # -------------------------------------------------
    # Step 2: Validate destination folder
    # -------------------------------------------------
    destination_folder = os.path.abspath(destination_folder_input)
    
    if not os.path.exists(destination_folder):
        print(f"Error: Destination folder '{destination_folder}' does not exist.")
        sys.exit(1)
    
    # Check that destination_folder includes 'utils' subfolder and 'pdocme.sh'
    utils_subfolder = os.path.join(destination_folder, "utils")
    pdocme_file = os.path.join(utils_subfolder, "pdocme.sh")
    if not os.path.exists(utils_subfolder) or not os.path.exists(pdocme_file):
        print("Error: Destination folder is not a valid remote copy of the mainfolder. "
              "It must contain 'utils/pdocme.sh'.")
        sys.exit(1)
    
    # -------------------------------------------------
    # Step 3: Walk through the source folder ($mainfolder)
    # -------------------------------------------------
    mainfolder = os.path.dirname(os.path.abspath(os.getcwd()))
    
    for root, dirs, files in os.walk(mainfolder):
        rel_path = os.path.relpath(root, mainfolder)
        if rel_path == ".":
            rel_path = ""
        
        if is_excluded_folder(root, mainfolder):
            dirs[:] = []
            continue
        
        for filename in files:
            source_file = os.path.join(root, filename)
            
            if not is_included(filename):
                continue
            if is_excluded_file(filename):
                continue
            
            dest_file = os.path.join(destination_folder, rel_path, filename)
            source_stat = os.stat(source_file)
            
            # Check if file is missing
            if not os.path.exists(dest_file):
                if operation_mode == "update" or operation_mode == "missing":
                    # We do display missing if mode is 'missing' or 'update'
                    # Show date + size of the source
                    source_date = format_date(source_stat.st_mtime)
                    source_size = format_size(source_stat.st_size)
                    msg = f"(source: {source_date}, {source_size}) MISSING: {source_file} -> {dest_file}"
                    print(msg)
                    if create_missing_folders:
                        dest_dir = os.path.dirname(dest_file)
                        if not os.path.exists(dest_dir):
                            os.makedirs(dest_dir, exist_ok=True)
                            print(f"Created missing folder: {dest_dir}")
                # No need to check obsolete logic because it's missing
                continue
            
            # If operation_mode == "missing", we skip checking obsolete
            if operation_mode == "missing":
                continue
            
            # operation_mode == "update": check if it is obsolete
            dest_stat = os.stat(dest_file)
            replace_required = False
            
            # Evaluate time difference
            time_diff = source_stat.st_mtime - dest_stat.st_mtime
            # Evaluate size difference
            size_diff = source_stat.st_size - dest_stat.st_size
            
            # Check date
            if comparison_mode in ["date", "both"]:
                if time_diff > TIME_DIFF_TOLERANCE:
                    replace_required = True
            
            # Check size
            if comparison_mode in ["size", "both"]:
                if source_stat.st_size > dest_stat.st_size:
                    replace_required = True
            
            if replace_required:
                size_diff_str = format_size_diff(size_diff)
                time_diff_str = format_time_diff(time_diff)
                print(f"({size_diff_str}, {time_diff_str}) OBSOLETE: {source_file} -> {dest_file}")
    
    print("Comparison completed.")
def usage()

Prints usage instructions and exits.

Expand source code
def usage():
    """
    Prints usage instructions and exits.
    """
    print("Usage:")
    print("  ./compare_and_list.py <destination_folder> [comparison_mode] [operation_mode] [create_missing_folders_flag]")
    print("")
    print("Where:")
    print("  <destination_folder> : Mandatory. Path to remote/copy of mainfolder.")
    print("  [comparison_mode]    : Optional. 'date', 'size', or 'both'. Default is 'date'.")
    print("  [operation_mode]     : Optional. 'missing' or 'update'. Default is 'update'.")
    print("      'missing' => only show missing files")
    print("      'update'  => show missing and obsolete files")
    print("  [create_missing_folders_flag] : Optional. 'yes' or 'no'. Default is 'no'.")
    sys.exit(1)