deplicate.github.io

Advanced Duplicate File Finder for Python

View My GitHub Profile

Table of contents

Status

Travis Build Status Requirements Status Codacy Badge Scrutinizer Code Quality

PyPI Status PyPI Version PyPI Python Versions PyPI License

Description

deplicate is an high-performance multi-filter duplicate file finder written in Pure Python with low memory impact and several advanced features.

Find out all the duplicate files in one or more directories, you can also scan directly a bunch of files. Latest releases let you to remove the spotted duplicates and/or apply a custom action over them.

From what we know, it’s the most complete and fastest duplicate finder tool for Python, nowadays.

Features

Installation

Type in your command shell with administrator/root privileges:

pip install deplicate

In Unix-based systems, this is generally achieved by superseding the command sudo.

sudo pip install deplicate

If the above commands fail, consider installing it with the option --user:

pip install --user deplicate

Note: You can install it with its Command Line Interface, typing pip install deplicate[cli]

If the command pip is not found in your system, but you have the Python Interpreter and the package setuptools (>=20.8.1) installed, you can try to install it from the sources, in this way:

  1. Get the latest tarball of the source code in format ZIP or TAR.
  2. Extract the downloaded archive.
  3. From the extracted path, launch the command python setup.py install.

Usage

Import in your script the module duplicate.

import duplicate

Call its function find if you want to know what are the duplicate files:

duplicate.find('/path')

Or purge if you want in addition to remove them:

duplicate.purge('/path')

In both cases, you’ll get a duplicate.ResultInfo object, with following properties:

Note: By default directory paths are scanned recursively.

Note: By default files smaller than 100 MiB or bigger than 100 GiB are not scanned.

Note: File paths are returned in canonical form.

Note: Tuples of duplicate files are sorted in descending order according input priority, file modification time and name length.

Quick Examples

Scan for duplicates a single directory:

import duplicate

duplicate.find('/path/to/dir')

Scan for duplicates two files (at least):

import duplicate

duplicate.find('/path/to/file1', '/path/to/file2')

Scan for duplicates a single directory and move them to the trash/recycle bin:

import duplicate

duplicate.purge('/path/to/dir')

Scan for duplicates a single directory and delete them:

import duplicate

duplicate.purge('/path/to/dir', trash=False)

Scan more directories together:

import duplicate

duplicate.find('/path/to/dir1', '/path/to/dir2', '/path/to/dir3')

Scan from iterable:

import duplicate

iterable = ['/path/to/dir1', '/path/to/dir2', '/path/to/dir3']

duplicate.find.from_iterable(iterable)

Scan ignoring the minimum file size threshold:

import duplicate

duplicate.find('/path/to/dir', minsize=0)

Advanced Examples

Scan without recursing directories:

import duplicate

duplicate.find('/path/to/file1', '/path/to/file2', '/path/to/dir1',
               recursive=False)

Note: In not-recursive mode, like the case above, directory paths are simply ignored.

Scan checking file names and hidden files:

import duplicate

duplicate.find.from_iterable('/path/to/file1', '/path/to/dir1',
                             comparename=True, scanhidden=True)

Scan excluding files with extension .doc:

import duplicate

duplicate.find('/path/to/dir', exclude="*.doc")

Scan including file links:

import duplicate

duplicate.find('/path/to/file1', '/path/to/file2', '/path/to/file3',
               scanlinks=True)

Scan for duplicates, handling errors with a custom action (printing):

import duplicate

def error_callback(exc, filename):
    print(filename)

duplicate.find('/path/to/dir', onerror=error_callback)

Scan for duplicates and apply a custom action (printing), instead of purging:

import duplicate

def purge_callback(filename):
    print(filename)
    raise duplicate.SkipException

duplicate.purge('/path/to/dir', ondel=purge_callback)

Scan for duplicates, apply a custom action (printing) and move them to the trash/recycle bin:

import duplicate

def purge_callback(filename):
    print(filename)

duplicate.purge('/path/to/dir', ondel=purge_callback)

Scan for duplicates, handling errors with a custom action (printing), and apply a custom action (moving to path), instead of purging:

import shutil
import duplicate

def error_callback(exc, filename):
    print(filename)

def purge_callback(filename):
    shutil.move(filename, '/path/to/custom-dir')
    raise duplicate.SkipException

duplicate.purge('/path/to/dir',
                ondel=purge_callback, onerror=error_callback)

API Reference

Exceptions

Classes

Functions


© 2017 Walter Purcaro vuolter@gmail.com