Advanced Duplicate File Finder for Python
deplicate is an high-performance multi-filter duplicate file finder written in Pure Python with low memory impact and several advanced features.
Find out all the duplicate files in one or more directories, you can also scan directly a bunch of files. Latest releases let you to remove the spotted duplicates and/or apply a custom action over them.
From what we know, it’s the most complete and fastest duplicate finder tool for Python, nowadays.
Type in your command shell with administrator/root privileges:
pip install deplicate
In Unix-based systems, this is generally achieved by superseding
the command sudo
.
sudo pip install deplicate
If the above commands fail, consider installing it with the option
--user
:
pip install --user deplicate
Note: You can install it with its Command Line Interface, typing
pip install deplicate[cli]
If the command pip
is not found in your system, but you have the
Python Interpreter and the package setuptools
(>=20.8.1) installed, you can try to install it from the sources, in this way:
python setup.py install
.Import in your script the module duplicate
.
import duplicate
Call its function find
if you want to know what are the duplicate files:
duplicate.find('/path')
Or purge
if you want in addition to remove them:
duplicate.purge('/path')
In both cases, you’ll get a duplicate.ResultInfo
object,
with following properties:
dups
– Tuples of paths of duplicate files.deldups
– Tuple of paths of purged duplicate files.duperrors
– Tuple of paths of files not filtered due errors.scanerrors
– Tuple of paths of files not scanned due errors.delerrors
– Tuple of paths of files not purged due errors.Note: By default directory paths are scanned recursively.
Note: By default files smaller than 100 MiB or bigger than 100 GiB are not scanned.
Note: File paths are returned in canonical form.
Note: Tuples of duplicate files are sorted in descending order according input priority, file modification time and name length.
Scan for duplicates a single directory:
import duplicate
duplicate.find('/path/to/dir')
Scan for duplicates two files (at least):
import duplicate
duplicate.find('/path/to/file1', '/path/to/file2')
Scan for duplicates a single directory and move them to the trash/recycle bin:
import duplicate
duplicate.purge('/path/to/dir')
Scan for duplicates a single directory and delete them:
import duplicate
duplicate.purge('/path/to/dir', trash=False)
Scan more directories together:
import duplicate
duplicate.find('/path/to/dir1', '/path/to/dir2', '/path/to/dir3')
Scan from iterable:
import duplicate
iterable = ['/path/to/dir1', '/path/to/dir2', '/path/to/dir3']
duplicate.find.from_iterable(iterable)
Scan ignoring the minimum file size threshold:
import duplicate
duplicate.find('/path/to/dir', minsize=0)
Scan without recursing directories:
import duplicate
duplicate.find('/path/to/file1', '/path/to/file2', '/path/to/dir1',
recursive=False)
Note: In not-recursive mode, like the case above, directory paths are simply ignored.
Scan checking file names and hidden files:
import duplicate
duplicate.find.from_iterable('/path/to/file1', '/path/to/dir1',
comparename=True, scanhidden=True)
Scan excluding files with extension .doc
:
import duplicate
duplicate.find('/path/to/dir', exclude="*.doc")
Scan including file links:
import duplicate
duplicate.find('/path/to/file1', '/path/to/file2', '/path/to/file3',
scanlinks=True)
Scan for duplicates, handling errors with a custom action (printing):
import duplicate
def error_callback(exc, filename):
print(filename)
duplicate.find('/path/to/dir', onerror=error_callback)
Scan for duplicates and apply a custom action (printing), instead of purging:
import duplicate
def purge_callback(filename):
print(filename)
raise duplicate.SkipException
duplicate.purge('/path/to/dir', ondel=purge_callback)
Scan for duplicates, apply a custom action (printing) and move them to the trash/recycle bin:
import duplicate
def purge_callback(filename):
print(filename)
duplicate.purge('/path/to/dir', ondel=purge_callback)
Scan for duplicates, handling errors with a custom action (printing), and apply a custom action (moving to path), instead of purging:
import shutil
import duplicate
def error_callback(exc, filename):
print(filename)
def purge_callback(filename):
shutil.move(filename, '/path/to/custom-dir')
raise duplicate.SkipException
duplicate.purge('/path/to/dir',
ondel=purge_callback, onerror=error_callback)
SkipException
(*args, **kwargs)
Exception
.Exception
.Exception
.Cache
(maxlen=DEFAULT_MAXLEN
)
maxlen
– Maximum number of entries stored.DEFAULT_MAXLEN
128
.clear
(self)
True
if went cleared, otherwise False
.Deplicate
(paths,
minsize=DEFAULT_MINSIZE
,
maxsize=DEFAULT_MAXSIZE
,
include=None
, exclude=None
,
comparename=False
, comparemtime=False
, comparemode=False
,
recursive=True
, followlinks=False
, scanlinks=False
,
scanempties=False
,
scansystem=True
, scanarchived=True
, scanhidden=True
)
paths
– Iterable of directory and/or file paths.minsize
– (optional) Minimum size in bytes of files to include
in scanning.maxsize
– (optional) Maximum size in bytes of files to include
in scanning.include
– (optional) Wildcard pattern of files to include
in scanning.exclude
– (optional) Wildcard pattern of files to exclude
from scanning.comparename
– (optional) Check file name.comparemtime
– (optional) Check file modification time.compareperms
– (optional) Check file mode (permissions).recursive
– (optional) Scan directory recursively.followlinks
– (optional) Follow symbolic links pointing to directory.scanlinks
– (optional) Scan symbolic links pointing to file
(hard-links included).scanempties
– (optional) Scan empty files.scansystems
– (optional) Scan OS files.scanarchived
– (optional) Scan archived files.scanhidden
– (optional) Scan hidden files.DEFAULT_MINSIZE
102400
.DEFAULT_MAXSIZE
107374182400
.result
find
or purge
invocation
(by default is None
).duplicate.ResultInfo
.find
(self, onerror=None
, notify=None
)
onerror
– (optional) Callback function called with two arguments,
exception
and filename
, when an error occurs during file
scanning or filtering.notify
– (internal) Notifier callback.purge
(self,
trash=True
, ondel=None
, onerror=None
, notify=None
)
trash
– (optional) Move duplicate files to trash/recycle bin,
instead of deleting.ondel
– (optional) Callback function called with one arguments,
filename
, before purging a duplicate file.onerror
– (optional) Callback function called with two arguments,
exception
and filename
, when an error occurs during file
scanning, filtering or purging.notify
– (internal) Notifier callback.ResultInfo
(dupinfo, delduplist, scnerrlist, delerrors)
collections.namedtuple
('ResultInfo'
,
'dups deldups duperrors scanerrors delerrors'
).dupinfo
– (internal) Instance of duplicate.structs.DupInfo
.delduplist
– (internal) Iterable of purged files
(deleted or trashed).scnerrlist
– (internal) Iterable of files not scanned (due errors).delerrors
– (internal) Iterable of files not purged (due errors).collections.namedtuple
.collections.namedtuple
.find
(*paths,
minsize=duplicate.Deplicate.DEFAULT_MINSIZE
,
maxsize=duplicate.Deplicate.DEFAULT_MAXSIZE
,
include=None
, exclude=None
,
comparename=False
, comparemtime=False
, comparemode=False
,
recursive=True
, followlinks=False
, scanlinks=False
,
scanempties=False
,
scansystem=True
, scanarchived=True
, scanhidden=True
,
onerror=None
, notify=None
)
duplicate.ResultInfo
.paths
– Iterable of directory and/or file paths.minsize
– (optional) Minimum size in bytes of files to include
in scanning.maxsize
– (optional) Maximum size in bytes of files to include
in scanning.include
– (optional) Wildcard pattern of files to include
in scanning.exclude
– (optional) Wildcard pattern of files to exclude
from scanning.comparename
– (optional) Check file name.comparemtime
– (optional) Check file modification time.compareperms
– (optional) Check file mode (permissions).recursive
– (optional) Scan directory recursively.followlinks
– (optional) Follow symbolic links pointing to directory.scanlinks
– (optional) Scan symbolic links pointing to file
(hard-links included).scanempties
– (optional) Scan empty files.scansystems
– (optional) Scan OS files.scanarchived
– (optional) Scan archived files.scanhidden
– (optional) Scan hidden files.onerror
– (optional) Callback function called with two arguments,
exception
and filename
, when an error occurs during file scanning or
filtering.notify
– (internal) (optional) Notifier callback.purge
(*paths,
minsize=duplicate.Deplicate.DEFAULT_MINSIZE
,
maxsize=duplicate.Deplicate.DEFAULT_MAXSIZE
,
include=None
, exclude=None
,
comparename=False
, comparemtime=False
, comparemode=False
,
recursive=True
, followlinks=False
, scanlinks=False
,
scanempties=False
,
scansystem=True
, scanarchived=True
, scanhidden=True
,
trash=True
, ondel=None
, onerror=None
, notify=None
)
duplicate.ResultInfo
.paths
– Iterable of directory and/or file paths.minsize
– (optional) Minimum size in bytes of files to include
in scanning.maxsize
– (optional) Maximum size in bytes of files to include
in scanning.include
– (optional) Wildcard pattern of files to include
in scanning.exclude
– (optional) Wildcard pattern of files to exclude
from scanning.comparename
– (optional) Check file name.comparemtime
– (optional) Check file modification time.compareperms
– (optional) Check file mode (permissions).recursive
– (optional) Scan directory recursively.followlinks
– (optional) Follow symbolic links pointing to directory.scanlinks
– (optional) Scan symbolic links pointing to file
(hard-links included).scanempties
– (optional) Scan empty files.scansystems
– (optional) Scan OS files.scanarchived
– (optional) Scan archived files.scanhidden
– (optional) Scan hidden files.trash
– (optional) Move duplicate files to trash/recycle bin,
instead of deleting.ondel
– (optional) Callback function called with one arguments,
filename
, before purging a duplicate file.onerror
– (optional) Callback function called with two arguments,
exception
and filename
, when an error occurs during file scanning,
filtering or purging.notify
– (internal) (optional) Notifier callback.