how to tell what files are not important linux
How to identify same-content files on Linux
Copies of files sometimes represent a big waste product of disk space and can cause confusion if you desire to make updates. Here are six commands to help you identify these files.
In a recent post, nosotros looked at how to identify and locate files that are hard links (i.e., that point to the same deejay content and share inodes). In this post, we'll check out commands for finding files that have the aforementioned content, merely are not otherwise connected.
Hard links are helpful because they allow files to be in multiple places in the file organization while not taking upwards any boosted disk space. Copies of files, on the other paw, sometimes represent a big waste of disk space and run some hazard of causing some confusion if you desire to brand updates. In this post, we're going to look at multiple ways to identify these files.
Comparing files with the diff command
Probably the easiest manner to compare two files is to use the diff command. The output will show you lot the differences betwixt the 2 files. The < and > signs indicate whether the extra lines are in the first (<) or second (>) file provided equally arguments. In this example, the extra lines are in fill-in.html.
$ diff index.html backup.html 2438a2439,2441 > <pre> > That'due south all there is to study. > </pre>
If diff shows no output, that means the two files are the aforementioned.
$ unequal abode.html alphabetize.html $
The merely drawbacks to diff are that it can simply compare ii files at a time, and yous have to place the files to compare. Some commands we will await at in this post can find the indistinguishable files for you.
Using checksums
The cksum (checksum) command computes checksums for files. Checksums are a mathematical reduction of the contents to a lengthy number (like 2819078353 228029). While not admittedly unique, the gamble that files that are not identical in content would result in the same checksum is extremely small.
$ cksum *.html 2819078353 228029 backup.html 4073570409 227985 habitation.html 4073570409 227985 index.html
In the instance to a higher place, y'all can see how the second and third files yield the same checksum and tin be assumed to be identical.
Using the find command
While the discover control doesn't accept an option for finding indistinguishable files, it can be used to search files by name or type and run the cksum command. For case:
$ detect . -proper noun "*.html" -exec cksum {} \; 4073570409 227985 ./home.html 2819078353 228029 ./backup.html 4073570409 227985 ./index.html
Using the fslint control
The fslint command can be used to specifically find duplicate files. Note that nosotros give it a starting location. The command can take quite some time to complete if it needs to run through a large number of files. Here's output from a very minor search. Note how it lists the duplicate files and also looks for other bug, such as empty directories and bad IDs.
$ fslint . -----------------------------------file name lint -------------------------------Invalid utf8 names -----------------------------------file case lint ----------------------------------DUPlicate files <== home.html index.html -----------------------------------Dangling links --------------------redundant characters in links ------------------------------------suspect links --------------------------------Empty Directories ./.gnupg ----------------------------------Temporary Files ----------------------duplicate/conflicting Names ------------------------------------------Bad ids -------------------------Non Stripped executables
You lot may accept to install fslint on your system. You will probably have to add information technology to your search path, as well:
$ consign PATH=$PATH:/usr/share/fslint/fslint
Using the rdfind control
The rdfind command will also look for duplicate (same content) files. The proper name stands for "redundant information discover," and the command is able to determine, based on file dates, which files are the originals — which is helpful if yous choose to delete the duplicates, as it will remove the newer files.
$ rdfind ~ Now scanning "/habitation/shark", found 12 files. Now have 12 files in total. Removed 1 files due to nonunique device and inode. Total size is 699498 bytes or 683 KiB Removed nine files due to unique sizes from list.ii files left. Now eliminating candidates based on commencement bytes:removed 0 files from list.two files left. Now eliminating candidates based on terminal bytes:removed 0 files from list.2 files left. Now eliminating candidates based on sha1 checksum:removed 0 files from list.2 files left. It seems like yous have 2 files that are not unique Totally, 223 KiB can be reduced. Now making results file results.txt
You can also run this command in "dryrun" (i.e., only report the changes that might otherwise be fabricated).
$ rdfind -dryrun true ~ (DRYRUN Mode) Now scanning "/home/shark", found 12 files. (DRYRUN MODE) Now have 12 files in total. (DRYRUN MODE) Removed 1 files due to nonunique device and inode. (DRYRUN MODE) Full size is 699352 bytes or 683 KiB Removed 9 files due to unique sizes from listing.two files left. (DRYRUN Mode) Now eliminating candidates based on commencement bytes:removed 0 files from list.2 files left. (DRYRUN MODE) Now eliminating candidates based on terminal bytes:removed 0 files from list.2 files left. (DRYRUN MODE) At present eliminating candidates based on sha1 checksum:removed 0 files from list.2 files left. (DRYRUN MODE) It seems like you have ii files that are not unique (DRYRUN Style) Totally, 223 KiB can be reduced. (DRYRUN Way) Now making results file results.txt
The rdfind command besides provides options for things such as ignoring empty files (-ignoreempty) and post-obit symbolic links (-followsymlinks). Check out the man folio for explanations.
-ignoreempty ignore empty files -minsize ignore files smaller than speficied size -followsymlinks follow symbolic links -removeidentinode remove files referring to identical inode -checksum place checksum type to be used -deterministic determiness how to sort files -makesymlinks turn duplicate files into symbolic links -makehardlinks replace duplicate files with difficult links -makeresultsfile create a results file in the current directory -outputname provide name for results file -deleteduplicates delete/unlink duplicate files -sleep set up sleep time between reading files (milliseconds) -n, -dryrun display what would have been done, just don't do information technology
Note that the rdfind command offers an option to delete indistinguishable files with the -deleteduplicates true setting. Hopefully the command's pocket-size trouble with grammar won't irritate you. ;-)
$ rdfind -deleteduplicates true . ... Deleted 1 files. <==
You volition likely have to install the rdfind command on your system. It's probably a adept thought to experiment with information technology to get comfortable with how it works.
Using the fdupes control
The fdupes command besides makes it easy to identify duplicate files and provides a large number of useful options — similar -r for recursion. In its simplest class, information technology groups duplicate files together similar this:
$ fdupes ~ /home/shs/UPGRADE /home/shs/mytwin /dwelling house/shs/lp.txt /dwelling house/shs/lp.man /dwelling/shs/penguin.png /habitation/shs/penguin0.png /habitation/shs/hideme.png
Here's an example using recursion. Note that many of the duplicate files are important (users' .bashrc and .profile files) and should clearly not be deleted.
# fdupes -r /home /habitation/shark/dwelling.html /dwelling house/shark/index.html /dwelling house/dory/.bashrc /home/eel/.bashrc /home/nemo/.profile /home/dory/.profile /dwelling/shark/.profile /dwelling house/nemo/tryme /home/shs/tryme /home/shs/pointer.png /home/shs/PNGs/arrow.png /dwelling house/shs/11/files_11.naught /dwelling/shs/ERIC/file_11.zip /habitation/shs/penguin0.jpg /abode/shs/PNGs/penguin.jpg /habitation/shs/PNGs/penguin0.jpg /domicile/shs/Sandra_rotated.png /home/shs/PNGs/Sandra_rotated.png
The fdupe control's many options are listed below. Use the fdupes -h command, or read the man folio for more details.
-r --recurse recurse -R --recurse: recurse through specified directories -southward --symlinks follow symlinked directories -H --hardlinks care for hard links every bit duplicates -n --noempty ignore empty files -f --omitfirst omit the beginning file in each set of matches -A --nohidden ignore hidden files -1 --sameline list matches on a single line -S --size testify size of duplicate files -thousand --summarize summarize duplicate files data -q --serenity hide progress indicator -d --delete prompt user for files to preserve -Due north --noprompt when used with --delete, preserve the commencement file in prepare -I --firsthand delete duplicates as they are encountered -p --permissions don't soncider files with different possessor/group or permission bits as duplicates -o --order=WORD order files according to specification -i --contrary opposite order while sorting -v --version brandish fdupes version -h --help displays help
The fdupes command is another one that y'all're like to take to install and work with for a while to become familiar with its many options.
Wrap-up
Linux systems provide a adept selection of tools for locating and potentially removing duplicate files, along with options for where you want to run your search and what you want to do with indistinguishable files when you discover them.
Bring together the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Copyright © 2019 IDG Communications, Inc.
Source: https://www.networkworld.com/article/3390204/how-to-identify-same-content-files-on-linux.html
0 Response to "how to tell what files are not important linux"
Post a Comment