名前の異なる重複ファイルの検索と削除

Question 1

このようなプログラムがありますが、名前は次のとおりですrdfind。

SYNOPSIS
   rdfind [ options ] directory1 | file1 [ directory2 | file2 ] ...

DESCRIPTION
   rdfind  finds duplicate files across and/or within several directories.
   It calculates checksum only if necessary.  rdfind  runs  in  O(Nlog(N))
   time with N being the number of files.

   If  two  (or  more) equal files are found, the program decides which of
   them is the original and the rest are considered  duplicates.  This  is
   done  by  ranking  the  files  to each other and deciding which has the
   highest rank. See section RANKING for details.

重複したアイテムを削除したり、シンボルまたはハードリンクに置き換えることができます。

Answer

このようなプログラムがありますが、名前は次のとおりですrdfind。

SYNOPSIS
   rdfind [ options ] directory1 | file1 [ directory2 | file2 ] ...

DESCRIPTION
   rdfind  finds duplicate files across and/or within several directories.
   It calculates checksum only if necessary.  rdfind  runs  in  O(Nlog(N))
   time with N being the number of files.

   If  two  (or  more) equal files are found, the program decides which of
   them is the original and the rest are considered  duplicates.  This  is
   done  by  ranking  the  files  to each other and deciding which has the
   highest rank. See section RANKING for details.

重複したアイテムを削除したり、シンボルまたはハードリンクに置き換えることができます。

Question 2

吸引。私はこれと重複する問題を解決するために、すべての重複エントリを一覧表示する1行を開発しました。どのくらいのメタ。まあ、無駄なものが恥ずかしいので、rdfindより良い解決策のように聞こえても投稿します。

これは、少なくとも「本物の」Unix方式という利点があります。 ;)

find -name '*.mp3' -print0 | xargs -0 md5sum | sort | uniq -Dw 32

パイプを壊す：

find -name '*.mp3' -print0現在のディレクトリから始めて、サブツリー内のすべてのmp3ファイルを見つけて名前を印刷します（NULで区切ります）。

xargs -0 md5sumNULで区切られたリストを読み取り、各ファイルのチェックサムを計算します。

あなたはsortそれが何をしているのか知っています。

uniq -Dw 32ソートされた行の最初の32文字を比較して、同じハッシュ値を持つ文字のみを印刷します。

したがって、すべての重複リストが生成されます。次に、削除するアイテムに手動で減らし、ハッシュを削除してリストをrm。

Answer