テスト

Question 1

シェルスクリプトを生成せずに単にコマンドラインツールを使用したい場合は、ほとんどfdupesのディストリビューションでこれを実行できるプログラムがあります。

fslint同じ機能を持つGUIベースのツールもあります。

Answer

シェルスクリプトを生成せずに単にコマンドラインツールを使用したい場合は、ほとんどfdupesのディストリビューションでこれを実行できるプログラムがあります。

fslint同じ機能を持つGUIベースのツールもあります。

Question 2

このソリューションは、O（n）時間内に重複項目を探します。各ファイルにはそれに対して生成されたチェックサムがあり、各ファイルは連想配列を介して既知のチェックサムセットと順番に比較されます。

#!/bin/bash
#
# Usage:  ./delete-duplicates.sh  [<files...>]
#
declare -A filecksums

# No args, use files in current directory
test 0 -eq $# && set -- *

for file in "$@"
do
    # Files only (also no symlinks)
    [[ -f "$file" ]] && [[ ! -h "$file" ]] || continue

    # Generate the checksum
    cksum=$(cksum <"$file" | tr ' ' _)

    # Have we already got this one?
    if [[ -n "${filecksums[$cksum]}" ]] && [[ "${filecksums[$cksum]}" != "$file" ]]
    then
        echo "Found '$file' is a duplicate of '${filecksums[$cksum]}'" >&2
        echo rm -f "$file"
    else
        filecksums[$cksum]="$file"
    fi
done

コマンドラインでファイル（またはワイルドカード）を指定しないと、現在のディレクトリにあるファイルセットが使用されます。複数のディレクトリのファイルを比較しますが、ディレクトリ自体に再帰的に移動することはありません。

セットの「最初の」ファイルは常に最終バージョンと見なされます。ファイルの時間、権限、または所有権は考慮されません。コンテンツのみを検討してください。

要件を満たしていることを確認したら、echoラインから削除してください。rm -f "$file"行を変更するには、ln -f "${filecksums[$cksum]}" "$file"コンテンツをハードリンクするだけです。また、ファイル名を失うことなくディスク容量を節約します。

Answer

このソリューションは、O（n）時間内に重複項目を探します。各ファイルにはそれに対して生成されたチェックサムがあり、各ファイルは連想配列を介して既知のチェックサムセットと順番に比較されます。

#!/bin/bash
#
# Usage:  ./delete-duplicates.sh  [<files...>]
#
declare -A filecksums

# No args, use files in current directory
test 0 -eq $# && set -- *

for file in "$@"
do
    # Files only (also no symlinks)
    [[ -f "$file" ]] && [[ ! -h "$file" ]] || continue

    # Generate the checksum
    cksum=$(cksum <"$file" | tr ' ' _)

    # Have we already got this one?
    if [[ -n "${filecksums[$cksum]}" ]] && [[ "${filecksums[$cksum]}" != "$file" ]]
    then
        echo "Found '$file' is a duplicate of '${filecksums[$cksum]}'" >&2
        echo rm -f "$file"
    else
        filecksums[$cksum]="$file"
    fi
done

コマンドラインでファイル（またはワイルドカード）を指定しないと、現在のディレクトリにあるファイルセットが使用されます。複数のディレクトリのファイルを比較しますが、ディレクトリ自体に再帰的に移動することはありません。

セットの「最初の」ファイルは常に最終バージョンと見なされます。ファイルの時間、権限、または所有権は考慮されません。コンテンツのみを検討してください。

要件を満たしていることを確認したら、echoラインから削除してください。rm -f "$file"行を変更するには、ln -f "${filecksums[$cksum]}" "$file"コンテンツをハードリンクするだけです。また、ファイル名を失うことなくディスク容量を節約します。

Question 3

スクリプトの主な問題は、数値でiはなく実際のファイル名を値として使用するようです。j名前を配列に入れてインデックスを使用するとi機能jします。

files=(*)
count=${#files[@]}
for (( i=0 ; i < count ;i++ )); do 
    for (( j=i+1 ; j < count ; j++ )); do
        if diff -q "${files[i]}" "${files[j]}"  >/dev/null ; then
            echo "${files[i]} and ${files[j]} are the same"
        fi
    done
done

ksh（Bashと/ Debianで動作しているようですksh93。）

この割り当ては、a=(this that)2つの要素の合計（インデックス0と1）で配列を初期化します。ワードセパレーションとワイルドカードは通常どおりに機能するため、初期化は現在のディレクトリにあるすべてのファイル名（ドットファイルを除く）で行われます。配列内のすべての要素に拡張されるため、ハッシュ表記には配列内の要素数と同じ長さが必要です。（これは配列の最初の要素になり、配列ではなく最初の要素の長さです！）athisthatfiles=(*)files"${files[@]}"${#files[@]}${files}${#files}

for i in `/folder/*`

確かにここバックティックはタイプミスですか？最初のファイルをコマンドとして実行し、残りのファイルを引数として提供します。

Answer

スクリプトの主な問題は、数値でiはなく実際のファイル名を値として使用するようです。j名前を配列に入れてインデックスを使用するとi機能jします。

files=(*)
count=${#files[@]}
for (( i=0 ; i < count ;i++ )); do 
    for (( j=i+1 ; j < count ; j++ )); do
        if diff -q "${files[i]}" "${files[j]}"  >/dev/null ; then
            echo "${files[i]} and ${files[j]} are the same"
        fi
    done
done

ksh（Bashと/ Debianで動作しているようですksh93。）

この割り当ては、a=(this that)2つの要素の合計（インデックス0と1）で配列を初期化します。ワードセパレーションとワイルドカードは通常どおりに機能するため、初期化は現在のディレクトリにあるすべてのファイル名（ドットファイルを除く）で行われます。配列内のすべての要素に拡張されるため、ハッシュ表記には配列内の要素数と同じ長さが必要です。（これは配列の最初の要素になり、配列ではなく最初の要素の長さです！）athisthatfiles=(*)files"${files[@]}"${#files[@]}${files}${#files}

for i in `/folder/*`

確かにここバックティックはタイプミスですか？最初のファイルをコマンドとして実行し、残りのファイルを引数として提供します。

Question 4

ところで、チェックサムやハッシュを使用することをお勧めします。私のスクリプトはそれを使用しません。ただし、ファイルが小さく、ファイル数が大きくない場合（例：10〜20ファイル）、このスクリプトは非常に高速に実行されます。それぞれ1000行のファイルが100を超える場合、時間は10秒を超えます。

使用法: ./duplicate_removing.sh files/*

#!/bin/bash

for target_file in "$@"; do
    shift
    for candidate_file in "$@"; do
        compare=$(diff -q "$target_file" "$candidate_file")
        if [ -z "$compare" ]; then
            echo the "$target_file" is a copy "$candidate_file"
            echo rm -v "$candidate_file"
        fi
    done
done

テスト

任意のファイルを生成します。 ./creating_random_files.sh

#!/bin/bash

file_amount=10
files_dir="files"

mkdir -p "$files_dir"

while ((file_amount)); do
    content=$(shuf -i 1-1000)
    echo "$RANDOM" "$content" | tee "${files_dir}/${file_amount}".txt{,.copied} > /dev/null
    ((file_amount--))
done

走る ./duplicate_removing.sh files/* そして出力を得なさい

the files/10.txt is a copy files/10.txt.copied
rm -v files/10.txt.copied
the files/1.txt is a copy files/1.txt.copied
rm -v files/1.txt.copied
the files/2.txt is a copy files/2.txt.copied
rm -v files/2.txt.copied
the files/3.txt is a copy files/3.txt.copied
rm -v files/3.txt.copied
the files/4.txt is a copy files/4.txt.copied
rm -v files/4.txt.copied
the files/5.txt is a copy files/5.txt.copied
rm -v files/5.txt.copied
the files/6.txt is a copy files/6.txt.copied
rm -v files/6.txt.copied
the files/7.txt is a copy files/7.txt.copied
rm -v files/7.txt.copied
the files/8.txt is a copy files/8.txt.copied
rm -v files/8.txt.copied
the files/9.txt is a copy files/9.txt.copied
rm -v files/9.txt.copied

Answer

ところで、チェックサムやハッシュを使用することをお勧めします。私のスクリプトはそれを使用しません。ただし、ファイルが小さく、ファイル数が大きくない場合（例：10〜20ファイル）、このスクリプトは非常に高速に実行されます。それぞれ1000行のファイルが100を超える場合、時間は10秒を超えます。

使用法: ./duplicate_removing.sh files/*

#!/bin/bash

for target_file in "$@"; do
    shift
    for candidate_file in "$@"; do
        compare=$(diff -q "$target_file" "$candidate_file")
        if [ -z "$compare" ]; then
            echo the "$target_file" is a copy "$candidate_file"
            echo rm -v "$candidate_file"
        fi
    done
done

テスト

任意のファイルを生成します。 ./creating_random_files.sh

#!/bin/bash

file_amount=10
files_dir="files"

mkdir -p "$files_dir"

while ((file_amount)); do
    content=$(shuf -i 1-1000)
    echo "$RANDOM" "$content" | tee "${files_dir}/${file_amount}".txt{,.copied} > /dev/null
    ((file_amount--))
done

走る ./duplicate_removing.sh files/* そして出力を得なさい

the files/10.txt is a copy files/10.txt.copied
rm -v files/10.txt.copied
the files/1.txt is a copy files/1.txt.copied
rm -v files/1.txt.copied
the files/2.txt is a copy files/2.txt.copied
rm -v files/2.txt.copied
the files/3.txt is a copy files/3.txt.copied
rm -v files/3.txt.copied
the files/4.txt is a copy files/4.txt.copied
rm -v files/4.txt.copied
the files/5.txt is a copy files/5.txt.copied
rm -v files/5.txt.copied
the files/6.txt is a copy files/6.txt.copied
rm -v files/6.txt.copied
the files/7.txt is a copy files/7.txt.copied
rm -v files/7.txt.copied
the files/8.txt is a copy files/8.txt.copied
rm -v files/8.txt.copied
the files/9.txt is a copy files/9.txt.copied
rm -v files/9.txt.copied

テスト

答え1

答え2

答え3

答え4

テスト

関連情報