重複ディレクトリの検索と一覧表示

Question 1

私の音楽コレクションにも同じ問題があります。

特殊文字、スペース、記号がこれを困難にします。戦略はMD5sumです。ソート済み文書名前これにより、スクリプトは親ディレクトリと一緒にハッシュをソートして重複するエントリを見つけることができます。 findは2つの異なるディレクトリにあるファイルの順序を保証できないため、サブファイル名を並べ替える必要があります。

Bashスクリプト（Debian 10）：

#!/bin/bash

# usage: ./find_duplicates tunes_dir
# output: c547c3bcf85b9c578a1a52dd20665343 - /mnt/tunes/soul brothers/Motherlode
# MD5 is generated from all children filenames + album folder name
# sort list by MD5 then list duplicate (32bit hashes) representing albums
# Album/CD1/... Album/CD2/... will show (3) results if Album is duplicated
# CD1/2 example is indistinguishable from Discography/Album/Song.mp3

if [ $# -eq 0 ]; then
    echo "Please supply tunes directory as first arg"
    exit 1
fi

# Using absolute path of tunes_dir param
find $(readlink -f $1) -type d | while IFS= read -r line
do
    cd "$line"
    children=$(find ./ -type f | sort)
    base=$(basename "$line")
    sum=$(echo $children $base | md5sum)
    echo $sum $line
done | sort -n | uniq -D -w 32

ディレクトリ構造：

user@pc:~/test# find . -type d
./super soul brothers
./super soul brothers/Stritch's Brew
./super soul brothers/Fireball!
./super soul brothers/Motherlode
./car_tunes
./car_tunes/Fireball!

出力例：

user@pc:~# ./find_duplicates  test/
07b0f79429663685f4005486af20247a - /root/test/car_tunes/Fireball!
07b0f79429663685f4005486af20247a - /root/test/super soul brothers/Fireball!

Answer

私の音楽コレクションにも同じ問題があります。

特殊文字、スペース、記号がこれを困難にします。戦略はMD5sumです。ソート済み文書名前これにより、スクリプトは親ディレクトリと一緒にハッシュをソートして重複するエントリを見つけることができます。 findは2つの異なるディレクトリにあるファイルの順序を保証できないため、サブファイル名を並べ替える必要があります。

Bashスクリプト（Debian 10）：

#!/bin/bash

# usage: ./find_duplicates tunes_dir
# output: c547c3bcf85b9c578a1a52dd20665343 - /mnt/tunes/soul brothers/Motherlode
# MD5 is generated from all children filenames + album folder name
# sort list by MD5 then list duplicate (32bit hashes) representing albums
# Album/CD1/... Album/CD2/... will show (3) results if Album is duplicated
# CD1/2 example is indistinguishable from Discography/Album/Song.mp3

if [ $# -eq 0 ]; then
    echo "Please supply tunes directory as first arg"
    exit 1
fi

# Using absolute path of tunes_dir param
find $(readlink -f $1) -type d | while IFS= read -r line
do
    cd "$line"
    children=$(find ./ -type f | sort)
    base=$(basename "$line")
    sum=$(echo $children $base | md5sum)
    echo $sum $line
done | sort -n | uniq -D -w 32

ディレクトリ構造：

user@pc:~/test# find . -type d
./super soul brothers
./super soul brothers/Stritch's Brew
./super soul brothers/Fireball!
./super soul brothers/Motherlode
./car_tunes
./car_tunes/Fireball!

出力例：

user@pc:~# ./find_duplicates  test/
07b0f79429663685f4005486af20247a - /root/test/car_tunes/Fireball!
07b0f79429663685f4005486af20247a - /root/test/super soul brothers/Fireball!

Question 2

bashバージョン4以降を使用してください。 macOSではデフォルト値がbash古すぎて、Homebrewパッケージマネージャを介してインストールできます。

# Make glob patterns disappear rather than remain unexpanded
# if the don't match (nullglob).
# Make glob patterns also match hidden names (dotglob).
shopt -s nullglob dotglob

# Create an associative array that hold the number of times
# a directory's name has been seen (the basename of the directory's
# pathname is the key into this array).
declare -A count

# Set the positional parameters ($1, $2, etc.) to the pathnames
# of the directories that we're interested in.
set -- Top_Dir/*/*/

# Loop over out directory paths,
# and count how many times each basename occurs.
for dirpath do
        name=$( basename "$dirpath" )
        count["$name"]=$(( count["$name"] + 1 ))
done

# Loop over the directory paths again, but this time
# output each directory whose basename occurs more than once.
for dirpath do
        name=$( basename "$dirpath" )
        [[ ${count["$name"]} -gt 1 ]] && printf '%s\n' "$dirpath"
done

テスト：

$ tree -F
.
|-- Top_Dir/
|   |-- Level_1_Dir/
|   |   |-- standard_cat/
|   |   |-- standard_dog/
|   |   `-- standard_snake/
|   |-- Level_2_Dir/
|   |   |-- standard_cat/
|   |   |-- standard_moon/
|   |   `-- standard_sun/
|   `-- Level_3_Dir/
|       |-- standard_man/
|       |-- standard_moon/
|       `-- standard_woman/
`-- script

13 directories, 1 file

$ bash script
Top_Dir/Level_1_Dir/standard_cat/
Top_Dir/Level_2_Dir/standard_cat/
Top_Dir/Level_2_Dir/standard_moon/
Top_Dir/Level_3_Dir/standard_moon/

以前のバージョンをサポートするには、オプションで、bashディレクトリの一意のベース名と各ベース名の発生回数を2つの別々の一般的な配列に保存できます。これには各ループで線形検索が必要です。

shopt -s nullglob dotglob

set -- Top_Dir/*/*/

names=()
counts=()
for dirpath do
        name=$( basename "$dirpath" )

        found=false
        for i in "${!names[@]}"; do
                if [[ ${names[i]} == "$name" ]]; then
                        found=true
                        break
                fi
        done

        if "$found"; then
                counts[i]=$(( counts[i] + 1 ))
        else
                names+=( "$name" )
                counts+=( 1 )
        fi
done

for dirpath do
        name=$( basename "$dirpath" )

        for i in "${!names[@]}"; do
                if [[ ${names[i]} == "$name" ]]; then
                        [[ ${counts[i]} -gt 1 ]] && printf '%s\n' "$dirpath"
                        break
                fi
        done
done

Answer

bashバージョン4以降を使用してください。 macOSではデフォルト値がbash古すぎて、Homebrewパッケージマネージャを介してインストールできます。

# Make glob patterns disappear rather than remain unexpanded
# if the don't match (nullglob).
# Make glob patterns also match hidden names (dotglob).
shopt -s nullglob dotglob

# Create an associative array that hold the number of times
# a directory's name has been seen (the basename of the directory's
# pathname is the key into this array).
declare -A count

# Set the positional parameters ($1, $2, etc.) to the pathnames
# of the directories that we're interested in.
set -- Top_Dir/*/*/

# Loop over out directory paths,
# and count how many times each basename occurs.
for dirpath do
        name=$( basename "$dirpath" )
        count["$name"]=$(( count["$name"] + 1 ))
done

# Loop over the directory paths again, but this time
# output each directory whose basename occurs more than once.
for dirpath do
        name=$( basename "$dirpath" )
        [[ ${count["$name"]} -gt 1 ]] && printf '%s\n' "$dirpath"
done

テスト：

$ tree -F
.
|-- Top_Dir/
|   |-- Level_1_Dir/
|   |   |-- standard_cat/
|   |   |-- standard_dog/
|   |   `-- standard_snake/
|   |-- Level_2_Dir/
|   |   |-- standard_cat/
|   |   |-- standard_moon/
|   |   `-- standard_sun/
|   `-- Level_3_Dir/
|       |-- standard_man/
|       |-- standard_moon/
|       `-- standard_woman/
`-- script

13 directories, 1 file

$ bash script
Top_Dir/Level_1_Dir/standard_cat/
Top_Dir/Level_2_Dir/standard_cat/
Top_Dir/Level_2_Dir/standard_moon/
Top_Dir/Level_3_Dir/standard_moon/

以前のバージョンをサポートするには、オプションで、bashディレクトリの一意のベース名と各ベース名の発生回数を2つの別々の一般的な配列に保存できます。これには各ループで線形検索が必要です。

shopt -s nullglob dotglob

set -- Top_Dir/*/*/

names=()
counts=()
for dirpath do
        name=$( basename "$dirpath" )

        found=false
        for i in "${!names[@]}"; do
                if [[ ${names[i]} == "$name" ]]; then
                        found=true
                        break
                fi
        done

        if "$found"; then
                counts[i]=$(( counts[i] + 1 ))
        else
                names+=( "$name" )
                counts+=( 1 )
        fi
done

for dirpath do
        name=$( basename "$dirpath" )

        for i in "${!names[@]}"; do
                if [[ ${names[i]} == "$name" ]]; then
                        [[ ${counts[i]} -gt 1 ]] && printf '%s\n' "$dirpath"
                        break
                fi
        done
done

Question 3

これはbashを使用してUbuntuで行うことができます。ツリーの深さに関係なく、重複したディレクトリのみが一致します。 $（）部分は、counted最後の列の重複エントリからパイプで区切られたディレクトリ名のリストを作成しますls -l。パイプで区切られたこのリストは、すべてのディレクトリリストでgrepを使用してフィルタリングされます。また、他のファイルは考慮されません（完全な単語一致などは使用されません）。

> ls -lR Top_Dir/ | grep -E $(ls -lR Top_Dir/ | grep ^d | rev | cut -d" " -f1 | rev | sort | uniq -d | head -c -1 | tr '\n' '|') | grep -v ^d | sed 's/://'

Top_Dir/Level_1_Dir/standard_cat

Top_Dir/Level_2_Dir/standard_cat

親_方向/レベル_2_方向/標準_ステートメント

親_方向/レベル_3_方向/標準_ステートメント

Answer

これはbashを使用してUbuntuで行うことができます。ツリーの深さに関係なく、重複したディレクトリのみが一致します。 $（）部分は、counted最後の列の重複エントリからパイプで区切られたディレクトリ名のリストを作成しますls -l。パイプで区切られたこのリストは、すべてのディレクトリリストでgrepを使用してフィルタリングされます。また、他のファイルは考慮されません（完全な単語一致などは使用されません）。

> ls -lR Top_Dir/ | grep -E $(ls -lR Top_Dir/ | grep ^d | rev | cut -d" " -f1 | rev | sort | uniq -d | head -c -1 | tr '\n' '|') | grep -v ^d | sed 's/://'

Top_Dir/Level_1_Dir/standard_cat

Top_Dir/Level_2_Dir/standard_cat

親_方向/レベル_2_方向/標準_ステートメント

親_方向/レベル_3_方向/標準_ステートメント

重複ディレクトリの検索と一覧表示

答え1

答え2

答え3

関連情報