最初の列から最初の重複行を削除する

Question 1

awk一意の最初の列を持つ行がない場合は、次を試してください。

awk -F, 'pre==$1 { print; next }{ pre=$1 }' infile

または通常、次のように変更します。

awk -F, 'pre==$1 { print; is_uniq=0; next }
                 # print when current& previous lines' 1^st column were same
                 # unset the 'is_uniq=0' variable since duplicated lines found

         is_uniq { print temp }
                 # print if previous line ('temp' variable keep a backup of previous line) is a 
                 # uniq line (according to the first column)

                 { pre=$1; temp=$0; is_uniq=1 }
                 # backup first column and whole line into 'pre' & 'temp' variable respectively
                 # and set the 'is_uinq=1' (assuming might that will be a uniq line)

END{ if(is_uniq) print temp }' infile
    # if there was a line that it's uniq and is the last line of input file, then print it

コメントなしの同じスクリプト：

awk -F, 'pre==$1 { print; is_uniq=0; next }
         is_uniq { print temp }
                 { pre=$1; temp=$0; is_uniq=1 }
END{ if(is_uniq) print temp }' infile

メモ:これは、入力ファイルがinfile最初のフィールドでソートされていると仮定します。そうでない場合は、ソートされたファイルを次のフィールドに渡す必要があります。

awk ... <(sort -t, -k1,1 infile)

Answer

awk一意の最初の列を持つ行がない場合は、次を試してください。

awk -F, 'pre==$1 { print; next }{ pre=$1 }' infile

または通常、次のように変更します。

awk -F, 'pre==$1 { print; is_uniq=0; next }
                 # print when current& previous lines' 1^st column were same
                 # unset the 'is_uniq=0' variable since duplicated lines found

         is_uniq { print temp }
                 # print if previous line ('temp' variable keep a backup of previous line) is a 
                 # uniq line (according to the first column)

                 { pre=$1; temp=$0; is_uniq=1 }
                 # backup first column and whole line into 'pre' & 'temp' variable respectively
                 # and set the 'is_uinq=1' (assuming might that will be a uniq line)

END{ if(is_uniq) print temp }' infile
    # if there was a line that it's uniq and is the last line of input file, then print it

コメントなしの同じスクリプト：

awk -F, 'pre==$1 { print; is_uniq=0; next }
         is_uniq { print temp }
                 { pre=$1; temp=$0; is_uniq=1 }
END{ if(is_uniq) print temp }' infile

メモ:これは、入力ファイルがinfile最初のフィールドでソートされていると仮定します。そうでない場合は、ソートされたファイルを次のフィールドに渡す必要があります。

awk ... <(sort -t, -k1,1 infile)

Question 2

csvの形式が正しいと仮定すると（引用符フィールド内にカンマや改行がなく、二重引用符"（""）がないなど）、次のようになります。

awk -F ',' 'NR==FNR{seen1[$1]++;next};seen1[$1]==1||seen2[$1]++
            {print(NR,$0)}' infile infile

ファイルのどこで行が繰り返されるかを知る唯一の方法は、行が繰り返される回数を取得することです。これはで行われますseen1。次に、行数が1（重複なし）または既に表示されている場合（ファイルの2番目のスキャン）（completeを使用seen2）を印刷します。

ファイルソート済み最初のフィールドで@devWeekソリューションを使用してください。

Answer

csvの形式が正しいと仮定すると（引用符フィールド内にカンマや改行がなく、二重引用符"（""）がないなど）、次のようになります。

awk -F ',' 'NR==FNR{seen1[$1]++;next};seen1[$1]==1||seen2[$1]++
            {print(NR,$0)}' infile infile

ファイルのどこで行が繰り返されるかを知る唯一の方法は、行が繰り返される回数を取得することです。これはで行われますseen1。次に、行数が1（重複なし）または既に表示されている場合（ファイルの2番目のスキャン）（completeを使用seen2）を印刷します。

ファイルソート済み最初のフィールドで@devWeekソリューションを使用してください。

Question 3

$ cat file
1,a
2,b
2,c
3,d
3,e
3,f
4,g
4,h
5,i

「2,b」、「3,d」、「4,g」の行を削除しようとしています。

perl -F, -anE '
    push $lines{$F[0]}->@*, $_ 
  } END { 
    for $key (sort keys %lines) {
        shift $lines{$key}->@* if (scalar($lines{$key}->@*) > 1); # remove the first
        print join "", $lines{$key}->@*;
    }
' file

1,a
2,c
3,e
3,f
4,h
5,i

Answer

$ cat file
1,a
2,b
2,c
3,d
3,e
3,f
4,g
4,h
5,i

「2,b」、「3,d」、「4,g」の行を削除しようとしています。

perl -F, -anE '
    push $lines{$F[0]}->@*, $_ 
  } END { 
    for $key (sort keys %lines) {
        shift $lines{$key}->@* if (scalar($lines{$key}->@*) > 1); # remove the first
        print join "", $lines{$key}->@*;
    }
' file

1,a
2,c
3,e
3,f
4,h
5,i

最初の列から最初の重複行を削除する

答え1

答え2

答え3

関連情報