あるファイルから別のファイルと比較して新しい行を検索 [重複]

Question 1

start cmd:> awk 'FNR == NR { oldfile[$0]=1; }; 
  FNR != NR { if(oldfile[$0]==0) print; }' file1 file2
delta
omega
rho
phi

Answer

start cmd:> awk 'FNR == NR { oldfile[$0]=1; }; 
  FNR != NR { if(oldfile[$0]==0) print; }' file1 file2
delta
omega
rho
phi

Question 2

私は使用しますgrep

grep -Fxvf oldfile newfile

-F：固定文字列パターンを使用する（メタ文字なし）

-x：行全体と一致します（サブストリングではありません）。

-f oldfile：一致する文字列を読み込みます。oldfile

-v：一致を逆にします。つまり、見つからない文字列を印刷します。oldfile

Answer

私は使用しますgrep

grep -Fxvf oldfile newfile

-F：固定文字列パターンを使用する（メタ文字なし）

-x：行全体と一致します（サブストリングではありません）。

-f oldfile：一致する文字列を読み込みます。oldfile

-v：一致を逆にします。つまり、見つからない文字列を印刷します。oldfile

Question 3

より短いawkコマンド：

awk 'NR==FNR{a[$0];next}!($0 in a)' file1 file2

null にできる場合にfile1置き換えます。NR==FNRFILENAME==ARGV[1]

grep -Fxvf file2 file1大容量ファイルの場合は非常に遅い：

$ jot -r 10000 1 100000 >file1;jot -r 10000 1 100000 >file2
$ time awk 'NR==FNR{a[$0];next}!($0 in a)' file1 file2 >/dev/null
0.015
$ time grep -Fxvf file2 file1 >/dev/null
36.758
$ time comm -13 <(sort file1) <(sort file2)>/dev/null
0.173

重複行を削除する必要がある場合は、次を使用してください。

awk 'NR==FNR{a[$0];next}!b[$0]++&&!($0 in a)' file1 file2

または

comm -13 <(sort file1) <(sort -u file2)

Answer

より短いawkコマンド：

awk 'NR==FNR{a[$0];next}!($0 in a)' file1 file2

null にできる場合にfile1置き換えます。NR==FNRFILENAME==ARGV[1]

grep -Fxvf file2 file1大容量ファイルの場合は非常に遅い：

$ jot -r 10000 1 100000 >file1;jot -r 10000 1 100000 >file2
$ time awk 'NR==FNR{a[$0];next}!($0 in a)' file1 file2 >/dev/null
0.015
$ time grep -Fxvf file2 file1 >/dev/null
36.758
$ time comm -13 <(sort file1) <(sort file2)>/dev/null
0.173

重複行を削除する必要がある場合は、次を使用してください。

awk 'NR==FNR{a[$0];next}!b[$0]++&&!($0 in a)' file1 file2

または

comm -13 <(sort file1) <(sort -u file2)

Question 4

pythonこれを行う方法が必要な場合。

#!/usr/bin/env python3.4


oldfp = open('/tmp/tmp.Q3JiYGY6fs/oldfile')
newfp = open('/tmp/tmp.Q3JiYGY6fs/newfile')


old = set([ x.strip() for x in oldfp.readlines() ])
new = set([ x.strip() for x in newfp.readlines() ])

print('Lines that are present only in newfile are \n{}\n\n{} '.format(42*'-', '\n'.join(list(new - old))))

出力は次のとおりです

Lines that are present only in newfile are 
------------------------------------------

phi
rho
omega
delta

Answer

pythonこれを行う方法が必要な場合。

#!/usr/bin/env python3.4


oldfp = open('/tmp/tmp.Q3JiYGY6fs/oldfile')
newfp = open('/tmp/tmp.Q3JiYGY6fs/newfile')


old = set([ x.strip() for x in oldfp.readlines() ])
new = set([ x.strip() for x in newfp.readlines() ])

print('Lines that are present only in newfile are \n{}\n\n{} '.format(42*'-', '\n'.join(list(new - old))))

出力は次のとおりです

Lines that are present only in newfile are 
------------------------------------------

phi
rho
omega
delta

あるファイルから別のファイルと比較して新しい行を検索 [重複]

答え1

答え2

答え3

答え4

関連情報