ファイルの列5の値に基づいて.CSVファイルをフィルタリングし、これらのレコードを新しいファイルに印刷します。

Question 1

awk -F '","'  'BEGIN {OFS=","} { if (toupper($5) == "STRING 1")  print }' file1.csv > file2.csv

出力

"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""

私はこれがあなたが望むものだと思います。

Answer

awk -F '","'  'BEGIN {OFS=","} { if (toupper($5) == "STRING 1")  print }' file1.csv > file2.csv

出力

"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""

私はこれがあなたが望むものだと思います。

Question 2

CSVの問題は標準がないことです。 CSV形式のデータを頻繁に処理する必要がある場合は、単に","フィールド区切り文字として使用するよりも強力な方法を見つけたい場合があります。この場合、PerlのText::CSVCPANモジュールはそのタスクに適しています。

$ perl -mText::CSV_XS -WlanE '
    BEGIN {our $csv = Text::CSV_XS->new;} 
    $csv->parse($_); 
    my @fields = $csv->fields(); 
    print if $fields[4] =~ /string 1/i;
' file1.csv
"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""

Answer

CSVの問題は標準がないことです。 CSV形式のデータを頻繁に処理する必要がある場合は、単に","フィールド区切り文字として使用するよりも強力な方法を見つけたい場合があります。この場合、PerlのText::CSVCPANモジュールはそのタスクに適しています。

$ perl -mText::CSV_XS -WlanE '
    BEGIN {our $csv = Text::CSV_XS->new;} 
    $csv->parse($_); 
    my @fields = $csv->fields(); 
    print if $fields[4] =~ /string 1/i;
' file1.csv
"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""

Question 3

csvgrepcsvkitから

awkを使用する最も安定した方法は、FPAT次のものを使用することです。https://stackoverflow.com/questions/45420535/whats-the-most-robust-way-to-efficiently-parse-csv-using-awk/45420607#45420607残念ながら、FPAT引用符内のリテラル改行文字も処理できません。

代わりに、よりスマートになりたい場合に使用できるさまざまなCSV CLIツールがあります。 pipバージョンを介してインストールするのは非常に簡単な方法です（Pythonベースのため、必ずしも最速ではありませんが）はcsvgrepcsvkitで提供されています。

pip install csvkit

これにより、一致しない行を取得できます。

csvgrep -H -c5 -r '^string 1$' mytest.csv

コマンドの説明：

-H: 最初の行はタイトル行ではありません。
-i：駅マッチ
-c5：5番目の列で動作
-r：次の正規表現に一致します。

具体的な例：

printf '00,01,02,03,string 1,"04,\n""05"\n10,11,12,13,string 2,"14,\n""15"\n' > nohead.csv
printf 'col1,col2,col3,col4,col5,col6\n00,01,02,03,string 1,"04,\n""05"\n10,11,12,13,string 2,"14,\n""15"\n' > head.csv

それから：

csvgrep -H -c5 -r '^string 1$' nohead.csv | tail -n+2

出力：

00,01,02,03,string 1,"04,
""05"

不快なダミーヘッダーを追加するtailため、パイプで接続します。-H

a,b,c,d,e,f
00,01,02,03,string 1,"04,
""05"

私達は-i一致を元に戻すことができます:

csvgrep -H -i -c5 -r '^sstring 1$' nohead.csv | tail -n+2

出力：

10,11,12,13,string 2,"14,
""15"

ヘッダーがある場合は、列名を使用できます。

csvgrep -c col5 -r '^string 1$' head.csv

出力：

col1,col2,col3,col4,col5,col6
00,01,02,03,string 1,"04,
""05"

csvkit 1.0.7、Ubuntu 23.04でテストされました。

Answer