5つの値を持つ「csv」行のみをキャプチャする方法

5つの値を持つ「csv」行のみをキャプチャする方法

この規則によると、5つの値を持つ「csv」行のみをキャプチャしようとします。

"","","","",""

例:

more conf.csv

"linux02","cluster26","api2-thrift-apiconf","api.driver.memory",
"linux02","cluster26","api2-thrift-apiconf","api.executor.cores"
"linux02","cluster26","api.executor.instances","2"

"linux02","cluster26","api2-thrift-apiconf","api.driver.memory","2"
"linux02","cluster26","api2-thrift-apiconf","api.executor.cores","2"
"linux02","cluster26","api2-thrift-apiconf","api.executor.instances","2"
"linux02","cluster26","api2-thrift-apiconf","api.executor.memory","2"
"linux02","cluster26","api2-thrift-apiconf","api.sql.shuffle.partitions","141"
"linux02","cluster26","api2-thrift-apiconf","api.dynamicAllocation.enabled","true"

"linux02","cluster26","api2-thrift-apiconf","api.driver.memory","api2-thrift-apiconf","api.executor.memory"
"linux02","cluster26","api2-thrift-apiconf","api.executor.cores"
"linux02","cluster26","api.executor.instances","2"

予想出力:

"linux02","cluster26","api2-thrift-apiconf","api.driver.memory","2"
"linux02","cluster26","api2-thrift-apiconf","api.executor.cores","2"
"linux02","cluster26","api2-thrift-apiconf","api.executor.instances","2"
"linux02","cluster26","api2-thrift-apiconf","api.executor.memory","2"
"linux02","cluster26","api2-thrift-apiconf","api.sql.shuffle.partitions","141"
"linux02","cluster26","api2-thrift-apiconf","api.dynamicAllocation.enabled","true"

答え1

使用:

awk -F "," 'NF==5 {print $0}' conf.csv

5つのフィールドを含む行を印刷します。ただし、この行は次のようになります。

"linux02","cluster26","api2-thrift-apiconf","api.driver.memory",

最後のコンマがなりすまし、エラーが発生します。アッその行に5番目のフィールドがあると信じてください。

答え2

CSVを正しく処理するために、CSVパーサーは次のことを行います。

ruby -rcsv -e '
  data = CSV.foreach(ARGV.shift) {|row|
    if row.size == 5 and row.none? {|elem| elem.nil?}
      puts CSV.generate_line(row, :force_quotes=>true)
    end
  }
' conf.csv

答え3

grep -E '(".+",){4}".+"' Csv.file
"linux02","cluster26","api2-thrift-apiconf","api.driver.memory","2"
"linux02","cluster26","api2-thrift-apiconf","api.executor.cores","2"
"linux02","cluster26","api2-thrift-apiconf","api.executor.instances","2"
"linux02","cluster26","api2-thrift-apiconf","api.executor.memory","2"
"linux02","cluster26","api2-thrift-apiconf","api.sql.shuffle.partitions","141"
"linux02","cluster26","api2-thrift-apiconf","api.dynamicAllocation.enabled","true"
"linux02","cluster26","api2-thrift-apiconf","api.driver.memory","api2-thrift-apiconf","api.executor.memory"

-E拡張正規表現が使用され、4回検索して".+",1回を追加します".+"。しかし、試したことを提供する必要があります。

メモ:.+5つのフィールド(または空のフィールド)を持つ行が必要な場合は、空でない文字列を検索して次のよう+に置き換えました*

grep -E '(".*",){4}".*"' Csv.file

答え4

分析データミラーmlr)ヘッダーがなく、ギザギザ(レコードあたりのフィールド数が異なる)CSVファイルで、正確に5つのフィールドを持つすべてのレコードを出力します。

$ mlr --csv -N --ragged filter 'NF == 5' file
linux02,cluster26,api2-thrift-apiconf,api.driver.memory,
linux02,cluster26,api2-thrift-apiconf,api.driver.memory,2
linux02,cluster26,api2-thrift-apiconf,api.executor.cores,2
linux02,cluster26,api2-thrift-apiconf,api.executor.instances,2
linux02,cluster26,api2-thrift-apiconf,api.executor.memory,2
linux02,cluster26,api2-thrift-apiconf,api.sql.shuffle.partitions,141
linux02,cluster26,api2-thrift-apiconf,api.dynamicAllocation.enabled,true

与えられた入力には空の5番目のフィールドを持つレコードが含まれているため、予想される出力と比較して1つの追加レコードを取得します。

5番目のフィールドが空であるレコードを除いて、次のようにすべてのフィールドを引用符で囲むことができます。

$ mlr --csv -N --ragged --quote-all filter 'NF == 5 && !is_empty($5)' file
"linux02","cluster26","api2-thrift-apiconf","api.driver.memory","2"
"linux02","cluster26","api2-thrift-apiconf","api.executor.cores","2"
"linux02","cluster26","api2-thrift-apiconf","api.executor.instances","2"
"linux02","cluster26","api2-thrift-apiconf","api.executor.memory","2"
"linux02","cluster26","api2-thrift-apiconf","api.sql.shuffle.partitions","141"
"linux02","cluster26","api2-thrift-apiconf","api.dynamicAllocation.enabled","true"

関連情報