別の列値から始めてCSVファイルの列値を読みますか?

別の列値から始めてCSVファイルの列値を読みますか?

OECD統計からいくつかのデータを抽出しようとしています。これは以下のようにCSVファイルとして提供されます(抜粋)。

"COUNTRY","Country","DAGEGR","Age groups","DSEX","Gender","DSTATUS","Status of population","YEAR","Year","Value","Flag Codes","Flags"
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2002","2002",19640979,,
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2003","2003",19872646,,
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2004","2004",20091504,,
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2005","2005",20339759,,
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2006","2006",20605488,,
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2007","2007",21015042,,
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2008","2008",21431781,,
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2009","2009",21874920,,
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2010","2010",22342398,,
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2011","2011",22620554,,
"AUS","Australia","900000","All ages","90","Total males+females","90","All population","2012","2012",22683573,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2002","2002",444050,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2003","2003",448300,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2004","2004",451600,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2005","2005",455000,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2006","2006",469086,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2007","2007",476187,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2008","2008",483799,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2009","2009",493500,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2010","2010",502066,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2011","2011",511840,,
"LUX","Luxembourg","900000","All ages","90","Total males+females","90","All population","2012","2012",524853,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2002","2002",40409330,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2003","2003",41550584,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2004","2004",42345342,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2005","2005",43038035,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2006","2006",43758250,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2007","2007",44474631,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2008","2008",45283259,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2009","2009",45828172,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2010","2010",45989016,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2011","2011",46152926,,
"ESP","Spain","900000","All ages","90","Total males+females","90","All population","2012","2012",46818221,,

私が望むのは、2007年にスペイン(ESP)にどれだけの住民がいたかを自動的に読み取ることです。これを達成するには、次のコマンドを使用します。

cat RPOP_16012023145346836.csv | cut -d "," -f 1,9,11 | sort

このコマンドは、国、年、居住者の数を返します。例:

"AUS","2002",19640979
"AUS","2003",19872646
"AUS","2004",20091504
"AUS","2005",20339759
"AUS","2006",20605488
"AUS","2007",21015042
"AUS","2008",21431781
"AUS","2009",21874920
"AUS","2010",22342398
"AUS","2011",22620554
"AUS","2012",22683573
"AUT","2002",8139310
"AUT","2003",8067289
"AUT","2004",8140122
"AUT","2005",8206524
"AUT","2006",8265925
"AUT","2007",8298923
"AUT","2008",8331930
"AUT","2009",8355260
"AUT","2010",8375290
"AUT","2011",8404252
"AUT","2012",8443018
"ESP","2002",40409330
"ESP","2003",41550584
"ESP","2004",42345342
"ESP","2005",43038035
"ESP","2006",43758250
"ESP","2007",44474631
"ESP","2008",45283259
"ESP","2009",45828172
"ESP","2010",45989016
"ESP","2011",46152926
"ESP","2012",46818221
"LUX","2002",444050
"LUX","2003",448300
"LUX","2004",451600
"LUX","2005",455000
"LUX","2006",469086
"LUX","2007",476187
"LUX","2008",483799
"LUX","2009",493500
"LUX","2010",502066
"LUX","2011",511840
"LUX","2012",524853

パイプを使用して、この結果を国(列1 = ESP)と年(列2 = 2007)に基づいて3番目の列(列3 =人口)を返すステートメントに渡したいと思います。残念ながら、これを行うための適切なコマンドはありません。誰かが私を助けることができますか?

予想される出力は次のとおりです。44474631

答え1

Miller()を使用して最初に2つの名前付きフィールドを使用し、データをmlrフィルタリングして2007年にスペインに関連するレコードのみを見つけ、そのレコードからフィールドを削除します。出力はタイトルなしで表示されます。COUNTRYYEARValue

mlr --csv --headerless-csv-output \
    filter '$COUNTRY == "ESP" && $YEAR == 2007' then \
    cut -f Value \
    RPOP_16012023145346836.csv

質問のデータが与えられると、以下が出力されます。

44474631

フィルタ式の代わりに

$COUNTRY == "ESP" && $YEAR == 2007

...あなたが利用できる

$Country == "Spain" && $Year == 2007

...名前が指定されたフィールドも存在するためです。

答え2

実際に単に数値を返すことが目的であれば、元の44474631入力ファイルですべての操作を使用できます。awk

awk -F, '$1=="\"ESP\"" && $9=="\"2007\"" {print $11}' RPOP_16012023145346836.csv

これはフィールド区切り文字を設定し、,列11の内容を印刷します。ただし、列1が同じ"ESP"(引用符を含む)列9が同じ場合"2007"(引用符を含む)にのみ適用されます。

答え3

awkを使用してこれを行うことができます。

$ sed 's/"//g' input_file | awk -F, '($1=="ESP" && $9==2007){print $1,$9,$11}'
ESP 2007 44474631

関連情報