awkを使用して二重引用符を置き換える

Question 1

データのソースまたは予想される形式については実際にはあまり話しません。練習を「replace with "(」または「replace with」に再構成できる場合は、次の2つのコマンドを使用できます。chr(34)("))chr(34)"(tst)"chr(34)(tst)chr(23)sed

$ sed -e 's/"(/chr(34)(/' -e 's/)"/)chr(34)/' file
"this is txt1","this is txt2",3,"this txt3","txt4 chr(34)(tst)chr(34)"

$ sed 's/"\((tst)\)"/chr(34)\1chr(34)/' file
"this is txt1","this is txt2",3,"this txt3","txt4 chr(34)(tst)chr(34)"

最後のフィールドの形式が正しくないため、テキストをCSVレコードに解析できません。このフィールドの正しく引用されたバージョンは次のとおりです"txt4 ""(tst)"""。

Answer

データのソースまたは予想される形式については実際にはあまり話しません。練習を「replace with "(」または「replace with」に再構成できる場合は、次の2つのコマンドを使用できます。chr(34)("))chr(34)"(tst)"chr(34)(tst)chr(23)sed

$ sed -e 's/"(/chr(34)(/' -e 's/)"/)chr(34)/' file
"this is txt1","this is txt2",3,"this txt3","txt4 chr(34)(tst)chr(34)"

$ sed 's/"\((tst)\)"/chr(34)\1chr(34)/' file
"this is txt1","this is txt2",3,"this txt3","txt4 chr(34)(tst)chr(34)"

最後のフィールドの形式が正しくないため、テキストをCSVレコードに解析できません。このフィールドの正しく引用されたバージョンは次のとおりです"txt4 ""(tst)"""。

Question 2

ここでは、有効なCSVフィールド引用符が行の先頭、行の終わり、またはカンマの横にあることがわかります。したがって、各引用符とその両方の文字を検索してください。両方ともカンマでない場合は、引用符が2倍になります。

これは絶対に真実ではありません。有効なCSVでは、引用符の中にカンマを含めることができます（たとえば、「one field」、「here」）。ただし、これはお客様のデータに適用されます。

テスト：

Paul--) ./awkFixCsv

"this is txt1","this is txt2",3,"this txt3","txt4 "(tst)"" <<< Input
"this is txt1","this is txt2",3,"this txt3","txt4 ""(tst)""" <<< Output

"this is txt1","this is txt2",3,"this txt3","txt4 "(tst)"",""","""","done" <<< Input
"this is txt1","this is txt2",3,"this txt3","txt4 ""(tst)""","""","""""","done" <<< Output

One,Two,"3","Four","Five "and" Six",Seven and Eight,"Nine" <<< Input
One,Two,"3","Four","Five ""and"" Six",Seven and Eight,"Nine" <<< Output
Paul--)

コード、テストデータをここに文書化し、関数に変更します。これをスクリプトに統合する方法がわからない場合は、コメントを残してください。

#! /bin/bash

AWK='

function Fix (s, Local, t, u, x) {
    while (match (s, ".\042.")) {
        u = substr (s, RSTART, RLENGTH);
        x = (u ~ /..,/ || u ~ /,../) ? 0 : 1;
        t = t substr (s, 1, RSTART + x); 
        s = substr (s, RSTART + 1); 
    }
    return (t s);
}

{ print "\n" $0 " <<< Input"; }
{ $0 = Fix( $0); }
{ print $0 " <<< Output"; }
'
    awk "${AWK}" <<[][]
"this is txt1","this is txt2",3,"this txt3","txt4 "(tst)""
"this is txt1","this is txt2",3,"this txt3","txt4 "(tst)"",""","""","done"
One,Two,"3","Four","Five "and" Six",Seven and Eight,"Nine"
[][]

Answer