テキストファイルをCSV形式に再フォーマットします。

テキストファイルをCSV形式に再フォーマットします。

入力サンプル

0bef-82-46-8a-9a0b.xml "Fruits/Mango Apple /Plum cherry date">1446815.ABC
0bef-82-46-8a-9a0b 5da-0-ba-c1-1a9 "Fruits/Pear Banana/Plum orange mango"
0bef-82-46-8a-9a0b ac-94-4ab-91-23 "Fruits/Pear Banana/Plum orange mango"
0bef-82-46-8a-9a0b 5z-94-ab-92-2f3 "Fruits/Pear Banana/Plum orange mango"

952f-82-46-8a-9a0b.xml "Fruits/Mango"1244115.ABC
3cff-82-46-8a-9a0b.xml "Fruits/Big Mango/Not Sweet ">905499.ABC
6m0k-82-46-8a-9a0b.xml "Fruits/Big Pear/Very Sweet">855499.ABC

17a-42-df-c24.xml "Fruits Market/Big Apple/Sweet "1483415.ABC
17a-42-df-c24 54-ba-4411-9-3d8 "Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24 2da5-0-4a-b1-e89 "Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24 b7-94-4db-92-2f3 "Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24 4d-67c-446-b5-ac "Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24 2-8b-4det-87-769 "Veg/Radish /Radish Carrot Celery Onion"

予想出力 -

0bef-82-46-8a-9a0b.xml,"Fruits/Mango Apple /Plum cherry date",0bef-82-46-8a-9a0b,5da-0-ba-c1-1a9,"Fruits/Pear Banana/Plum orange mango"
0bef-82-46-8a-9a0b.xml,"Fruits/Mango Apple /Plum cherry date",0bef-82-46-8a-9a0b,ac-94-4ab-91-23,"Fruits/Pear Banana/Plum orange mango"
0bef-82-46-8a-9a0b.xml,"Fruits/Mango Apple /Plum cherry date",0bef-82-46-8a-9a0b,5z-94-ab-92-2f3,"Fruits/Pear Banana/Plum orange mango"

952f-82-46-8a-9a0b.xml,"Fruits/Mango",,
3cff-82-46-8a-9a0b.xml,"Fruits/Big Mango/Not Sweet ",,
6m0k-82-46-8a-9a0b.xml,"Fruits/Big Pear/Very Sweet",,


17a-42-df-c24.xml,"Fruits Market/Big Apple/Sweet ",17a-42-df-c24,54-ba-4411-9-3d8,"Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24.xml,"Fruits Market/Big Apple/Sweet ",17a-42-df-c24,2da5-0-4a-b1-e89,"Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24.xml,"Fruits Market/Big Apple/Sweet ",17a-42-df-c24,b7-94-4db-92-2f3,"Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24.xml,"Fruits Market/Big Apple/Sweet ",17a-42-df-c24,4d-67c-446-b5-ac,"Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24.xml,"Fruits Market/Big Apple/Sweet ",17a-42-df-c24,2-8b-4det-87-769,"Veg/Radish /Radish Carrot Celery Onion"

生データ入力:

  1. 各行には先頭または末尾のスペースはありません。
  2. 行の間にスペースはありません。表示されているスペースは、見やすく理解しやすくするためのものです。最終出力にはスペースも必要ありません。
  3. 複数行に「>」記号がありません。これはタイプミスではありません。

bash / shellスクリプト(sed、awkなど)を使用して再フォーマットする方法を教えてください。迷子になりました。

答え1

使用awk:

awk '{
  if (sub(/\.xml /, ".xml,")){      # replace `.xml ` with `.xml,`
    if (NR>1 && is_processed != 1){ # xml line was not printed?
       print xml","                 # print previous xml line + `,`
    }
    sub(/>?[0-9]+\.ABC$/, ",") # replace strings `>1446815.ABC` or `1244115.ABC` with `,`
    xml=$0                     # save line in variable `xml`
    is_processed=0             # clear flag
  }
  else {
    if (!NF) next  # skip empty line
    sub(/ /, ",")  # replace 1st ` ` with `,`
    sub(/ /, ",")  # replace 2nd ` ` with `,`
    print xml$0    # print xml line + current line
    is_processed=1 # set flag
  }
}
END {
  # print possible remaining line
  if (is_processed != 1) print xml","
}' filein > fileout

-blockはifそれを含む行を処理し、.xmlそれを変数に保存しますxml。 -blockはelsexml行の次の「サブ」を処理し、xml行と最初の2つの空白文字がコンマで置き換えられた修正行を印刷します。空行はスキップされます。

if「子」がない場合、追加のコンマを含むxml行は、一番上のブロック(行番号が1より大きい場合)またはENDブロックに印刷されます。

出力( fileout):

0bef-82-46-8a-9a0b.xml,"Fruits/Mango Apple /Plum cherry date",0bef-82-46-8a-9a0b,5da-0-ba-c1-1a9,"Fruits/Pear Banana/Plum orange mango"
0bef-82-46-8a-9a0b.xml,"Fruits/Mango Apple /Plum cherry date",0bef-82-46-8a-9a0b,ac-94-4ab-91-23,"Fruits/Pear Banana/Plum orange mango"
0bef-82-46-8a-9a0b.xml,"Fruits/Mango Apple /Plum cherry date",0bef-82-46-8a-9a0b,5z-94-ab-92-2f3,"Fruits/Pear Banana/Plum orange mango"
952f-82-46-8a-9a0b.xml,"Fruits/Mango",,
3cff-82-46-8a-9a0b.xml,"Fruits/Big Mango/Not Sweet ",,
6m0k-82-46-8a-9a0b.xml,"Fruits/Big Pear/Very Sweet",,
17a-42-df-c24.xml,"Fruits Market/Big Apple/Sweet ",17a-42-df-c24,54-ba-4411-9-3d8,"Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24.xml,"Fruits Market/Big Apple/Sweet ",17a-42-df-c24,2da5-0-4a-b1-e89,"Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24.xml,"Fruits Market/Big Apple/Sweet ",17a-42-df-c24,b7-94-4db-92-2f3,"Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24.xml,"Fruits Market/Big Apple/Sweet ",17a-42-df-c24,4d-67c-446-b5-ac,"Veg/Radish /Radish Carrot Celery Onion"
17a-42-df-c24.xml,"Fruits Market/Big Apple/Sweet ",17a-42-df-c24,2-8b-4det-87-769,"Veg/Radish /Radish Carrot Celery Onion"

答え2

ミラーの使用(https://github.com/johnkerl/miller)とsed

<input.csv sed -r 's|^(.+")(.?[0-9]+.+)$|\1 "\2"|g' | \
mlr --csv -N --ifs " " put 'if($1=~"xml") {$4=$1;$5=$2}' \
then unsparsify \
then fill-down -f 4,5  \
then count-similar -g 4 \
then filter '($1=~"xml" && $count==1) || ($1!=~"xml" && $count>1)' \
then reorder -f 4,5,1,2,3 \
then put 'if($2=~"Fru"){$1="";$2="";$3=""}' \
then cut -x -f count

あなたはやる

+------------------------+--------------------------------------+--------------------+------------------+----------------------------------------+
| 0bef-82-46-8a-9a0b.xml | Fruits/Mango Apple /Plum cherry date | 0bef-82-46-8a-9a0b | 5da-0-ba-c1-1a9  | Fruits/Pear Banana/Plum orange mango   |
| 0bef-82-46-8a-9a0b.xml | Fruits/Mango Apple /Plum cherry date | 0bef-82-46-8a-9a0b | ac-94-4ab-91-23  | Fruits/Pear Banana/Plum orange mango   |
| 0bef-82-46-8a-9a0b.xml | Fruits/Mango Apple /Plum cherry date | 0bef-82-46-8a-9a0b | 5z-94-ab-92-2f3  | Fruits/Pear Banana/Plum orange mango   |
| 952f-82-46-8a-9a0b.xml | Fruits/Mango                         | -                  | -                | -                                      |
| 3cff-82-46-8a-9a0b.xml | Fruits/Big Mango/Not Sweet           | -                  | -                | -                                      |
| 6m0k-82-46-8a-9a0b.xml | Fruits/Big Pear/Very Sweet           | -                  | -                | -                                      |
| 17a-42-df-c24.xml      | Fruits Market/Big Apple/Sweet        | 17a-42-df-c24      | 54-ba-4411-9-3d8 | Veg/Radish /Radish Carrot Celery Onion |
| 17a-42-df-c24.xml      | Fruits Market/Big Apple/Sweet        | 17a-42-df-c24      | 2da5-0-4a-b1-e89 | Veg/Radish /Radish Carrot Celery Onion |
| 17a-42-df-c24.xml      | Fruits Market/Big Apple/Sweet        | 17a-42-df-c24      | b7-94-4db-92-2f3 | Veg/Radish /Radish Carrot Celery Onion |
| 17a-42-df-c24.xml      | Fruits Market/Big Apple/Sweet        | 17a-42-df-c24      | 4d-67c-446-b5-ac | Veg/Radish /Radish Carrot Celery Onion |
| 17a-42-df-c24.xml      | Fruits Market/Big Apple/Sweet        | 17a-42-df-c24      | 2-8b-4det-87-769 | Veg/Radish /Radish Carrot Celery Onion |
+------------------------+--------------------------------------+--------------------+------------------+----------------------------------------+

ノート:入力として空白行なしでCSVを使用しました。

関連情報