開始パターンと終了パターンで定義された複数行範囲のキャプチャ

2024-6-11 • tag-icon

開始パターンと終了パターンで定義された複数行範囲のキャプチャ

ファイルの中央部分（開始パターンと終了パターンの間）を印刷し、特定の行に色を付けたいです。

以下は、そのようなファイルの1つのサンプルテキストです。

## Beginning of file

Some text and code

## FAML [ASMB] KEYWORD
##  Some information.
##  Some other text.
##  Blu:
##  Some text in blue.
## END OF FAML [ASMB]

## Other text

More text and code

## FAML [ASMB] KEYWORDとの間のテキストが## END OF FAML [ASMB]抽出され（開始部分なしで##）関数に渡され、luciferin複数行のテキストが適切に印刷されます。

ブロック間のテキストは削除されます。後続のブロックは、中間領域を抽出して印刷する関数を呼び出して同じように機能しますluciferin(rec)。この機能はluciferinカラーで出力します。

入力文字列luciferinは

Some information.
Some other text.
Blu:
Some text in blue.

これは中間領域をキャプチャする awk スクリプトです。

BEGIN {
  beg_ere = "## [[:alnum:]]+ [[][[:alnum:]]+[]]"
  end_ere = "## END OF [[:alnum:]]+ [[][[:alnum:]]+[]]"
 }

match($0, beg_ere, paggr) { display = 1 }
$0 ~ end_ere { display = 0 ; next }
display { print }

luciferin文字列を受け取り、カラーで出力する関数です。ここでは、cptカラーエスケープシーケンス内の、およびはastr[i]複数行の入力文字列の特定の行です。i

function luciferin(mstr) {
  cpt = tseq["Grn:"]
  nlines = split(mstr, astr, "\n")
  for (i = 1; i <= nlines; i++) {
    for ( knam in tseq ) {
      if ( knam == astr[i] ) { cpt = tseq[knam] ; break }
     }
    if (knam == str) { print "" } else { print cpt astr[i] rst }
   }

 }

答え1

最小限の完全なコード例もなく、テストするのに十分なサンプル入力/出力もないので、これは確かにテストされていない推測にすぎませんが、変更する必要があるようです。

display { print }

到着

display { rec = rec $0 ORS }

そして

$0 ~ end_ere { display = 0 ; next }

到着

$0 ~ end_ere { luciferin(rec); rec = ""; display = 0 ; next }

または類似しており、luciferin印刷する前に引数から追加の末尾の改行文字を削除するように調整されました。

この質問とOPの質問を改善する方法について、次の質問で完全かつ最小限のコード例がどのように見えるかを示します。

$ cat tst.awk
$2 == "FAML" { display = 1 ; next }
$2 == "END" { display = 0 ; next }
display { print }

function luciferin(mstr) {
    nlines = split(mstr, astr, "\n")
    for (i = 1; i <= nlines; i++) {
        print "Luci:", astr[i]
    }
}

要件を表示およびテストするためのいくつかの入力例は次のとおりです。

$ cat input
## Beginning of file

Some text and code

## FAML [ASMB] KEYWORD
##  Some information.
##  Some other text.
## END OF FAML [ASMB]

## Other text

## FAML [ASMB] KEYWORD
##  Some other information.
##  Even more text.
## END OF FAML [ASMB]

More text and code

そして入力が与えられると、予想される出力は次のようになります。

Luci: ##  Some information.
Luci: ##  Some other text.
Luci: ##  Some other information.
Luci: ##  Even more text.

実際のコードが色付けやその他の操作を実行するという事実は、助けが必要な問題とはまったく関係ありませんluciferin()。

明確で簡単な例では、次の解決策をすばやく表示できます。

$ cat tst.awk
$2 == "FAML" { display = 1 ; next }
$2 == "END" { luciferin(rec); rec = ""; display = 0 ; next }
display { rec = rec $0 ORS }

function luciferin(mstr) {
    nlines = split(mstr, astr, "\n")
    for (i = 1; i < nlines; i++) {
        print "Luci:", astr[i]
    }
}

$ awk -f tst.awk input
Luci: ##  Some information.
Luci: ##  Some other text.
Luci: ##  Some other information.
Luci: ##  Even more text.

その後、概念を実際のコードに適用できます。

答え2

この問題を解決することはawk確かに可能です。しかし、あなた自身はこの問題をとても難しくしているようです。注釈に記載されている機能からコピーして、Perlその範囲に対する直接言語サポートを提供します。sed

春を青で塗りましょう。

$ cat months.txt | perl -ane 'print "blue" if /Mar/../May/; print "\t$_"'
        January
        February
blue    March
blue    April
blue    May
        June

これらの正規表現にFAML / ASMBキーワードを使用して、ユースケースに合わせて調整してください。

これ以上の高度な処理が必要な場合でも、パイプラインの良い初期段階です。

今後のステップでは、行の範囲について心配する必要はありません。最初のフィールドを使用して範囲内にあることを確認し、それに応じて行の残りの部分を処理できます。

関連情報