複数行からパターンを抽出する

Question 1

grep -zPo '\\author{\K[^}]*' ex1.tex | tr '\0\n' '\n '

いくつかの簡単な説明：

-z入出力レコード（「行」）はNULL（）で区切られています\0。したがって、TeXファイル全体は1つのレコードになります。
-PPerl PCRE正規表現バリアントを使用してください。
-oregExpに一致するレコード部分のみが出力されます。
\\author{\K左のコンテキストを表します。

tr '\0\n' '\n ' 出力レコード区切り文字を変更し（\0to \n）、名前内の改行を削除します（\nto ）

Answer

grep -zPo '\\author{\K[^}]*' ex1.tex | tr '\0\n' '\n '

いくつかの簡単な説明：

-z入出力レコード（「行」）はNULL（）で区切られています\0。したがって、TeXファイル全体は1つのレコードになります。
-PPerl PCRE正規表現バリアントを使用してください。
-oregExpに一致するレコード部分のみが出力されます。
\\author{\K左のコンテキストを表します。

tr '\0\n' '\n ' 出力レコード区切り文字を変更し（\0to \n）、名前内の改行を削除します（\nto ）

Question 2

#!/bin/bash

sed -nr '
/\\author/ {
    :ending
    /]|}$/! {
        N   
        b ending 
    }
    s/\\author(\{|\[)(.*)(}|])/\2/p
}
' test.tex

説明（同じコードですがコメントが追加されました）：

#!/bin/bash

sed -nr '
# if the line contains the \author string, we are working with it.
/\\author/ {

    ##### this part are needed for multiple line pattern processing

    # put a label here. We will be return to this point, 
    # until we reach line, which have } or ] in the ending.
    :ending

    # if this line does not ended by } or ]. 
    # It is tell us, that this line continues on the next line.
    /]|}$/! {

        # Take the next line and append it to the previous line. 
        # Just join them together.
        N   

        # Go to the ":ending" label
        b ending 
    }

    ##### ending multiple line pattern processing

    # remove the \author word and brackets from line
    s/\\author(\{|\[)(.*)(}|])/\2/p
}
' test.tex

テストファイル

\documentclass{scrartcl}
\usepackage{graphicx}
\title{Test}
\author{Author 1, Author 2, Author 3}
\author[Author 1, Author 2, Author 3]
\author{Author 1,
Author 2,
Author 3}
\author[Author 1,
Author 2,
Author 3]
\begin{document}
\end{document}

出力

Author 1, Author 2, Author 3
Author 1, Author 2, Author 3
Author 1,
Author 2,
Author 3
Author 1,
Author 2,
Author 3

Answer

#!/bin/bash

sed -nr '
/\\author/ {
    :ending
    /]|}$/! {
        N   
        b ending 
    }
    s/\\author(\{|\[)(.*)(}|])/\2/p
}
' test.tex

説明（同じコードですがコメントが追加されました）：

#!/bin/bash

sed -nr '
# if the line contains the \author string, we are working with it.
/\\author/ {

    ##### this part are needed for multiple line pattern processing

    # put a label here. We will be return to this point, 
    # until we reach line, which have } or ] in the ending.
    :ending

    # if this line does not ended by } or ]. 
    # It is tell us, that this line continues on the next line.
    /]|}$/! {

        # Take the next line and append it to the previous line. 
        # Just join them together.
        N   

        # Go to the ":ending" label
        b ending 
    }

    ##### ending multiple line pattern processing

    # remove the \author word and brackets from line
    s/\\author(\{|\[)(.*)(}|])/\2/p
}
' test.tex

テストファイル

\documentclass{scrartcl}
\usepackage{graphicx}
\title{Test}
\author{Author 1, Author 2, Author 3}
\author[Author 1, Author 2, Author 3]
\author{Author 1,
Author 2,
Author 3}
\author[Author 1,
Author 2,
Author 3]
\begin{document}
\end{document}

出力

Author 1, Author 2, Author 3
Author 1, Author 2, Author 3
Author 1,
Author 2,
Author 3
Author 1,
Author 2,
Author 3

Question 3

これは仕事をするようですegrep -o '[\[{]?Author' | sed -E 's/[\[{]//'

例:

1)

echo "\documentclass{scrartcl}
\usepackage{graphicx}
\title{Test}
\author[Author 1,
Author 2
Author 3 ] " | egrep -o '[\[{]?Author' | sed -E 's/[\[{]//'
Author
Author
Author

2)

echo "\documentclass{scrartcl}
\usepackage{graphicx}
\title{Test}
\author[Author 1, Author 2, Author 3]
\begin{document}
\end{document}" | egrep -o '[\[{]?Author' | sed -E 's/[\[{]//'
Author
Author
Author

三）

echo "\documentclass{scrartcl}
\usepackage{graphicx}
\title{Test}
\author{Author 1, Author 2, Author 3}
\begin{document}
\end{document}" | egrep -o '[\[{]?Author' | sed -E 's/[\[{]//'
Author
Author
Author

grepおそらくLookBehindのようなものを使ってこれを行うことができます。私は個人的にsed問題なくafterにパイプを使用しますgrep。

Answer

これは仕事をするようですegrep -o '[\[{]?Author' | sed -E 's/[\[{]//'

例:

1)

echo "\documentclass{scrartcl}
\usepackage{graphicx}
\title{Test}
\author[Author 1,
Author 2
Author 3 ] " | egrep -o '[\[{]?Author' | sed -E 's/[\[{]//'
Author
Author
Author

2)

echo "\documentclass{scrartcl}
\usepackage{graphicx}
\title{Test}
\author[Author 1, Author 2, Author 3]
\begin{document}
\end{document}" | egrep -o '[\[{]?Author' | sed -E 's/[\[{]//'
Author
Author
Author

三）

echo "\documentclass{scrartcl}
\usepackage{graphicx}
\title{Test}
\author{Author 1, Author 2, Author 3}
\begin{document}
\end{document}" | egrep -o '[\[{]?Author' | sed -E 's/[\[{]//'
Author
Author
Author

grepおそらくLookBehindのようなものを使ってこれを行うことができます。私は個人的にsed問題なくafterにパイプを使用しますgrep。

Question 4

Python

質問に記載されている入力ファイルを使用すると、次のようにライナーを実行できます。

$ python -c 'import sys,re;f=open(sys.argv[1],"r");a=tuple(l for l in f.readlines() if l.startswith("\\author") );print("\n".join(re.split(", |,|{|}",a[0].strip())[1:]))' input.tex      
Author 1
Author 2
Author 3

そしてスクリプトは次のとおりです。

#!/usr/bin/env python

import sys,re

# read the doc, find the desired line
line=""
with open(sys.argv[1]) as f:
    for l in f:
        if l.startswith("\\author"):
            line=l.strip()
            break
# split at multiple separators, get slice of that list starting since 2nd item
author_list = re.split( ", |,|{|}", line )[1:] 
# print 1 author per line
print("\n".join(author_list))

2つの主要なステップがあります。ファイルを読み取り、文字列で始まる行を見つけて、複数の区切り文字で行を\\authorsトークンリストに分割し、そのトークンリストから改行区切り文字列を作成します。私はまた、,あなたが分割する必要がある可能性を検討してみました,<space>。

Answer

Python

質問に記載されている入力ファイルを使用すると、次のようにライナーを実行できます。

$ python -c 'import sys,re;f=open(sys.argv[1],"r");a=tuple(l for l in f.readlines() if l.startswith("\\author") );print("\n".join(re.split(", |,|{|}",a[0].strip())[1:]))' input.tex      
Author 1
Author 2
Author 3

そしてスクリプトは次のとおりです。

#!/usr/bin/env python

import sys,re

# read the doc, find the desired line
line=""
with open(sys.argv[1]) as f:
    for l in f:
        if l.startswith("\\author"):
            line=l.strip()
            break
# split at multiple separators, get slice of that list starting since 2nd item
author_list = re.split( ", |,|{|}", line )[1:] 
# print 1 author per line
print("\n".join(author_list))

2つの主要なステップがあります。ファイルを読み取り、文字列で始まる行を見つけて、複数の区切り文字で行を\\authorsトークンリストに分割し、そのトークンリストから改行区切り文字列を作成します。私はまた、,あなたが分割する必要がある可能性を検討してみました,<space>。

複数行からパターンを抽出する

答え1

答え2

答え3

例:

答え4

Python

関連情報