sedを使用して式から文字を削除する

2024-5-29 • tag-icon

shell-script shell text-processing sed regular-expression

sedを使用して式から文字を削除する

フォームに文字列があります。

|a 一部のテキスト、文字、または数字。 |他のテキスト文字または数字| bテキストの他の部分| c他の文字または数字

バーは、「number.|other」のように単独で表示したり、「|a」、「|b」、「|c」などの文字で表示したり、最大「|z」まで表示することができます。

しかし、そうかもしれません

|他の列のタイトルはありません

つまり、バーの数がわかりません。

sedに使用する2つの正規表現を見つける必要があります。

1つ目は、|aと|bまたは|bと|cの間のすべてのテキストを見つけることです。

１）で、例えば、

a|後ろ、b |上記のすべてのテキストを見つけると、次のようになります。

いくつかの単語、文字または数字。 |その他のテキスト文字または数字

上記の例ではb |後ろ、c |前のすべてのテキストを探します。

本文の他の部分

|aの後のすべてのテキストを見つけるには、2番目の式が必要です。ただし、 |b で停止するのではなく、単にバーだけを削除するか (|) 他の文字を持つバーを削除すると、|a、|b、|c などが削除されます。一緒に。

1) 例えば:

一部のテキスト、文字または数字その他のテキスト文字または数字テキストの他の部分その他の文字または数字

答え1

GNUユーティリティとデータファイルを想定するとdata、

grep -Po '(?<=\|a).*(?=\|b)' data

 Some text, letters or numbers. | Some other text letters or numbers

sed -r -e 's/^.?*\|a//' -e 's/\|[a-z]?//g' data

 Some text, letters or numbers.  Some other text letters or numbers  some other part of text  some other letters or numbers 
 Title without any other bars

必要に応じてなどを|a変更してください。|b|c|d

これらのどれも|xマークアップの周りのスペースを削除しないため、テキストに先行スペースと末尾スペースがあります（どちらもここには表示できません）。これも削除するにはパターンに含める必要があります。

grep -Po '(?<=\|a ).*(?= \|b)' data
sed -r -e 's/^.?*\|a ?//' -e 's/ ?\|([a-z] ?)?//g' data

ここで書かれているように、このsedコマンドは個々のセクションを一緒に結合します。間にスペースを入れたい場合は、末尾//のスペースをに変更してください/ /。

答え2

|a区切り文字の文字が連続したいかどうかは明確ではないので、区切り文字が連続することを要求することがより困難な場合（たとえば、ANDとペアになりますが、ANDでは|bない）を処理したいとします。|c）。正規表現のみを使用してこれを実行できるかどうかはわかりません（少なくとも非常に詳細な正規表現なしでは可能です）。とにかく、この状況を処理する簡単なPythonスクリプトは次のとおりです。

#!/usr/bin/env python2
# -*- coding: ascii -*-
"""parse.py"""

import sys
import re

def extract(string):
    """Removes text between delimters of the form `|START` and `|STOP`
    where START is a single ASCII letter and STOP is the next sequential
    ASCII character (e.g. `|a` and `|b` if START=a and STOP=b or
    `|x` and `|y` if START=x and STOP=y)."""

    # Find the opening delimiter (e.g. '|a' or '|b')
    start_match = re.search(r'\|[a-z]', string)
    start_index = start_match.start()
    start_letter = string[start_index+1]

    # Find the matching closing delimiter
    stop_letter = chr(ord(start_letter) + 1) 
    stop_index = string.find('|' + stop_letter)

    # Extract and return the substring
    substring = string[start_index+2:stop_index]
    return(substring)

def remove(string):

    # Find the opening delimiter (e.g. '|a' or '|b')
    start_match = re.search(r'\|[a-z]', string)
    start_index = start_match.start()
    start_letter = string[start_index+1]

    # Remove everything up to and including the opening delimiter
    string = string[start_index+2:]

    # Remove the desired substrings which occur after the delimiter
    string = re.sub(r'\|[a-z]?', '', string)

    # Return the updated string
    return(string)

if __name__=="__main__":
    input_string = sys.stdin.readline()
    sys.stdout.write(extract(input_string) + '\n')
    sys.stdout.write(remove(input_string))

関連情報