sedコマンドで奇妙な文字を置き換える

sedコマンドで奇妙な文字を置き換える

sed特定の文書から奇妙な文字をすべて削除するコマンドを作成したいと思います。

sed -n 's/\|®MD-IT¯\|®MD\+BO¯\|®MDNM¯®LL\.8LI,0LI¯\|®LL0LI,0LI¯\|®MD\+IT¯\|®LL.8LI,0LI¯®MDIT¯\|®MDNM¯®FL¯®LL.8LI,0LI¯\|®FL¯®MD-BO¯\|®FL¯®MD-BO¯\|®MD-BO¯\|¯®OF1IN,1IN¯®FC¯®LL1LI,0LI¯\|\|®SF1,1¯\|®FM1FT=0LI,LR=1;\|®MDSU¯®FN1¯\|®MDNM¯¯\|®IV-RTF\|\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\.\|¯®BF0¯\|®FS1\|-------------------------------------\|¯®FW1\|\|//gp'

これらのコードはすべて他のアプリケーションで生成されましたNota Bene。このタイプのコードを含むファイルがたくさんあり、それをプレーンテキストに変換したりMarkdownに変換したいと思います。

問題は、文字が置き換えられないことです。私はこれを試しましたが、Sublime Text検索と置換(regex)を正常に使用して文書を削除しました。私にとっては、このタスクにスクリプトを使用するよりもsedスクリプトを作成する方がSublime良いです。

私も試してみましたが、Ed代替品が見つかりませんでした。

以下は、Sublime Textで開いたときのサンプルnbファイルです。

®SSDEFAULTS¯®LR1¯®JU¯®MD+BO¯®UFTimes New Roman¯®SZ12Pt¯Glossary®MD+BO¯®TS.5IN,1IN,1.5IN,2IN,2.5IN,3IN,3.5IN,4IN,4.5IN,5IN,5.5IN,6IN¯    ®MD-BO¯
®NJ¯®LR1¯®LL.5LI,0LI¯®MD+BO¯®LL0LI,0LI¯®MDNM¯®LR1¯®LL.5LI,0LI¯A fortiori proposition: If X is true, then how much greater is Y true? To move logically from a stronger argument to establish a weaker argument. The weaker argument is sometimes presented by the speaker as the stronger argument.
®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯Accusative of motion/direction - Indicates movement to the noun marked by the accusative and is to be distinguished from the accusative of local determination which indicates location without motion (Joüon and Muraoka 2006, 428).
Anadiplosis - A figure of speech in which the word that a colon ends with, or a like sounding word, is the word that begins the next colon ®GC|CI:R#=47;AU=Brown, Raymond E.;YR=1990;TI=New Jerome biblical commentary;PG=245;XT=;F[=;F]=;F#=;ID=;XX=Print;CT=;FL=¯(Brown, Fitzmyer, Murphy, et al. 1990, 245)®GC¯.
®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯Anaphoric use of the article - When the article is used to indicate that the word to which it is attached is the one previously mentioned (Williams and Beckman 2007, 36). 
®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯Anaptyxis - The insertion of a vowel into a word to avoid a consonant cluster.
®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯Aoristic perfect - I use the phrase 'aoristic perfect' to refer to one of the ways the qatal form can be rendered into English. Aoristic perfect denotes a past situation the implications of which are no longer felt in the present. The situation may have extended over a period of time and it may have occurred more than once. It may have occurred in the recent or distant past but from the standpoint of the speaker it is to be regarded as a fact having occurred and hence as a fact belonging to the past (Joüon and Muraoka 2006, 337; Driver 1998, 12). The term 'aoristic perfect' and indeed the other categorizations of perfect in this grammar, all relate to the interpretation of qatal verbs in their given contexts. The qatal form in and of itself does not convey these meanings. 
®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯®LL0LI,0LI¯®LR1¯®LL.5LI,0LI¯Beth essentiae - ®LAHebrew¯ÿHá®LAEnglish¯ that is used to indicate the predicate of a clause or a word used predicatively (Joüon and Muraoka 2006, 458).

これが私がテキストを読む方法です。

Glossary    
A fortiori proposition: If X is true, then how much greater is Y true? To move logically from a stronger argument to establish a weaker argument. The weaker argument is sometimes presented by the speaker as the stronger argument.
Accusative of motion/direction - Indicates movement to the noun marked by the accusative and is to be distinguished from the accusative of local determination which indicates location without motion (Joüon and Muraoka 2006, 428).
Anadiplosis - A figure of speech in which the word that a colon ends with, or a like sounding word, is the word that begins the next colon (Brown, Fitzmyer, Murphy, et al. 1990, 245).
Anaphoric use of the article - When the article is used to indicate that the word to which it is attached is the one previously mentioned (Williams and Beckman 2007, 36). 
Anaptyxis - The insertion of a vowel into a word to avoid a consonant cluster.
Aoristic perfect - I use the phrase 'aoristic perfect' to refer to one of the ways the qatal form can be rendered into English. Aoristic perfect denotes a past situation the implications of which are no longer felt in the present. The situation may have extended over a period of time and it may have occurred more than once. It may have occurred in the recent or distant past but from the standpoint of the speaker it is to be regarded as a fact having occurred and hence as a fact belonging to the past (Joüon and Muraoka 2006, 337; Driver 1998, 12). The term 'aoristic perfect' and indeed the other categorizations of perfect in this grammar, all relate to the interpretation of qatal verbs in their given contexts. The qatal form in and of itself does not convey these meanings. 
|> sed -n l Glossary.NB
\256SSDEFAULTS\257\256LR1\257\256JU\257\256MD+BO\257\256UFTimes New R\
oman\257\256SZ12Pt\257Glossary\256MD+BO\257\256TS.5IN,1IN,1.5IN,2IN,2\
.5IN,3IN,3.5IN,4IN,4.5IN,5IN,5.5IN,6IN\257\t\256MD-BO\257\r$
\256NJ\257\256LR1\257\256LL.5LI,0LI\257\256MD+BO\257\256LL0LI,0LI\257\
\256MDNM\257\256LR1\257\256LL.5LI,0LI\257A fortiori proposition: If X\
 is true, then how much greater is Y true? To move logically from a s\
tronger argument to establish a weaker argument. The weaker argument \
is sometimes presented by the speaker as the stronger argument.\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Accusative of motion/direction - Indicates mov\
ement to the noun marked by the accusative and is to be distinguished\
 from the accusative of local determination which indicates location \
without motion (Jo\374on and Muraoka 2006, 428).\r$
Anadiplosis - A figure of speech in which the word that a colon ends \
with, or a like sounding word, is the word that begins the next colon\
 \256GC|CI:R#=47;AU=Brown, Raymond E.;YR=1990;TI=New Jerome biblical \
commentary;PG=245;XT=;F[=;F]=;F#=;ID=;XX=Print;CT=;FL=\257(Brown, Fit\
zmyer, Murphy, et al. 1990,\240245)\256GC\257.\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Anaphoric use of the article - When the articl\
e is used to indicate that the word to which it is attached is the on\
e previously mentioned (Williams and Beckman 2007, 36). \r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Anaptyxis - The insertion of a vowel into a wo\
rd to avoid a consonant cluster.\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Aoristic perfect - I use the phrase 'aoristic \
perfect' to refer to one of the ways the qatal form can be rendered i\
nto English. Aoristic perfect denotes a past situation the implicatio\
ns of which are no longer felt in the present. The situation may have\
 extended over a period of time and it may have occurred more than on\
ce. It may have occurred in the recent or distant past but from the s\
tandpoint of the speaker it is to be regarded as a fact having occurr\
ed and hence as a fact belonging to the past (Jo\374on and Muraoka 20\
06, 337; Driver 1998, 12). The term 'aoristic perfect' and indeed the\
 other categorizations of perfect in this grammar, all relate to the \
interpretation of qatal verbs in their given contexts. The qatal form\
 in and of itself does not convey these meanings. \r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Beth essentiae - \256LAHebrew\257\377H\341\256\
LAEnglish\257 that is used to indicate the predicate of a clause or a\
 word used predicatively (Jo\374on and Muraoka 2006, 458).\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Classic perfect - I use the phrase 'classic pe\
rfect' to refer to one of the ways the qatal form can be rendered int\
o English. Classic perfect refers to the continuing present relevance\
 of a past situation from the perspective of the speaker (Comrie 1976\
, 52). By perfect I do not necessarily imply that a previous situatio\
n has resulted in a state but that the situation has implications rel\
evant to the present. The situation is not merely past and over but s\
omehow persists and continues to intrude into the present. Such verbs\
 are usually translated into English using the perfect or present ten\
se. I have included under this definition quasi-stative verbs which r\
efer to attributes which were acquired before, but which are assumed \
to continue in some way up to the present moment (Driver 1998, 11; Jo\
\374on and Muraoka 2006, 333; Waltke and O'Connor 1990, 487). In some\
 grammars these are treated separately. However, that creates too man\
y functions for the one perfect form. The term 'classic perfect' and \
indeed the other categorizations of perfect in this grammar all relat\
e to the \256MD+IT\257interpretation \256MD-IT\257of qatal verbs in t\
heir given contexts. The qatal form by itself does not convey these m\
eanings.\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Cohortative of praise. The cohortative is ofte\
n used in Psalms to indicate that praise, freely undertaken, has begu\
n. This usage is close to the cohortative of resolve but not identica\
l with it. The emphasis falls not on what the writer is intending to \
do, but what he has already undertaken. \r$
Cohortative of resolve - The cohortative mood normally expresses the \
will of the speaker, but when the speaker has the ability to carry ou\
t what he wants it takes on the coloring of resolve (Van der Merwe et\
 al. 1997, 152; Waltke and O'Connor 1990, 573).\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Concluding \256LAHebrew\257\377h\353\377H\351\
\256LAEnglish\257 - A special use of the word \256LAHebrew\257\377h\
\353\377H\351\256LAEnglish\257 found towards the end of several Psalm\
s and approximating in meaning to: the conclusion of the matter is th\
at\205\r$
\256LL0LI,0LI\257\256LR1\257\256LL.5LI,0LI\257\256LL0LI,0LI\257\256LR\
1\257\256LL.5LI,0LI\257Conjunctive waw - Waw used to connect clauses \

答え1

Sedはスクリプトとしても使用できます(開発が簡単です): "nb2txt"ファイルの生成

#!/usr/bin/sed -Ef

s/®[^¯]*¯//g
s/-{20,}//g
s/\.{20,}//g

そして:

$ chmod 755 nb2txt
$ nb2txt file.nb

答え2

正規表現は\|(GNUの代替パターンsed、他のほとんどの実装ではリテラルバーsed)、および\+1つ以上(GNUではbar、他のほとんどsedの実装ではリテラル)を使用します。 GNUを使用している場合、このモードは同様のモードを削除します。他の実装を使用している場合、一致するものが見つからない可能性があります。+sedsed®MD-IT¯®MDDDDDBO¯sed

sed長年にわたり、ほとんどのバージョンでサポートされている拡張正規表現を使用する方が良いです。

sed -nE 's/®MD-IT¯|®MD+BO¯|®MDNM¯®LL\.8LI,0LI¯|®LL0LI,0LI¯|… and so on

\|また、空の選択肢(パターンの始めと終わり)を削除することをお勧めします。しかし、この場合は問題になりません。

終わりはなく、\.\.\.\.\.\.\.\.\.\.\.\.実際の点またはダッシュ数で----置き換える必要があります。または、表示される10個以上の点を削除するだけです。\.{42}-{23}\-{10,}

答え3

リストを見ると、内容の多いファイルがあることはsed -n l明らかです。キャラクター174(10進数または8進数256)および[char 175](10進数)または257(8進数)。列はおよびであり\256、「シングルバイト」文字として解釈されるとUnicode \257\xae16進数コードaeまたは2568進数)と解釈でき、単に単一文字として解釈されるとUnicode(16進数コードまたは8進数)と解釈できます。 。ちょうどバイト文字、®\xafaf257¯

$ printf '\256 \257 \n' | iconv -f WINDOWS-1252 -t utf8
® ¯

utf8をデフォルトのエンコーディングとして使用する場合(Linuxで一般的に使用されます)

startこれはファイルendの一部の内部エンコーディングに関連しているようです.nb。次に始めて\xae終わる文字列を削除すると、\xafあなたの要求に近づくようです。

$ sed 's/®[^¯]*¯//g' test
Glossary    
A fortiori proposition: If X is true, then how much greater is Y true? To move logically from a stronger argument to establish a weaker argument. The weaker argument is sometimes presented by the speaker as the stronger argument.
Accusative of motion/direction - Indicates movement to the noun marked by the accusative and is to be distinguished from the accusative of local determination which indicates location without motion (Joüon and Muraoka 2006, 428).
Anadiplosis - A figure of speech in which the word that a colon ends with, or a like sounding word, is the word that begins the next colon (Brown, Fitzmyer, Murphy, et al. 1990, 245).
Anaphoric use of the article - When the article is used to indicate that the word to which it is attached is the one previously mentioned (Williams and Beckman 2007, 36). 
Anaptyxis - The insertion of a vowel into a word to avoid a consonant cluster.
Aoristic perfect - I use the phrase 'aoristic perfect' to refer to one of the ways the qatal form can be rendered into English. Aoristic perfect denotes a past situation the implications of which are no longer felt in the present. The situation may have extended over a period of time and it may have occurred more than once. It may have occurred in the recent or distant past but from the standpoint of the speaker it is to be regarded as a fact having occurred and hence as a fact belonging to the past (Joüon and Muraoka 2006, 337; Driver 1998, 12). The term 'aoristic perfect' and indeed the other categorizations of perfect in this grammar, all relate to the interpretation of qatal verbs in their given contexts. The qatal form in and of itself does not convey these meanings. 
Beth essentiae - ÿHá that is used to indicate the predicate of a clause or a word used predicatively (Joüon and Muraoka 2006, 458).

関連情報