大容量ファイルで複数行式で sed を使用する場合のメモリ不足

Question 1

最初の3つのコマンドが犯人です。

:a
N
$!ba

これにより、ファイル全体が一度にメモリに読み込まれます。次のスクリプトは、一度に1つのセグメントのみをメモリに保持できます。

% cat test.sed
#!/usr/bin/sed -nf

# Append this line to the hold space. 
# To avoid an extra newline at the start, replace instead of append.
1h
1!H

# If we find a paren at the end...
/)$/{
    # Bring the hold space into the pattern space
    g
    # Remove the newlines
    s/\n//g 
    # Print what we have
    p
    # Delete the hold space
    s/.*//
    h
}
% cat test.in
a
b
c()
d()
e
fghi
j()
% ./test.sed test.in
abc()
d()
efghij()

このawkソリューションは各行を印刷するため、一度にメモリに1行だけ残ります。

% awk '/)$/{print;nl=1;next}{printf "%s",$0;nl=0}END{if(!nl)print ""}' test.in
abc()
d()
efghij()

Answer

最初の3つのコマンドが犯人です。

:a
N
$!ba

これにより、ファイル全体が一度にメモリに読み込まれます。次のスクリプトは、一度に1つのセグメントのみをメモリに保持できます。

% cat test.sed
#!/usr/bin/sed -nf

# Append this line to the hold space. 
# To avoid an extra newline at the start, replace instead of append.
1h
1!H

# If we find a paren at the end...
/)$/{
    # Bring the hold space into the pattern space
    g
    # Remove the newlines
    s/\n//g 
    # Print what we have
    p
    # Delete the hold space
    s/.*//
    h
}
% cat test.in
a
b
c()
d()
e
fghi
j()
% ./test.sed test.in
abc()
d()
efghij()

このawkソリューションは各行を印刷するため、一度にメモリに1行だけ残ります。

% awk '/)$/{print;nl=1;next}{printf "%s",$0;nl=0}END{if(!nl)print ""}' test.in
abc()
d()
efghij()

Question 2

完璧にするために、Perlソリューションは次のとおりです。perl -p -e '/)$/ || chomp'

対称の場合：スクリプトをループで囲み、1行ずつ読み込み、印刷します。 / script-p式は行末で一致し、一致しない場合（一致がfalse）、改行を削除し続けます。終わり。-e)chomp

Answer

完璧にするために、Perlソリューションは次のとおりです。perl -p -e '/)$/ || chomp'

対称の場合：スクリプトをループで囲み、1行ずつ読み込み、印刷します。 / script-p式は行末で一致し、一致しない場合（一致がfalse）、改行を削除し続けます。終わり。-e)chomp

Question 3

これを使用してください：

sed -i -z -u 's/\n/ /g' reallyBigFile.log

-z, --null-data
NUL 文字で行を区切ります。

-u、--unbufferedは、
入力ファイルから最小限のデータをロードし、出力バッファをより頻繁にフラッシュします。

Answer

これを使用してください：

sed -i -z -u 's/\n/ /g' reallyBigFile.log

-z, --null-data
NUL 文字で行を区切ります。

-u、--unbufferedは、
入力ファイルから最小限のデータをロードし、出力バッファをより頻繁にフラッシュします。

大容量ファイルで複数行式で sed を使用する場合のメモリ不足

答え1

答え2

答え3

関連情報