あるファイルの1行にある文字列を別のファイルから削除する必要があるPerlスクリプトに関する問題

Question 1

あなたが要求した問題に加えて、あなたのスクリプトには、「remove.txt」のすべての行に対して「foo」をそのまま渡すという大きな欠陥があります。これは非常に非効率的です。より良いアプローチは、「remove.txt」を読み、長い正規表現を設定し、それを一度使用して「foo」を編集することです。

最も簡単な方法は、検索文字列を配列にプッシュしてから "|" "join()" 配列 (regex "or") 文字を使用して正規表現で使用できる文字列を作成します。

以下は、これを実行して元の問題を解決するスクリプトです。

#! /usr/bin/perl 

use strict;
use warnings;

# first construct a regular expression containing every
# line that needs to be removed.  This is so we only have
# to run a single pass through $infile rather than one
# pass per line in $removefile.
my @remove = ();

my $removefile='remove.txt';
open(REMFILE,"<",$removefile) || die "couldn't open $removefile: $!\n";
while(<REMFILE>) {
    chomp;
    next if (/^\s*$/);
    push @remove, $_;
};
close(REMFILE);

# choose one of the following two lines depending on
# whether you want to remove only entire lines or text
# within a line:
my $remove = '^(' . join("|",@remove) . ')$';
#my $remove = join("|",@remove);

# now remove the unwanted text from all lines in $infile
my $infile = 'foo';
system('perl','-p','-i','-e',"s/$remove//g",$infile);

# if you want to delete matching lines, try this instead:
#system('perl','-n','-i','-e',"print unless /$remove/",$infile);

Answer

あなたが要求した問題に加えて、あなたのスクリプトには、「remove.txt」のすべての行に対して「foo」をそのまま渡すという大きな欠陥があります。これは非常に非効率的です。より良いアプローチは、「remove.txt」を読み、長い正規表現を設定し、それを一度使用して「foo」を編集することです。

最も簡単な方法は、検索文字列を配列にプッシュしてから "|" "join()" 配列 (regex "or") 文字を使用して正規表現で使用できる文字列を作成します。

以下は、これを実行して元の問題を解決するスクリプトです。

#! /usr/bin/perl 

use strict;
use warnings;

# first construct a regular expression containing every
# line that needs to be removed.  This is so we only have
# to run a single pass through $infile rather than one
# pass per line in $removefile.
my @remove = ();

my $removefile='remove.txt';
open(REMFILE,"<",$removefile) || die "couldn't open $removefile: $!\n";
while(<REMFILE>) {
    chomp;
    next if (/^\s*$/);
    push @remove, $_;
};
close(REMFILE);

# choose one of the following two lines depending on
# whether you want to remove only entire lines or text
# within a line:
my $remove = '^(' . join("|",@remove) . ')$';
#my $remove = join("|",@remove);

# now remove the unwanted text from all lines in $infile
my $infile = 'foo';
system('perl','-p','-i','-e',"s/$remove//g",$infile);

# if you want to delete matching lines, try this instead:
#system('perl','-n','-i','-e',"print unless /$remove/",$infile);

Question 2

でqq()正規表現メタ文字（(および)）を使用してエスケープする必要があります$bad_string。

            my $bad_string = "\\($line\\)";
            system( qq( perl -p -i -e 's/$bad_string//g' foo ) );

Answer

でqq()正規表現メタ文字（(および)）を使用してエスケープする必要があります$bad_string。

            my $bad_string = "\\($line\\)";
            system( qq( perl -p -i -e 's/$bad_string//g' foo ) );

Question 3

あなたの質問には3つの要素があります。

「除外リスト」を作成します。除外リストの「特殊」文字は問題を引き起こす可能性があります。
ファイルを読み込んで「一致」する場合は、行を除外してください。
新しいファイルを作成してください。

あなたの質問には「悪いスタイル」と呼ぶことが進んでいると思います。

3引数の語彙ファイルハンドルを開くのは良いスタイルです。
内部で呼び出すのはsystem非効率的です。 perlperl
参照補間は面倒なので避けるのが最善です。
出力ファイルを繰り返し再処理していますが、これは非常に非効率的です。（覚えておいてください - ディスクIOは、システムで実行できる最も遅い作業です。）

これを念頭に置いて、次のようにします。

#!/usr/bin/env perl
use strict;
use warnings;

my $infile = "remove.txt";
open( my $pattern_fh, '<', $infile ) or die "cannot open $infile $!";

#quotemeta escapes meta characters that'll break your pattern matching. 
my $regex = join( '|', map {quotemeta} <$pattern_fh> );
#compile the regex
$regex = qr/^($regex)$/;    #whole lines
close($input_fh);

print "Using regular expression: $regex\n"; 

open( my $input_fh,  '<', "foo" )     or die $!;
open( my $output_fh, '>', "foo.new" ) or die $!;

#tell print where to print by default. 
#could instead print {$output_fh} $_; 
select($output_fh);
while (<$input_fh>) {
    print unless m/$regex/;
}
close($input_fh);
close($output_fh);

#rename/copy if it worked

（注：徹底的なテストではありません。サンプルデータを提供できる場合は、必要に応じてテスト/更新します。）

Answer

あなたの質問には3つの要素があります。

「除外リスト」を作成します。除外リストの「特殊」文字は問題を引き起こす可能性があります。
ファイルを読み込んで「一致」する場合は、行を除外してください。
新しいファイルを作成してください。

あなたの質問には「悪いスタイル」と呼ぶことが進んでいると思います。

3引数の語彙ファイルハンドルを開くのは良いスタイルです。
内部で呼び出すのはsystem非効率的です。 perlperl
参照補間は面倒なので避けるのが最善です。
出力ファイルを繰り返し再処理していますが、これは非常に非効率的です。（覚えておいてください - ディスクIOは、システムで実行できる最も遅い作業です。）

これを念頭に置いて、次のようにします。

#!/usr/bin/env perl
use strict;
use warnings;

my $infile = "remove.txt";
open( my $pattern_fh, '<', $infile ) or die "cannot open $infile $!";

#quotemeta escapes meta characters that'll break your pattern matching. 
my $regex = join( '|', map {quotemeta} <$pattern_fh> );
#compile the regex
$regex = qr/^($regex)$/;    #whole lines
close($input_fh);

print "Using regular expression: $regex\n"; 

open( my $input_fh,  '<', "foo" )     or die $!;
open( my $output_fh, '>', "foo.new" ) or die $!;

#tell print where to print by default. 
#could instead print {$output_fh} $_; 
select($output_fh);
while (<$input_fh>) {
    print unless m/$regex/;
}
close($input_fh);
close($output_fh);

#rename/copy if it worked

（注：徹底的なテストではありません。サンプルデータを提供できる場合は、必要に応じてテスト/更新します。）

あるファイルの1行にある文字列を別のファイルから削除する必要があるPerlスクリプトに関する問題

答え1

答え2

答え3

関連情報