egrep [wW] [oO] [rR] [dD]がgrep -i wordよりも速いのはなぜですか？

Question 1

grep -i 'a'grep '[Aa]'純粋なASCII言語環境と同じです。 Unicode ロケールでは文字同等性と変換が複雑になる可能性があるため、grep同等文字を確認するには追加の作業が必要になる場合があります。関連ロケールは、LC_CTYPEバイトが文字として解釈される方法を決定することです。

私の経験では、grepUTF-8ロケールでGNUを呼び出すと速度が遅くなる可能性があります。 ASCII文字のみを検索することがわかっている場合は、ASCII専用ロケールから呼び出す方が高速かもしれません。期待した

time LC_ALL=C grep -iq "thats" testfile
time LC_ALL=C egrep -q "[tT][hH][aA][tT][sS]" testfile

区別できない時間を生み出すだろう。

つまり、grepDebian jessieでGNUを使用して結果を再現することはできませんでした（ただしテストファイルを指定していません）。 ASCIIロケール（LC_ALL=C）を設定するとgrep -iより高速です。影響は文字列の正確な特性によって異なります。たとえば、繰り返される文字を含む文字列はパフォーマンスを低下させます（これは期待できることです）。

Answer

grep -i 'a'grep '[Aa]'純粋なASCII言語環境と同じです。 Unicode ロケールでは文字同等性と変換が複雑になる可能性があるため、grep同等文字を確認するには追加の作業が必要になる場合があります。関連ロケールは、LC_CTYPEバイトが文字として解釈される方法を決定することです。

私の経験では、grepUTF-8ロケールでGNUを呼び出すと速度が遅くなる可能性があります。 ASCII文字のみを検索することがわかっている場合は、ASCII専用ロケールから呼び出す方が高速かもしれません。期待した

time LC_ALL=C grep -iq "thats" testfile
time LC_ALL=C egrep -q "[tT][hH][aA][tT][sS]" testfile

区別できない時間を生み出すだろう。

つまり、grepDebian jessieでGNUを使用して結果を再現することはできませんでした（ただしテストファイルを指定していません）。 ASCIIロケール（LC_ALL=C）を設定するとgrep -iより高速です。影響は文字列の正確な特性によって異なります。たとえば、繰り返される文字を含む文字列はパフォーマンスを低下させます（これは期待できることです）。

Question 2

好奇心が強いので、これをArch Linuxシステムでテストしました。

$ uname -r
4.4.5-1-ARCH
$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           3.9G  720K  3.9G   1% /tmp
$ dd if=/dev/urandom bs=1M count=1K | base64 > foo
$ df -h .                                         
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           3.9G  1.4G  2.6G  35% /tmp
$ for i in {1..100}; do /usr/bin/time -f '%e' -ao grep.log grep -iq foobar foo; done
$ for i in {1..100}; do /usr/bin/time -f '%e' -ao egrep.log egrep -q '[fF][oO][oO][bB][aA][rR]' foo; done

$ grep --version
grep (GNU grep) 2.23
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.

そしていくつかの統計単一のコマンドで数値リストの最小値、最大値、中央値、平均を取得する方法はありますか？:

$ R -q -e "x <- read.csv('grep.log', header = F); summary(x); sd(x[ , 1])"
> x <- read.csv('grep.log', header = F); summary(x); sd(x[ , 1])
       V1       
 Min.   :1.330  
 1st Qu.:1.347  
 Median :1.360  
 Mean   :1.362  
 3rd Qu.:1.370  
 Max.   :1.440  
[1] 0.02322725
> 
> 
$ R -q -e "x <- read.csv('egrep.log', header = F); summary(x); sd(x[ , 1])"
> x <- read.csv('egrep.log', header = F); summary(x); sd(x[ , 1])
       V1       
 Min.   :1.330  
 1st Qu.:1.340  
 Median :1.360  
 Mean   :1.365  
 3rd Qu.:1.380  
 Max.   :1.430  
[1] 0.02320288
> 
>

私もそこにいましたが、en_GB.utf8時間がほぼ分にならないほどでした。

Answer

好奇心が強いので、これをArch Linuxシステムでテストしました。

$ uname -r
4.4.5-1-ARCH
$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           3.9G  720K  3.9G   1% /tmp
$ dd if=/dev/urandom bs=1M count=1K | base64 > foo
$ df -h .                                         
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           3.9G  1.4G  2.6G  35% /tmp
$ for i in {1..100}; do /usr/bin/time -f '%e' -ao grep.log grep -iq foobar foo; done
$ for i in {1..100}; do /usr/bin/time -f '%e' -ao egrep.log egrep -q '[fF][oO][oO][bB][aA][rR]' foo; done

$ grep --version
grep (GNU grep) 2.23
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.

そしていくつかの統計単一のコマンドで数値リストの最小値、最大値、中央値、平均を取得する方法はありますか？:

$ R -q -e "x <- read.csv('grep.log', header = F); summary(x); sd(x[ , 1])"
> x <- read.csv('grep.log', header = F); summary(x); sd(x[ , 1])
       V1       
 Min.   :1.330  
 1st Qu.:1.347  
 Median :1.360  
 Mean   :1.362  
 3rd Qu.:1.370  
 Max.   :1.440  
[1] 0.02322725
> 
> 
$ R -q -e "x <- read.csv('egrep.log', header = F); summary(x); sd(x[ , 1])"
> x <- read.csv('egrep.log', header = F); summary(x); sd(x[ , 1])
       V1       
 Min.   :1.330  
 1st Qu.:1.340  
 Median :1.360  
 Mean   :1.365  
 3rd Qu.:1.380  
 Max.   :1.430  
[1] 0.02320288
> 
>

私もそこにいましたが、en_GB.utf8時間がほぼ分にならないほどでした。

egrep [wW] [oO] [rR] [dD]がgrep -i wordよりも速いのはなぜですか？

答え1

答え2

関連情報