2017/01/30 修正

Question 1

私はperl次のようなものを使います：

perl -MFile::Find -MClone=clone -lne '
  # parse the strings.txt input, here looking for the sequences of
  # 0 or more characters (.*?) in between two " characters
  for (/"(.*?)"/g) {
    # @needle is an array of associative arrays whose keys
    # are the "strings" for each line.
    $needle[$n]{$_} = undef;
  }
  $n++;

  END{
    sub wanted {
      return unless -f; # only regular files
      my $needle_clone = clone(\@needle);
      if (open FILE, "<", $_) {
        LINE: while (<FILE>) {
          # read the file line by line
          for (my $i = 0; $i < $n; $i++) {
            for my $s (keys %{$needle_clone->[$i]}) {
              if (index($_, $s)>=0) {
                # if the string is found, we delete it from the associative
                # array.
                delete $needle_clone->[$i]{$s};
                unless (%{$needle_clone->[$i]}) {
                  # if the associative array is empty, that means we have
                  # found all the strings for that $i, that means we can
                  # stop processing, and the file matches
                  print $File::Find::name;
                  last LINE;
                }
              }
            }
          }
        }
        close FILE;
      }
    }
    find(\&wanted, ".")
  }' /path/to/strings.txt

これは、文字列検索の回数を最小限に抑えることを意味します。

ここではファイルを1行ずつ処理します。ファイルが非常に小さい場合、全体的に処理すると作業が少し簡単になり、パフォーマンスが向上する可能性があります。

リストファイルは次の場所にあると予想されます。

 "surveillance data" "surveillance technology" "cctv camera"
 "social media" "surveillance techniques" "enforcement agencies"
 "social control" "surveillance camera" "social security"
 "surveillance data" "security guards" "social networking"
 "surveillance mechanisms" "cctv surveillance" "contemporary surveillance"

形式に応じて、各行には引用符（二重引用符を含む）で囲まれた特定の数（必ずしも3である必要はありません）の文字列があります。引用符付き文字列自体には二重引用符文字を含めることはできません。二重引用符文字は、検索中のテキストの一部ではありません。つまり、リストファイルに次のものが含まれている場合：

"A" "B"
"1" "2" "3"

これは、現在のディレクトリと次のいずれかを含むその下のすべての一般ファイルへのパスを報告します。

A両方B
または（独占または) すべて1と23

どこでも。

Answer

私はperl次のようなものを使います：

perl -MFile::Find -MClone=clone -lne '
  # parse the strings.txt input, here looking for the sequences of
  # 0 or more characters (.*?) in between two " characters
  for (/"(.*?)"/g) {
    # @needle is an array of associative arrays whose keys
    # are the "strings" for each line.
    $needle[$n]{$_} = undef;
  }
  $n++;

  END{
    sub wanted {
      return unless -f; # only regular files
      my $needle_clone = clone(\@needle);
      if (open FILE, "<", $_) {
        LINE: while (<FILE>) {
          # read the file line by line
          for (my $i = 0; $i < $n; $i++) {
            for my $s (keys %{$needle_clone->[$i]}) {
              if (index($_, $s)>=0) {
                # if the string is found, we delete it from the associative
                # array.
                delete $needle_clone->[$i]{$s};
                unless (%{$needle_clone->[$i]}) {
                  # if the associative array is empty, that means we have
                  # found all the strings for that $i, that means we can
                  # stop processing, and the file matches
                  print $File::Find::name;
                  last LINE;
                }
              }
            }
          }
        }
        close FILE;
      }
    }
    find(\&wanted, ".")
  }' /path/to/strings.txt

これは、文字列検索の回数を最小限に抑えることを意味します。

ここではファイルを1行ずつ処理します。ファイルが非常に小さい場合、全体的に処理すると作業が少し簡単になり、パフォーマンスが向上する可能性があります。

リストファイルは次の場所にあると予想されます。

 "surveillance data" "surveillance technology" "cctv camera"
 "social media" "surveillance techniques" "enforcement agencies"
 "social control" "surveillance camera" "social security"
 "surveillance data" "security guards" "social networking"
 "surveillance mechanisms" "cctv surveillance" "contemporary surveillance"

形式に応じて、各行には引用符（二重引用符を含む）で囲まれた特定の数（必ずしも3である必要はありません）の文字列があります。引用符付き文字列自体には二重引用符文字を含めることはできません。二重引用符文字は、検索中のテキストの一部ではありません。つまり、リストファイルに次のものが含まれている場合：

"A" "B"
"1" "2" "3"

これは、現在のディレクトリと次のいずれかを含むその下のすべての一般ファイルへのパスを報告します。

A両方B
または（独占または) すべて1と23

どこでも。

Question 2

システムには存在しないようなので、agrepsedとawkベースの選択肢をチェックして、grepとローカルファイルの読み取りを適用するモードで動作します。

PS：あなたはosxを使用しているので、あなたが使用しているawkバージョンが次の使用法をサポートしているかどうかはわかりません。

awkさまざまなAND作業モードを使用してgrepの使用をシミュレートできます。
awk '/pattern1/ && /pattern2/ && /pattern3/'

したがって、次のようにスキーマファイルを変換できます。

$ cat ./tmp/d1.txt
"surveillance data" "surveillance technology" "cctv camera"
"social media" "surveillance techniques" "enforcement agencies"
"social control" "surveillance camera" "social security"
"surveillance data" "security guards" "social networking"
"surveillance mechanisms" "cctv surveillance" "contemporary surveillance"

これに関して：

$ sed 's/" "/\/ \&\& \//g; s/^"/\//g; s/"$/\//g' ./tmp/d1.txt
/surveillance data/ && /surveillance technology/ && /cctv camera/
/social media/ && /surveillance techniques/ && /enforcement agencies/
/social control/ && /surveillance camera/ && /social security/
/surveillance data/ && /security guards/ && /social networking/
/surveillance mechanisms/ && /cctv surveillance/ && /contemporary surveillance/

PS：結局、出力を別のファイルにリダイレクトするために使用できます>anotherfile。または、そのsed -iオプションを使用して、同じクエリパターンファイルで内部変更を実行できます。

次に、このスキーマファイルでawk形式のスキーマをawkに提供します。

$ while IFS= read -r line;do awk "$line" *.txt;done<./tmp/d1.txt #d1.txt = my test pattern file

また、ソースパターンファイルの各行に次のようにsedを適用して、rawパターンファイルのパターンを変換することはできません。

while IFS= read -r line;do 
  line=$(sed 's/" "/\/ \&\& \//g; s/^"/\//g; s/"$/\//g' <<<"$line")
  awk "$line" *.txt
done <./tmp/d1.txt

または1行：

$ while IFS= read -r line;do line=$(sed 's/" "/\/ \&\& \//g; s/^"/\//g; s/"$/\//g' <<<"$line"); awk "$line" *.txt;done <./tmp/d1.txt

上記のコマンドは、次のようにテストファイルに正しいAND結果を返します。

$ cat d2.txt
This guys over there have the required surveillance technology to do the job.
The other guys not only have efficient surveillance technology, but they also gather surveillance data by one cctv camera.

$ cat d3.txt
All surveillance data are locked.
All surveillance data are locked and guarded by security guards.
There are several surveillance mechanisms (i.e cctv surveillance, contemporary surveillance, etv)

結果：

$ while IFS= read -r line;do awk "$line" *.txt;done<./tmp/d1.txt
#or while IFS= read -r line;do line=$(sed 's/" "/\/ \&\& \//g; s/^"/\//g; s/"$/\//g' <<<"$line"); awk "$line" *.txt;done <./tmp/d1.txt
The other guys not only have efficient surveillance technology, but they also gather surveillance data by one cctv camera.
There are several surveillance mechanisms (i.e cctv surveillance, contemporary surveillance, etv)

更新：
上記のawkソリューションは、一致するtxtファイルの内容を印刷します。
コンテンツの代わりにファイル名を表示するには、必要に応じて次の awk を使用します。

awk "$line""{print FILENAME}" *.txt

Answer

システムには存在しないようなので、agrepsedとawkベースの選択肢をチェックして、grepとローカルファイルの読み取りを適用するモードで動作します。

PS：あなたはosxを使用しているので、あなたが使用しているawkバージョンが次の使用法をサポートしているかどうかはわかりません。

awkさまざまなAND作業モードを使用してgrepの使用をシミュレートできます。
awk '/pattern1/ && /pattern2/ && /pattern3/'

したがって、次のようにスキーマファイルを変換できます。

$ cat ./tmp/d1.txt
"surveillance data" "surveillance technology" "cctv camera"
"social media" "surveillance techniques" "enforcement agencies"
"social control" "surveillance camera" "social security"
"surveillance data" "security guards" "social networking"
"surveillance mechanisms" "cctv surveillance" "contemporary surveillance"

これに関して：

$ sed 's/" "/\/ \&\& \//g; s/^"/\//g; s/"$/\//g' ./tmp/d1.txt
/surveillance data/ && /surveillance technology/ && /cctv camera/
/social media/ && /surveillance techniques/ && /enforcement agencies/
/social control/ && /surveillance camera/ && /social security/
/surveillance data/ && /security guards/ && /social networking/
/surveillance mechanisms/ && /cctv surveillance/ && /contemporary surveillance/

PS：結局、出力を別のファイルにリダイレクトするために使用できます>anotherfile。または、そのsed -iオプションを使用して、同じクエリパターンファイルで内部変更を実行できます。

次に、このスキーマファイルでawk形式のスキーマをawkに提供します。

$ while IFS= read -r line;do awk "$line" *.txt;done<./tmp/d1.txt #d1.txt = my test pattern file

また、ソースパターンファイルの各行に次のようにsedを適用して、rawパターンファイルのパターンを変換することはできません。

while IFS= read -r line;do 
  line=$(sed 's/" "/\/ \&\& \//g; s/^"/\//g; s/"$/\//g' <<<"$line")
  awk "$line" *.txt
done <./tmp/d1.txt

または1行：

$ while IFS= read -r line;do line=$(sed 's/" "/\/ \&\& \//g; s/^"/\//g; s/"$/\//g' <<<"$line"); awk "$line" *.txt;done <./tmp/d1.txt

上記のコマンドは、次のようにテストファイルに正しいAND結果を返します。

$ cat d2.txt
This guys over there have the required surveillance technology to do the job.
The other guys not only have efficient surveillance technology, but they also gather surveillance data by one cctv camera.

$ cat d3.txt
All surveillance data are locked.
All surveillance data are locked and guarded by security guards.
There are several surveillance mechanisms (i.e cctv surveillance, contemporary surveillance, etv)

結果：

$ while IFS= read -r line;do awk "$line" *.txt;done<./tmp/d1.txt
#or while IFS= read -r line;do line=$(sed 's/" "/\/ \&\& \//g; s/^"/\//g; s/"$/\//g' <<<"$line"); awk "$line" *.txt;done <./tmp/d1.txt
The other guys not only have efficient surveillance technology, but they also gather surveillance data by one cctv camera.
There are several surveillance mechanisms (i.e cctv surveillance, contemporary surveillance, etv)

更新：
上記のawkソリューションは、一致するtxtファイルの内容を印刷します。
コンテンツの代わりにファイル名を表示するには、必要に応じて次の awk を使用します。

awk "$line""{print FILENAME}" *.txt

Question 3

この問題は多少厄介ですが、次のように解決できます。

while read one two three four five six
  do grep -lF "$one $two" *files* | xargs grep -lF "$three $four" | xargs grep -lF "$five $six"
done < patterns | sort -u

これは、パターンファイルが1行に正確に6つの単語（3つのパターン、それぞれ2つの単語）を含むと仮定します。ロジックは、and3つの連続したフィルタ（）を連結することによってgrep実装されます。これは特に効率的ではありません。解決がawk速いかもしれません。

Answer

この問題は多少厄介ですが、次のように解決できます。

while read one two three four five six
  do grep -lF "$one $two" *files* | xargs grep -lF "$three $four" | xargs grep -lF "$five $six"
done < patterns | sort -u

これは、パターンファイルが1行に正確に6つの単語（3つのパターン、それぞれ2つの単語）を含むと仮定します。ロジックは、and3つの連続したフィルタ（）を連結することによってgrep実装されます。これは特に効率的ではありません。解決がawk速いかもしれません。

Question 4

私のテストで動作するように見える別の方法があります。

後で、同じファイル（d1.txt）にある文字列ファイルのgrepingを防ぐために、文字列ファイルデータをd1.txtというファイルにコピーし、別のディレクトリ（tmpなど）に移動しました。

次に、次のコマンドを使用して、この文字列ファイル（私の場合はd1.txt）の各検索語の間にセミコロンを挿入します。sed -i 's/" "/";"/g' ./tmp/d1.txt

$ cat ./tmp/d1.txt
"surveillance data" "surveillance technology" "cctv camera"
"social media" "surveillance techniques" "enforcement agencies"
"social control" "surveillance camera" "social security"
"surveillance data" "security guards" "social networking"
"surveillance mechanisms" "cctv surveillance" "contemporary surveillance"
$ sed -i 's/" "/";"/g' ./tmp/d1.txt
$ cat ./tmp/d1.txt
"surveillance data";"surveillance technology";"cctv camera"
"social media";"surveillance techniques";"enforcement agencies"
"social control";"surveillance camera";"social security"
"surveillance data";"security guards";"social networking"
"surveillance mechanisms";"cctv surveillance";"contemporary surveillance"

次に、コマンドを使用して二重引用符を削除しますsed 's/"//g' ./tmp/d1.txt 。 PS：これは実際には必要ないかもしれませんが、テストのために二重引用符を削除しました。

$ sed -i 's/"//g' ./tmp/d1.txt && cat ./tmp/d1.txt
surveillance data;surveillance technology;cctv camera
social media;surveillance techniques;enforcement agencies
social control;surveillance camera;social security
surveillance data;security guards;social networking
surveillance mechanisms;cctv surveillance;contemporary surveillance

agrepいいえ、AND操作でマルチモードgrepを提供するように設計されたこのプログラムを使用して、現在のディレクトリ内のすべてのファイルをgrepできます。

agrep;ANDで評価するには、複数のパターンをセミコロンで区切る必要があります。

私のテストでは、次の内容で2つのサンプルファイルを作成しました。

$ cat d2.txt
This guys over there have the required surveillance technology to do the job.

The other guys not only have efficient surveillance technology, but they also gather surveillance data by one cctv camera.

$ cat d3.txt
All surveillance data are locked.
All surveillance data are locked and guarded by security guards.
There are several surveillance mechanisms (i.e cctv surveillance, contemporary surveillance, etv)

現在のディレクトリでagrepを実行すると、正しい行（ANDを使用）とファイル名が返されます。

$ while IFS= read -r line;do agrep "$line" *;done<./tmp/d1.txt
d2.txt: The other guys not only have efficient surveillance technology, but they also gather surveillance data by one cctv camera.
d3.txt: There are several surveillance mechanisms (i.e cctv surveillance, contemporary surveillance, etv)

Answer

私のテストで動作するように見える別の方法があります。

後で、同じファイル（d1.txt）にある文字列ファイルのgrepingを防ぐために、文字列ファイルデータをd1.txtというファイルにコピーし、別のディレクトリ（tmpなど）に移動しました。

次に、次のコマンドを使用して、この文字列ファイル（私の場合はd1.txt）の各検索語の間にセミコロンを挿入します。sed -i 's/" "/";"/g' ./tmp/d1.txt

$ cat ./tmp/d1.txt
"surveillance data" "surveillance technology" "cctv camera"
"social media" "surveillance techniques" "enforcement agencies"
"social control" "surveillance camera" "social security"
"surveillance data" "security guards" "social networking"
"surveillance mechanisms" "cctv surveillance" "contemporary surveillance"
$ sed -i 's/" "/";"/g' ./tmp/d1.txt
$ cat ./tmp/d1.txt
"surveillance data";"surveillance technology";"cctv camera"
"social media";"surveillance techniques";"enforcement agencies"
"social control";"surveillance camera";"social security"
"surveillance data";"security guards";"social networking"
"surveillance mechanisms";"cctv surveillance";"contemporary surveillance"

次に、コマンドを使用して二重引用符を削除しますsed 's/"//g' ./tmp/d1.txt 。 PS：これは実際には必要ないかもしれませんが、テストのために二重引用符を削除しました。

$ sed -i 's/"//g' ./tmp/d1.txt && cat ./tmp/d1.txt
surveillance data;surveillance technology;cctv camera
social media;surveillance techniques;enforcement agencies
social control;surveillance camera;social security
surveillance data;security guards;social networking
surveillance mechanisms;cctv surveillance;contemporary surveillance

agrepいいえ、AND操作でマルチモードgrepを提供するように設計されたこのプログラムを使用して、現在のディレクトリ内のすべてのファイルをgrepできます。

agrep;ANDで評価するには、複数のパターンをセミコロンで区切る必要があります。

私のテストでは、次の内容で2つのサンプルファイルを作成しました。

$ cat d2.txt
This guys over there have the required surveillance technology to do the job.

The other guys not only have efficient surveillance technology, but they also gather surveillance data by one cctv camera.

$ cat d3.txt
All surveillance data are locked.
All surveillance data are locked and guarded by security guards.
There are several surveillance mechanisms (i.e cctv surveillance, contemporary surveillance, etv)

現在のディレクトリでagrepを実行すると、正しい行（ANDを使用）とファイル名が返されます。

$ while IFS= read -r line;do agrep "$line" *;done<./tmp/d1.txt
d2.txt: The other guys not only have efficient surveillance technology, but they also gather surveillance data by one cctv camera.
d3.txt: There are several surveillance mechanisms (i.e cctv surveillance, contemporary surveillance, etv)

2017/01/30 修正

2017/01/30 修正

2017/01/29 修正

答え1

答え2

答え3

答え4

関連情報