Medlineの要約で薬物名を解析する必要があります。出力をインポートしてから貼り付けを使用してこれを実行しようとしましたが、同じ行にあっても各一致に対して1つの出力が生成されるgrep -wf
ためgrep -owf
、出力は一致しません。grep -owf
パターンファイル:
DrugA
DrugB
DrugC
DrugD
解析するファイル:
In our study, DrugA and DrugB were found to be effective. DrugA was more effective than DrugB.
In our study, DrugC was found to be effective
In our study, DrugX was found to be effective
希望の出力:
DrugA In our study, DrugA and DrugB were found to be effective. DrugA was more effective.
DrugB In our study, DrugA and DrugB were found to be effective. DrugA was more effective.
DrugC In our study, DrugC was found to be effective
答え1
たぶん方法がありますかawk
?
awk '
NR == FNR {
a[$0] = 1
n = length($0)
w = n > w ? n : w
next
}
{
for (i in a)
if ($0 ~ i)
printf "%-* s %s\n", w, i, $0
}
' pattern_file.txt data_file.txt
答え2
厳密には単独ではありませんが、grep
次のように動作します。
while IFS= read -r pattern; do
grep "$pattern" input | awk -v drug="$pattern" 'BEGIN {OFS="\t"} { print drug,$0}'
done < "patterns"
答え3
一方sed
通行:
sed 's|.*|/&/{h;s/^/&\\t/p;g}|' pattern_file | sed -nf - input