次のテキストファイルがいくつかあります。
サンプル1.tsv
## index of replication (iRep) - thresholds: min cov. = 5, min wins. = 0.98, min r^2 = 0.9, max fragments/Mbp = 175, GC correction min r^2 = 0.0
# genome ./bam/18097D-02-01.sam.sorted.sam
18097D-02-01_bin.11.fa 1.295179372
18097D-02-01_bin.13.fa 1.284880274
18097D-02-01_bin.15.fa 1.339609918
#
## un-filtered index of replication (iRep)
# genome ./bam/18097D-02-01.sam.sorted.sam
18097D-02-01_bin.11.fa 1.295179372
18097D-02-01_bin.13.fa 1.284880274
18097D-02-01_bin.15.fa 1.339609918
#
## raw index of replication (no GC bias correction)
# genome ./bam/18097D-02-01.sam.sorted.sam
18097D-02-01_bin.11.fa 1.298934455
18097D-02-01_bin.13.fa 1.2885746
#
サンプル2.tsv
## index of replication (iRep) - thresholds: min cov. = 5, min wins. = 0.98, min r^2 = 0.9, max fragments/Mbp = 175, GC correction min r^2 = 0.0
# genome ./bam/18097D-02-02.sam.sorted.sam
18097D-02-02_bin.11.fa 1.59665286
18097D-02-02_bin.13.fa 1.332990306
18097D-02-02_bin.14.fa 1.499196606
18097D-02-02_bin.6.fa 1.323465715
18097D-02-02_bin.9.fa 1.583302299
#
## un-filtered index of replication (iRep)
# genome ./bam/18097D-02-02.sam.sorted.sam
18097D-02-02_bin.11.fa 1.59665286
18097D-02-02_bin.13.fa 1.332990306
18097D-02-02_bin.14.fa 1.499196606
18097D-02-02_bin.6.fa 1.323465715
18097D-02-02_bin.9.fa 1.583302299
#
## raw index of replication (no GC bias correction)
# genome ./bam/18097D-02-02.sam.sorted.sam
18097D-02-02_bin.11.fa 1.603339021
18097D-02-02_bin.13.fa 1.366124796
18097D-02-02_bin.14.fa 1.502052999
18097D-02-02_bin.6.fa 1.324573575
18097D-02-02_bin.9.fa 1.618136032
#
最初の##から2番目の##まで、すべてのファイルからすべてのコンテンツを抽出したいと思います。これは出力が欲しいという意味です。
## index of replication (iRep) - thresholds: min cov. = 5, min wins. = 0.98, min r^2 = 0.9, max fragments/Mbp = 175, GC correction min r^2 = 0.0
# genome ./bam/18097D-02-01.sam.sorted.sam
**18097D-02-01_bin.11.fa 1.295179372
18097D-02-01_bin.13.fa 1.284880274
18097D-02-01_bin.15.fa 1.339609918**
#
## un-filtered index of replication (iRep)
## index of replication (iRep) - thresholds: min cov. = 5, min wins. = 0.98, min r^2 = 0.9, max fragments/Mbp = 175, GC correction min r^2 = 0.0
# genome ./bam/18097D-02-02.sam.sorted.sam
**18097D-02-02_bin.11.fa 1.59665286
18097D-02-02_bin.13.fa 1.332990306
18097D-02-02_bin.14.fa 1.499196606
18097D-02-02_bin.6.fa 1.323465715
18097D-02-02_bin.9.fa 1.583302299**
#
## un-filtered index of replication (iRep)
これを試しましたが、sed -n '/##=/{s/からの出力はありません。#=//;s/\S=.*//;p}' *.tsv > ../test.tsv
実は私--
18097D-02-01_bin.11.fa 1.295179372
18097D-02-01_bin.13.fa 1.284880274
18097D-02-01_bin.15.fa 1.339609918
18097D-02-02_bin.11.fa 1.59665286
18097D-02-02_bin.13.fa 1.332990306
18097D-02-02_bin.14.fa 1.499196606
18097D-02-02_bin.6.fa 1.323465715
18097D-02-02_bin.9.fa 1.583302299
ありがとう
答え1
使用awk
:
awk 'FNR==1{p=0}
FNR==1 && NR>1 {print ""}
$0 ~ /^##/ {p++}
p==1 && $0 !~ /^#/
' sample*
出力:
18097D-02-01_bin.11.fa 1.295179372
18097D-02-01_bin.13.fa 1.284880274
18097D-02-01_bin.15.fa 1.339609918
18097D-02-02_bin.11.fa 1.59665286
18097D-02-02_bin.13.fa 1.332990306
18097D-02-02_bin.14.fa 1.499196606
18097D-02-02_bin.6.fa 1.323465715
18097D-02-02_bin.9.fa 1.583302299
説明する:
FNR==1{p=0}
各新しいファイルへのポインタを0(FNR
現在のファイルの行番号)に設定します。FNR==1 && NR>1 {print ""}
最初のファイルを除くすべてのファイルに対して空白行を印刷します。$0 ~ /^##/ {p++}
行が##で始まると、ポインタがインクリメントされます。p==1 && $0 !~ /^#/
##
ポインタが1の場合(最初から2番目の場合、##
行がaで始まらない場合は#
印刷します)。
答え2
#!/bin/bash
sample_dir="/C/Users/testuser/Desktop/sample"
out_file="/C/Users/testuser/Desktop/sample/output.tsv"
for file in "$sample_dir"/*
do
count=0
while read line; do
if [[ "$line" == "##"* ]] || [[ "$line" == "#"* ]]; then
count=$((count+1))
if [[ count -ge 4 ]]; then
echo -e "" >> $out_file
continue 2
fi
continue
else
echo -e "$line" >> $out_file
fi
done < $file
done