ハード .log ファイルから情報を抽出する

ハード .log ファイルから情報を抽出する

大量の情報を含む大容量の.logファイルがありますが、そのうちのほんの一部だけを抽出して他の出力ファイルに保存したいと思います。

.logファイルのいくつかの例:

.....
New Water Solv 104: solv=  1.635

Saved WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb


Water:   1 AtId: 3021 ResId: 316    OH2n: -8.922    OH2s: -6.900    CRY: -0.640 ENTR: -0.321    SOLV:  0.000    DG:  6.636  CLASS: 1
Water:   2 AtId: 3013 ResId: 308    OH2n: -8.331    OH2s: -7.364    CRY: -0.885 ENTR: -0.321    SOLV:  0.000    DG:  6.453  CLASS: 1
Water:   3 AtId: 3009 ResId: 304    OH2n: -7.424    OH2s: -7.321    CRY:  5.000 ENTR: -0.036    SOLV:  0.577    DG:  5.450  CLASS: 1
Water:   4 AtId: 3064 ResId: 359    OH2n: -9.779    OH2s: -8.778    CRY: -1.187 ENTR: -0.804    SOLV:  0.000    DG:  3.279  CLASS: 1
Water: 103 AtId: 2996 ResId: 291    OH2n: -14.725   OH2s: -10.556   CRY: -1.060 ENTR: -0.607    SOLV:  0.962    DG: -0.849  CLASS: 5
Water: 104 AtId: 3004 ResId: 299    OH2n: -14.237   OH2s: -11.215   CRY: -1.197 ENTR: -0.500    SOLV:  1.635    DG: -1.185  CLASS: 5

Water Network Score Contributions:

Total       OH2n: -731.606  OH2s: -368.197  CRY: -30.908    ENTR: -94.714   DG:  28.882
Average     OH2n: -12.835   OH2s: -6.460    CRY: -0.542 ENTR: -1.662    DG:  0.507
Summary:     28.882 ( -10.345  39.228 )


Saved WATERFLAP_REFINED2_SCORED_OH2s_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_OH2n_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_DRY_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_CRY_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_ENTROPY_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_DG_WAT_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_CLASS_H2O.pdb

Saved WATERFLAP_REFINED2_SCORED_CLASS_COMPLEX.pdb

Saved WATERFLAP_REFINED2_SCORED.PDB

Saved WATERFLAP_REFINED2_SCORED_DG_WAT_H2O_ele.pdb

Saved WATERFLAP_REFINED2_SCORED_CLASS_H2O_ele.pdb

---------------------------
WaterFLAP summary of delta DG between apo and complex

Water: 1 AtId: 2994 ResId: 289  DG_APO: -6.921 DG_COMPLEX: -7.026 DDG: -0.105 CLASS: 3
Water: 2 AtId: 2995 ResId: 290  DG_APO: -1.789 DG_COMPLEX: -2.014 DDG: -0.225 CLASS: 3
Water: 3 AtId: 2996 ResId: 291  DG_APO: -0.841 DG_COMPLEX: -0.849 DDG: -0.008 CLASS: 3
Water: 121 AtId: 3138 ResId: 433    DG_APO:  0.000 DG_COMPLEX:  0.000 DDG:  0.000 CLASS: 3
Water: 122 AtId: 3143 ResId: 438    DG_APO:  0.000 DG_COMPLEX:  0.000 DDG:  0.000 CLASS: 3
Water_USED: 1 AtId: 2994 ResId: 289 OH2n: -15.983   OH2s: -15.953   CRY: -1.934 ENTR: -0.250    DG: -7.026  CLASS: 4
Water_USED: 2 AtId: 2995 ResId: 290 OH2n: -12.808   OH2s: -11.344   CRY: -0.291 ENTR: -0.411    DG: -2.014  CLASS: 4
Water_BOUNDARY: 3 AtId: 2996 ResId: 291 OH2n: -14.725   OH2s: -10.556   CRY: -1.060 ENTR: -0.607    DG: -0.849  CLASS: 5
Water_USED: 4 AtId: 2997 ResId: 292 OH2n: -14.971   OH2s: -14.678   CRY: -2.085 ENTR: -0.375    DG: -4.170  CLASS: 4
Water_BOUNDARY: 122 AtId: 3143 ResId: 438   OH2n:  5.000    OH2s: -0.110    CRY: -0.064 ENTR: -4.875    DG:  0.000  CLASS: 5

Saved WATERFLAP_Delta_DG_CLASS_H2O_ele.pdb

Saved WATERFLAP_Delta_DG_DG_WAT_H2O_ele.pdb

Saved WATERFLAP_Delta_DG_CLASS_H2O.pdb

Saved WATERFLAP_Delta_DG_DG_WAT_H2O.pdb


---------------------------

Apo:        DG:  35.441 DH: -3.791  -TDS:  39.232
Complex:    DG:  28.882 DH: -10.345 -TDS:  39.228

-------------
Net:        DG: -6.559  DH: -6.555  -TDS: -0.004


---------------------------
DG Displaced:   13.760
DDG Perturbed:  6.605
DG Disp-Pert:   7.155

---------------------------

WARNING: Setting ATOM parms from HETATM table
Atm:   CA  Q: 0.08
WARNING: Setting ATOM parms from HETATM table
Atm:   CA  Q: 0.08
....

ファイルの構造は常に同じですが、行数、ID、番号が変更されることがあります。

これから私は4つの異なる出力を得たいと思います。 (おそらく「Saved WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb"、" Water Network Score Contributions"、" WaterFLAP summary of delta DG between apo and complex"、" Water Network Score Contributions:"???などの常に存在する定数「文字列」を使用することもできます)

出力1:(" Saved WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb"と" Water Network Score Contributions"の間)

Water:   1 AtId: 3021 ResId: 316    OH2n: -8.922    OH2s: -6.900    CRY: -0.640 ENTR: -0.321    SOLV:  0.000    DG:  6.636  CLASS: 1
Water:   2 AtId: 3013 ResId: 308    OH2n: -8.331    OH2s: -7.364    CRY: -0.885 ENTR: -0.321    SOLV:  0.000    DG:  6.453  CLASS: 1
Water:   3 AtId: 3009 ResId: 304    OH2n: -7.424    OH2s: -7.321    CRY:  5.000 ENTR: -0.036    SOLV:  0.577    DG:  5.450  CLASS: 1
Water:   4 AtId: 3064 ResId: 359    OH2n: -9.779    OH2s: -8.778    CRY: -1.187 ENTR: -0.804    SOLV:  0.000    DG:  3.279  CLASS: 1
Water: 103 AtId: 2996 ResId: 291    OH2n: -14.725   OH2s: -10.556   CRY: -1.060 ENTR: -0.607    SOLV:  0.962    DG: -0.849  CLASS: 5
Water: 104 AtId: 3004 ResId: 299    OH2n: -14.237   OH2s: -11.215   CRY: -1.197 ENTR: -0.500    SOLV:  1.635    DG: -1.185  CLASS: 5

出力2(" "とWaterFLAP summary of delta DG between apo and complexorで始まる最初の行の間)WATER_USEDWATER_BOUNDARY

Water: 1 AtId: 2994 ResId: 289  DG_APO: -6.921 DG_COMPLEX: -7.026 DDG: -0.105 CLASS: 3
Water: 2 AtId: 2995 ResId: 290  DG_APO: -1.789 DG_COMPLEX: -2.014 DDG: -0.225 CLASS: 3
Water: 3 AtId: 2996 ResId: 291  DG_APO: -0.841 DG_COMPLEX: -0.849 DDG: -0.008 CLASS: 3
Water: 121 AtId: 3138 ResId: 433    DG_APO:  0.000 DG_COMPLEX:  0.000 DDG:  0.000 CLASS: 3
Water: 122 AtId: 3143 ResId: 438    DG_APO:  0.000 DG_COMPLEX:  0.000 DDG:  0.000 CLASS: 3

WATER_USED出力3(またはを含む行から始めてWATER_BOUNDARY前に完了Saved WATERFLAP_Delta_DG_CLASS_H2O_ele.pdb

Water_USED: 1 AtId: 2994 ResId: 289 OH2n: -15.983   OH2s: -15.953   CRY: -1.934 ENTR: -0.250    DG: -7.026  CLASS: 4
Water_USED: 2 AtId: 2995 ResId: 290 OH2n: -12.808   OH2s: -11.344   CRY: -0.291 ENTR: -0.411    DG: -2.014  CLASS: 4
Water_BOUNDARY: 3 AtId: 2996 ResId: 291 OH2n: -14.725   OH2s: -10.556   CRY: -1.060 ENTR: -0.607    DG: -0.849  CLASS: 5
Water_USED: 4 AtId: 2997 ResId: 292 OH2n: -14.971   OH2s: -14.678   CRY: -2.085 ENTR: -0.375    DG: -4.170  CLASS: 4
Water_BOUNDARY: 122 AtId: 3143 ResId: 438   OH2n:  5.000    OH2s: -0.110    CRY: -0.064 ENTR: -4.875    DG:  0.000  CLASS: 5
   

出力4

Apo:        DG:  35.441 DH: -3.791  -TDS:  39.232
Complex:    DG:  28.882 DH: -10.345 -TDS:  39.228
Net:        DG: -6.559  DH: -6.555  -TDS: -0.004
DG Displaced:   13.760
DDG Perturbed:  6.605
DG Disp-Pert:   7.155

すべての出力はaでなければならず、列間.txt fileの区切り文字(space入力ファイルで「」として定義されています)は入力と同様に単純でなけれ"," ばなりません。"space"

私は何をすべきかわかりません。この最も難しい挑戦を手伝ってくれる人はいますか?

答え1

$ cat tst.awk
/^Saved WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb/ { out = "output" 1; next }
/^Water Network Score Contributions/            { out = ""; next }
/^WaterFLAP summary of delta DG/                { out = "output" 2; next }
/^Water_(USED|BOUNDARY)/                        { out = ""; print > ("output" 3) }
/^(Apo|Complex|Net|DD?G)/                       { print > ("output" 4) }
out && NF { print > out }

$ awk -f tst.awk file

$ head out*
==> output1 <==
Water:   1 AtId: 3021 ResId: 316    OH2n: -8.922    OH2s: -6.900    CRY: -0.640 ENTR: -0.321    SOLV:  0.000    DG:  6.636  CLASS: 1
Water:   2 AtId: 3013 ResId: 308    OH2n: -8.331    OH2s: -7.364    CRY: -0.885 ENTR: -0.321    SOLV:  0.000    DG:  6.453  CLASS: 1
Water:   3 AtId: 3009 ResId: 304    OH2n: -7.424    OH2s: -7.321    CRY:  5.000 ENTR: -0.036    SOLV:  0.577    DG:  5.450  CLASS: 1
Water:   4 AtId: 3064 ResId: 359    OH2n: -9.779    OH2s: -8.778    CRY: -1.187 ENTR: -0.804    SOLV:  0.000    DG:  3.279  CLASS: 1
Water: 103 AtId: 2996 ResId: 291    OH2n: -14.725   OH2s: -10.556   CRY: -1.060 ENTR: -0.607    SOLV:  0.962    DG: -0.849  CLASS: 5
Water: 104 AtId: 3004 ResId: 299    OH2n: -14.237   OH2s: -11.215   CRY: -1.197 ENTR: -0.500    SOLV:  1.635    DG: -1.185  CLASS: 5

==> output2 <==
Water: 1 AtId: 2994 ResId: 289  DG_APO: -6.921 DG_COMPLEX: -7.026 DDG: -0.105 CLASS: 3
Water: 2 AtId: 2995 ResId: 290  DG_APO: -1.789 DG_COMPLEX: -2.014 DDG: -0.225 CLASS: 3
Water: 3 AtId: 2996 ResId: 291  DG_APO: -0.841 DG_COMPLEX: -0.849 DDG: -0.008 CLASS: 3
Water: 121 AtId: 3138 ResId: 433    DG_APO:  0.000 DG_COMPLEX:  0.000 DDG:  0.000 CLASS: 3
Water: 122 AtId: 3143 ResId: 438    DG_APO:  0.000 DG_COMPLEX:  0.000 DDG:  0.000 CLASS: 3

==> output3 <==
Water_USED: 1 AtId: 2994 ResId: 289 OH2n: -15.983   OH2s: -15.953   CRY: -1.934 ENTR: -0.250    DG: -7.026  CLASS: 4
Water_USED: 2 AtId: 2995 ResId: 290 OH2n: -12.808   OH2s: -11.344   CRY: -0.291 ENTR: -0.411    DG: -2.014  CLASS: 4
Water_BOUNDARY: 3 AtId: 2996 ResId: 291 OH2n: -14.725   OH2s: -10.556   CRY: -1.060 ENTR: -0.607    DG: -0.849  CLASS: 5
Water_USED: 4 AtId: 2997 ResId: 292 OH2n: -14.971   OH2s: -14.678   CRY: -2.085 ENTR: -0.375    DG: -4.170  CLASS: 4
Water_BOUNDARY: 122 AtId: 3143 ResId: 438   OH2n:  5.000    OH2s: -0.110    CRY: -0.064 ENTR: -4.875    DG:  0.000  CLASS: 5

==> output4 <==
Apo:        DG:  35.441 DH: -3.791  -TDS:  39.232
Complex:    DG:  28.882 DH: -10.345 -TDS:  39.228
Net:        DG: -6.559  DH: -6.555  -TDS: -0.004
DG Displaced:   13.760
DDG Perturbed:  6.605
DG Disp-Pert:   7.155

答え2

grep "ここに文字列" file.log

たとえば、

grep "WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdb 保存" big.log

キーワードの上または下に行を追加する場合

-A NUM は以降を意味します。 -B NUM は移行を意味します。 -C NUM は前と後を意味します。

ファイルに移動するには、 ">"を使用してstdoutをtxtファイルに出力します。

たとえば、

grep -A 5 "WATERFLAP_REFINED2_SCORED_SOLV_H2O.pdbを保存しました" big.log > text.txt

関連情報