最初の列に基づいて行を Grep

最初の列に基づいて行を Grep

以下のように、xyzファイル形式のファイルがあります。

   55
FINAL HEAT OF FORMATION =     0.000000
 C    -1.602726     0.220926     0.289897
 C    -1.486393     1.490851    -0.581098
 C    -0.269002     2.434576    -0.276060
 C     1.010307     1.687714     0.217781
 C     1.485345     0.603160    -0.764139
 C    -1.564938     1.114872    -2.078306
 O    -2.879135     0.437518    -2.475109
 C    -0.550397     3.726131     0.624425
 C    -1.962009     3.939190     1.255790
 C    -2.367687     2.809316     2.219183
 C     0.020100     4.998947    -0.121715
 C    -0.978418     5.719489    -1.074614
 C    -1.616282     4.795344    -2.118148
 C     2.215398     2.612417     0.464811
 C     0.729644     5.994046     0.844547
 C     2.143005     6.393766     0.406166
 C    -2.045078     5.240181     2.079386
 C    -0.323618     6.897509    -1.813043
 C    -1.401212     2.359346    -2.899572
 H    -2.385338     2.081960    -0.396254
 H     0.010153     2.832497    -1.252999
 H     0.084959     3.605123     1.504509
 H     0.809530     4.617245    -0.774572
 H     0.128704     6.897394     0.976132
 H     0.798102     5.548850     1.839117
 H     2.585871     7.101059     1.112504
 H     2.797551     5.521179     0.355908
 H     2.147260     6.862273    -0.578875
 H    -1.790477     6.132728    -0.470526
 H    -1.045932     7.372355    -2.481810
 H     0.046563     7.666682    -1.135188
 H     0.516095     6.553772    -2.424710
 H    -2.319681     5.356939    -2.738366
 H    -0.857441     4.374340    -2.783635
 H    -2.163844     3.970151    -1.668818
 H    -2.716285     4.004456     0.465662
 H    -3.243256     3.107916     2.800711
 H    -2.619420     1.880831     1.722495
 H    -1.559685     2.604091     2.928503
 H    -3.049833     5.345765     2.495339
 H    -1.345140     5.209610     2.919004
 H    -1.835946     6.139276     1.507938
 H     0.771267     1.206903     1.173131
 H     3.062594     2.025264     0.827489
 H     2.528865     3.099255    -0.462839
 H     2.024108     3.390184     1.201447
 H     2.402236     0.135542    -0.396501
 H     0.758861    -0.189092    -0.920031
 H     1.710054     1.046723    -1.738905
 H    -1.261942     0.377911     1.311479
 H    -2.639613    -0.116300     0.341732
 H    -1.021606    -0.605553    -0.121261
 H    -1.454026     2.110341    -3.938854
 H    -2.181199     3.050993    -2.658440
 H    -0.451621     2.804428    -2.687258

.xyzファイルを分子入力形式に変換するための次のコードがあります。

CARBONS=$(grep -ow "C" $1 | wc -l)
HYDROGENS=$(grep -ow "H" $1 | wc -l)
OXYGENS=$(grep -ow "O" $1 | wc -l)

ATYPES=0
ARRAY=($CARBONS $HYDROGENS $OXYGENS)

for i in "${ARRAY[@]}"
do
        if [ $i -gt 0 ]; then
                ((ATYPES+=1))
        fi
done

echo "BASIS"
echo "co2"
echo ""
echo ""
echo "Atomtypes="$ATYPES" Generators=0 Integrals=1.00D-15 Angstrom"
echo "Charge=6.0 Atoms="$CARBONS""
grep "C" $1
echo "Charge=1.0 Atoms="$HYDROGENS""
grep "H" $1
if [ $OXYGENS -gt 0 ]; then
    echo "Charge=8.0 Atoms="$OXYGENS""
    grep "O" $1
fi

出力は次のとおりです

BASIS
co2


Atomtypes=3 Generators=0 Integrals=1.00D-15 Angstrom
Charge=6.0 Atoms=18
 C    -1.602726     0.220926     0.289897
 C    -1.486393     1.490851    -0.581098
 C    -0.269002     2.434576    -0.276060
 C     1.010307     1.687714     0.217781
 C     1.485345     0.603160    -0.764139
 C    -1.564938     1.114872    -2.078306
 C    -0.550397     3.726131     0.624425
 C    -1.962009     3.939190     1.255790
 C    -2.367687     2.809316     2.219183
 C     0.020100     4.998947    -0.121715
 C    -0.978418     5.719489    -1.074614
 C    -1.616282     4.795344    -2.118148
 C     2.215398     2.612417     0.464811
 C     0.729644     5.994046     0.844547
 C     2.143005     6.393766     0.406166
 C    -2.045078     5.240181     2.079386
 C    -0.323618     6.897509    -1.813043
 C    -1.401212     2.359346    -2.899572
Charge=1.0 Atoms=36
FINAL HEAT OF FORMATION =     0.000000
 H    -2.385338     2.081960    -0.396254
 H     0.010153     2.832497    -1.252999
 H     0.084959     3.605123     1.504509
 H     0.809530     4.617245    -0.774572
 H     0.128704     6.897394     0.976132
 H     0.798102     5.548850     1.839117
 H     2.585871     7.101059     1.112504
 H     2.797551     5.521179     0.355908
 H     2.147260     6.862273    -0.578875
 H    -1.790477     6.132728    -0.470526
 H    -1.045932     7.372355    -2.481810
 H     0.046563     7.666682    -1.135188
 H     0.516095     6.553772    -2.424710
 H    -2.319681     5.356939    -2.738366
 H    -0.857441     4.374340    -2.783635
 H    -2.163844     3.970151    -1.668818
 H    -2.716285     4.004456     0.465662
 H    -3.243256     3.107916     2.800711
 H    -2.619420     1.880831     1.722495
 H    -1.559685     2.604091     2.928503
 H    -3.049833     5.345765     2.495339
 H    -1.345140     5.209610     2.919004
 H    -1.835946     6.139276     1.507938
 H     0.771267     1.206903     1.173131
 H     3.062594     2.025264     0.827489
 H     2.528865     3.099255    -0.462839
 H     2.024108     3.390184     1.201447
 H     2.402236     0.135542    -0.396501
 H     0.758861    -0.189092    -0.920031
 H     1.710054     1.046723    -1.738905
 H    -1.261942     0.377911     1.311479
 H    -2.639613    -0.116300     0.341732
 H    -1.021606    -0.605553    -0.121261
 H    -1.454026     2.110341    -3.938854
 H    -2.181199     3.050993    -2.658440
 H    -0.451621     2.804428    -2.687258
Charge=8.0 Atoms=1
FINAL HEAT OF FORMATION =     0.000000
 O    -2.879135     0.437518    -2.475109

しかし、このような行はあってはなりませんFINAL HEAT OF FORMATION = 0.000000。いくつかの単語はH文字で始まるので、grepコマンドはその場所に移動すると思いますO。正しい出力は次のとおりです。

BASIS
co2


Atomtypes=3 Generators=0 Integrals=1.00D-15 Angstrom
Charge=6.0 Atoms=18
 C    -1.602726     0.220926     0.289897
 C    -1.486393     1.490851    -0.581098
 C    -0.269002     2.434576    -0.276060
 C     1.010307     1.687714     0.217781
 C     1.485345     0.603160    -0.764139
 C    -1.564938     1.114872    -2.078306
 C    -0.550397     3.726131     0.624425
 C    -1.962009     3.939190     1.255790
 C    -2.367687     2.809316     2.219183
 C     0.020100     4.998947    -0.121715
 C    -0.978418     5.719489    -1.074614
 C    -1.616282     4.795344    -2.118148
 C     2.215398     2.612417     0.464811
 C     0.729644     5.994046     0.844547
 C     2.143005     6.393766     0.406166
 C    -2.045078     5.240181     2.079386
 C    -0.323618     6.897509    -1.813043
 C    -1.401212     2.359346    -2.899572
Charge=1.0 Atoms=36
 H    -2.385338     2.081960    -0.396254
 H     0.010153     2.832497    -1.252999
 H     0.084959     3.605123     1.504509
 H     0.809530     4.617245    -0.774572
 H     0.128704     6.897394     0.976132
 H     0.798102     5.548850     1.839117
 H     2.585871     7.101059     1.112504
 H     2.797551     5.521179     0.355908
 H     2.147260     6.862273    -0.578875
 H    -1.790477     6.132728    -0.470526
 H    -1.045932     7.372355    -2.481810
 H     0.046563     7.666682    -1.135188
 H     0.516095     6.553772    -2.424710
 H    -2.319681     5.356939    -2.738366
 H    -0.857441     4.374340    -2.783635
 H    -2.163844     3.970151    -1.668818
 H    -2.716285     4.004456     0.465662
 H    -3.243256     3.107916     2.800711
 H    -2.619420     1.880831     1.722495
 H    -1.559685     2.604091     2.928503
 H    -3.049833     5.345765     2.495339
 H    -1.345140     5.209610     2.919004
 H    -1.835946     6.139276     1.507938
 H     0.771267     1.206903     1.173131
 H     3.062594     2.025264     0.827489
 H     2.528865     3.099255    -0.462839
 H     2.024108     3.390184     1.201447
 H     2.402236     0.135542    -0.396501
 H     0.758861    -0.189092    -0.920031
 H     1.710054     1.046723    -1.738905
 H    -1.261942     0.377911     1.311479
 H    -2.639613    -0.116300     0.341732
 H    -1.021606    -0.605553    -0.121261
 H    -1.454026     2.110341    -3.938854
 H    -2.181199     3.050993    -2.658440
 H    -0.451621     2.804428    -2.687258
Charge=8.0 Atoms=1
 O    -2.879135     0.437518    -2.475109

私はgrepコマンドをgrep -w "^C" $1次のように変更しようとしましたが、grep -x "C" $1どちらも役に立ちませんでした。この問題をどのように解決できますか?

答え1

^CC入力行の先頭ではなく、前にスペースがあるため機能しません。grep '^ C' "$1"あなたが欲しいものをする必要があります。

(ところで、これをgrep | wc -l使用できますgrep -c。ああ、行の引用符がecho少し奇妙です。変数を引用符の中に入れるだけです。)

答え2

awkで完全に試してみてください。たとえば、次のスクリプトは2つの配列を使用して、指定されたatoms原子の各入力行を記憶し、count各原子の数を保持します。入力ファイル全体を読み込んだ後、目的の形式でデータを出力します。

/^ [[:alpha:]]/ {
  if (count[$1] == 0) {
    atoms[$1]=$0;
  } else {
    atoms[$1]=atoms[$1] "\n" $0;
  }
    count[$1]++
}

END {
  atypes = length(count);

  print "BASIS\nco2\n\n"

  print "Atomtypes=" atypes " Generators=0 Integrals=1.00D-15 Angstrom"

  print "Charge=6.0 Atoms=" count["C"]
  print atoms["C"]

  print "Charge=1.0 Atoms=" count["H"]
  print atoms["H"]

  if (count["O"] > 0) {
    print "Charge=8.0 Atoms=" count["O"]
    print atoms["O"]
  }
}

各原子に関連するチャージルックアップテーブルがあれば、大幅に改善され、通常の変換スクリプトに置き換えることができますが、必ずしも必要ではありません。変換ユーティリティはすでに存在します。obabelから呼び出されます。Open Babel: オープンソース化学ツールボックスxyzプロジェクトを変換し、次のような複数の化学ファイル形式間で変換できますdalmol

obabel -i xyz input.xyz -o dalmol -O output.dalmol

olabel -L formats | lessサポートされている形式の完全なリストを取得するには実行してください。

DebianまたはDebian派生製品(Ubuntu、Mintなど)を実行している場合はインストールできますapt-get install openbabel。梱包手順は次のとおりです。

Package: openbabel
Version: 3.1.1+dfsg-6
Installed-Size: 630
Maintainer: Debichem Team <[email protected]>
Architecture: amd64
Depends: libc6 (>= 2.14), libgcc-s1 (>= 3.0), libopenbabel7 (>= 3.1.1+dfsg), libstdc++6 (>= 5.2)
Description-en: Chemical toolbox utilities (cli)
 Open Babel is a chemical toolbox designed to speak the many languages of
 chemical data. It allows one to search, convert, analyze, or store data from
 molecular modeling, chemistry, solid-state materials, biochemistry, or related
 areas.  Features include:
 .
  * Hydrogen addition and deletion
  * Support for Molecular Mechanics
  * Support for SMARTS molecular matching syntax
  * Automatic feature perception (rings, bonds, hybridization, aromaticity)
  * Flexible atom typer and perception of multiple bonds from atomic coordinates
  * Gasteiger-Marsili partial charge calculation
 .
 File formats Open Babel supports include PDB, XYZ, CIF, CML, SMILES, MDL
 Molfile, ChemDraw, Gaussian, GAMESS, MOPAC and MPQC.
 .
 This package includes the following utilities:
  * obabel: Convert between various chemical file formats
  * obenergy: Calculate the energy for a molecule
  * obminimize: Optimize the geometry, minimize the energy for a molecule
  * obgrep: Molecular search program using SMARTS pattern
  * obgen: Generate 3D coordinates for a molecule
  * obprop: Print standard molecular properties
  * obfit: Superimpose two molecules based on a pattern
  * obrotamer: Generate conformer/rotamer coordinates
  * obconformer: Generate low-energy conformers
  * obchiral: Print molecular chirality information
  * obrotate: Rotate dihedral angle of molecules in batch mode
  * obprobe: Create electrostatic probe grid

関連情報