説明する

説明する

私の質問には2つの方法で答える必要がありますが、「sed」を使って答えることができることを願っています。

IDが異なる次の行があります。

ID1_TRINITY_DN120587_c0_g1::TRINITY_DN120587_c0_g1_i1::g.8298::m.8298

私は欲しい:

TRINITY_DN120587_c0_g1_i1[ID1]

答え1

sed -e '
   s/::/\n/;s//\n/
   s/^\([^_]*\)_.*\n\(.*\)\n.*/\2[\1]/
   ;#  |--1---|      |-2-|
' ID.data

ID文字列の周りにマーカーを配置し、最初の_前の部分をつかみ、行全体をこの値に置き換えます。出力:

TRINITY_DN120587_c0_g1_i1[ID1]

説明する

              ID1_TRINITY_DN120587_c0_g1::TRINITY_DN120587_c0_g1_i1::g.8298::m.8298
              |-|                         |-----------------------|

::の最初の発生と2番目の発生の間にあるIDを抽出したいと言います。

ステップ1:関心領域の周りにマーカー(通常は\ n)を配置します。

       s/::/\n/;s//\n/

   This is how the pattern space looks after the above tranformation

              ID1_TRINITY_DN120587_c0_g1\nTRINITY_DN120587_c0_g1_i1\ng.8298::m.8298

ステップ2:2つの\ nsのIDと_の最初の項目の左側にある文字列を抽出します。

                    s/^\([^_]*\)_.*\n\(.*\)\n.*/\2[\1]/
                    ;#  |------|      |---|
                    ;#     \1           \2

   [^_]       => matches any char but an underscore

   [^_]*      => matches 0 or more non underscore char(s)

   \([^_]*\)  => store what was matched into a memory, recallable as \1

   ^\([^_]*\) => anchor your matching from the start of the string

   .*\n       => go upto to the rightmost \n you can see in the string

   \n\(.*\)\n => Ooops!! we see another \n, hence we need to backtrack to
                 the previous \n position and from there start moving right again
                 and stop at the rightmost \n. Whatever is between these positions
                 is the string ID and is recallable as \2. Since the \ns fall outside
                 the \(...\), hence they wouldn't be stored in \2.

   .*         => This is a catchall that we stroll to the end of the string after
                 starting from the rightmost \n position and do nothing with it.

 So our regex engine has matched against the input string it was given in
 the pattern space and was able to store in two memory locations the data
 it was able to gather, viz.: \1 => stores the string portion which is in
 between the beginning of the pattern space and the 1st occurrence of the
 underscore.

 \2 => store the string portion which is in between the 1st and 2nd
       occurrences of :: in the pattern space.

                      \1 = ID1
                      \2 = TRINITY_DN120587_c0_g1_i1

 Now comes the replacement part. Remember that the regex engine was able to scan
 the whole of pattern space from beginning till end, hence the replacement
 will effect the whole of the pattern space.

 \2[\1] => We replace the matched portion of the pattern space (in our case it
           happens to be the entire string) with what has been stored in
           the memory \2 literal [ memory \1 literal ]
           leading to what we see below:

                  TRINITY_DN120587_c0_g1_i1[ID1]

In other words, you have just managed to turn the pattern space from:

              ID1_TRINITY_DN120587_c0_g1::TRINITY_DN120587_c0_g1_i1::g.8298::m.8298

into the following:

                  TRINITY_DN120587_c0_g1_i1[ID1]

答え2

アッ解決策:

awk -F'::' '{ print $2"[" substr($1,1,index($1,"_")-1) "]"}' file

出力:

TRINITY_DN120587_c0_g1_i1[ID1]

  • -F'::'- フィールド区切り記号

  • substr($1,1,index($1,"_")-1)_- 最初の位置から始めて、最初の項目(たとえばID1)までの最初のフィールドから部分文字列を抽出します。

答え3

ここでは、あなたのパターンが同じままであり、この単一のsedソリューションが機能すると仮定します。

sed -n "s/^\([^_]*\)_[^:]*::\([^:]*\)::.*/\2\[\1\]/p" filename

出力例入力:

TRINITY_DN120587_c0_g1_i1[ID1]

説明:行の先頭から内容を最初の下線まで一致させて最初のグループに[^_]*保存し、最初と2番目の二重コロンの間の2番目のグループを一致させます[^:]*。行を置き換えて目的の出力形式と一致すると、pは変更された行を印刷します。

関連情報