マルチフィールド抽出

マルチフィールド抽出

複数行を含むファイルからフィールドを抽出してみてください。たとえば、次のようになります。

alert tcp $HOME_NET any -> $EXTERNAL_NET 8074 (msg:"ET CHAT GaduGadu Chat Client Login Packet"; flowbits:isset,ET.gadu.welcome; flow:established,to_server; dsize:<50; content:"|15 00 00 00|"; depth:4; flowbits:set,ET.gadu.loginsent; reference:url,piotr.trzcionkowski.pl/default.asp?load=/programy/pppgg_protokol.html; reference:url,doc.emergingthreats.net/2008298; classtype:policy-violation; sid:2008298; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)

alert tcp any [21,25,110,143,443,465,587,636,989:995,5061,5222] -> $HOME_NET any (msg:"ET EXPLOIT FREAK Weak Export Suite From Server (CVE-2015-0204)"; flow:established,from_server; content:"|16 03|"; depth:2; byte_test:1,<,4,0,relative; content:"|02|"; distance:3; within:1; byte_jump:1,37,relative; content:"|00 19|"; within:2; fast_pattern; threshold:type limit,track by_dst,count 1,seconds 1200; reference:url,blog.cryptographyengineering.com/2015/03/attack-of-week-freak-or-factoring-nsa.html; reference:cve,2015-0204; reference:cve,2015-1637; classtype:bad-unknown; sid:2020661; rev:3; metadata:created_at 2015_03_10, updated_at 2015_03_10;)

alert tcp $HOME_NET any -> $EXTERNAL_NET 8074 (msg:"ET CHAT GaduGadu Chat Send Message"; flowbits:isset,ET.gadu.loggedin; flow:established,to_server; content:"|0b 00 00 00|"; depth:4; reference:url,piotr.trzcionkowski.pl/default.asp?load=/programy/pppgg_protokol.html; reference:url,doc.emergingthreats.net/2008302; classtype:policy-violation; sid:2008302; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)

alert tcp $EXTERNAL_NET 8074 -> $HOME_NET any (msg:"ET CHAT GaduGadu Chat Receive Message"; flowbits:isset,ET.gadu.loggedin; flow:established,from_server; content:"|0a 00 00 00|"; depth:4; reference:url,piotr.trzcionkowski.pl/default.asp?load=/programy/pppgg_protokol.html; reference:url,doc.emergingthreats.net/2008303; classtype:policy-violation; sid:2008303; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)

alert tcp $HOME_NET any -> $EXTERNAL_NET 8074 (msg:"ET CHAT GaduGadu Chat Keepalive PING"; flowbits:isset,ET.gadu.loggedin; flow:established,to_server; content:"|08 00 00 00|"; depth:4; reference:url,piotr.trzcionkowski.pl/default.asp?load=/programy/pppgg_protokol.html; reference:url,doc.emergingthreats.net/2008304; classtype:policy-violation; sid:2008304; rev:3; metadata:created_at 2010_07_30, updated_at 2010_07_30;)

alert http $EXTERNAL_NET any -> $HOME_NET any (msg:"ET EXPLOIT CVE-2016-0189 Common Construct M2"; flow:established,from_server; file_data; content:"triggerBug"; nocase; content:"Dim "; nocase; distance:0; content:".resize"; nocase; pcre:"/^\s*\x28/Rs";  content:"Mid"; pcre:"/^\s*?\(x\s*,\s*1,\s*24000\s*\x29/Rs"; reference:url,theori.io/research/cve-2016-0189; reference:cve,2016-0189; classtype:attempted-user; sid:2022972; rev:2; metadata:affected_product Windows_XP_Vista_7_8_10_Server_32_64_Bit, attack_target Client_Endpoint, deployment Perimeter, signature_severity Major, created_at 2016_07_15, performance_impact Low, updated_at 2016_07_15;)

sid個々のフィールドを抽出できますが、たとえば、およびのmsg内容を抽出してclasstypeコンマ区切りの行にリストし、ファイル内の他の行に対して同じことを行う方法を知りません。metadata:created_atupdated_at

最初の項目に基づいて予想される出力:

2008298,ET CHAT GaduGadu Chat Client Login Packet,policy-violation,2010_07_30,2010_07_30

created_at常にupdated_at後で表示されますが、metadata別の場所/順序で表示されることもあります。

GNU/Linux の Bash で動作します。

答え1

希望の出力を得るための簡単なスクリプト:

#!/usr/bin/env bash

# Assumptions: the file name is always passed, and points to a valid file,
# hence no error handling has been implemented. (for script simplicity)

# let the first argument to the script be the file name.
filename="$1"

# read one line at a time, extracting the required fields
while read -r line
do
    # skip blank lines
    if [[ ${#line} -gt 0 ]]; then
        sid=$(echo "$line"|grep -o 'sid[^;]*'| awk -F ':' '{print $2}')
        msg=$(echo "$line"|grep -o 'msg:[^;]*'| awk -F '"' '{print $2}')
        classType=$(echo "$line"|grep -o 'classtype:[^;]*'| awk -F ':' '{print $2}')
        cDate=$(echo "$line"|grep -o "created_at[^,]*"|awk '{print $2}')
        uDate=$(echo "$line"|grep -o "updated_at[^';']*"|awk '{print $2}')

        echo "$sid,$msg,$classType,$cDate,$uDate"
    fi
done < "$filename"

スクリプトを実行します。

./scriptName fileName

出力:

2008298,ET CHAT GaduGadu Chat Client Login Packet,policy-violation,2010_07_30,2010_07_30
2020661,ET EXPLOIT FREAK Weak Export Suite From Server (CVE-2015-0204),bad-unknown,2015_03_10,2015_03_10
2008302,ET CHAT GaduGadu Chat Send Message,policy-violation,2010_07_30,2010_07_30
2008303,ET CHAT GaduGadu Chat Receive Message,policy-violation,2010_07_30,2010_07_30
2008304,ET CHAT GaduGadu Chat Keepalive PING,policy-violation,2010_07_30,2010_07_30
2022972,ET EXPLOIT CVE-2016-0189 Common Construct M2,attempted-user,2016_07_15,2016_07_15

答え2

GNU awkを使用してFPAT操作を実行する一般的な方法は次のとおりです。

$ cat tst.awk
BEGIN {
    FPAT="[[:alnum:]_]+:(\"[^\"]+\"|[^;]+)"
    OFS = ","
}
{
    delete f
    for (i=1; i<=NF; i++) {
        tag = val = $i
        sub(/:.*/,"",tag)
        sub(/[^:]+:/,"",val)
        gsub(/"/,"",val)
        f[tag] = val
        if ( tag == "metadata" ) {
            numSubFlds = split(val,md,/, */)
            for (j=1; j<=numSubFlds; j++) {
                subTag = subVal = md[j]
                sub(/ .*/,"",subTag)
                sub(/[^ ]+ /,"",subVal)
                f[tag":"subTag] = subVal
            }
        }
    }

    # uncomment this to see all tags and values
    # for (idx in f) { print idx "=" f[idx] }

    # print
    print f["sid"], f["msg"], f["classtype"], f["metadata:created_at"], f["metadata:updated_at"]
}

$ gawk -f tst.awk file
2008298,ET CHAT GaduGadu Chat Client Login Packet,policy-violation,2010_07_30,2010_07_30
2020661,,bad-unknown,2015_03_10,2015_03_10
2008302,ET CHAT GaduGadu Chat Send Message,policy-violation,2010_07_30,2010_07_30
2008303,ET CHAT GaduGadu Chat Receive Message,policy-violation,2010_07_30,2010_07_30
2008304,ET CHAT GaduGadu Chat Keepalive PING,policy-violation,2010_07_30,2010_07_30
2022972,ET EXPLOIT CVE-2016-0189 Common Construct M2,attempted-user,2016_07_15,2016_07_15

2番目の入力ラインの形式が他の入力ラインと異なるように指定され、出力が異なるようです。

関連情報