2つの異なるファイルから2つの列のみを比較して行方不明を追加する方法

Question 1

たぶんこんなことはありませんか？

cat file2 | awk '!(1 in f) {if ((getline l < "-") == 1) split(l, f)} $3!=f[3] {print;next} {print l; delete f}' file1 | column -t

スクリプトは次のことをfile1期待します。ディスカッションfile2それを期待しながらawk標準入力。私はこれをより明確にするために「catの役に立たない使用」を使用しましたが、当然< file2これをリダイレクトとして提供することができます。実際にはファイル名をスクリプト自体に含めることもできますが、"file2"これはより柔軟です。"-"getline

さらに、2つのファイルはfield3の値に関して「同期化」されるか、ユースケースに適した場合は「早く前に」開始されると予想されますfile2。file1

スクリプトは読みやすいように個別に分類され、詳細な説明は次のとおりです。

# Check if our `real_fields` array is not existent.
# NOTE: we use the `<index> in <array>` construct
# in order to force awk treat `real_fields` name as an
# array (instead of as a scalar as it would by default)
# and build it in an empty state
!(1 in real_fields) {
    # get the next line (if any) from the "real" file
    if ((getline real_line < "-") == 1)
        # split that line in separate fields populating
        # our `real_fields` array
        split(real_line, real_fields)
        # awk split function creates an array with numeric
        # indexes for each field found as per FS separator
}
# if field3 of the current line of the "reference"
# file does not match the current line of the "real" file..
$3!=real_fields[3] {
    # print current line of "reference" file
    print
    # go reading next line of "reference" file thus
    # skipping the final awk pattern
    next
}
# final awk pattern, we get here only if the pattern
# above did not match, i.e. if field3 values from both
# files match
{
    # print current line of "real" file
    print real_line
    # delete our real_fields array, thus triggering
    # the fetching of the next line of "real" file as
    # performed by the first awk pattern
    delete real_fields
}

Answer

たぶんこんなことはありませんか？

cat file2 | awk '!(1 in f) {if ((getline l < "-") == 1) split(l, f)} $3!=f[3] {print;next} {print l; delete f}' file1 | column -t

スクリプトは次のことをfile1期待します。ディスカッションfile2それを期待しながらawk標準入力。私はこれをより明確にするために「catの役に立たない使用」を使用しましたが、当然< file2これをリダイレクトとして提供することができます。実際にはファイル名をスクリプト自体に含めることもできますが、"file2"これはより柔軟です。"-"getline

さらに、2つのファイルはfield3の値に関して「同期化」されるか、ユースケースに適した場合は「早く前に」開始されると予想されますfile2。file1

スクリプトは読みやすいように個別に分類され、詳細な説明は次のとおりです。

# Check if our `real_fields` array is not existent.
# NOTE: we use the `<index> in <array>` construct
# in order to force awk treat `real_fields` name as an
# array (instead of as a scalar as it would by default)
# and build it in an empty state
!(1 in real_fields) {
    # get the next line (if any) from the "real" file
    if ((getline real_line < "-") == 1)
        # split that line in separate fields populating
        # our `real_fields` array
        split(real_line, real_fields)
        # awk split function creates an array with numeric
        # indexes for each field found as per FS separator
}
# if field3 of the current line of the "reference"
# file does not match the current line of the "real" file..
$3!=real_fields[3] {
    # print current line of "reference" file
    print
    # go reading next line of "reference" file thus
    # skipping the final awk pattern
    next
}
# final awk pattern, we get here only if the pattern
# above did not match, i.e. if field3 values from both
# files match
{
    # print current line of "real" file
    print real_line
    # delete our real_fields array, thus triggering
    # the fetching of the next line of "real" file as
    # performed by the first awk pattern
    delete real_fields
}

Question 2

配列の順序を設定する必要があります。それ以外の場合、awk は行の順序を並べ替えます。

#!/usr/bin/awk -f

BEGIN {
    PROCINFO["sorted_in"] = "@ind_str_asc"
}
NR==FNR {
    a[i++,$3]=$0
    next
} 
{
    for (c in a) {
        split(c, s, SUBSEP)
        if (s[2] == $3) {
            print $0
            getline
        } else {
            print a[c]
        }
    }
}

./script.awk file1 file2

Answer

配列の順序を設定する必要があります。それ以外の場合、awk は行の順序を並べ替えます。

#!/usr/bin/awk -f

BEGIN {
    PROCINFO["sorted_in"] = "@ind_str_asc"
}
NR==FNR {
    a[i++,$3]=$0
    next
} 
{
    for (c in a) {
        split(c, s, SUBSEP)
        if (s[2] == $3) {
            print $0
            getline
        } else {
            print a[c]
        }
    }
}

./script.awk file1 file2

2つの異なるファイルから2つの列のみを比較して行方不明を追加する方法

答え1

答え2

関連情報