シミュレートされた値を使用して、2つの列の順序を同時に1000回ランダムに変更するには？

Question 1

すべての場合に、| column -t以下を追加して出力を視覚的にソートします。

1）乱数を含む「simulation1」と「simulation2」の2つの列を作成する必要があります。

$ cat tst.awk
BEGIN { srand(seed) }
{ print $0, r(), r() }
function r() { return rand() * 100001 / 1000 }

$ awk -f tst.awk file | column -t
231   0.12    85.5574  23.7444
432   0.32    23.558   65.5853
11    0.0003  59.2486  50.3799
134   0.33    27.8248  45.7872
2334  0.553   45.7947  13.1887
12    0.33    51.6042  99.55
100   0.331   88.0281  17.4515
1008  1.6     1.37974  65.5945
223   -0.81   14.6773  97.6476
998   -3.001  87.888   31.97

2）その後、「simulation1」列の値に基づいて「ID」列と「pheno」列を並べ替えます。

$ awk -f tst.awk file | sort -k3,3n | column -t
1008  1.6     1.37974  65.5945
223   -0.81   14.6773  97.6476
432   0.32    23.558   65.5853
134   0.33    27.8248  45.7872
2334  0.553   45.7947  13.1887
12    0.33    51.6042  99.55
11    0.0003  59.2486  50.3799
231   0.12    85.5574  23.7444
998   -3.001  87.888   31.97
100   0.331   88.0281  17.4515

3）次に、行の最初の40％の平均「フェノ」を計算します。

$ cat tst2.awk
{ vals[NR] = $2 }
END {
    max = NR * 40 / 100
    for (i=1; i<=max; i++) {
        sum += vals[i]
    }
    print sum / max
}

$ awk -f tst.awk file | sort -k3,3n | awk -f tst2.awk
0.36

残りはあなたが理解できることを願っています。上記では、出力が同じになるように、すべての呼び出しに同じシードをawkに渡すことで、計算ステップ全体を簡単に追跡できます。呼び出しごとに異なる乱数が生成さtst.awkれるように、呼び出しをに変更します。awk -v seed="$RANDOM" -f tst.awk

Answer

すべての場合に、| column -t以下を追加して出力を視覚的にソートします。

1）乱数を含む「simulation1」と「simulation2」の2つの列を作成する必要があります。

$ cat tst.awk
BEGIN { srand(seed) }
{ print $0, r(), r() }
function r() { return rand() * 100001 / 1000 }

$ awk -f tst.awk file | column -t
231   0.12    85.5574  23.7444
432   0.32    23.558   65.5853
11    0.0003  59.2486  50.3799
134   0.33    27.8248  45.7872
2334  0.553   45.7947  13.1887
12    0.33    51.6042  99.55
100   0.331   88.0281  17.4515
1008  1.6     1.37974  65.5945
223   -0.81   14.6773  97.6476
998   -3.001  87.888   31.97

2）その後、「simulation1」列の値に基づいて「ID」列と「pheno」列を並べ替えます。

$ awk -f tst.awk file | sort -k3,3n | column -t
1008  1.6     1.37974  65.5945
223   -0.81   14.6773  97.6476
432   0.32    23.558   65.5853
134   0.33    27.8248  45.7872
2334  0.553   45.7947  13.1887
12    0.33    51.6042  99.55
11    0.0003  59.2486  50.3799
231   0.12    85.5574  23.7444
998   -3.001  87.888   31.97
100   0.331   88.0281  17.4515

3）次に、行の最初の40％の平均「フェノ」を計算します。

$ cat tst2.awk
{ vals[NR] = $2 }
END {
    max = NR * 40 / 100
    for (i=1; i<=max; i++) {
        sum += vals[i]
    }
    print sum / max
}

$ awk -f tst.awk file | sort -k3,3n | awk -f tst2.awk
0.36

残りはあなたが理解できることを願っています。上記では、出力が同じになるように、すべての呼び出しに同じシードをawkに渡すことで、計算ステップ全体を簡単に追跡できます。呼び出しごとに異なる乱数が生成さtst.awkれるように、呼び出しをに変更します。awk -v seed="$RANDOM" -f tst.awk

Question 2

更新されたスクリプトは数字bcの先頭記号とうまく機能しません。awkdomath

また、shuf固定配列を使用する方が簡単なので、繰り返しごとに混合するために配列インデックスを使用するように変更されました。

#!/bin/bash

function domath {
    #do the math using the 4 indices into the pheno array
    awk '{print ($1+$2+$3+$4)/4}' <<<"${ph[$1]} ${ph[$2]} ${ph[$3]} ${ph[$4]}"
}

function iterate {
    #randomise the indices and get the first 4
    shuf -e 0 1 2 3 4 5 6 7 8 9 | head -n 4
}

#number of iterations
nits=100

#read the pheno values into an array
ph=($(tail -n +3 data | awk '{print $2}'))


echo -e row'\t'sim1'\t'sim2'\t'diff
for (( row=1; row<=$nits; row++ )); do
    #calculate simulation1 
    first=$(printf "%+.3f" $(domath $(iterate)))
    #calculate simulation 2
    second=$(printf "%+.3f" $(domath $(iterate)))
    #calculate the difference
    diff=$(printf "%+.3f" $(awk '{print $2-$1}' <<<"$first $second"))
    #and print
    echo -e $row'\t'$first'\t'$second'\t'$diff
done

Answer

更新されたスクリプトは数字bcの先頭記号とうまく機能しません。awkdomath

また、shuf固定配列を使用する方が簡単なので、繰り返しごとに混合するために配列インデックスを使用するように変更されました。

#!/bin/bash

function domath {
    #do the math using the 4 indices into the pheno array
    awk '{print ($1+$2+$3+$4)/4}' <<<"${ph[$1]} ${ph[$2]} ${ph[$3]} ${ph[$4]}"
}

function iterate {
    #randomise the indices and get the first 4
    shuf -e 0 1 2 3 4 5 6 7 8 9 | head -n 4
}

#number of iterations
nits=100

#read the pheno values into an array
ph=($(tail -n +3 data | awk '{print $2}'))


echo -e row'\t'sim1'\t'sim2'\t'diff
for (( row=1; row<=$nits; row++ )); do
    #calculate simulation1 
    first=$(printf "%+.3f" $(domath $(iterate)))
    #calculate simulation 2
    second=$(printf "%+.3f" $(domath $(iterate)))
    #calculate the difference
    diff=$(printf "%+.3f" $(awk '{print $2-$1}' <<<"$first $second"))
    #and print
    echo -e $row'\t'$first'\t'$second'\t'$diff
done

シミュレートされた値を使用して、2つの列の順序を同時に1000回ランダムに変更するには？

答え1

答え2

関連情報