データ平滑化のためのAWK

Question 1

統計データ処理のために、右一般的に使いやすくなります。

すべてのデータをメモリに読み込む方が簡単です。 Awkは最高の言語ではありません（もちろん可能ですが）。これは高速Pythonスクリプトです。

#!/usr/bin/env python
import sys
n = int(sys.argv[1])     # the smooth parameter n must be passed as the first argument to the script
print sys.stdin.readline()  # print the header line
def split_line(line):  # split line into first 3 fields as string, then list of numbers
    fields = line[:-1].split(";")
    return [";".join(fields[:3])] + map(float, fields[3:7])
rows = map(split_line, sys.stdin.readlines())
def avg(i, j):
    return (rows[i-n][j] + rows[i][j] + rows[i+n][j]) / 3
for i in xrange(n, len(rows) - n):
    print ";".join([rows[i][0]] + [str(avg(i, j-2)) for j in xrange(3, 7)])

データが本当に大きい場合、ここにあるスクリプトがその効果を発揮すると思います。 2*n+1 行を読み取り、その値を plus に格納し、prev[2*n](n+1) 行の平均を出力します。prev[1]$0

awk -F ';' -v OFS=';' -v n="${1-1}" '
    function avg(i) { return (prev[2*n, i] + prev[n, i] + $i) / 3; }
    NR == 1 { print; next }         # title line: print and skip the rest
    NR >= 2*n+2 {
        # fourth line on: print the average values from the past 3 lines
        # with the labels from the previous line
        print labels[n], avg(4), avg(5), avg(6), avg(7);
    }
    {
        # shift the saved averages by 1 position
        for (i=4; i<=7; i++) {
            for (k=2*n; k>1; k--) prev[k, i] = prev[k-1, i];
            prev[1, i] = $i;
        }
        # save the labels of this line to print in the next round
        for (k=n; k>1; k--) labels[k] = labels[k-1];
        labels[1] = $1 ";" $2 ";" $3;
    }
'

Answer

統計データ処理のために、右一般的に使いやすくなります。

すべてのデータをメモリに読み込む方が簡単です。 Awkは最高の言語ではありません（もちろん可能ですが）。これは高速Pythonスクリプトです。

#!/usr/bin/env python
import sys
n = int(sys.argv[1])     # the smooth parameter n must be passed as the first argument to the script
print sys.stdin.readline()  # print the header line
def split_line(line):  # split line into first 3 fields as string, then list of numbers
    fields = line[:-1].split(";")
    return [";".join(fields[:3])] + map(float, fields[3:7])
rows = map(split_line, sys.stdin.readlines())
def avg(i, j):
    return (rows[i-n][j] + rows[i][j] + rows[i+n][j]) / 3
for i in xrange(n, len(rows) - n):
    print ";".join([rows[i][0]] + [str(avg(i, j-2)) for j in xrange(3, 7)])

データが本当に大きい場合、ここにあるスクリプトがその効果を発揮すると思います。 2*n+1 行を読み取り、その値を plus に格納し、prev[2*n](n+1) 行の平均を出力します。prev[1]$0

awk -F ';' -v OFS=';' -v n="${1-1}" '
    function avg(i) { return (prev[2*n, i] + prev[n, i] + $i) / 3; }
    NR == 1 { print; next }         # title line: print and skip the rest
    NR >= 2*n+2 {
        # fourth line on: print the average values from the past 3 lines
        # with the labels from the previous line
        print labels[n], avg(4), avg(5), avg(6), avg(7);
    }
    {
        # shift the saved averages by 1 position
        for (i=4; i<=7; i++) {
            for (k=2*n; k>1; k--) prev[k, i] = prev[k-1, i];
            prev[1, i] = $i;
        }
        # save the labels of this line to print in the next round
        for (k=n; k>1; k--) labels[k] = labels[k-1];
        labels[1] = $1 ";" $2 ";" $3;
    }
'

Question 2

知った後に作成したスクリプトです。これ

#!/usr/bin/bash     
awk 'BEGIN { FS=OFS=";"
    RS="\n"}
    {
           gsub(/\n/,"",$0) 
           if (max_nf<NF) 
               max_nf=NF 
           max_nr=NR 
           for(x=1; x<=NF; ++x) 
               vector[NR,x]=$x 
        } 
    END { 
    row=1
        printf vector[row,1] OFS vector[row,2] OFS vector[row,3]
               for(col=4; col<max_nf; ++col) 
                      printf OFS "Average(" vector[row,col] ")";
              printf ORS      

    for(row=2; row<max_nr; ++row) {
        printf vector[row,1] OFS vector[row,2] OFS vector[row,3]
              for(col=4; col<max_nf; ++col) 
                printf OFS (vector[row,col]+vector[row-1,col]+vector[row+1,col])/3 ;
            printf ORS
           } 
        }' File_IN.csv>File_OUT.csv

スクリプトはcsvファイルを2D配列として保存し、最初の行を印刷します。列> 4の場合、平均（タイトル）を印刷します。 2行目からファイルの終わりまで、最初、2番目、3番目の列と4より大きい列の平均を印刷します。 @gillesのPythonスクリプトが正しく機能していても、要求どおりにAWKを使用するため、このスクリプトを回答として選択します。

Answer

知った後に作成したスクリプトです。これ

#!/usr/bin/bash     
awk 'BEGIN { FS=OFS=";"
    RS="\n"}
    {
           gsub(/\n/,"",$0) 
           if (max_nf<NF) 
               max_nf=NF 
           max_nr=NR 
           for(x=1; x<=NF; ++x) 
               vector[NR,x]=$x 
        } 
    END { 
    row=1
        printf vector[row,1] OFS vector[row,2] OFS vector[row,3]
               for(col=4; col<max_nf; ++col) 
                      printf OFS "Average(" vector[row,col] ")";
              printf ORS      

    for(row=2; row<max_nr; ++row) {
        printf vector[row,1] OFS vector[row,2] OFS vector[row,3]
              for(col=4; col<max_nf; ++col) 
                printf OFS (vector[row,col]+vector[row-1,col]+vector[row+1,col])/3 ;
            printf ORS
           } 
        }' File_IN.csv>File_OUT.csv

スクリプトはcsvファイルを2D配列として保存し、最初の行を印刷します。列> 4の場合、平均（タイトル）を印刷します。 2行目からファイルの終わりまで、最初、2番目、3番目の列と4より大きい列の平均を印刷します。 @gillesのPythonスクリプトが正しく機能していても、要求どおりにAWKを使用するため、このスクリプトを回答として選択します。

データ平滑化のためのAWK

答え1

答え2

関連情報