ホストのtmpfsからのXeon Phiのスワップ

2024-5-18 • tag-icon

Xeon Phi（mic0）でスワップを使用するためにホストシステムのRAMを使用したいと思います。

ホストマシンから：

# free -m
             total       used       free     shared    buffers     cached
Mem:        129022      60312      68710          0       1092      50078
-/+ buffers/cache:       9141     119880
Swap:            0          0          0

ホストマシンで次のコマンドを実行します。

# mount -t ramfs ramfs /mnt/ramfs/
# dd bs=512M if=/dev/zero of=/mnt/ramfs/ram1 count=48
# echo /mnt/ramfs/ram1 >/sys/class/mic/mic0/virtblk_file
# df -a | grep ramfs
/mnt/ramfs              0          0          0    - /mnt/ramfs
# vim  /etc/mpss/default.conf # add:
ExtraCommandLine "vfs_read_optimization=on" 
ExtraCommandLine "vfs_write_optimization=on"
# service mpss stop
# micctrl --resetconfig
# service mpss start

次にmic0で実行します。

# modprobe mic_virtblk 
# mkswap /dev/vda
# swapon /dev/vda
# free -m
             total       used       free     shared    buffers     cached
Mem:          7697        574       7123          0          0        145
-/+ buffers/cache:        428       7268
Swap:        24575          0      24575

スワップがホストのRAMに接続されていることをどのように確認できますか？

ホストのRAMにすでに接続されているXeon Phiでスワップを使用すると、なぜそんなに遅くなるのですか？

テストコード：

#include <stdlib.h>   
#include <stdio.h>
#include <string.h>
#include <time.h>

long timediff(clock_t t1, clock_t t2) {
    long elapsed;
    elapsed = ((double)t2 - t1) / CLOCKS_PER_SEC * 1000;
    return elapsed;
}

int main(int argc, char** argv) {
    clock_t t1, t2;
    int max = 100;
    int mb = 0;
    int size = 256;
    char* buffer;

    if(argc > 1)
        max = atoi(argv[1]);

    t1 = clock();
    while((buffer=malloc(size*1024*1024)) != NULL && mb != max) {
        memset(buffer, 0, size*1024*1024);
        ++mb;
        t2 = clock();
        printf("Allocated %.2f GB in %ld ms\n", mb * size / 1024.0, timediff(t1, t2) );
        t1 = t2;
    }      
return 0;
}

以下を使用してコンパイルします。icc swaptest.c -o swaptest -mmic

結果：

# ./swaptest 
Allocated 0.25 GB in 260 ms
Allocated 0.50 GB in 269 ms
...
Allocated 6.75 GB in 269 ms
Allocated 7.00 GB in 260 ms
Allocated 7.25 GB in 470 ms
Allocated 7.50 GB in 1819 ms
Allocated 7.75 GB in 2060 ms
Allocated 8.00 GB in 2420 ms
Allocated 8.25 GB in 2820 ms
Allocated 8.50 GB in 2750 ms
Allocated 8.75 GB in 2300 ms
Allocated 9.00 GB in 1380 ms
Allocated 9.25 GB in 1530 ms
Allocated 9.50 GB in 3400 ms
Allocated 9.75 GB in 3800 ms
Allocated 10.00 GB in 3940 ms
Allocated 10.25 GB in 3579 ms
Allocated 10.50 GB in 5050 ms
Allocated 10.75 GB in 5029 ms
Allocated 11.00 GB in 5130 ms
Allocated 11.25 GB in 4770 ms
Allocated 11.50 GB in 3719 ms
Allocated 11.75 GB in 2300 ms
Allocated 12.00 GB in 3619 ms

など..

ホストシステムとの比較：

$ ./a.out 
Allocated 0.25 GB in 140 ms
Allocated 0.50 GB in 170 ms
Allocated 0.75 GB in 160 ms
Allocated 1.00 GB in 160 ms
...
Allocated 23.75 GB in 130 ms
Allocated 24.00 GB in 130 ms
Allocated 24.25 GB in 130 ms
Allocated 24.50 GB in 130 ms
Allocated 24.75 GB in 130 ms
Allocated 25.00 GB in 120 ms

以下に交換した場合：269msで256MB～951MB/s

スワップ使用時：5.13秒で、256 MBは約48.7 MB / sで、上記のベンチマークよりはるかに遅いです。https://software.intel.com/en-us/blogs/2014/01/07/improving-file-io-performance-on-intel-xeon-phi（最小360MB/s）、意図的なものですか？

Xeon Phiicc (ICC) 14.0.2 20140120と一緒に（parallel_studio_xe_2013_sp1_update2）を使用しています。mpss-3.2.15110Pシリーズ（11版）

関連情報