マルチスレッドプロセスでパフォーマンス統計を取得する2つの方法のうち、正しいものは何ですか？

2024-6-3 • tag-icon

マルチスレッドプロセスでパフォーマンス統計を取得する2つの方法のうち、正しいものは何ですか？

私はマルチスレッドデータベースサーバーのパフォーマンスを研究しています。特定のシステムで実行するのに約61秒かかる特定のワークロードがありました。ワークロードに対して perf を実行すると、データベース・プロセスの pid は 79894 です。

データベース・サーバーのソフトウェア・スレッドに加えて、一般にアイドル・システムでは休止状態ですが、ワークロードの実行時にアクティブになるLinux関連のスレッドが多数あります。だから-pオプションだけでなく、perfの-aオプションも使用したいと思います。

私は2つの方法でパフォーマンスを実行し、各方法で異なる結果を得ます。

ウィンドウで次のperfコマンドを実行した最初の方法

perf stat -p 2413 -a

すぐに別のウィンドウでデータベースワークロードを実行します。データベースのワークロードが完了したら、Cを制御してperfを終了し、次の結果を取得します。

    Performance counter stats for process id '79894':

              1,842,359.55 msec cpu-clock                 #   30.061 CPUs utilized          
                 3,798,673      context-switches          #    0.002 M/sec                  
                   153,995      cpu-migrations            #    0.084 K/sec                  
                16,038,992      page-faults               #    0.009 M/sec                  
         4,939,131,149,436      cycles                    #    2.681 GHz                    
         3,924,220,386,428      stalled-cycles-frontend   #   79.45% frontend cycles idle   
         3,418,137,943,654      instructions              #    0.69  insn per cycle         
                                                          #    1.15  stalled cycles per insn
           402,389,588,237      branches                  #  218.410 M/sec                 
             5,137,510,170      branch-misses             #    1.28% of all branches  


     61.28834199 seconds time elapsed

2番目の方法は実行することです

perf stat  -a  sleep 61

すぐに別のウィンドウでデータベースワークロードを実行します。 61秒後、perfとワークロードの両方が完了し、perfは次の結果を生成します。

 Performance counter stats for 'system wide':

      4,880,317.67 msec cpu-clock                 #   79.964 CPUs utilized          
         8,274,996      context-switches          #    0.002 M/sec                  
           202,832      cpu-migrations            #    0.042 K/sec                  
        14,605,246      page-faults               #    0.003 M/sec                  
 5,022,298,186,711      cycles                    #    1.029 GHz                    
 7,599,517,323,727      stalled-cycles-frontend   #  151.32% frontend cycles idle   
 3,421,512,233,294      instructions              #    0.68  insn per cycle         
                                                  #    2.22  stalled cycles per insn
   402,726,487,019      branches                  #   82.521 M/sec                  
     5,124,543,680      branch-misses             #    1.27% of all branches        

      61.031494851 seconds time elapsed

両方のバージョンで-aを使用したので、ほぼ同じ結果が得られると予想しました。

しかし、睡眠と一緒に、

cpu-clock is 2.5 times what you get with the -p version, 
context-switches are double what you get with the -p version  
and the other values are more or less the same

質問2、

    (1) which set of results do I believe?
and 
    (2) how can there be more stalled-cycles-frontend than cycles in the sleep version?

関連情報