最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

c++ - MPI_Wtime changes when using OpenMP - Stack Overflow

programmeradmin3浏览0评论

I am trying to run some code which uses MPI + OpenMP, but when I measure the time that the program takes I see a huge difference when I comment out the OpenMP part. I don't know if I am measuring the time wrong or if there is another problem.

This is my code:

#include <iostream>
#include <cstdlib>
#include <omp.h>
#include <mpi.h>

int main(int argc, char*argv[]) {

int provided;
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);

if (provided != MPI_THREAD_MULTIPLE) {
    throw std::logic_error("Cannot provide threaded MPI\n");
}

try {

    omp_set_num_threads(28);
    
    int wr, ws;
    MPI_Comm_rank(MPI_COMM_WORLD, &wr);
    MPI_Comm_size(MPI_COMM_WORLD, &ws);

    MPI_Barrier(MPI_COMM_WORLD);
    double starttime = MPI_Wtime();

    #pragma omp parallel
    {

        #pragma omp for
        for (long long i = 0; i < 1400000000/ws; i++) {


        }

    }

    double endtime = MPI_Wtime();
    MPI_Barrier(MPI_COMM_WORLD);

    if (wr == 0) {
    std::cout << "Total time " << endtime-starttime << " seconds\n";
    }

}

catch (std::exception& e) {

    std::cout << e.what() << std::endl;
    return 1;

}

MPI_Finalize();

return 0;
}

When I run it with the omp part active, I get the following times for np = 1, 2, 4 respectively (where np is used when running mpirun -np # ./main): 0.0944848, 0.171973, 0.121193. However, when I comment out that part I get times of order 10-7.

Do you have any suggestions on where the problem is? Thanks!

I am trying to run some code which uses MPI + OpenMP, but when I measure the time that the program takes I see a huge difference when I comment out the OpenMP part. I don't know if I am measuring the time wrong or if there is another problem.

This is my code:

#include <iostream>
#include <cstdlib>
#include <omp.h>
#include <mpi.h>

int main(int argc, char*argv[]) {

int provided;
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);

if (provided != MPI_THREAD_MULTIPLE) {
    throw std::logic_error("Cannot provide threaded MPI\n");
}

try {

    omp_set_num_threads(28);
    
    int wr, ws;
    MPI_Comm_rank(MPI_COMM_WORLD, &wr);
    MPI_Comm_size(MPI_COMM_WORLD, &ws);

    MPI_Barrier(MPI_COMM_WORLD);
    double starttime = MPI_Wtime();

    #pragma omp parallel
    {

        #pragma omp for
        for (long long i = 0; i < 1400000000/ws; i++) {


        }

    }

    double endtime = MPI_Wtime();
    MPI_Barrier(MPI_COMM_WORLD);

    if (wr == 0) {
    std::cout << "Total time " << endtime-starttime << " seconds\n";
    }

}

catch (std::exception& e) {

    std::cout << e.what() << std::endl;
    return 1;

}

MPI_Finalize();

return 0;
}

When I run it with the omp part active, I get the following times for np = 1, 2, 4 respectively (where np is used when running mpirun -np # ./main): 0.0944848, 0.171973, 0.121193. However, when I comment out that part I get times of order 10-7.

Do you have any suggestions on where the problem is? Thanks!

Share Improve this question edited Feb 14 at 19:06 LMo asked Feb 14 at 19:02 LMoLMo 11 bronze badge 5
  • 4 You have an empty loop. Maybe the compiler is not able to remove an empty loop if it's openmp. – Victor Eijkhout Commented Feb 14 at 19:35
  • 10-7 ? Do you mean 7,8,9 or 10 ? or 10 ^ -7 ? – 463035818_is_not_an_ai Commented Feb 14 at 20:12
  • A couple of notes to improve the question: 1) improve the indention style, helps with reading the code; 2) share how you compile and run your code, compilations are important (like -O1 vs -O3); 3) share info about the system you run your code on, you are setting the omp thread count to 28 so the system spec matters (is it a high-end server system with 28 cores or a low-level one). – mgNobody Commented Feb 14 at 20:21
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Bot Commented Feb 14 at 20:21
  • 1 Recompile with -O0 and run again. – Gilles Gouaillardet Commented Feb 15 at 2:41
Add a comment  | 

1 Answer 1

Reset to default 0

As some comments already pointed out, the compiler can optimize out the whole loop, because it is empty and therefore dead code. Other then suggested by some comment, the compiler also optimizes out the loop iterations in the OpenMP case. But the compiler does not optimize out the code to create the parallel region and distribute the iterations to the threads. If you change the size of the loop to 100 you don't see any changes in the timing. The difference between execution with OpenMP and without OpenMP is the time to create of the threads for the parallel region.

If you add some code into the loop, that the compiler cannot optimize out, you see expected timing results:

#include <iostream>
#include <mpi.h>

int main(int argc, char *argv[]) {

  int provided;
  MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided);

  if (provided != MPI_THREAD_MULTIPLE) {
    std::cerr << "Cannot provide threaded MPI\n" << std::endl;
    MPI_Abort(MPI_COMM_SELF, 1);
  }

  int wr, ws;
  MPI_Comm_rank(MPI_COMM_WORLD, &wr);
  MPI_Comm_size(MPI_COMM_WORLD, &ws);

  double starttime = MPI_Wtime();
  double total = 0;

#pragma omp parallel
  {
#pragma omp for reduction(+ : total)
    for (long long i = 0; i < 1400000000 / ws; i++) {
      total += .2 * i;
    }
  }

  double endtime = MPI_Wtime();
  MPI_Barrier(MPI_COMM_WORLD);

  if (wr == 0) {
    std::cout << "Total time " << endtime - starttime << " seconds\n";
    std::cout << "Result: " << total << std::endl;
  }

  MPI_Finalize();

  return 0;
}

By removing the OpenMP functions and include, the code can be simply compiled with and without -fopenmp flag. Number of threads is selected with the OMP_NUM_THREADS env variable:

$ mpicxx -O3 openmp-loop-time.cpp -o serial
$ mpirun -np 2 env OMP_NUM_THREADS=1 ./serial 
Total time 0.370651 seconds
Result: 4.9e+16
$ mpirun -np 2 env OMP_NUM_THREADS=24 ./serial 
Total time 0.369488 seconds
Result: 4.9e+16
$ mpicxx -fopenmp -O3 openmp-loop-time.cpp -o parallel
$ mpirun -np 2 env OMP_NUM_THREADS=24 ./parallel
Total time 0.0188746 seconds
Result: 4.9e+16
$ mpirun -np 2 env OMP_NUM_THREADS=1 ./parallel 
Total time 0.369074 seconds
Result: 4.9e+16

Here execution with 24 threads results in about 1/20 of single-threaded execution time. The serial time with and without OpenMP is the same.

发布评论

评论列表(0)

  1. 暂无评论