I have a Matrix multiply code that does matrix multiply by the following Where Matrix A * Matrix B = Matrix C
for(j=1;j<=n;j++) {
for(l=1;l<=k;l++) {
for(i=1;i<=m;i++) {
C[i][j] = C[i][j] + B[l][j]*A[i][l];
}
}
Now I want to turn it into multi threaded matrix multiply and my code is as follows:
I use a struct
struct ij
{
int rows;
int columns;
};
my method is
void *MultiplyByThread(void *t)
{
struct ij *RowsAndColumns = t;
double total=0;
int pos;
for(pos = 1;pos<k;pos++)
{
fprintf(stdout, "Current Total For: %10.2f",total);
fprintf(stdout, "%d\n\n",pos);
total += (A[RowsAndColumns->rows][pos])*(B[pos]开发者_如何学Go[RowsAndColumns->columns]);
}
D[RowsAndColumns->rows][RowsAndColumns->columns] = total;
pthread_exit(0);
}
and inside my main is
for(i=1;i<=m;i++) {
for(j=1;j<=n;j++) {
struct ij *t = (struct ij *) malloc(sizeof(struct ij));
t->rows = i;
t->columns = j;
pthread_t thread;
pthread_attr_t threadAttr;
pthread_attr_init(&threadAttr);
pthread_create(&thread, &threadAttr, MultiplyByThread, t);
pthread_join(thread, NULL);
}
}
But I can't seem to get the same result as the first matrix multiply (which is correct) can someone point me to the right direction?
Try the following:
#pragma omp for private(i, l, j)
for(j=1;j<=n;j++) {
for(l=1;l<=k;l++) {
for(i=1;i<=m;i++) {
C[i][j] = C[i][j] + B[l][j]*A[i][l];
}
}
}
While Googling for the GCC compiler switch to enable OpenMP, I actually came across this blog post that describes what happens better than I could, and also contains a better example.
OpenMP is supported on most reasonably relevant compilers for multicore machines, see the OpenMP web site for more information.
Your threading code is not threaded, in fact. You create a thread and wait for it to complete by calling the join just after calling the create. You have to create a matrix of mxn threads, launch them all, and then join them all. Apart from that, the code seems to be calculating the same as the loop. What is the exact discrepancy with the results?
Example (note, not compiled):
pthread_t threads[m][n]; /* Threads that will execute in parallel */
and then in the main:
for(i=1;i<=m;i++) {
for(j=1;j<=n;j++) {
struct ij *t = (struct ij *) malloc(sizeof(struct ij));
t->rows = i;
t->columns = j;
pthread_attr_t threadAttr;
pthread_attr_init(&threadAttr);
pthread_create(thread[i][j], &threadAttr, MultiplyByThread, t);
}
}
/* join all the threads */
for(i=1;i<=m;i++) {
for(j=1;j<=n;j++) {
pthread_join(thread[i][j], NULL);
}
}
(more or less, just not calling pthread_join
for each thread inside the loop).
精彩评论