User:Szha/Notes/CMSC714/note0911

From Grad Wiki
< User:Szha‎ | Notes/CMSC714
Revision as of 15:01, 13 September 2012 by Szha (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
  1. Office Hours
  2. MPI project
  3. bug cluster account info

MPI[edit | edit source]

goals[edit | edit source]

standardize previous message passing
  • PVM, P4, NX (intel), MPL (IBM), ...
support copy-free message passing
portable to many platforms - defines an API , not an implementation

features[edit | edit source]

point-to-point messaging
group/collective communications
profiling interface: every function has a name-shifted version (allow wrapper and metrics)

buffering (in standard mode)[edit | edit source]

no guarantee that there are buffers (existence or size)
possible that send will block until receive is called

delivery order[edit | edit source]

two sends from same process to same des. will arrive in order
no guarantee of fairness between processes on receive (in wildcard-receiving)

MPI communicators[edit | edit source]

provide a named set of processes for communication[edit | edit source]

plus a context - system allocated unique tag

all processes within a communicator can be named[edit | edit source]

a communicator is a group of processes and a context
numbered from 0..n-1

allows libraries to be constructed[edit | edit source]

application creates communicators
library uses it
prevents problems with posting wildcard receives
  • adds a communicator scope to each receive

all programs start with MPI_COMM_WORLD[edit | edit source]

functions for creating communicators from other communicators (split, duplicate, etc.)
  • duplicate: to get different tag (context)
functions for finding out about processes within communicator (size, my_rank, ...)

non-blocking point-to-point functions[edit | edit source]

two parts[edit | edit source]

post the operation
wait for results

also includes a poll/test option[edit | edit source]

checks if the operation has finished

semantics[edit | edit source]

must not alter buffer while operation is pending (wait returns or test returns true)
and data not valid for a receive until operation completes

collective communication[edit | edit source]

communicator specifies process group to participate[edit | edit source]

various opeartions, that may be optimized in an MPI implementation[edit | edit source]

barrier synchronization
broadcast
gather/scatter (with one destination, or all in group)
reduction operations - predefined and user-defined
  • also with one destination or all in group
scan - prefix reductions
  • for processes p1 - p5 that produces msg x1 - x5, scan: 0 x1 x1+x2 x1+x2+x3

collective operations may or may not synchronize[edit | edit source]

up to the implementation, so application can't make assumptions

MPI calls[edit | edit source]

include <mpi.h> in c/c++ program[edit | edit source]

for every process

first call MPI_Init(&argc, &argv)[edit | edit source]

MPI_Comm_rank(MPI_COMM_WORLD, &myrank)[edit | edit source]

myrank is set to id of this process (in range 0 to P-1)

MPI_Wtime()[edit | edit source]

returns wall time

at the end, call MPI_Finalize()[edit | edit source]

no MPI calls allowed after this

MPI communication[edit | edit source]

parameters of various calls[edit | edit source]

var - a variable (pointer to memory)
num - number of elements in the variable to use
type {MPI_INT, MPI_REAL, MPI_BYTE, ... }
root - rank of process at root of collective operation
src/dest - rank of source/destination process
status

calls (all return a code - check for MPI_Success)[edit | edit source]

MPI_Send(var, num, type, dest, tag, MPI_COMM_WORLD)
MPI_Recv(var, num, type, src, MPI_ANY_TAG, MPI_COMM_WORLD, &status)
MPI_Bcast(var, num, type, root, MPI_COMM_WORLD)
  all processes call Bcast (the same call) with compatible parameters ("compatible" refers to different buffer pointers)
MPI_Barrier(MPI_COMM_WORLD)
  all processes must call the barrier, otherwise all the rest processes hang

MPI Misc.[edit | edit source]

MPI Types[edit | edit source]

all messsages are typed
  • base/primitive types are pre-defined
int, double, real, {unsigned}{short, char, long}
  • can construct user-defined types
includes non-contiguous data types

processor topologies[edit | edit source]

allows construction of cartesian & arbitrary graph
may allow some systems to run faster

language bindings for C, fortran, C++, ...[edit | edit source]

What's not in MPI-1[edit | edit source]

process creation
I/O
one-sided communication (get put)

sample MPI program[edit | edit source]

#include "mpi.h"

/* fragment from main  */ 

int myrank, friendRank;
char message[MESSAGESIZE];
int i, tag = MSG_TAG;
MPI_Status status;

/* initialize, no spawning necessary */
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
if (myrank == 0) {
    friendRank = 1;
}
else {
    friendRank = 0;
}
MPI_Barrier(MPI_COMM_WORLD);
if (myrank == 0) {
    for (i = 0; i < MESSAGESIZE; i++ {
        message[i] = '1';
    }
}

for (i = 0; i<ITERATIONS; i++) {
    if (myrank == 0) {
        MPI_Send(message, MESSAGESIZE, MPI_CHAR, friendRank, tag, MPI_COMM_WORLD);
        MPI_Recv(message, MESSAGESIZE, MPI_CHAR, friendRank, tag, MPI_COMM_WORLD, &status);
    }
    else {
        MPI_Recv(message, MESSAGESIZE, MPI_CHAR, friendRank, tag, MPI_COMM_WORLD, &status);
        MPI_Send(message, MESSAGESIZE, MPI_CHAR, friendRank, tag, MPI_COMM_WORLD);
    }
}
MPI_Finalize();
exit(0);