NAME

     bsp_put, bsp_hpput - deposit data into  a  remote  processes
     memory



C SYNOPSIS

     #include "bsp.h"

     void bsp_put(int pid,const void *src,void *dst,int offset,int nbytes);

     void bsp_hpput(int pid,const void *src,void *dst,int offset,int nbytes);


FORTRAN SYNOPSIS

     SUBROUTINE  bspput(pid,src,dst,offset,nbytes)
       INTEGER, intent(IN)  :: pid,offset,nbytes
       <TYPE>,  intent(IN)  :: src
       <TYPE>,  intent(OUT) :: dst

     SUBROUTINE  bsphpput(pid,src,dst,offset,nbytes)
       INTEGER, intent(IN)  :: pid,offset,nbytes
       <TYPE>,  intent(IN)  :: src
       <TYPE>,  intent(OUT) :: dst


DESCRIPTION

     The aim of bsp_put(3) and  bsp_hpput(3)  is  to  provide  an
     operation  akin  memcpy(3C) available in the Unix <string.h>
     library.  Both operations copy a specified number of  bytes,
     from  a byte addressed data structure in the local memory of
     one process into contiguous memory locations  in  the  local
     memory of another process. The distinguishing factor between
     these operations is provided by the buffering choice.

     The semantics buffered on source,  buffered  on  destination
     (see  drma(3))  is used for bsp_put(3) communications. While
     the semantics is clean and safety  is  maximised,  puts  may
     unduly  tax  the memory resources of a system. Consequently,
     bsplib(3) also provides a  high  performance  put  operation
     bsp_hpput(3)  whose  semantics  is  {unbuffered  on  source,
     unbuffered on  destination  .  The  use  of  this  operation
     requires  care  as  correct data delivery is only guaranteed
     if: (1) no communications alter the source area; (2) no sub-
     sequent  local  computations  alter  the source area; (3) no
     other communications alter the destination area; and (4)  no
     computation  on  the  remote  process alters the destination
     area during the entire superstep. The main advantage of this
     operation  is  its economical use of memory. It is therefore
     particularly  useful  for  applications   which   repeatedly
     transfer large data sets.



EXAMPLES

     1)   The  reverse  function  shown  below   highlights   the
          interaction  between  registration  and  put communica-
          tions. This example defines a simple collective commun-
          ication  operation, in which all processes have to call
          the function within the same superstep. The  result  of
          the  function  on  process  i  will be the value of the
          parameter x from process bsp_nprocs()-i-1.

          int reverse(int x) {
            bsp_push_reg(&x,sizeof(int));
            bsp_sync();

            bsp_put(bsp_nprocs()-bsp_pid()-1,&x,&x,0,sizeof(int));
            bsp_sync();
            bsp_pop_reg(&x);
            return x;
          }

          The function would be defined in Fortran as:

                  INTEGER FUNCTION reverse(x)
                    INCLUDE 'fbsp.h'
                    INTEGER x

                    CALL bsppushreg(x,BSPINT)
                    CALL bspsync()

                    CALL bspput(bspnprocs()-bsppid()-1,x,x,0,BSPINT)
                    CALL bspsync()
                    CALL bsppopreg(x)
                    reverse=x
                  END

          By the end of the first superstep,  identified  by  the
          first   bsp_sync(3),   all   the  processes  will  have
          registered the  parameter  x  as  being  available  for
          remote access by any subsequent drma(3) operation. Dur-
          ing the second superstep, each  process  transfers  its
          local copy of the variable x into a remote copy on pro-
          cess bsp_nprocs()-bsp_pid()-1 . Although communications
          occur  to  and  from  the same variable within the same
          superstep, the algorithm does not suffer from  problems
          of  concurrent  assignment  because  of the buffered on
          source,   buffered   on   destination   semantics    of
          bsp_put(3).  This  buffering ensures conflict-free com-
          munication between the outgoing communication  from  x,
          and  any incoming transfers from remote processes.  The
          popregister at the end of the function  reinstates  the
          registration  properties  that  were active on entry to
          the function at the next bsp_sync(3) encountered during
          execution.

          As can be seen from the Fortran code  above,  bsplib(3)
          defines   a  collection  of  constant  parameters  that
          specify the size, in  bytes,  of  the  various  Fortran
          data-types.   BSPWORD,   BSPINT,  BSPLOGICAL,  BSPREAL,
          BSPDOUBLE, and BSPCOMPLEX can be used by including  the
          file fbsp.h within a compilation unit. The relationship
          between the sizes of each of these types  follows  that
          of Fortran 77. i.e.,

          BSPWORD   = BSPINT    = BSPLOGICAL = BSPREAL
          2*BSPWORD = BSPDOUBLE = BSPCOMPLEX


     2)   The procedure put_array shown  below  has  a  semantics
          assignment:

          forall i in {0,..n-1} xs[xs[i]] := xs[i]

          Conceptually, the algorithm manipulates a global  array
          xs  of  n  elements  that  are  distributed  among  the
          processes.  The role of bsplib(3)  is  to  provide  the
          infrastructure  for  the  user to take care of the data
          distribution, and any implied  communication  necessary
          to manipulate parts of the data structure that are on a
          remote process. Therefore, if the user distributes  the
          global array in a block-wise manner (i.e., process zero
          gets elements 0 to n/(p-1), process  one  gets  n/p  to
          (2n)/(p-1), etc.) with each process owning an n/p chunk
          of elements, then the bsplib(3)  communications  neces-
          sary  to  perform  the  concurrent assignment are shown
          below.

          void put_array(int *xs, int n) {
            int i,pid,local_idx,n_over_p= n/bsp_nprocs();
            if ((n % bsp_nprocs()) != 0)
              bsp_abort("{put_array} n=%d not divisible by p=%d",
                        n,bsp_nprocs());
            bsp_push_reg(xs,n_over_p*sizeof(int));
            bsp_sync();

            for(i=0;i<n_over_p;i++) {
              pid       = xs[i]/n_over_p;
              local_idx = xs[i]%n_over_p;
              bsp_put(pid,&xs[i],xs,local_idx*sizeof(int),sizeof(int));
            }
            bsp_sync();
            bsp_pop_reg(xs);
          }

          Similarly, the function can be defined in Fortran as:

                  SUBROUTINE putarray(xs,n)
                    INCLUDE 'fbsp.h'
                    INTEGER xs(*),n
                    INTEGER i,pid,localidx,noverp

                    noverp=n/bspnprocs()
                    IF (MOD(n,bspnprocs()) .NE. 0) THEN
                      CALL bspabort('N not divisible by p')
                    END IF
                    CALL bsppushreg(xs,noverp*BSPINT)
                    CALL bspsync()

                    DO i=1,noverp
                      pid      = xs(i)/noverp
                      localidx = MOD(xs(i),noverp)
                      CALL bspput(pid,xs(i),xs,localidx*BSPINT,BSPINT)
                    END DO
                    CALL bspsync()
                    CALL bsppopreg(xs)
                  END

          The procedure highlights the use  of  bsp_abort(3)  and
          the  offset  parameter  in  bsp_put(3).  Each process's
          local section of the array  xs  is  registered  in  the
          first superstep. Next, n/p puts are performed, in which
          the global numbering  used  in  the  distributed  array
          (i.e., indices in the range 0 through to n-1), are con-
          verted into  pairs  of  process  identifier  and  local
          numbering  in  the range 0 to n/(p-1). Once the conver-
          sion from the global scheme to  process-id/local  index
          has  been  performed,  elements  of  the  array  can be
          transferred into the correct index on a remote process.
          It  should  be  noted that if the value of the variable
          pid is the same as bsp_pid(3), then a local  assignment
          (i.e., memory copy) will occur at the end of the super-
          step.


     3)   Consider a function cyclic_shift executed on each  pro-
          cess,  that  takes  an  integer  x as its argument, and
          returns the value of x on its  left  neighbouring  pro-
          cess.

          int cyclic_shift(int x) {
            bsp_push_reg(&x,sizeof(int));
            bsp_sync();
            bsp_put( (bsp_pid() + 1) % bsp_nprocs(),
                     &x,
                     &x,0,
                     sizeof(int));
            bsp_sync();
            bsp_pop_reg(&x);
            return x;
          }

     4)   An alternative definition of cyclic shift that  uses  a
          high performance put.

          int cyclic_shift(int x) {
            int result;
            bsp_push_reg(&result,sizeof(int));
            bsp_sync();
            bsp_hpput( (bsp_pid() + 1) % bsp_nprocs(),
                       &x,
                       &result,0,
                       sizeof(int));
            bsp_sync();
            bsp_pop_reg(&result);
            return result;
          }


     5)   Consider a function  bsp_allsums  that  calculates  the
          running  sums of p values stored on p processors. i.e.,
          if x_i is stored on process i, then the result on  each
          processor is x_0 + .. x_i.

          #include "bsp.h"
          #include <stdio.h>

          int bsp_allsums(int x) {
            int i, left, right;

            bsp_push_reg(&left,sizeof(int));
            bsp_sync();

            right=x;
            for(i=1;i<bsp_nprocs();i*=2) {
              if (bsp_pid()+i < bsp_nprocs())
                bsp_put(bsp_pid()+i,&right,&left,0,sizeof(int));
              bsp_sync();
              if (bsp_pid()>=i) right=left+right;
            }
            bsp_pop_reg(&left);
            return right;
          }

          void main() {
            int y;
            bsp_begin(bsp_nprocs());
            y = bsp_pid()+1;
            printf("y=%d sums=%d\n",y,bsp_allsums(y));
            bsp_end();
          }

          A compilation, and an example run on four processors is
          shown below:
          pippin> bspcc allsums.c
          pippin> ./a.out
          y=4 sums=10
          y=2 sums=3
          y=1 sums=1
          y=3 sums=6



SEE ALSO

     drma(3),   bsp_push_reg(3),   bsp_pop_reg(3),    bsp_get(3),
     bsp_hpget(3)

     ``BSPlib: The BSP Programming Library'' Jonathan M. D. Hill,
     Bill  McColl,  Dan  C.  Stefanescu,  Mark W. Goudreau, Kevin
     Lang, Satish B. Rao, , Torsten Suel, Thanasis Tsantilas, and
     Rob  Bisseling.  Parallel  Computing,  to  appear  1998. See
     http://www.bsp-worldwide.org for more details.



NOTES

     i    The destination memory area used in a  put  has  to  be
          registered.  It  is an error to communicate into a data
          structure that has not been registered.


     ii   The source of a put does not have to be registered.


     iii  If the destination memory area dst is  registered  with
          size  x,  then it is a bounds error to perform the com-
          munication bsp_put(pid,src,dst},o,n) if o+n>x.


     iv   A communication of zero bytes does nothing.


     v    A process can communicate into its own memory if pid  =
          bsp_pid(). However, for bsp_put(3), due to the buffered
          at destination semantics, the memory  copy  only  takes
          effect at the end of the superstep.


     vi   The process numbering and offset parameter  start  from
          zero, even for the FORTRAN bindings of the operations.



BUGS

     Problems  and  bug  reports  should  be  mailed  to  bsplib-
     bugs@comlab.ox.ac.uk




AUTHORS

     The Oxford BSP Toolset implementation of BSPlib was  written
     by Jonathan.Hill@comlab.ox.ac.uk
     http://www.comlab.ox.ac.uk/oucl/people/jonathan.hill.html

















































Man(1) output converted with man2html