BSP machine parameters (table of contents)

  1. Summary of machines profiled
  2. Table of the BSP machine parameters
  3. What is the BSP parameter s?
  4. What is the BSP parameter l?
  5. What is the BSP parameter g?

Summary of machines profiled

MANUFACTURER Machine CONFIGURATION
BSP cluster Eight 400Mhz Pentium IIs with 128Mbytes of memory, connected by a Cisco 2916XL 100Mbps Ethernet switch, with a dedicated BSP device driver for a 3Com 3C905B Ethernet card.
Silicon Graphics Power Challenge Four 75Mhz R8000 processors with 4Mbytes unified secondary cache, 0.5Gbytes main memory.
Origin 2000 Eight 195Mhz R10000 with 4Mbytes unified secondary cache, 2Gbytes main memory.
Cray Research T3D Five hundred and twelve 150Mhz DECchip 21064 with 64Mbytes per processor
T3E Twenty four 300Mhz DECchip 21164 with 128Mbytes of main memory per processor.
IBM SP2 (+switch) Eight 66.7Mhz P2SC Thin1 nodes with 1Mbytes secondary cache and 128Mbytes of main memory per processor.
SP2 (+Ethernet) Eight 66.7Mhz P2SC Thin1 nodes with 1Mbytes secondary cache and 128Mbytes of main memory per processor.
Intel Pentium Pro NOW Seventeen 266Mhz Pentium Pros with 512K cache connected by 10Mbit Ethernet
Digital Digital 8400 Six 300Mhz Alpha EV5 with 2Gbytes main memory.
Alpha farm
Sun Microsystems Sparcstation-20 SMP
Sparcstation-5 NOW Sixteen Sparcstation-5s connected by 10Mbit Ethernet
Hitachi SR2001
Convex Exemplar
Parsytec GC

Table of the BSP machine parameters

The benchmark program that was used to calculate the BSP machine parameters can be found here. A more detailed list of the BSP machine parameters, including communication rates in usecs (which are useful for comparing the absolute performance of communication across architectures) can be found here.

MACHINE s (Mflop/s) p (no. procs) l (flops) g (flops/word)
BSP cluster88 11281.3
2565433.5
41175931.5
81834730.9
SGI PowerChallenge74 12260.50
2113210.2
314969.5
419029.3
Origin 2000100.7 12861.36
28048.26
313138.36
4178910.24
5247411.06
6296312.25
7386714.28
Cray T3E47 1862.14
22692.61
32962.11
43571.77
85061.64
95521.57
167511.66
208801.63
2410131.70
Cray T3D12 1680.3
21641.0
41680.8
81750.8
93831.2
161811.0
254861.5
322011.4
641481.7
1283011.8
2563872.4
IBM SP2 (Switch)26 12441.3
219037.8
435838.0
8541211.4
Pentium Pro NOW61 1851.0
252745484.5
41399811128.5
85391591994.1
108260542436.3
1628842733614.6
Digital 840024 1280.3
22862.5
33625.3
Sun Sparc-5 NOW7.7 1822.4
272966128.8
4369771266.2
8835281322.3
Multiprocessor Sun Sparc-2010 1240.4
2543.4
3744.1
41184.1
Hitachi SR20015.4 1310.2
211653.0
422993.0
838443.1
1646383.0
3269064.9
Convex Exemplar10.5 1600.16
2213738.3
4644579.2
819447611.3
Digital Alpha farm10.1 1290.3
21720281.1
33435683.0
44710981.3
Parsytec GC19.3 1981.0
26309113
423538143
829080254
16224977342
32130527658
IBM SP2 (Ethernet)26 12411.3
218759183.6
439025628.2
8887951224.1

The following description of the BSP parameters s, l, and g is taken from the paper:


What is the BSP parameter s?

The values of the g and l parameters are normalised by the instruction rate, s, of each processor. Because this instruction rate depends heavily upon the kind of computations being done, the average of two different measured values is used:
  1. Lower-bound for s: measures the cost of an inner product, where O(n) operations are performed on a data structure of size n. The value of n is chosen to be far greater than the cache size on each processor. This benchmark therefore gives a lower-bound megaflop rate for the processor as each arithmetic operation induces a cache miss.
  2. Upper-bound for s: measures the cost of a dense matrix multiplication, where O(n^3) operations are performed on a data structures of size n^2. Because a large percentage of the computation can be kept in cache, this benchmark gives an upper-bound megaflop rate for the processor.

The values of s given in the table above is an average of the upper and lower bound values for s.


What is the BSP parameter l?

The cost of a barrier synchronisation comes in two parts:

The parameter l captures the latter of these costs. The diameter of the communication network, or at least the length of the longest path that allows state to be moved from one processor to another clearly imposes a lower bound on l. However, it is also affected by many other factors, so that, in practice, an accurate value of l for each parallel architecture is obtained empirically.


What is the BSP parameter g?

The ability of a communication network to deliver data is captured by a BSP parameter, g, that measures the permeability of the network to continuous traffic addressed to uniformly-random destinations. As we have seen, BSP programs approximate such traffic. The parameter g is defined such that an h-relation will be delivered in time hg. Subject to some small provisos, hg is an accurate measure of communication performance over a large range of architectures. The value of g is normalised with respect to the clock rate of each architecture so that it is in the same units as the time for executing sequences of instructions.

Sending a message of length m clearly takes longer than sending a message of size 1. BSP does not distinguish between a message of length m and m messages of length 1 --- the cost in either case is mhg (refer to the Q&A paper to see why this isn't a problem). So messages of varying lengths may either be costed using the form mhg where h is the number of messages, or the message lengths can be folded into h, so that it becomes the number of units of data to be transferred.

The parameter g is related to the bisection bandwidth of the communication network but they are not equivalent --- g also depends on factors such as:

So g is bounded below by the ratio of p to the bisection bandwidth, suitablly normalised, but may be much larger because of these other factors. Only a very unusual network would have a bisection bandwidth that grew faster than p, so g is a monotonically increasing function of p. The precise value of g is, in practice, determined empirically for each parallel computer, by running suitable benchmarks.

Note that g is not the single-word delivery time, but the single-word delivery time under continuous traffic conditions. This difference is subtle but crucial.

An example of the BSP parameters being used in the analysis of a broadcasting algorithm can be found here.


Jonathan Hill
Last updated: September 24th 1997