Effects of Communication Characteristics on Task Mapping
Download Effects of Communication Characteristics on Task Mapping
Preview text
Effects of Communication Characteristics on Task Mapping Quality on a 2D Mesh with Wormhole Routing
S.Y. Lee
Department of Electrical and Computer Engineering Auburn University
Auburn, AL 36849
[email protected]
ABSTRACT
Given that computational load is well balanced, task mapping is mainly concerned with reducing communication overhead. How much communication time can be reduced by optimizing allocation of tasks on a multiprocessor system would be dependent on several factors. In this study, dependency of improvement by task mapping on communication pattern is investigated on a mesh with wormhole routing. Communication pattern may be described by a set of parameters such as the total number of messages, the number of sources, the number of destinations, and distribution of message size. Through extensive simulation, it has been shown that the communication pattern has a significant effect on reduction in communication overhead that can be achieved by task mapping. Effects of individual communication parameters have been analyzed.
1. INTRODUCTION
As more parallel computing systems, tightlycoupled multiprocessors or networks of workstations, are made commercially available, how those systems can be efficiently used for various applications has become an important issue. A specific issue of task mapping concerns with assigning partitioned tasks onto processing elements (PE's). Task mapping has been extensively studied by many researchers for a long time [1][2][3][4][5]. However, as a whole, it is still an open problem. The term, task mapping, is to be distinguished from task scheduling which determines the order in which a given set of tasks are to be executed on a single or multiple PE's. Task mapping (as used in this paper) allocates tasks, which are communicating concurrently and are to be executed at the same time, onto multiple PE's. With task partitioning fixed, task mapping mainly affects communication overhead. As the ratio of communication time to computation time in parallel or distributed computing increases, it is required that a mapping scheme take communication over
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and or fee. SAC'00 March 1921 Como, Italy (e) 2000 ACM 1581132395/00/003>...>$5.00
head into account. This increased ratio is due to a larger number of PE's now available for an application, a larger size of shared data involved in many recentlyemerging applications, the everincreasing processor speed, etc.
One may attempt to reduce communication overhead by optimizing "when PE's send out messages", i.e., scheduling communication with a given (spatial) mapping. However, the achievable improvement can be limited since a spatial assignment (mapping) of communicating tasks is fixed. That is, mapping would help scheduling to achieve a better result (a smaller communication overhead). Also, this approach would require more information to deal with such as timing information. Furthermore, it can be more sensitive to dynamic variation of communication.
Communication pattern among PE's in a multiprocessor system depends on where communicating tasks are assigned and shared data are allocated. Depending on the relative positions of communicating tasks, the resulting communication overhead may vary significantly. In some cases, random mapping may work just as well as any elaborated mapping. Therefore, whether one should concern about mapping at all and what should be optimized in mapping are to be addressed.
There have been significant research efforts in optimizing communication in order to minimize communication overhead including [7][8]. Various communication patterns such as multicasting, broadcasting, alltoall broadcasting, etc. were investigated. Hambrusch et. al. [9] considered "StoP" broadcasting problem on meshes, where S is the number of source PE's and P is the total number of PE's in a mesh, 1 < S < P. A set of particular source distribution patterns were examined.
In this paper, how much communication overhead can be reduced by optimizing locations of communicating tasks (task mapping) on a 2D mesh with wormhole routing depending on the characteristics of communication pattern is analyzed for general (or random) communication patterns. It is well known that communication time for a pair of PE's is almost independent of the distance between them in a wormhole routing network while it is proportional to the distance in a packet switching or message switching network [10][11]. Nevertheless, it should not be "extrapolated" to: communication overhead in a wormhole routing multiprocessor system does not depend on what PE's communicating tasks axe mapped onto. Conflict among messages is dependent on locations of communicating tasks while communication
633
I J
sending flit buffer
S
receiving flit buffer
Figure h A processing element (PE) in the system model. A flit bypasses the receiving buffer to one of the sending buffers unless the current PE is its destination.
time for an isolated message can be said to be distanceindependent. Note that such conflict directly affects communication time in general. This study shows that communication overhead on a wormhole routing multiprocessor system is not "independent" of locations of communicating tasks, and attempts to analyze how it is affected by communication pattern which in turn depends on locations of tasks. The communication characteristics to be considered are the number and distribution of sources, the number and distribution of destinations, the total number of messages, distribution of message size, etc., which are referred to as communication parameters. An extensive simulation has been performed to analyze effects of the communication parameters on the quality of a mapping result that can be achieved. Two different objective functions are considered in the optimization procedure of task mapping and are compared in their effects on mapping results.
2. MODELS
2.1 System Model
In this study, a square mesh of size N × N is employed as a system model onto which a problem graph is mapped. Each PE (except those along the boundary) in a mesh is connected to its four neighboring PE's through four fullduplex channels as shown in Figure 1. T h a t is, over each channel between two adjacent PE's, simultaneous communication in both directions is possible. For other types of channels such as single bidirectional or unidirectional channels, the general performance behaviors would be similar though they can be quantitatively different. A data transfer from a source node to a destination node is carried out by wormhole routing. Each channel between two adjacent PE's has flit buffers at both ends (one each). Given a pair of source and destination, the path to be followed is specified by the deterministic XY routing. Each PE has a message queue where messages to be sent to other PE's are temporarily stored until they are transmitted
to their destinations. The message at the top of the queue is sent first. Once the tail flit of the message being transferred leaves the queue, transmission of next message in the queue may be initiated. Therefore, multiple messages from a PE may be travelling through the network at the same time (while only one message can be transmitted from a PE at any given time). When a message which has left its source node is blocked by another before its header flit arrives at its destination, it stays in the network occupying the flit buffers along the path from the source node to the current node (at which it is blocked) until the blocking flit buffer is released. In the case that multiple messages contend for a flit buffer, the one destined for the farthest node from the current node is given the right to proceed first. Each PE has multiple, two to four, incoming channels and can receive multiple messages arriving at the same time. It is assumed that virtual channel is not implemented. Therefore, channel contention is equivalent to flit buffer contention. In this paper, the terms PE and node are interchangeably used. Whenever necessary, the terms, system node and problem node are used for clarity.
2.2 Communication Model
A problem (computation) which has been partitioned may be described by a graph to be referred to as a problem graph. In the problem graph, a node represents a task and an edge between two nodes represents communication between the corresponding tasks. In this study, it is assumed that tasks are to communicate with each other concurrently. When communication patterns are regular, in most cases, efficient (optimal) algorithms to perform communication between nodes have been developed for various interconnection topologies. Also, the best task mapping is known or task mapping is unnecessary in many cases. On the other hand, when the communication pattern is irregular, an elaborate task mapping scheme has been employed in order to reduce communication time. In order to obtain "general" results, a statistical approach is taken rather than considering a few particular problem graphs. That is, communication pattern (problemgraph) is "random", where certain characteristics, i.e., communication parameters defined below, are controlled.
Population, P, indicates communication intensity in terms of the total number of messages to be exchanged between nodes during a communication period. Each PE that has any messages to send to other PE's places them in the message queue (refer to Figure 1). All PE's with any messages in the queue attempt to transmit messages concurrently. The communication period lasts until delivery of all messages in the queue at all PE's is completed.
Source spread, S, is defined to be the number of sources, i.e., sending nodes. This parameter quantifies the spatial density of sending nodes. Note that 1 _~ S _~ N 2 since a N x N mesh is considered in this study. P ~ N 2 doesn't necessarily mean S = N 2 because a PE may have more than one message to send. In order to have more control over communication pattern, an auxiliary parameter is used, which is the maximum number of messages, M~, that can be sent by a PE.
634
. .... . / ( /
Figure 2: An e x a m p l e for communication parameters where P = 8, S = 4 ( M ~ : 3 ) , and D = 3 (Md=4). A positive number indicates the number of messages to be sent while a negative number does the number of messages to be received.
D e s t i n a t i o n s p r e a d , D, is defined to be the number of destinations, i.e., receiving nodes where 1 < D _< N 2. This parameter indicates the spatial density of receiving nodes. As in the source spread, P _> N 2 doesn't always imply D = N 2. T h e m a x i m u m number of messages that can be destined to (received by) a PE is denoted by M~.
Uniformity specifies the distribution of message size, in terms of its mean, Urn, and standard deviation, Ud. For example, Ud = 0 m e a n s that all messages have the same size.
Obviously, communication overhead depends on the population of messages, e.g., more messages usually results in a higher overhead. How much the overhead can be reduced by mapping (i.e., optimizing locations of communicating tasks) must vary with population. A larger S given P means that messages to be sent are distributed over a larger number of PE's and therefore a larger number of messages exist being routed simultaneously while a smaller S indicates that fewer PE's are to send a larger number of messages. A similar interpretation can be given to D. Therefore, it is expected that, for the same population, source and destination spreads would affect the room for optimization of locations of communicating tasks, i.e., effectiveness of task mapping. Also, the size of message is a factor to be considered especially in the case of wormhole routing, which determines how long the flit buffers at the intermediate PE's (including source and destination PE's) are occupied by a message. Though this set of parameters may not be inclusive, this should be sufficient to characterize a communication pattern for this study.
3. TASK MAPPING
In task mapping, one determines the correspondence between a set of tasks (problem graph) and a group of PE's (system graph) such t h a t a certain objective is achieved. In defining what to be achieved in task mapping, one may employ a cost or objective function. Depending on the objective function, the final mapping can be significantly different
even for the same problem and system graphs. Also, given an objective function, the quality of mapping that can be achieved would vary with optimization method employed. The objective of this study is not to develop a new mapping scheme. One of the issues to be addressed is whether a mapping, optimization in paxticulax, is needed or not given an application (problem graph) and a multiprocessor system. If it is needed or not would depend on how much we want to improve assignment of tasks (and how much room we have for optimization) over random mapping. Also, another related issue is what (objective function) is to be optimized in mapping. In the case of wormhole routing, as long as the size of a message is sufficiently large, the time required to send it from a node to another is almost distanceindependent and is mainly proportional to the message size. However, this is true only when there is no conflict among messages (network contention) such that a message is not delayed by others. When there are multiple messages being transferred in a network, a message may experience blocking. Then, the actual delivery time of a message may be significantly longer than the "ideal" one which is mainly dependent on the message size only (independent of distance). The purpose of this study is to analyze the effect of the communication parameters on the total communication time, T, that can be achieved by mapping and also to examine what type of objective function is to be employed in optimizing mapping. A more accurate objective function (quantifying more realistically what is to be optimized) would require more parameters to be considered and a more complex formulation and therefore usually takes a longer time to evaluate. That is, a proper objective function should be selected considering the acceptable accuracy and complexity.
3.1 Two Objective Functions
There axe two common components found in most objective functions used for task mapping: distance between communicating nodes (PE's) and edge conflict (congestion). Based on these two, the following two simple objective functions are employed in this study. One, to be referred to as OFd, is the maximum of weighted distances between all communicating nodes. The weighted distance is the product of the size of a message and the number of hops the message is to travel. The other, to be referred to as OF~, is the maximum of edge congestions for all system edges (channels). Edge congestion is defined as the sum of sizes of messages to be transferred over a system edge. These objective functions are general enough for wide application and can be easily evaluated for fast mapping. Let Gp and Gs be a problem graph and a system graph, respectively. Gp = {Vp, Ep} where Vp is t h e problem node set of which elements are subtasks, Vpi, i.e., Vp = {vpi} for 1 < i < N when there axe N subtasks, and Ep is the problem edge set of which elements correspond to communication between subtasks. Specifically, epij is the edge between Vpi and vpj and lepij[ denotes the c o m m u n i c a t i o n intensity (i.e., message size) between vpi and vpj that communicate with each other, for 1 _< i , j < N (i ~ j). Similarly, Gs = {V~,E~}. Vs = {yak} where yak is a system node corresponding to a P E k , and E~ = {epkl} where epm denotes a channel or link between vsk and vsl that axe adjacent to each other. Suppose t h a t Vpi and vpj are m a p p e d onto vsk(i) and vs,(j),
635
respectively, where tile notation x(y) means that vpv is mapped
onto v ~ . Let's denote the weighted distance between yak(i)
and v~t(j) by dk(i)l(j). Then, the objective function, OFd,
can be defined as follows.
OFd = maxdk(i)t(j)
(1)
Now, let Rk(1)l(j) be the set of system edges along the routing
path from v~k(O to v~t(j). Define [esmn[ to be ~,:,j lepij[ for
e~m,~ C Rk(i)l(i) and all i, j. Note that if [epi3l = 1 for all
i, j, [e~m~l denotes the n u m b e r of messages which share the link e. . . . Then, OF~ can be expressed as follows.
OF~ : m a x le~m~l
(2)
rr~ ,n
OFd represents a class of objective functions by which dis
tances between communicating tasks are minimized. On
the other hand, OF~ is a typical formulation of objective
function in which messages are to be spread over as many different channels as possible (independent of distances be
tween tasks). Note that OFd would be a reasonable choice
for a system where the communication overhead is mainly
distancedependent while OF~ is for a system where avail
ability of resources such as channels and buffers determines communication time (resourcedependent). However, in practice, unless there is no or very low contention, both types of dependency exist.
3.2 Effects of Task Mapping
Objective Function, OFf
One effect of task mapping with OFd is the decreased dis
tance between communicating nodes and therefore a shorter
propagation delay. Also, it (a smaller OFd) increases the
probability that a message is not blocked by other messages since it travels a shorter distance. That is, the network latency is usually reduced. In the case of wormhole routing, the total communication time heavily depends on the "degree" of contention (blocking) in addition to the size of message.
Objective Function, OF~
Minimizing OF~ is qualitatively equivalent to minimizing
the number of communicating nodes (source and destination) in each row and column of a mesh. That is, it tends to spread the communicating nodes uniformly over rows and columns. Decreasing the number of communicating nodes in a row would reduce the possibility of contention during routing in the X dimension. Similarly, a smaller number of communicating nodes per column lead to less contention during the Y dimension routing.
3.3 Optimization
An essential procedure in task mapping is optimization. Several different approaches were employed in the past, from a simple heuristic method to a timeconsuming global optimization technique. The emphasis of this study is not on mapping scheme itself. Therefore, a simple optimization technique is employed to emphasize practicality for both of the objective functions.
The optimization strategy employed in this study is the pairwise exchange. Starting from the random (initial) assignment, optimization is carried out through iterations. In each iteration, a pair of PE's (system nodes) on which problem nodes are to be swapped temporarily are randomly selected. If the swapping doesn't increase the (value of) objective function, it is granted. Otherwise, no change is made to the current mapping. This iterative optimization process
continues until the number of consecutive iterations (neon)
which would increase the objective function exceeds a cer
tain limit (Nlimit) or the total number of iterations (n~ot~)
reaches an allowed maximum (Nma~).
4. SIMULATION RESULTS AND DISCUSSION
4.1 Simulation
An extensive computer simulation has been carried out to analyze effects of communication patterns on task mapping. Random problem graphs axe used for this simulation. Generation of problem graphs are controlled by a given set of communication parameters. For each problem graph, the three mappings are evaluated in terms of T on a mesh with
wormhole routing. The first is a random mapping, i.e., no
task mapping effort is made. The other two are obtained by task mapping with the two objective functions described in Equations (1) and (2). Then, improvement (reduction in time) achieved by each of the two task mappings over the random assignment is considered.
Communication Parameters
The size of mesh employed is 8 x 8, nonwrappedaround. The population, P, varies from 10 up to 120. For each P, both of pS_and ~D are between 1 and 1. The size of a rues
sage, Urn, is changed from 5 up to 30 flits. Note that the
maximum distance between two PE's in a 8 x 8 mesh is 15 and therefore the m a x i m u m message size of 30 flits should be long enough to examine cases where messages occupy whole paths from sources to destinations during transmission. Also, the standard deviation of the message size is varied from 0 to 5 for U m = 20.
Simulation Results
Reduction factor, which is defined below, is used for perfor
mance analysis.
Rd (Tr Td) x 100 Tr
where Rd and Td are the reduction factor and the communication time achieved by mapping with OFd, respectively,
and Tr is the communication time achieved by the random mapping. The reduction factor indicates how much reduction in communication time one can expect by task mapping on a mesh
with wormhole routing. Note that Rd, expressed in percent
age, approaches to 100 as Td to 0. The reduction factor
for mapping with OFt is defined similarly. That is, Re =
(T,TT~,)× 100 where T~ is the communication time achieved
by mapping with OF~.
In Figures 3 and 4, the reduction factor is observed as the population, P, varies for different source (S) and destination (D) spreads. In Figures 5, 6 and 7, the effects of S and
636
D on the reduction factor achieved are mainly examined. In Figures 8 and 9, the message size is varied for different populations to analyze the message size dependency of reduction factor. In Figure 10, variation of message size is considered. In each case of the simulation, multiple problem graphs with
the same c o m m u n i c a t i o n parameters are used and the average reduction factor is reported.
4.2 Discussions
As can be seen in Figures 3, 4, 5, 6 and 7, it is clear that task mapping can have a significant effect on the communication time for concurrently communicating tasks on meshes with wormhole routing and the degree of effect varies with the communication parameters.
Population
For a relatively light traffic (a small P), one can expect a significant reduction in communication time by a simple task mapping scheme as shown in Figures 3 and 4. As P increases, however, the reduction factor tends to decrease. When only a few messages are in a system, there would not be much interaction among them. Therefore, it is easy to
improve b o t h of OFa and OFe by a large margin. Also,
time required for a message would be more dependent on distance than on edge (buffer) conflict. This is why the
reduction fmztor is larger for OFa t h a n for OF~ when P is
small as can be seen in Figures 3 and 4. As P increases, messages interact with each other more. Therefore, improving (reducing the value of) either objective function for a part of problem graph is highly likely to have a negative effect on mapping of other parts. Hence, a smaller reduction in the communication time is expected. Since more buffer contention which increases network latency in a wormhole routing system occurs, mapping with
OF~ has a b e t t e r chance to reduce the communication time more (refer to Figures 3 and 4). Note t h a t minimizing OFe
reduces buffer conflict nmre explicitly.
Source~Destination Spread
For the same population, the reduction factor depends on distribution of messages over nodes. In Figures 5, 6, and 7, it can be seen that a smaller source spread (S) results in a smaller reduction in communication time. This observation may be explained as follows. A smaller spread means that messages are concentrated on fewer nodes (PE's), i.e, the number of messages per node is larger. Consider the following two cases. In one case, a node (say node A) needs to send a message to multiple (other) nodes. Therefore, the maximum of distances (for example) between node A and other nodes is to be minimized. In the other case, there are independent multiple pairs of nodes, and nodes in each pair need to communicate each other. Then, the distance between two nodes in each individual pair should be minimized. In general, minimization in the former case is harder. Hence, when messages are not spread over many nodes, communication time is reduced less by mapping. It is also noticed that the reduction factor, R, is less sensitive to D than to S. Remember that it is assumed that each PE can transmit only one message at any given time while it may receive multiple messages arriving at different buffers simultaneously. That is, receiving capacity is higher than sending capacity. Therefore, variation in the destination
spread is less influential to the reduction factor than that in the source spread.
OF~ and OF~
It can be observed that m a p p i n g with OF~ performs better than with OFd as both of S and D decrease with a fixed P.
When both of the source and destination spreads decrease, (the same number of) messages are exchanged between a smaller number of nodes. This leads to the increased number of messages from or to a node. Therefore, in order to minimize network latency, it would be more effective to reduce channel (buffer) conflict rather than distance between source and destination. When the number of messages per node is large, the increase in network latency due to a path
conflict would be larger. Therefore, in such cases, OF~ is
the one to be employed in mapping.
Message Size
In Figures 8 and 9, message size dependency is analyzed. As the message size increases, the reduction factor one can expect to achieve decreases. A longer message occupies flit buffers along its path for a longer time. Therefore, the probability of reducing delay experienced by a message by a certain amount would be lower for longer messages. This is true especially when there is a larger number of such messages in a system as can be seen by comparing Figures 8 and 9. It is also confirmed in these figures that the larger the source spread is the larger the reduction factor becomes.
Variation of Message Size
As shown in Figure 10, variation (standard deviation, Ua)
in the message size has a very marginal effect on reduction factor in most cases. The reason for this observation is that the overall behavior is determined by the average message
size rather than its variance which m a y change instantaneous
behavior. However, a slight decrease in reduction factor can be observed as Ua increases.
5. SUMMARY
The simulation results may be summarized as follows:
• In a wormhole routing multiprocessor system, mapping (spatial assignment of communicating tasks) has a significant effect on communication overhead.
• Unless population is too high, a significant reduction in communication time is achievable by a simple task mapping.
• A higher percentage of reduction is obtained for a smaller population.
• A smaller spread tends to result in a less reduction.
• Minimizing channel (buffer) conflict is more effective than minimizing distance when population is relatively high or when the spread is smaller.
It is observed that the simulation results reported in this paper do not always match with the above summary. This observation is believed to be largely due to the local search algorithm employed in optimization step of mapping. While the algorithm is simple to implement and fast to execute, it
637
IOO
90
0" Rd
80
4 Re
I~
90 80
~ ~o
Rd +Re
Rd
~0
4" Re
g 50
30 20
50
"~ 30'
"~'~
20
~
g ~°
10
30
120
Population (P)
(a)
0
10
30
120
Population (P)
(b)
0
Io
30
120
Population (P)
(c)
Figure 3: Reduction factor as a function of population when N = 8, D = 16, Um = 10 and Ud = O. (a) S = 16 (b) S = 8 (c) S=2.
100
90
Rd
80
+ Re
90 Rd
80
+ Re
70
60
~
40
"~ 30
g .50
".'~ 40
~
"~ 30
20
20
IO I
,
0
'
1o
30
120
Population(P)
(a)
10
o
1o
30
12o
Population(P)
(b)
9O Rd
~0
+ Re
7O
60
50
40
30
20
io
30
12o
Population(P)
(c)
Figure 4: Reduction factor as a function of population when N = 8, D = 2, U m = 10 and Ud = O. (a) S = 16 (b) S = 8 (c) S2.
must have found local (rather than global) optimum solutions (mapping) in some cases. Also, the unavoidable imperfectness of simulation is a minor factor contributing to this deviation.
6. REFERENCES
[1] H. S. Stone. "Multiprocessor scheduling with the aid of network flow algorithms", IEEE Transactions on Software Engineering, SE3(1):8593, January 1977.
[2] S. H. Bokhari, "On the Mapping Problem", IEEE Transactions on Computers, C30(3), pp207214, March 1981.
[3] S. W. Bollinger and S. F. Midkiff. "Heuristic technique for processor and link assignment in multicomputers", IEEE Transactions on Computers, 40(3):325333, March 1991.
[4] G. Cybenko. "Dynamic load balancing for distributed memory multiprocessors", Journal of Parallel and Dis'tribUted Computing, 7:279301, 1989.
[5] S.Y.Lee and J.K.Aggarwal, "A Mapping Strategy for Parallel Processing", IEEE Transactions on Computers, C36, no. 4, pp433442, April 1987.
[6] J.C. Jacob and S.Y. Lee, "Task Spreading and Shrinking on a Network of Workstations with Various Edge Classes", Proceedings of IEEE International Conference on Parallel Processing, vol III, pp174181, August 1996.
[7] S. E. Hambrusch, F. Hameed, and A. A. Khokhar. "Communication Operations on CoarseGrained Mesh Architectures", Parallel Computing, vol. 21, pages 731751, 1995.
[8] S. Ranka, R. Shankar and K. Alsabti. "ManytoMany Communication with Bounded Traffic", Proe. off Syrup. on the Frontiers of Massively Parallel Computation, 1995.
[9] S. E. Hambrusch, A. A. Khokhar and Y. Liu. "Scalable StoP Broadcasting on MessagePassing MPPs", Proe. of International Conference on Parallel Processing, vol. I, pages 6976, 1996.
[10] W. Dally and C. Seitz, "The Torus Routing Chip", J. Distributed Computing, vol.1, No.3, pp187196, 1986.
[11] L. Ni and P. McKinley, "A Survey of Wormhole Routing Techniques in Direct Networks", IEEE Computer, pp6276, vol.26, No.2, February 1993.
638
I00
90
4~ Rd
80
.q Re
70
50
4O
3O
20
10
0
I (16)
2 (8)
4 (4)
8 (2)
16 ( l )
Source Spread(Ms)
(a)
I{1O
9O Rd
8O
Re
g 50
4O
3O
20
10
0
i
q
*

I (16)
2 (B)
4 (4)
8 (2)
16 (1)
Source Spread (Ms)
(b)
100
90 Rd
80
q Re
70
60
50
40
30
20
l0
I (16)
2 (8)
4 14)
8 (2)
16(I)
Source Spread (Ms)
(c)
Figure 5: Reduction factor as a function of source spread when N = 8, P = 10, Um= parentheses is Ms. (a) D = 16, M~ = 1 (b) D = 4, M~ = 4 (e) D = 1, M, = 16.
10 a n d Ud = O. T h e n u m b e r in
I00 9O 80 70 60
g 50
Rd ~ Re
30
2O 10
0 1(32)
2(16)
. 4181
.
.
g(4)
. 16(21
Source Spread (Ms)
(a)
32(11
16O 9O 80 70 6O
g 50
O Rd + Re
31
2
I0
0 I(32) 2(16) 4(8) 8(4) 16(2) 32(I)
Source Spread (Ms)
(b)
I00 9O 80 7O 60
g
4O
~ Rd ~r Re
I(32)
2116) 4(8)
B(4)
16(2)
Source Spread (Ms)
(c)
3211)
Figure 6: Reduction factor as a function of source spread when N = 8, P = 30, Um= parentheses is Me. (a) D = 32, ~[~ = 1 (b) D = 8, Mr = 4 (c) D = 1, M~. = 32.
10 a n d Ud = 0. T h e n u m b e r in
I00
9o ~ Rd
XO
t Re
~ 6o
~g 5o
"~ 4O
20 IO
2(64) 4(32) 8(16) 16(8) 32(4) 64(2)
Source Spread (Ms)
(a)
100 90 80 70 6O Lt~
g 5O .g 40
4k Rd + Re
2O I0
~(b41 4(32) 8(16) 16(8) 32(4) 64(2)
Source Spread (Ms)
(b)
16O
90
80
Re
70
60
50
40
30
20
10
0 2 (('A) 4(32)
8(16)
16(8) 32(4)
64 (2)
Source Spread (Ms)
(c)
Figure 7: Reduction factor as a function of source spread when N = 8, P = 120, Um= parentheses is Ms. (a) 19 = 64, M~ = 2 (b) D = 16, Mr = 8 (c) D = 2, Mr = 64.
10 a n d Ud = 0. T h e n u m b e r in
639
Ioo
100
90 O Rd
80
+ Re
9O
Rd
80
+ Re
90 Rd
80
4 Re
70 60
g 50 . f 4O. I
g 30
~_______+___.~
7O
~ (60
g ~
50
"~ 40
"~ 30
70
6O L~
g 50
40
g 30 !
2O
10
0
IO
20
40
60
Message Size
(a)
20 10
I0
20
40
(60
Message Size
(b)
2o i
o "~~'e

÷
•
1o
20
40
60
Message Size
(c)
Figure 8: Reduction factor as a function of message size when N = 8, P = 10, Ud = 0, D = 16 and Mr = 1. (a) S = 16, M8 = 1 (b) S=4, M8 =4 (c) S= 1, M~ = 16.
ICO
90
Rd
80
+ Re
70
60 L~ 50
40
g 30
20 ~+______.
10
0
Message Size
(a)
1oo
IOO
90
80
,~ ~o
~ (60
Rd + Re
9o
80
~ ~o ~. 6o
Rd + Rc
~g 50
"~ (60
'~ 3o
20 1
~
50
'.~ 40
"~ 30
20
I0
10
20
4o
(60
0 ' "' ~
,
IO
20
40
~o
Message Size
(b)
Message Size
(c)
Figure 9: Reduction factor as a function of message size when N = 8, P = 30, Ud = 0, D = 32 and Mr = 1. (a) S = 32, Ms = 1 (b) S=4, Ms =8 (c) S= 1, Ms =32.
Ioo
90
0 Rd
80
tRe
~ ~o
N ~~ 5o
m I
0
I
2
3
5
StandardDeviation of Message Size
(a)
100
90
•4 b Rd
80
b Re
70
~~ 60
Lg~ 50
40
30"
20
10'
0
1
2
3
5
Standard Deviation of Message Size
(b)
1oo 90 80
~ 7o
Rd 4" R¢
~ 5o
~ 3o
2O
I
2
3
Standard Deviation of Message Size
(c)
Figure 10: Reduction factor as a function of variation of message size when N = 8, P = 10, Um= 20, D = 16 and Mr = 2. (a) S=32, Ms = 1 (b) S=4, Ms =8(c) S=2, Ms = 16.
640
S.Y. Lee
Department of Electrical and Computer Engineering Auburn University
Auburn, AL 36849
[email protected]
ABSTRACT
Given that computational load is well balanced, task mapping is mainly concerned with reducing communication overhead. How much communication time can be reduced by optimizing allocation of tasks on a multiprocessor system would be dependent on several factors. In this study, dependency of improvement by task mapping on communication pattern is investigated on a mesh with wormhole routing. Communication pattern may be described by a set of parameters such as the total number of messages, the number of sources, the number of destinations, and distribution of message size. Through extensive simulation, it has been shown that the communication pattern has a significant effect on reduction in communication overhead that can be achieved by task mapping. Effects of individual communication parameters have been analyzed.
1. INTRODUCTION
As more parallel computing systems, tightlycoupled multiprocessors or networks of workstations, are made commercially available, how those systems can be efficiently used for various applications has become an important issue. A specific issue of task mapping concerns with assigning partitioned tasks onto processing elements (PE's). Task mapping has been extensively studied by many researchers for a long time [1][2][3][4][5]. However, as a whole, it is still an open problem. The term, task mapping, is to be distinguished from task scheduling which determines the order in which a given set of tasks are to be executed on a single or multiple PE's. Task mapping (as used in this paper) allocates tasks, which are communicating concurrently and are to be executed at the same time, onto multiple PE's. With task partitioning fixed, task mapping mainly affects communication overhead. As the ratio of communication time to computation time in parallel or distributed computing increases, it is required that a mapping scheme take communication over
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and or fee. SAC'00 March 1921 Como, Italy (e) 2000 ACM 1581132395/00/003>...>$5.00
head into account. This increased ratio is due to a larger number of PE's now available for an application, a larger size of shared data involved in many recentlyemerging applications, the everincreasing processor speed, etc.
One may attempt to reduce communication overhead by optimizing "when PE's send out messages", i.e., scheduling communication with a given (spatial) mapping. However, the achievable improvement can be limited since a spatial assignment (mapping) of communicating tasks is fixed. That is, mapping would help scheduling to achieve a better result (a smaller communication overhead). Also, this approach would require more information to deal with such as timing information. Furthermore, it can be more sensitive to dynamic variation of communication.
Communication pattern among PE's in a multiprocessor system depends on where communicating tasks are assigned and shared data are allocated. Depending on the relative positions of communicating tasks, the resulting communication overhead may vary significantly. In some cases, random mapping may work just as well as any elaborated mapping. Therefore, whether one should concern about mapping at all and what should be optimized in mapping are to be addressed.
There have been significant research efforts in optimizing communication in order to minimize communication overhead including [7][8]. Various communication patterns such as multicasting, broadcasting, alltoall broadcasting, etc. were investigated. Hambrusch et. al. [9] considered "StoP" broadcasting problem on meshes, where S is the number of source PE's and P is the total number of PE's in a mesh, 1 < S < P. A set of particular source distribution patterns were examined.
In this paper, how much communication overhead can be reduced by optimizing locations of communicating tasks (task mapping) on a 2D mesh with wormhole routing depending on the characteristics of communication pattern is analyzed for general (or random) communication patterns. It is well known that communication time for a pair of PE's is almost independent of the distance between them in a wormhole routing network while it is proportional to the distance in a packet switching or message switching network [10][11]. Nevertheless, it should not be "extrapolated" to: communication overhead in a wormhole routing multiprocessor system does not depend on what PE's communicating tasks axe mapped onto. Conflict among messages is dependent on locations of communicating tasks while communication
633
I J
sending flit buffer
S
receiving flit buffer
Figure h A processing element (PE) in the system model. A flit bypasses the receiving buffer to one of the sending buffers unless the current PE is its destination.
time for an isolated message can be said to be distanceindependent. Note that such conflict directly affects communication time in general. This study shows that communication overhead on a wormhole routing multiprocessor system is not "independent" of locations of communicating tasks, and attempts to analyze how it is affected by communication pattern which in turn depends on locations of tasks. The communication characteristics to be considered are the number and distribution of sources, the number and distribution of destinations, the total number of messages, distribution of message size, etc., which are referred to as communication parameters. An extensive simulation has been performed to analyze effects of the communication parameters on the quality of a mapping result that can be achieved. Two different objective functions are considered in the optimization procedure of task mapping and are compared in their effects on mapping results.
2. MODELS
2.1 System Model
In this study, a square mesh of size N × N is employed as a system model onto which a problem graph is mapped. Each PE (except those along the boundary) in a mesh is connected to its four neighboring PE's through four fullduplex channels as shown in Figure 1. T h a t is, over each channel between two adjacent PE's, simultaneous communication in both directions is possible. For other types of channels such as single bidirectional or unidirectional channels, the general performance behaviors would be similar though they can be quantitatively different. A data transfer from a source node to a destination node is carried out by wormhole routing. Each channel between two adjacent PE's has flit buffers at both ends (one each). Given a pair of source and destination, the path to be followed is specified by the deterministic XY routing. Each PE has a message queue where messages to be sent to other PE's are temporarily stored until they are transmitted
to their destinations. The message at the top of the queue is sent first. Once the tail flit of the message being transferred leaves the queue, transmission of next message in the queue may be initiated. Therefore, multiple messages from a PE may be travelling through the network at the same time (while only one message can be transmitted from a PE at any given time). When a message which has left its source node is blocked by another before its header flit arrives at its destination, it stays in the network occupying the flit buffers along the path from the source node to the current node (at which it is blocked) until the blocking flit buffer is released. In the case that multiple messages contend for a flit buffer, the one destined for the farthest node from the current node is given the right to proceed first. Each PE has multiple, two to four, incoming channels and can receive multiple messages arriving at the same time. It is assumed that virtual channel is not implemented. Therefore, channel contention is equivalent to flit buffer contention. In this paper, the terms PE and node are interchangeably used. Whenever necessary, the terms, system node and problem node are used for clarity.
2.2 Communication Model
A problem (computation) which has been partitioned may be described by a graph to be referred to as a problem graph. In the problem graph, a node represents a task and an edge between two nodes represents communication between the corresponding tasks. In this study, it is assumed that tasks are to communicate with each other concurrently. When communication patterns are regular, in most cases, efficient (optimal) algorithms to perform communication between nodes have been developed for various interconnection topologies. Also, the best task mapping is known or task mapping is unnecessary in many cases. On the other hand, when the communication pattern is irregular, an elaborate task mapping scheme has been employed in order to reduce communication time. In order to obtain "general" results, a statistical approach is taken rather than considering a few particular problem graphs. That is, communication pattern (problemgraph) is "random", where certain characteristics, i.e., communication parameters defined below, are controlled.
Population, P, indicates communication intensity in terms of the total number of messages to be exchanged between nodes during a communication period. Each PE that has any messages to send to other PE's places them in the message queue (refer to Figure 1). All PE's with any messages in the queue attempt to transmit messages concurrently. The communication period lasts until delivery of all messages in the queue at all PE's is completed.
Source spread, S, is defined to be the number of sources, i.e., sending nodes. This parameter quantifies the spatial density of sending nodes. Note that 1 _~ S _~ N 2 since a N x N mesh is considered in this study. P ~ N 2 doesn't necessarily mean S = N 2 because a PE may have more than one message to send. In order to have more control over communication pattern, an auxiliary parameter is used, which is the maximum number of messages, M~, that can be sent by a PE.
634
. .... . / ( /
Figure 2: An e x a m p l e for communication parameters where P = 8, S = 4 ( M ~ : 3 ) , and D = 3 (Md=4). A positive number indicates the number of messages to be sent while a negative number does the number of messages to be received.
D e s t i n a t i o n s p r e a d , D, is defined to be the number of destinations, i.e., receiving nodes where 1 < D _< N 2. This parameter indicates the spatial density of receiving nodes. As in the source spread, P _> N 2 doesn't always imply D = N 2. T h e m a x i m u m number of messages that can be destined to (received by) a PE is denoted by M~.
Uniformity specifies the distribution of message size, in terms of its mean, Urn, and standard deviation, Ud. For example, Ud = 0 m e a n s that all messages have the same size.
Obviously, communication overhead depends on the population of messages, e.g., more messages usually results in a higher overhead. How much the overhead can be reduced by mapping (i.e., optimizing locations of communicating tasks) must vary with population. A larger S given P means that messages to be sent are distributed over a larger number of PE's and therefore a larger number of messages exist being routed simultaneously while a smaller S indicates that fewer PE's are to send a larger number of messages. A similar interpretation can be given to D. Therefore, it is expected that, for the same population, source and destination spreads would affect the room for optimization of locations of communicating tasks, i.e., effectiveness of task mapping. Also, the size of message is a factor to be considered especially in the case of wormhole routing, which determines how long the flit buffers at the intermediate PE's (including source and destination PE's) are occupied by a message. Though this set of parameters may not be inclusive, this should be sufficient to characterize a communication pattern for this study.
3. TASK MAPPING
In task mapping, one determines the correspondence between a set of tasks (problem graph) and a group of PE's (system graph) such t h a t a certain objective is achieved. In defining what to be achieved in task mapping, one may employ a cost or objective function. Depending on the objective function, the final mapping can be significantly different
even for the same problem and system graphs. Also, given an objective function, the quality of mapping that can be achieved would vary with optimization method employed. The objective of this study is not to develop a new mapping scheme. One of the issues to be addressed is whether a mapping, optimization in paxticulax, is needed or not given an application (problem graph) and a multiprocessor system. If it is needed or not would depend on how much we want to improve assignment of tasks (and how much room we have for optimization) over random mapping. Also, another related issue is what (objective function) is to be optimized in mapping. In the case of wormhole routing, as long as the size of a message is sufficiently large, the time required to send it from a node to another is almost distanceindependent and is mainly proportional to the message size. However, this is true only when there is no conflict among messages (network contention) such that a message is not delayed by others. When there are multiple messages being transferred in a network, a message may experience blocking. Then, the actual delivery time of a message may be significantly longer than the "ideal" one which is mainly dependent on the message size only (independent of distance). The purpose of this study is to analyze the effect of the communication parameters on the total communication time, T, that can be achieved by mapping and also to examine what type of objective function is to be employed in optimizing mapping. A more accurate objective function (quantifying more realistically what is to be optimized) would require more parameters to be considered and a more complex formulation and therefore usually takes a longer time to evaluate. That is, a proper objective function should be selected considering the acceptable accuracy and complexity.
3.1 Two Objective Functions
There axe two common components found in most objective functions used for task mapping: distance between communicating nodes (PE's) and edge conflict (congestion). Based on these two, the following two simple objective functions are employed in this study. One, to be referred to as OFd, is the maximum of weighted distances between all communicating nodes. The weighted distance is the product of the size of a message and the number of hops the message is to travel. The other, to be referred to as OF~, is the maximum of edge congestions for all system edges (channels). Edge congestion is defined as the sum of sizes of messages to be transferred over a system edge. These objective functions are general enough for wide application and can be easily evaluated for fast mapping. Let Gp and Gs be a problem graph and a system graph, respectively. Gp = {Vp, Ep} where Vp is t h e problem node set of which elements are subtasks, Vpi, i.e., Vp = {vpi} for 1 < i < N when there axe N subtasks, and Ep is the problem edge set of which elements correspond to communication between subtasks. Specifically, epij is the edge between Vpi and vpj and lepij[ denotes the c o m m u n i c a t i o n intensity (i.e., message size) between vpi and vpj that communicate with each other, for 1 _< i , j < N (i ~ j). Similarly, Gs = {V~,E~}. Vs = {yak} where yak is a system node corresponding to a P E k , and E~ = {epkl} where epm denotes a channel or link between vsk and vsl that axe adjacent to each other. Suppose t h a t Vpi and vpj are m a p p e d onto vsk(i) and vs,(j),
635
respectively, where tile notation x(y) means that vpv is mapped
onto v ~ . Let's denote the weighted distance between yak(i)
and v~t(j) by dk(i)l(j). Then, the objective function, OFd,
can be defined as follows.
OFd = maxdk(i)t(j)
(1)
Now, let Rk(1)l(j) be the set of system edges along the routing
path from v~k(O to v~t(j). Define [esmn[ to be ~,:,j lepij[ for
e~m,~ C Rk(i)l(i) and all i, j. Note that if [epi3l = 1 for all
i, j, [e~m~l denotes the n u m b e r of messages which share the link e. . . . Then, OF~ can be expressed as follows.
OF~ : m a x le~m~l
(2)
rr~ ,n
OFd represents a class of objective functions by which dis
tances between communicating tasks are minimized. On
the other hand, OF~ is a typical formulation of objective
function in which messages are to be spread over as many different channels as possible (independent of distances be
tween tasks). Note that OFd would be a reasonable choice
for a system where the communication overhead is mainly
distancedependent while OF~ is for a system where avail
ability of resources such as channels and buffers determines communication time (resourcedependent). However, in practice, unless there is no or very low contention, both types of dependency exist.
3.2 Effects of Task Mapping
Objective Function, OFf
One effect of task mapping with OFd is the decreased dis
tance between communicating nodes and therefore a shorter
propagation delay. Also, it (a smaller OFd) increases the
probability that a message is not blocked by other messages since it travels a shorter distance. That is, the network latency is usually reduced. In the case of wormhole routing, the total communication time heavily depends on the "degree" of contention (blocking) in addition to the size of message.
Objective Function, OF~
Minimizing OF~ is qualitatively equivalent to minimizing
the number of communicating nodes (source and destination) in each row and column of a mesh. That is, it tends to spread the communicating nodes uniformly over rows and columns. Decreasing the number of communicating nodes in a row would reduce the possibility of contention during routing in the X dimension. Similarly, a smaller number of communicating nodes per column lead to less contention during the Y dimension routing.
3.3 Optimization
An essential procedure in task mapping is optimization. Several different approaches were employed in the past, from a simple heuristic method to a timeconsuming global optimization technique. The emphasis of this study is not on mapping scheme itself. Therefore, a simple optimization technique is employed to emphasize practicality for both of the objective functions.
The optimization strategy employed in this study is the pairwise exchange. Starting from the random (initial) assignment, optimization is carried out through iterations. In each iteration, a pair of PE's (system nodes) on which problem nodes are to be swapped temporarily are randomly selected. If the swapping doesn't increase the (value of) objective function, it is granted. Otherwise, no change is made to the current mapping. This iterative optimization process
continues until the number of consecutive iterations (neon)
which would increase the objective function exceeds a cer
tain limit (Nlimit) or the total number of iterations (n~ot~)
reaches an allowed maximum (Nma~).
4. SIMULATION RESULTS AND DISCUSSION
4.1 Simulation
An extensive computer simulation has been carried out to analyze effects of communication patterns on task mapping. Random problem graphs axe used for this simulation. Generation of problem graphs are controlled by a given set of communication parameters. For each problem graph, the three mappings are evaluated in terms of T on a mesh with
wormhole routing. The first is a random mapping, i.e., no
task mapping effort is made. The other two are obtained by task mapping with the two objective functions described in Equations (1) and (2). Then, improvement (reduction in time) achieved by each of the two task mappings over the random assignment is considered.
Communication Parameters
The size of mesh employed is 8 x 8, nonwrappedaround. The population, P, varies from 10 up to 120. For each P, both of pS_and ~D are between 1 and 1. The size of a rues
sage, Urn, is changed from 5 up to 30 flits. Note that the
maximum distance between two PE's in a 8 x 8 mesh is 15 and therefore the m a x i m u m message size of 30 flits should be long enough to examine cases where messages occupy whole paths from sources to destinations during transmission. Also, the standard deviation of the message size is varied from 0 to 5 for U m = 20.
Simulation Results
Reduction factor, which is defined below, is used for perfor
mance analysis.
Rd (Tr Td) x 100 Tr
where Rd and Td are the reduction factor and the communication time achieved by mapping with OFd, respectively,
and Tr is the communication time achieved by the random mapping. The reduction factor indicates how much reduction in communication time one can expect by task mapping on a mesh
with wormhole routing. Note that Rd, expressed in percent
age, approaches to 100 as Td to 0. The reduction factor
for mapping with OFt is defined similarly. That is, Re =
(T,TT~,)× 100 where T~ is the communication time achieved
by mapping with OF~.
In Figures 3 and 4, the reduction factor is observed as the population, P, varies for different source (S) and destination (D) spreads. In Figures 5, 6 and 7, the effects of S and
636
D on the reduction factor achieved are mainly examined. In Figures 8 and 9, the message size is varied for different populations to analyze the message size dependency of reduction factor. In Figure 10, variation of message size is considered. In each case of the simulation, multiple problem graphs with
the same c o m m u n i c a t i o n parameters are used and the average reduction factor is reported.
4.2 Discussions
As can be seen in Figures 3, 4, 5, 6 and 7, it is clear that task mapping can have a significant effect on the communication time for concurrently communicating tasks on meshes with wormhole routing and the degree of effect varies with the communication parameters.
Population
For a relatively light traffic (a small P), one can expect a significant reduction in communication time by a simple task mapping scheme as shown in Figures 3 and 4. As P increases, however, the reduction factor tends to decrease. When only a few messages are in a system, there would not be much interaction among them. Therefore, it is easy to
improve b o t h of OFa and OFe by a large margin. Also,
time required for a message would be more dependent on distance than on edge (buffer) conflict. This is why the
reduction fmztor is larger for OFa t h a n for OF~ when P is
small as can be seen in Figures 3 and 4. As P increases, messages interact with each other more. Therefore, improving (reducing the value of) either objective function for a part of problem graph is highly likely to have a negative effect on mapping of other parts. Hence, a smaller reduction in the communication time is expected. Since more buffer contention which increases network latency in a wormhole routing system occurs, mapping with
OF~ has a b e t t e r chance to reduce the communication time more (refer to Figures 3 and 4). Note t h a t minimizing OFe
reduces buffer conflict nmre explicitly.
Source~Destination Spread
For the same population, the reduction factor depends on distribution of messages over nodes. In Figures 5, 6, and 7, it can be seen that a smaller source spread (S) results in a smaller reduction in communication time. This observation may be explained as follows. A smaller spread means that messages are concentrated on fewer nodes (PE's), i.e, the number of messages per node is larger. Consider the following two cases. In one case, a node (say node A) needs to send a message to multiple (other) nodes. Therefore, the maximum of distances (for example) between node A and other nodes is to be minimized. In the other case, there are independent multiple pairs of nodes, and nodes in each pair need to communicate each other. Then, the distance between two nodes in each individual pair should be minimized. In general, minimization in the former case is harder. Hence, when messages are not spread over many nodes, communication time is reduced less by mapping. It is also noticed that the reduction factor, R, is less sensitive to D than to S. Remember that it is assumed that each PE can transmit only one message at any given time while it may receive multiple messages arriving at different buffers simultaneously. That is, receiving capacity is higher than sending capacity. Therefore, variation in the destination
spread is less influential to the reduction factor than that in the source spread.
OF~ and OF~
It can be observed that m a p p i n g with OF~ performs better than with OFd as both of S and D decrease with a fixed P.
When both of the source and destination spreads decrease, (the same number of) messages are exchanged between a smaller number of nodes. This leads to the increased number of messages from or to a node. Therefore, in order to minimize network latency, it would be more effective to reduce channel (buffer) conflict rather than distance between source and destination. When the number of messages per node is large, the increase in network latency due to a path
conflict would be larger. Therefore, in such cases, OF~ is
the one to be employed in mapping.
Message Size
In Figures 8 and 9, message size dependency is analyzed. As the message size increases, the reduction factor one can expect to achieve decreases. A longer message occupies flit buffers along its path for a longer time. Therefore, the probability of reducing delay experienced by a message by a certain amount would be lower for longer messages. This is true especially when there is a larger number of such messages in a system as can be seen by comparing Figures 8 and 9. It is also confirmed in these figures that the larger the source spread is the larger the reduction factor becomes.
Variation of Message Size
As shown in Figure 10, variation (standard deviation, Ua)
in the message size has a very marginal effect on reduction factor in most cases. The reason for this observation is that the overall behavior is determined by the average message
size rather than its variance which m a y change instantaneous
behavior. However, a slight decrease in reduction factor can be observed as Ua increases.
5. SUMMARY
The simulation results may be summarized as follows:
• In a wormhole routing multiprocessor system, mapping (spatial assignment of communicating tasks) has a significant effect on communication overhead.
• Unless population is too high, a significant reduction in communication time is achievable by a simple task mapping.
• A higher percentage of reduction is obtained for a smaller population.
• A smaller spread tends to result in a less reduction.
• Minimizing channel (buffer) conflict is more effective than minimizing distance when population is relatively high or when the spread is smaller.
It is observed that the simulation results reported in this paper do not always match with the above summary. This observation is believed to be largely due to the local search algorithm employed in optimization step of mapping. While the algorithm is simple to implement and fast to execute, it
637
IOO
90
0" Rd
80
4 Re
I~
90 80
~ ~o
Rd +Re
Rd
~0
4" Re
g 50
30 20
50
"~ 30'
"~'~
20
~
g ~°
10
30
120
Population (P)
(a)
0
10
30
120
Population (P)
(b)
0
Io
30
120
Population (P)
(c)
Figure 3: Reduction factor as a function of population when N = 8, D = 16, Um = 10 and Ud = O. (a) S = 16 (b) S = 8 (c) S=2.
100
90
Rd
80
+ Re
90 Rd
80
+ Re
70
60
~
40
"~ 30
g .50
".'~ 40
~
"~ 30
20
20
IO I
,
0
'
1o
30
120
Population(P)
(a)
10
o
1o
30
12o
Population(P)
(b)
9O Rd
~0
+ Re
7O
60
50
40
30
20
io
30
12o
Population(P)
(c)
Figure 4: Reduction factor as a function of population when N = 8, D = 2, U m = 10 and Ud = O. (a) S = 16 (b) S = 8 (c) S2.
must have found local (rather than global) optimum solutions (mapping) in some cases. Also, the unavoidable imperfectness of simulation is a minor factor contributing to this deviation.
6. REFERENCES
[1] H. S. Stone. "Multiprocessor scheduling with the aid of network flow algorithms", IEEE Transactions on Software Engineering, SE3(1):8593, January 1977.
[2] S. H. Bokhari, "On the Mapping Problem", IEEE Transactions on Computers, C30(3), pp207214, March 1981.
[3] S. W. Bollinger and S. F. Midkiff. "Heuristic technique for processor and link assignment in multicomputers", IEEE Transactions on Computers, 40(3):325333, March 1991.
[4] G. Cybenko. "Dynamic load balancing for distributed memory multiprocessors", Journal of Parallel and Dis'tribUted Computing, 7:279301, 1989.
[5] S.Y.Lee and J.K.Aggarwal, "A Mapping Strategy for Parallel Processing", IEEE Transactions on Computers, C36, no. 4, pp433442, April 1987.
[6] J.C. Jacob and S.Y. Lee, "Task Spreading and Shrinking on a Network of Workstations with Various Edge Classes", Proceedings of IEEE International Conference on Parallel Processing, vol III, pp174181, August 1996.
[7] S. E. Hambrusch, F. Hameed, and A. A. Khokhar. "Communication Operations on CoarseGrained Mesh Architectures", Parallel Computing, vol. 21, pages 731751, 1995.
[8] S. Ranka, R. Shankar and K. Alsabti. "ManytoMany Communication with Bounded Traffic", Proe. off Syrup. on the Frontiers of Massively Parallel Computation, 1995.
[9] S. E. Hambrusch, A. A. Khokhar and Y. Liu. "Scalable StoP Broadcasting on MessagePassing MPPs", Proe. of International Conference on Parallel Processing, vol. I, pages 6976, 1996.
[10] W. Dally and C. Seitz, "The Torus Routing Chip", J. Distributed Computing, vol.1, No.3, pp187196, 1986.
[11] L. Ni and P. McKinley, "A Survey of Wormhole Routing Techniques in Direct Networks", IEEE Computer, pp6276, vol.26, No.2, February 1993.
638
I00
90
4~ Rd
80
.q Re
70
50
4O
3O
20
10
0
I (16)
2 (8)
4 (4)
8 (2)
16 ( l )
Source Spread(Ms)
(a)
I{1O
9O Rd
8O
Re
g 50
4O
3O
20
10
0
i
q
*

I (16)
2 (B)
4 (4)
8 (2)
16 (1)
Source Spread (Ms)
(b)
100
90 Rd
80
q Re
70
60
50
40
30
20
l0
I (16)
2 (8)
4 14)
8 (2)
16(I)
Source Spread (Ms)
(c)
Figure 5: Reduction factor as a function of source spread when N = 8, P = 10, Um= parentheses is Ms. (a) D = 16, M~ = 1 (b) D = 4, M~ = 4 (e) D = 1, M, = 16.
10 a n d Ud = O. T h e n u m b e r in
I00 9O 80 70 60
g 50
Rd ~ Re
30
2O 10
0 1(32)
2(16)
. 4181
.
.
g(4)
. 16(21
Source Spread (Ms)
(a)
32(11
16O 9O 80 70 6O
g 50
O Rd + Re
31
2
I0
0 I(32) 2(16) 4(8) 8(4) 16(2) 32(I)
Source Spread (Ms)
(b)
I00 9O 80 7O 60
g
4O
~ Rd ~r Re
I(32)
2116) 4(8)
B(4)
16(2)
Source Spread (Ms)
(c)
3211)
Figure 6: Reduction factor as a function of source spread when N = 8, P = 30, Um= parentheses is Me. (a) D = 32, ~[~ = 1 (b) D = 8, Mr = 4 (c) D = 1, M~. = 32.
10 a n d Ud = 0. T h e n u m b e r in
I00
9o ~ Rd
XO
t Re
~ 6o
~g 5o
"~ 4O
20 IO
2(64) 4(32) 8(16) 16(8) 32(4) 64(2)
Source Spread (Ms)
(a)
100 90 80 70 6O Lt~
g 5O .g 40
4k Rd + Re
2O I0
~(b41 4(32) 8(16) 16(8) 32(4) 64(2)
Source Spread (Ms)
(b)
16O
90
80
Re
70
60
50
40
30
20
10
0 2 (('A) 4(32)
8(16)
16(8) 32(4)
64 (2)
Source Spread (Ms)
(c)
Figure 7: Reduction factor as a function of source spread when N = 8, P = 120, Um= parentheses is Ms. (a) 19 = 64, M~ = 2 (b) D = 16, Mr = 8 (c) D = 2, Mr = 64.
10 a n d Ud = 0. T h e n u m b e r in
639
Ioo
100
90 O Rd
80
+ Re
9O
Rd
80
+ Re
90 Rd
80
4 Re
70 60
g 50 . f 4O. I
g 30
~_______+___.~
7O
~ (60
g ~
50
"~ 40
"~ 30
70
6O L~
g 50
40
g 30 !
2O
10
0
IO
20
40
60
Message Size
(a)
20 10
I0
20
40
(60
Message Size
(b)
2o i
o "~~'e

÷
•
1o
20
40
60
Message Size
(c)
Figure 8: Reduction factor as a function of message size when N = 8, P = 10, Ud = 0, D = 16 and Mr = 1. (a) S = 16, M8 = 1 (b) S=4, M8 =4 (c) S= 1, M~ = 16.
ICO
90
Rd
80
+ Re
70
60 L~ 50
40
g 30
20 ~+______.
10
0
Message Size
(a)
1oo
IOO
90
80
,~ ~o
~ (60
Rd + Re
9o
80
~ ~o ~. 6o
Rd + Rc
~g 50
"~ (60
'~ 3o
20 1
~
50
'.~ 40
"~ 30
20
I0
10
20
4o
(60
0 ' "' ~
,
IO
20
40
~o
Message Size
(b)
Message Size
(c)
Figure 9: Reduction factor as a function of message size when N = 8, P = 30, Ud = 0, D = 32 and Mr = 1. (a) S = 32, Ms = 1 (b) S=4, Ms =8 (c) S= 1, Ms =32.
Ioo
90
0 Rd
80
tRe
~ ~o
N ~~ 5o
m I
0
I
2
3
5
StandardDeviation of Message Size
(a)
100
90
•4 b Rd
80
b Re
70
~~ 60
Lg~ 50
40
30"
20
10'
0
1
2
3
5
Standard Deviation of Message Size
(b)
1oo 90 80
~ 7o
Rd 4" R¢
~ 5o
~ 3o
2O
I
2
3
Standard Deviation of Message Size
(c)
Figure 10: Reduction factor as a function of variation of message size when N = 8, P = 10, Um= 20, D = 16 and Mr = 2. (a) S=32, Ms = 1 (b) S=4, Ms =8(c) S=2, Ms = 16.
640
Categories
You my also like
Concept Mapping, Mind Mapping and Argument Mapping
369.5 KB42.3K21.1KCompiler Error Messages Considered Unhelpful: The Landscape of
2.1 MB27.4K9.6KBlackBerry® Wireless Handheld
1.2 MB4.2K1.5KMessage Reference, Volume 2
1.8 MB8.9K4.2KGenerating LR Syntax Error Messages from Examples
100.7 KB2K531Voicemail Menu Guide
134.9 KB66.5K23.9KMJIC Message Switch Remote Server Interface Control Document
618.6 KB29.2K9.6KDB2 Version 9 Linux, UNIX, Windows
5.5 MB39.3K5.9KCalifornia State University, Northridge A Visual Hospital System
2.5 MB20.4K9K