HOME


DSP         Embedded Systems         GNU Tutorials         SW Development        


Introduction

Parallel Processing

Processor Architecture

Memory

Peripherals

Booting

System Design

System Development

System Debugging

Memory Devices

Real World Examples

Previous Page                                 Index                                 Next Page


PARALLEL PROCESSING

Computer Programs and Parallelism:
A computer program consists of a sequence of instructions. Each of these instructions performs some operations on certain data operands. So, computer program can be considered as a sequence of operations on certain data operands. A computer program is said to exhibit parallelism, if these operations (to be performed on data operands by the given set of instructions) can be performed in any order. If the operations could be performed out of order, and multiple processing elements (or ALUs) are available, then it would be possible to perform these operations in parallel (since the order of execution does not matter) at the same time. That is where the world “parallelism” coins from. The processors which provide multiple ALUs on a single chip (to exploit the program parallelism) are called parallel processors (and the process is known as parallel processing) . There could be different kind of parallelism in a program.

Instruction Level Parallelism
Data Level Parallelism
Thread Level Parallelism

Instruction Level Parallelism:

Consider the following instruction sequence

data1 = data2+data3;
data4 = data2+data5;
data6 = data3;

Changing the order of above instruction does not effect the final result (values of data1, data4 and data6 will remain same). Since the instructions can be executed out of order, we can execute them on three different ALUs (provided that the processor on which you are executing your instructions has multiple ALUs available) in parallel. This is called Instruction Level Parallelism. There are no dependencies in the above instruction. However certain dependencies amongst the instructions can exist. In such a scenario, the instructions depend on each other and can not be executed in parallel (certain dependencies can still be resolved using some techniques). Consider the following sequence of instructions:

data1 = data2+data3;
data2 = data4;
data5 = data2+data1;
Changing the order of above instructions, will change the program results (final values of data1, data2 and data5).

There can be three kind of dependencies:
True Dependencies (or Read after Write)
Anti Dependencies (or Write After Read)
Output Dependencies (or Write after Write)

        True Dependencies: (Read after Write)
data1 = data2+data3;
data4 = data1 + data5;

Here the content of data1 are being read after write to data1. This is a true dependency and can not be resolved.

        Anti-Dependencies: (Write after Read)
data1 = data2 + data3;
data2 = data4;

Here first instruction depends on the second instruction. If the second instruction is executed before the first, result of first instruction will change (because the value of data2 will be different). Since first instruction here depends on the result of a future instruction, it is said to anti-depend on the future instruction. The anti-dependencies can be resolved using “renaming”. As shown below, renaming the conflicting variable (data2 in this case), resolves this dependency.

data1 = data2 + data3;
new.data2 = data4;

        Output-Dependencies: (Write after Write)
Exists when ordering of instructions will affect the final output value of a variable.

data1 = data2 + data3;
data4 = data1/data7;
data1 = data5+data6;;

Here the re-ordering of instruction-1 and instruction-3 will affect the final value of output variable data4 (instruction-2). Similar to anti-dependencies, the output-dependencies can also be resolved using renaming.
data1.new = data2 + data3;
data4 = data1.new/data7;
data1 = data5+data6;;

Data Level Parallelism

A Data Processing application is said to exhibit “Data Level Parallelism”, if different data blocks can be executed out of order. One such example is “Image Processing” applications. In Image processing applications, entire image is divided in to smaller macro-blocks and each macro-block can be processed (generally mathematical transforms) individually. Since each data blocks has to undergo the same operations (same processing), such application can be efficiently run on SIMD (Single Instruction Multiple Data) Architectures, as we will discuss in next section.

Thread level parallelism
Sometimes (data intensive applications), the performance of application is limited by the I/O bandwidth (the rate at which data can flow in and out of the system) rather than the processing power of the system. In such systems, dividing the complete application in to multiple independent (independent at least to some extent) threads, can improve the overall performance. When one task has been blocked due to I/O dependency, the other task can be executed (and first task can take over the CPU resources when it is ready again). In modern days, some companies provide multiple-core CPUs (this is different from a CPU having multiple ALUs). A multi-threaded application can exploit the multi-core design efficiently, since each thread can run on a different core.


Previous Page                                 Index                                 Next Page






HOME

Copyright : Kunal Singh

Content of this site shall not be reused without my written permission

This page is XHTML Certified