Chapter 1 (continued)
I. Introduction, motivation, terminology
- An informal definition of distributed computing:
multiple separate machines coordinated to
"service requests"; machines typically have
clearly distinct roles. Examples: client-server; peers in a P2P system.
A "request" may be itself a parallelizable computing task
in the traditional sense.
- Comparing parallel and distributed computing --- some characteristics and
concerns.
- For parallel computing
- Homogeneous problems (typically one parallelized algorithm,
or a composition of parallelized algorithms that solve a single problem,
e.g. a system of linear equations, a sorting/search problem, etc.)
- Homogeneous machines: same processor model, all
processors considered equal (some pragmatic
exceptions discussed in class).
- Scalable problem sizes: Small problem instances solved on
single CPU. Large problem instances easy to generate,
and solve on multiple CPUs.
- Parallelism may be needed for performance. Example:
real-time weather forecasting, we need the result by
the end of the day or sooner, depending on situation.
- Parallelism may be needed for additional memory or
disk space, i.e. problem may not "fit" in one machine.
In this case we are interested in scaled speedup
from parallelism, i.e. solve a p times larger problem
on p processors in the time it takes to solve the
original problem on one processor.
- For distributed computing
- Inherently (physically) distributed machines. Network
is usually the Internet.
- Involves multiple heterogeneous computers (PCs, servers,
*berrys, mobile phones, etc.).
- Asynchronous, event-driven processing: Client posts request - triggers response - server responds.
- Fault-tolerance and security are critical, especially
within the boundaries of a transaction.
- May involve specialized devices. Example: machine collecting temperature sensor data.
- Software engineering, interoperability, safety,
reliability, portability, all equally/more important than performance.
- For both parallel and distributed computing
- Communication performance is critical
- Concurrency, synchronization
- Shared resources, contention.
- Data and problem decomposition.
- High-level programming models appealing.
- Hard to debug and test. Hard to replay "programs". Why?
Global state.
- Motivation for parallel computing
- Solve a given problem faster, e.g., to meet some real-time
constraint or so that you can now solve many instances of that problem;
or
- Solve a bigger problem in roughly the same amount of time.
Computational science and engineering (CSE) is a leading
source of these kinds of "scalable" problems.
-
CSE is becoming an important
complement to the traditional
approaches to doing natural
science and engineering, namely theory and experiment.
- A famous set of CSE problems is the so-called
Grand
Challenge problems.
- CSE applications are characterized by:
- legacy codes (they've been around a long time!)
- floating point
- modest polynomial time complexity, i.e.,
O(n^k) for some small value of k.
- accuracy grows with problem size
- problem size can grow very large.
- Examples of a big CSE (grand challenge) problems:
- Fluid dynamics
- Environmental modeling, weather prediction
- Molecular biology
- Material design
- Astrophysics
- There are many other sources of motivating problems
for parallel computing, e.g., information retrieval
(big databases, data mining), cryptography,
signal processing, games, ... Opportunities arise at
different time scales and problem domains (e.g. real-time)
- Where is parallelism to be found?
- Simple real-life example: laundry.
- Algebra examples matrix-vector operations.
- Algorithmic example: sorting, searching.
- How is parallelism formulated?
- Dependence graph.
- task parallelism.
- data parallelism.
- pipelining.
- hierarchical organizations of parallelism.
- How is parallelism expressed in a program?
- Explicitly: define tasks, work decomposition, data decomposition,
communication, synchronization.
- Implicitly: define tasks only, rest implied; or define tasks and
work decomposition rest implied;
- MPI is a library for fully explicit parallelization.
Other high-level parallel programming models/languages are
somewhere between fully implicit and explicit. We will
consider later OpenMP, which is a mostly implicit model.
- Not expressed at all: bury it all in the compiler.
- Expression of parallelism creates very interesting discussions
on the trade-off between productivity and performance in parallel
programming.
- Ideally, the compiler can automatically parallelize the program.
Is this easy?
- If parallelism is expressed implicitly, who controls the
placement of tasks and data? Is this placement optimal for
the algorithm?
- If parallelism is expressed explicitly, how much effort does
the programmer need to distribute and communicate data?
- What is the right model for each architecture? Cluster? SMP?
CS 4234,
Dimitris Nikolopoulos,
latest update: