'SPLASH-2'에 해당되는 글 3건

6월 18일 발표논문

Lab.work 2009. 7. 1. 21:14
An OS-Based Alternative to Full Hardware Coherence on Tiled CMPs (HPCA 2008) 
PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites (by 신민정) 

'Lab.work' 카테고리의 다른 글

PROGRESS REPORT  (0) 2009.07.21
PROGRESS REPORT  (0) 2009.07.07
6월 29일 월요일  (0) 2009.06.29
PROGRESS REPORT (6/29)  (0) 2009.06.29
PROGRESS REPORT (6/23)  (0) 2009.06.23
블로그 이미지

민둥

,

PARSEC vs. SPLASH-2

Architecture 2009. 6. 16. 16:24
PARSEC vs. SPLASH2: A Quantitative Comparison of Two Multithreaded Benchmark Suites on Chip Multiprocessors
Princeton University Technical Report TR-818-08, March 2008

ABSTRACT
우리는 SPLASH-2와 PARSEC benchmark suite 각각의 다른 점과 비슷한 점을 알아본다.
CMP에서의 redundancy와 overlap을 analyze하기 위해 standard statistical method와 machine learning을 사용한다.

1. INTRODUCTION
PARSEC: Princeton Application Repository for Shared-MEmory Computers
Intel과 Princeton University의 joint venture의 결과, CMP에서의 최신 workload들의 collection.

PARSEC은 다른 benchmark들과 어떻게 다른가?
SPLASH-2와 SPEC OMP2001도 여러 domain을 다루지만 High-Performance Computing에 초점.
BioParallel은 bioinformation programs
ALPBench는 multimedia workload를 위한 suite
Minebench는 data mining

SPLASH-2는 현재 가장 많이 쓰이고 있는 suite for scientific studies (of parallel machines with shared memory), 
PARSEC과 비슷하게 하나의 특정 domain에 제한되어있지 않음. 
그러나 PARSEC은 SPLASH-2에 비해서 최신 program들과 넓은 범위의 application domain을 제공

이 논문에서는
- SPLASH-2와 PARSEC을 비교: 얼마나 많은 program이 겹치는가
- 두 suite가 얼마나 닮았는지 식별
- 현재의 technology trend가 program들을 바꾸고 있는지: CMP의 확산과 world data의 massive growth관점에서.

2. OVERVIEW
SPLASH-2가 가장 많이 쓰이는 multithreaded workload중에 하나이긴 하지만
SPLASH-2는 parallel machine들이 아직 비싸고 흔하지 않았던 90년대에 나왔기 때문에
majority of workloads는 High-Performance Computing domain에 대부분 국한되어있음

PARSEC은 2008년에 나왔고, 다음과 같은 5개의 특징을 따른다.
- Multithreaded Application: multiprocessor computers with shared memory의 장점을 누리기 위해 parallelized
- Emerging Workloads: 많은 processing power를 필요로 하는 새로운 application들에 초점
- Diverse: 넓은 범위의 application domain들을 다룸
- Employ State-of-Art Techniques: 각각의 필드에서 가장 최근의 algorithm과 technique를 포함.
- Support Research: 계측과 조작을 허용하는 infrastructure를 제공하여서 research support

PARSEC은 현재 computing problem을 반영하는 input set를 포함한다.
SPLASH-2는 그 오래된 나이 때문에 더 이상 현재의 problem size를 반영하지 못한다.

3. METHODOLOGY
A set of interesting characteristics
Execution-driven simulation to obtain the relevant data
Standard statistical method to compute the similarity of the workloads

3.1 Program Characteristics
CMP에서 thread communication과 data가 어떻게 shared되는지를 반영하는 characteristic을 선택
첫 번째 4개의 특징은 어떤 program인지를 알려준다. 아래의 5개의 특징들은 total/shared working set, program이 shared data를 얼마나 집약적으로 잘 사용하는지 등등의 data usage와 communication등을 반영한다.

cache usage에 관련된 특성들은 cache size에 따라서 변할 수 있다. 우리는 1MB~128MB의 8개의 cache size로 제한한다.
따라서 전체 54개의 characteristics for each of the 26 workloads. (14 from SPLASH-2, 12 from PARSEC)
- Instruction Mix: 4 characteristics
- Working Sets: 8 characteristics (1 x 8 cache sizes)
- Sharing: 42 characteristics (왜?)

3.2 Experimental Setup
Simulate abstract cache hierarchy with CMP$im
Preprocess chosen characteristics with Principal Component Analysis (PCA) to eliminate correlation
Compute similarity with hierarchical clustering
Visualize results with dendrograms and scatter plots

3.3 Removing Correlated Data
PCA(Principal Component Analysis)를 사용하서 correlated information을 제거할 필요가 있다.
PCA는 redundancy analysis에 주로 사용되는 방법.
PC: linear combinations of the original variables

3.4 Measuring Similarity
program의 similarity를 측정하기 위해서 Euclidean distance를 사용.

4. REDUNDANCY ANALYSIS RESULTS
- total variance로 부터 diversity 측정
SPLASH-2: 19.55, PARSEC: 18.98 거의 비슷

- direct comparison
single PCA (모든 특징들의 weight를 동등하게 주어서) 를 이용하여 analysis.
PARSEC이 SPLASH-2보다 훨씬 다양하다.
SPLASH-2의 많은 program들은 redundancy가 심하다. (ex, two version of lu and water) ocean code만 눈에띄게 차이를 보인다.
non-contig ocean을 제외하면 대부분 비슷비슷하다.
SPLASH-2에서 7개의 workloads가 d=~0.42범위내에 있음. 
위쪽에 있는 workload들은 다른 cluster와 0.72정도의 distance가 있고, 이는 따라서 program collection 안에서 unique하다고 볼 수 있다.
PARSEC에서 bodytrack과 vips만 SPLASH-2와 유사하다.

4.1 Multiple Differences
Instruction Mix Differences
Working Set Differences
Sharing Behavior Differences
= No single source for the differences of the two suites.





'Architecture' 카테고리의 다른 글

Virtual-Channel Flow Control  (0) 2009.07.08
Interconnection Network Topologies  (0) 2009.07.01
The PARSEC Benchmark Suite  (0) 2009.06.05
The SPLASH-2 Programs  (0) 2009.06.05
CMP vs. SMP  (0) 2009.05.26
블로그 이미지

민둥

,
The SPLASH-2 Programs: Characterization and Methodological Considerations (1995)

SPLASH-2 (vs. SPLASH)

Represent a wider range of computations in the scientific, engineering and graphics domains.
Use better algorithms and implementations.
Are more architecturally aware.

■ Characteristics and Approach

□ Axes of Characterization

Concurrency and load balancing: How many processors can be effectively utilized by that program, assuming a perfect memory system and communication architecture.
Working set: Program’s temporal locality
Communication to computation ratio: Potential impact of communication latency on performance
Spatial locality: Spatial locality and false sharing in the programs

□ Approach to Characterization

Experimental environment
Execution-driven simulation. Simulate a cache-coherent shared address space multiprocessor with physically distributed memory and one processor per node. Each processor has a single-level cache, using a directory-based protocol.
All memory references complete in a single cycle (regardless of hits or misses)
Data are distributed among the processing nodes according to the guidelines.
Data Sets and Scaling
The data sets are small enough to simulate in a reasonable time, yew large enough to be of interest in their problem domain in practice. We fix the number of processors at 32 for most of our characterization.
Inherent versus Practical Characteristics
Focus on these realistic memory system parameters while still trying to approach inherent properties and avoid too many artifacts.

■ The SPLASH-2 Application Suite

It has 8 complete applications and 4 kernels

Barnes Simulates the interaction of a system of bodies in three dimensions over a number of time-steps, using the Barnes-Hut hierarchical N-body method.
Cholesky Factors a sparse matrix into the product of a lower triangular matrix and its transpose.
FFT FFT kernel is a comoplex 1-D version of the radix root n six-step FFT algorithm
FMM Similates a system of bodies over a number of timesteps. Interactions in two dimensions using a different hierarchical N-body method called the adaptive Fast Multipole Method.
LU Factors a dense matrix into the product of a lower triangular and an upper triangular matrix.
Ocean Studies large-scale ocean movements based on eddy and boundary currents.
Radiosity Computes the equilibrium distribution of light in a scene using the iterative hierarchical diffuse radiosity method.
Radix Integer radix sort kernel
Raytrace Renders a three-dimensional scene using ray tracing.
Volrend Renders a three-dimensional volume using a ray casting technique.
Water-Nsquared Evaluates forces and potentials that occur over time in a system of water molecules.
Water-Spatial Solves the same problem as Water-Nsquared, but uses a more efficient algorithm.

■ Concurrency and Load Balance

Concurrency and load balance: how they change with problem size and number of processors
Study how the computational load balance scales with the number of processors by measuring speedups on a PRAM architectural model.

Figure 1: the PRAM speedups for the SPLASH-2 programs for up to 64 processors
Figure 2: the time spent waiting at synchronization points for 32-processor executions of each application.

The reasons for sub-linear speedups: the sizes of the input data sets.
(load imbalance, not-completely parallelized prefix computation, …)

■ Working Sets and Temporal Locality

......

'Architecture' 카테고리의 다른 글

Interconnection Network Topologies  (0) 2009.07.01
PARSEC vs. SPLASH-2  (0) 2009.06.16
The PARSEC Benchmark Suite  (0) 2009.06.05
CMP vs. SMP  (0) 2009.05.26
Evaluating MapReduce for Multi-core and Multiprocessor Systems  (0) 2009.05.25
블로그 이미지

민둥

,