'PARSEC'에 해당되는 글 3건

6월 18일 발표논문

Lab.work 2009. 7. 1. 21:14
An OS-Based Alternative to Full Hardware Coherence on Tiled CMPs (HPCA 2008) 
PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites (by 신민정) 

'Lab.work' 카테고리의 다른 글

PROGRESS REPORT  (0) 2009.07.21
PROGRESS REPORT  (0) 2009.07.07
6월 29일 월요일  (0) 2009.06.29
PROGRESS REPORT (6/29)  (0) 2009.06.29
PROGRESS REPORT (6/23)  (0) 2009.06.23
블로그 이미지

민둥

,

PARSEC vs. SPLASH-2

Architecture 2009. 6. 16. 16:24
PARSEC vs. SPLASH2: A Quantitative Comparison of Two Multithreaded Benchmark Suites on Chip Multiprocessors
Princeton University Technical Report TR-818-08, March 2008

ABSTRACT
우리는 SPLASH-2와 PARSEC benchmark suite 각각의 다른 점과 비슷한 점을 알아본다.
CMP에서의 redundancy와 overlap을 analyze하기 위해 standard statistical method와 machine learning을 사용한다.

1. INTRODUCTION
PARSEC: Princeton Application Repository for Shared-MEmory Computers
Intel과 Princeton University의 joint venture의 결과, CMP에서의 최신 workload들의 collection.

PARSEC은 다른 benchmark들과 어떻게 다른가?
SPLASH-2와 SPEC OMP2001도 여러 domain을 다루지만 High-Performance Computing에 초점.
BioParallel은 bioinformation programs
ALPBench는 multimedia workload를 위한 suite
Minebench는 data mining

SPLASH-2는 현재 가장 많이 쓰이고 있는 suite for scientific studies (of parallel machines with shared memory), 
PARSEC과 비슷하게 하나의 특정 domain에 제한되어있지 않음. 
그러나 PARSEC은 SPLASH-2에 비해서 최신 program들과 넓은 범위의 application domain을 제공

이 논문에서는
- SPLASH-2와 PARSEC을 비교: 얼마나 많은 program이 겹치는가
- 두 suite가 얼마나 닮았는지 식별
- 현재의 technology trend가 program들을 바꾸고 있는지: CMP의 확산과 world data의 massive growth관점에서.

2. OVERVIEW
SPLASH-2가 가장 많이 쓰이는 multithreaded workload중에 하나이긴 하지만
SPLASH-2는 parallel machine들이 아직 비싸고 흔하지 않았던 90년대에 나왔기 때문에
majority of workloads는 High-Performance Computing domain에 대부분 국한되어있음

PARSEC은 2008년에 나왔고, 다음과 같은 5개의 특징을 따른다.
- Multithreaded Application: multiprocessor computers with shared memory의 장점을 누리기 위해 parallelized
- Emerging Workloads: 많은 processing power를 필요로 하는 새로운 application들에 초점
- Diverse: 넓은 범위의 application domain들을 다룸
- Employ State-of-Art Techniques: 각각의 필드에서 가장 최근의 algorithm과 technique를 포함.
- Support Research: 계측과 조작을 허용하는 infrastructure를 제공하여서 research support

PARSEC은 현재 computing problem을 반영하는 input set를 포함한다.
SPLASH-2는 그 오래된 나이 때문에 더 이상 현재의 problem size를 반영하지 못한다.

3. METHODOLOGY
A set of interesting characteristics
Execution-driven simulation to obtain the relevant data
Standard statistical method to compute the similarity of the workloads

3.1 Program Characteristics
CMP에서 thread communication과 data가 어떻게 shared되는지를 반영하는 characteristic을 선택
첫 번째 4개의 특징은 어떤 program인지를 알려준다. 아래의 5개의 특징들은 total/shared working set, program이 shared data를 얼마나 집약적으로 잘 사용하는지 등등의 data usage와 communication등을 반영한다.

cache usage에 관련된 특성들은 cache size에 따라서 변할 수 있다. 우리는 1MB~128MB의 8개의 cache size로 제한한다.
따라서 전체 54개의 characteristics for each of the 26 workloads. (14 from SPLASH-2, 12 from PARSEC)
- Instruction Mix: 4 characteristics
- Working Sets: 8 characteristics (1 x 8 cache sizes)
- Sharing: 42 characteristics (왜?)

3.2 Experimental Setup
Simulate abstract cache hierarchy with CMP$im
Preprocess chosen characteristics with Principal Component Analysis (PCA) to eliminate correlation
Compute similarity with hierarchical clustering
Visualize results with dendrograms and scatter plots

3.3 Removing Correlated Data
PCA(Principal Component Analysis)를 사용하서 correlated information을 제거할 필요가 있다.
PCA는 redundancy analysis에 주로 사용되는 방법.
PC: linear combinations of the original variables

3.4 Measuring Similarity
program의 similarity를 측정하기 위해서 Euclidean distance를 사용.

4. REDUNDANCY ANALYSIS RESULTS
- total variance로 부터 diversity 측정
SPLASH-2: 19.55, PARSEC: 18.98 거의 비슷

- direct comparison
single PCA (모든 특징들의 weight를 동등하게 주어서) 를 이용하여 analysis.
PARSEC이 SPLASH-2보다 훨씬 다양하다.
SPLASH-2의 많은 program들은 redundancy가 심하다. (ex, two version of lu and water) ocean code만 눈에띄게 차이를 보인다.
non-contig ocean을 제외하면 대부분 비슷비슷하다.
SPLASH-2에서 7개의 workloads가 d=~0.42범위내에 있음. 
위쪽에 있는 workload들은 다른 cluster와 0.72정도의 distance가 있고, 이는 따라서 program collection 안에서 unique하다고 볼 수 있다.
PARSEC에서 bodytrack과 vips만 SPLASH-2와 유사하다.

4.1 Multiple Differences
Instruction Mix Differences
Working Set Differences
Sharing Behavior Differences
= No single source for the differences of the two suites.





'Architecture' 카테고리의 다른 글

Virtual-Channel Flow Control  (0) 2009.07.08
Interconnection Network Topologies  (0) 2009.07.01
The PARSEC Benchmark Suite  (0) 2009.06.05
The SPLASH-2 Programs  (0) 2009.06.05
CMP vs. SMP  (0) 2009.05.26
블로그 이미지

민둥

,
The PARSEC Benchmark Suite: Characterization and Architectural Implications (2008)

PARSEC Wiki
http://wiki.cs.princeton.edu/index.php/PARSEC

A benchmark suite for studies of CMPs (Chip-Multiprocessors)
Diverse in working set, locality, data sharing, synchronization, and off-chip traffic.

existing benchmark suites cannot be considered adequate to describe future CMP applications.

■ Motivation

□ Requirements for a Benchmarks Suite

Multi-threaded Applications
Emerging Workloads
Diverse
Employ State-of-Art Techniques
Support Research

□ Limitations of Existing Benchmark Suites

SPLASH-2
Program collection is skewed towards HPC and graphics programs.
Does not include parallelization models such as the pipeline model.
SPEC CPU2006 and OMP2001
Provide a snapshot of current scientific and engineering applications
Workloads such as systems programs and parallelization models which employ the producer-consumer model are not included.
SPEC CPU2006 is a suite of serial programs.
Other Benchmark Suites
Designed to study a specific program area and limited to a single application domain.

■ The PARSEC Benchmark Suite

9 applications and 3 kernels (+PARSEC 2.0 includes RayTrace)

□ Input Sets
test, simdev, simsmall, simmedium, simlarge, native

□ Workloads

Blackscholes (Financial Analysis)
This application is an Intel RMS benchmark. It calculates the prices for a portfolio of European options analytically with the Black-Scholes partial differential equation (PDE). There is no closed-form expression for the Black-Scholes equation and as such it must be computed numerically.
Bodytrack (Computer Vision)
This computer vision application is an Intel RMS workload which tracks a human body with multiple cameras through an image sequence. This benchmark was included due to the increasing significance of computer vision algorithms in areas such as video surveillance, character animation and computer interfaces.
Canneal (Engineering)
This kernel was developed by Princeton University. It uses cache-aware simulated annealing (SA) to minimize the routing cost of a chip design. Canneal uses fine-grained parallelism with a lock-free algorithm and a very aggressive synchronization strategy that is based on data race recovery instead of avoidance.
Dedup (Enterprise Storage)
This kernel was developed by Princeton University. It compresses a data stream with a combination of global and local compression that is called 'deduplication'. The kernel uses a pipelined programming model to mimic real-world implementations. The reason for the inclusion of this kernel is that deduplication has become a mainstream method for new-generation backup storage systems.
Facesim (Animation)
This Intel RMS application was originally developed by Stanford University. It computes a visually realistic animation of the modeled face by simulating the underlying physics. The workload was included in the benchmark suite because an increasing number of animations employ physical simulation to create more realistic effects.
Ferret (Similarity Search)
This application is based on the Ferret toolkit which is used for content-based similarity search. It was developed by Princeton University. The reason for the inclusion in the benchmark suite is that it represents emerging next-generation search engines for non-text document data types. In the benchmark, we have configured the Ferret toolkit for image similarity search. Ferret is parallelized using the pipeline model.
Fluidanimate (Animation)
This Intel RMS application uses an extension of the Smoothed Particle Hydrodynamics (SPH) method to simulate an incompressible fluid for interactive animation purposes. It was included in the PARSEC benchmark suite because of the increasing significance of physics simulations for animations.
Freqmine (Data Mining)
This application employs an array-based version of the FP-growth (Frequent Pattern-growth) method for Frequent Itemset Mining (FIMI). It is an Intel RMS benchmark which was originally developed by Concordia University. Freqmine was included in the PARSEC benchmark suite because of the increasing use of data mining techniques.
Raytrace (PARSEC 2.0 추가됨, Graphics?)
The Intel RMS application uses a version of the raytracing method that would typically be employed for real-time animations such as computer games. It is optimized for speed rather than realism. The computational complexity of the algorithm depends on the resolution of the output image and the scene.
Streamcluster (Data Mining)
This RMS kernel was developed by Princeton University and solves the online clustering problem. Streamcluster was included in the PARSEC benchmark suite because of the importance of data mining algorithms and the prevalence of problems with streaming characteristics.
Swaptions (Financial Analysis)
The application is an Intel RMS workload which uses the Heath-Jarrow-Morton (HJM) framework to price a portfolio of swaptions. Swaptions employs Monte Carlo (MC) simulation to compute the prices.
Vips (Media Processing)
This application is based on the VASARI Image Processing System (VIPS) which was originally developed through several projects funded by European Union (EU) grants. The benchmark version is derived from a print on demand service that is offered at the National Gallery of London, which is also the current maintainer of the system. The benchmark includes fundamental image operations such as an affine transformation and a convolution.
X264 (Media Processing)
This application is an H.264/AVC (Advanced Video Coding) video encoder. H.264 describes the lossy compression of a video stream and is also part of ISO/IEC MPEG-4. The flexibility and wide range of application of the H.264 standard and its ubiquity in next-generation video systems are the reasons for the inclusion of x264 in the PARSEC benchmark suite.

■ Methodology

Parallelization
Working sets and locality
Communication to computation ratio and sharing
Off-chip traffic



'Architecture' 카테고리의 다른 글

Interconnection Network Topologies  (0) 2009.07.01
PARSEC vs. SPLASH-2  (0) 2009.06.16
The SPLASH-2 Programs  (0) 2009.06.05
CMP vs. SMP  (0) 2009.05.26
Evaluating MapReduce for Multi-core and Multiprocessor Systems  (0) 2009.05.25
블로그 이미지

민둥

,