'benchmark' 태그의 글 목록

The PARSEC Benchmark Suite

Architecture 2009. 6. 5. 14:44

The PARSEC Benchmark Suite: Characterization and Architectural Implications (2008)

PARSEC Wiki: http://wiki.cs.princeton.edu/index.php/PARSEC

A benchmark suite for studies of CMPs (Chip-Multiprocessors)

Diverse in working set, locality, data sharing, synchronization, and off-chip traffic.

existing benchmark suites cannot be considered adequate to describe future CMP applications.

■ Motivation

□ Requirements for a Benchmarks Suite

Multi-threaded Applications

Emerging Workloads

Diverse

Employ State-of-Art Techniques

Support Research

□ Limitations of Existing Benchmark Suites

SPLASH-2

Program collection is skewed towards HPC and graphics programs.

Does not include parallelization models such as the pipeline model.

SPEC CPU2006 and OMP2001

Provide a snapshot of current scientific and engineering applications

Workloads such as systems programs and parallelization models which employ the producer-consumer model are not included.

SPEC CPU2006 is a suite of serial programs.

Other Benchmark Suites

Designed to study a specific program area and limited to a single application domain.

■ The PARSEC Benchmark Suite

9 applications and 3 kernels (+PARSEC 2.0 includes RayTrace)

□ Input Sets

test, simdev, simsmall, simmedium, simlarge, native

□ Workloads

Blackscholes (Financial Analysis)

This application is an Intel RMS benchmark. It calculates the prices for a portfolio of European options analytically with the Black-Scholes partial differential equation (PDE). There is no closed-form expression for the Black-Scholes equation and as such it must be computed numerically.

Bodytrack (Computer Vision)

This computer vision application is an Intel RMS workload which tracks a human body with multiple cameras through an image sequence. This benchmark was included due to the increasing significance of computer vision algorithms in areas such as video surveillance, character animation and computer interfaces.

Canneal (Engineering)

This kernel was developed by Princeton University. It uses cache-aware simulated annealing (SA) to minimize the routing cost of a chip design. Canneal uses fine-grained parallelism with a lock-free algorithm and a very aggressive synchronization strategy that is based on data race recovery instead of avoidance.

Dedup (Enterprise Storage)

This kernel was developed by Princeton University. It compresses a data stream with a combination of global and local compression that is called 'deduplication'. The kernel uses a pipelined programming model to mimic real-world implementations. The reason for the inclusion of this kernel is that deduplication has become a mainstream method for new-generation backup storage systems.

Facesim (Animation)

This Intel RMS application was originally developed by Stanford University. It computes a visually realistic animation of the modeled face by simulating the underlying physics. The workload was included in the benchmark suite because an increasing number of animations employ physical simulation to create more realistic effects.

Ferret (Similarity Search)

This application is based on the Ferret toolkit which is used for content-based similarity search. It was developed by Princeton University. The reason for the inclusion in the benchmark suite is that it represents emerging next-generation search engines for non-text document data types. In the benchmark, we have configured the Ferret toolkit for image similarity search. Ferret is parallelized using the pipeline model.

Fluidanimate (Animation)

This Intel RMS application uses an extension of the Smoothed Particle Hydrodynamics (SPH) method to simulate an incompressible fluid for interactive animation purposes. It was included in the PARSEC benchmark suite because of the increasing significance of physics simulations for animations.

Freqmine (Data Mining)

This application employs an array-based version of the FP-growth (Frequent Pattern-growth) method for Frequent Itemset Mining (FIMI). It is an Intel RMS benchmark which was originally developed by Concordia University. Freqmine was included in the PARSEC benchmark suite because of the increasing use of data mining techniques.

Raytrace (PARSEC 2.0 추가됨, Graphics?)

The Intel RMS application uses a version of the raytracing method that would typically be employed for real-time animations such as computer games. It is optimized for speed rather than realism. The computational complexity of the algorithm depends on the resolution of the output image and the scene.

Streamcluster (Data Mining)

This RMS kernel was developed by Princeton University and solves the online clustering problem. Streamcluster was included in the PARSEC benchmark suite because of the importance of data mining algorithms and the prevalence of problems with streaming characteristics.

Swaptions (Financial Analysis)

The application is an Intel RMS workload which uses the Heath-Jarrow-Morton (HJM) framework to price a portfolio of swaptions. Swaptions employs Monte Carlo (MC) simulation to compute the prices.

Vips (Media Processing)

This application is based on the VASARI Image Processing System (VIPS) which was originally developed through several projects funded by European Union (EU) grants. The benchmark version is derived from a print on demand service that is offered at the National Gallery of London, which is also the current maintainer of the system. The benchmark includes fundamental image operations such as an affine transformation and a convolution.

X264 (Media Processing)

This application is an H.264/AVC (Advanced Video Coding) video encoder. H.264 describes the lossy compression of a video stream and is also part of ISO/IEC MPEG-4. The flexibility and wide range of application of the H.264 standard and its ubiquity in next-generation video systems are the reasons for the inclusion of x264 in the PARSEC benchmark suite.

■ Methodology

Parallelization

Working sets and locality

Communication to computation ratio and sharing

Off-chip traffic

'Architecture' 카테고리의 다른 글

Interconnection Network Topologies (0)	2009.07.01
PARSEC vs. SPLASH-2 (0)	2009.06.16
The SPLASH-2 Programs (0)	2009.06.05
CMP vs. SMP (0)	2009.05.26
Evaluating MapReduce for Multi-core and Multiprocessor Systems (0)	2009.05.25

민둥

,

The SPLASH-2 Programs

Architecture 2009. 6. 5. 13:21

The SPLASH-2 Programs: Characterization and Methodological Considerations (1995)

SPLASH-2 (vs. SPLASH)

Represent a wider range of computations in the scientific, engineering and graphics domains.

Use better algorithms and implementations.

Are more architecturally aware.

■ Characteristics and Approach

□ Axes of Characterization

Concurrency and load balancing: How many processors can be effectively utilized by that program, assuming a perfect memory system and communication architecture.

Working set: Program’s temporal locality

Communication to computation ratio: Potential impact of communication latency on performance

Spatial locality: Spatial locality and false sharing in the programs

□ Approach to Characterization

Experimental environment

Execution-driven simulation. Simulate a cache-coherent shared address space multiprocessor with physically distributed memory and one processor per node. Each processor has a single-level cache, using a directory-based protocol.

All memory references complete in a single cycle (regardless of hits or misses)

Data are distributed among the processing nodes according to the guidelines.

Data Sets and Scaling

The data sets are small enough to simulate in a reasonable time, yew large enough to be of interest in their problem domain in practice. We fix the number of processors at 32 for most of our characterization.

Inherent versus Practical Characteristics

Focus on these realistic memory system parameters while still trying to approach inherent properties and avoid too many artifacts.

■ The SPLASH-2 Application Suite

It has 8 complete applications and 4 kernels

Barnes Simulates the interaction of a system of bodies in three dimensions over a number of time-steps, using the Barnes-Hut hierarchical N-body method.

Cholesky Factors a sparse matrix into the product of a lower triangular matrix and its transpose.

FFT FFT kernel is a comoplex 1-D version of the radix root n six-step FFT algorithm

FMM Similates a system of bodies over a number of timesteps. Interactions in two dimensions using a different hierarchical N-body method called the adaptive Fast Multipole Method.

LU Factors a dense matrix into the product of a lower triangular and an upper triangular matrix.

Ocean Studies large-scale ocean movements based on eddy and boundary currents.

Radiosity Computes the equilibrium distribution of light in a scene using the iterative hierarchical diffuse radiosity method.

Radix Integer radix sort kernel

Raytrace Renders a three-dimensional scene using ray tracing.

Volrend Renders a three-dimensional volume using a ray casting technique.

Water-Nsquared Evaluates forces and potentials that occur over time in a system of water molecules.

Water-Spatial Solves the same problem as Water-Nsquared, but uses a more efficient algorithm.

■ Concurrency and Load Balance

Concurrency and load balance: how they change with problem size and number of processors

Study how the computational load balance scales with the number of processors by measuring speedups on a PRAM architectural model.

Figure 1: the PRAM speedups for the SPLASH-2 programs for up to 64 processors

Figure 2: the time spent waiting at synchronization points for 32-processor executions of each application.

The reasons for sub-linear speedups: the sizes of the input data sets.

(load imbalance, not-completely parallelized prefix computation, …)

■ Working Sets and Temporal Locality

......

'Architecture' 카테고리의 다른 글

Interconnection Network Topologies (0)	2009.07.01
PARSEC vs. SPLASH-2 (0)	2009.06.16
The PARSEC Benchmark Suite (0)	2009.06.05
CMP vs. SMP (0)	2009.05.26
Evaluating MapReduce for Multi-core and Multiprocessor Systems (0)	2009.05.25

민둥

,

5월 27일 수요일

Lab.work 2009. 5. 27. 17:02

Core 간의 Communication/Performance/Power/Reliability

Applicaion & Architecture <- 에 초점 ★

Coherence

power, verification, less overhead, cost efficient

★★ MapReduce!!

참고 "Evaluating MapReduce for Multicore and Multiprocessor Systems"

논문에 나오는 application을 따라 구현 & multicore에서 돌릴 수 있도록 접근

Gems Simulator (http://www.cs.wisc.edu/gems/)

Benchmark: 각각에 대한 논문 찾아 읽어보기

SPLASH-2 Stanford Parallel Applications for Shared Memory (SPLASH)

"The SPLASH-2 Programs: Characterization and Methodological Considerations"

PARSEC The PARSEC Benchmark Suite

"The PARSEC Benchmark Suite: Characterization and Architectural Implications"

"PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites on Chip-Multiprocessors"

Simics license/Site license 알아보고 신청하기 (http://www.virtutech.com/)

Garnet: Detailed Interconnection Network Model inside a Full-system Simulation Framework

Orion: A Power-Performance Simulator for Interconnection Networks

매주 화요일 저녁 9시 화상회의

매주 월요일 저녁까지 Progress Report

Cluster account 받기

웹캠 & 마이크 구입

세미나 논문 발표

1. 문제 정의

2. 저자의 접근 방식

3. 기존의 방법과의 차이점

4. 결과 분석

5. 장단점 (나의 생각)

'Lab.work' 카테고리의 다른 글

PROGRESS REPORT (6/8) (0)	2009.06.09
PROGRESS REPORT (6/1) (0)	2009.05.31
PROGRESS REPORT (5/22) (0)	2009.05.22
5월 15일 발표 논문 (0)	2009.05.21
5월 21일 목요일 (0)	2009.05.21

민둥

,

'benchmark'에 해당되는 글 3건

The PARSEC Benchmark Suite

'Architecture' 카테고리의 다른 글

The SPLASH-2 Programs

'Architecture' 카테고리의 다른 글

5월 27일 수요일

'Lab.work' 카테고리의 다른 글

공지사항

카테고리

태그목록

최근에 받은 트랙백

글 보관함

달력

링크

민둥

LATEST FROM OUR BLOG

LATEST COMMENTS

BLOG VISITORS

티스토리툴바

« 2024/05 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31