'Gems'에 해당되는 글 6건

[Gems] Torus Topology

Lab.work 2010. 7. 4. 20:41
GEMS wiki says that PT_TO_PT and FILE_SPECIFIED topologies are recommended for CMP protocols.
Its probably fine for a rough model of a single CMP.  The auto-generated
TORUS_2D network will not create a hierarchy (for a Multiple-CMP) and will
not do anything special for the directory/memory controllers.

'Lab.work' 카테고리의 다른 글

Meeting 12월 29일  (0) 2009.12.29
11월 11일 Meeting  (0) 2009.11.11
Weekly Report (11/9)  (0) 2009.11.10
11월 4일 Meeting  (0) 2009.11.04
Weekly Report  (0) 2009.11.04
블로그 이미지

민둥

,

11월 11일 Meeting

Lab.work 2009. 11. 11. 21:22
다음주 월요일(11월 16일) 까지 해야할일!

@ Phoenix

1) Ideal(p2p)과 mesh에 대해서 src + dest traffic을 뽑아서 비교!
2) 많이 사용되는 node는 왜 많이 사용 되는 거지? 이유알아보기

@ CS710 ICN

1) 발표 준비
"Low-Cost Router Microarchitecture ", John Kim, to appear in MICRO 2009, New York, NY December 2009 

2) 논문의 내용을 Garnet을 사용하여서 재구현 하기 ★★★★★

@ Booksim

1) 3000 cycle 이후에 simulation 정지하도록 구현.
거기 까지의 패킷으로만 결과를 출력하기

'Lab.work' 카테고리의 다른 글

[Gems] Torus Topology  (0) 2010.07.04
Meeting 12월 29일  (0) 2009.12.29
Weekly Report (11/9)  (0) 2009.11.10
11월 4일 Meeting  (0) 2009.11.04
Weekly Report  (0) 2009.11.04
블로그 이미지

민둥

,

PROGRESS REPORT (8/10)

Lab.work 2009. 8. 11. 01:02
■ THIS WEEK  (8/4~8/10)

□ Phoenix 2.0
I ran pheonix on gems (simics + ruby).
I just tested wordcount and pca.
Both applications still have a few assertion error to fix, but they worked.
But, with ruby, it takes too much time to watch the process.

□ Booksim 2.0
i changed the Bernoulli function and made bimodal injection function.
the result is as below.
(booksim_uniform_bimodal.pdf)

And I draw the traffic patterns correctly.
But we still need to lookup booksim code and analyze each traffic pattern.

□ sigMP
A Novel Cache Architecture with Enhanced Performance and Security (MICRO 2008)


■ NEXT WEEK  (8/11~8/17)

□ Phoenix 
run all applications on gems with different configurations.

□ Booksim 2.0
read code in detail and understand the process. 
test and compare the result, and analysis the characteristic of each traffic pattern.


'Lab.work' 카테고리의 다른 글

PROGRESS REPORT (8/26)  (0) 2009.08.26
PROGRESS REPORT (8/17)  (0) 2009.08.17
PROGRESS REPORT (8/3)  (0) 2009.08.04
PROGRESS REPORT  (0) 2009.07.28
PROGRESS REPORT  (0) 2009.07.21
블로그 이미지

민둥

,
The PARSEC Benchmark Suite: Characterization and Architectural Implications (2008)

PARSEC Wiki
http://wiki.cs.princeton.edu/index.php/PARSEC

A benchmark suite for studies of CMPs (Chip-Multiprocessors)
Diverse in working set, locality, data sharing, synchronization, and off-chip traffic.

existing benchmark suites cannot be considered adequate to describe future CMP applications.

■ Motivation

□ Requirements for a Benchmarks Suite

Multi-threaded Applications
Emerging Workloads
Diverse
Employ State-of-Art Techniques
Support Research

□ Limitations of Existing Benchmark Suites

SPLASH-2
Program collection is skewed towards HPC and graphics programs.
Does not include parallelization models such as the pipeline model.
SPEC CPU2006 and OMP2001
Provide a snapshot of current scientific and engineering applications
Workloads such as systems programs and parallelization models which employ the producer-consumer model are not included.
SPEC CPU2006 is a suite of serial programs.
Other Benchmark Suites
Designed to study a specific program area and limited to a single application domain.

■ The PARSEC Benchmark Suite

9 applications and 3 kernels (+PARSEC 2.0 includes RayTrace)

□ Input Sets
test, simdev, simsmall, simmedium, simlarge, native

□ Workloads

Blackscholes (Financial Analysis)
This application is an Intel RMS benchmark. It calculates the prices for a portfolio of European options analytically with the Black-Scholes partial differential equation (PDE). There is no closed-form expression for the Black-Scholes equation and as such it must be computed numerically.
Bodytrack (Computer Vision)
This computer vision application is an Intel RMS workload which tracks a human body with multiple cameras through an image sequence. This benchmark was included due to the increasing significance of computer vision algorithms in areas such as video surveillance, character animation and computer interfaces.
Canneal (Engineering)
This kernel was developed by Princeton University. It uses cache-aware simulated annealing (SA) to minimize the routing cost of a chip design. Canneal uses fine-grained parallelism with a lock-free algorithm and a very aggressive synchronization strategy that is based on data race recovery instead of avoidance.
Dedup (Enterprise Storage)
This kernel was developed by Princeton University. It compresses a data stream with a combination of global and local compression that is called 'deduplication'. The kernel uses a pipelined programming model to mimic real-world implementations. The reason for the inclusion of this kernel is that deduplication has become a mainstream method for new-generation backup storage systems.
Facesim (Animation)
This Intel RMS application was originally developed by Stanford University. It computes a visually realistic animation of the modeled face by simulating the underlying physics. The workload was included in the benchmark suite because an increasing number of animations employ physical simulation to create more realistic effects.
Ferret (Similarity Search)
This application is based on the Ferret toolkit which is used for content-based similarity search. It was developed by Princeton University. The reason for the inclusion in the benchmark suite is that it represents emerging next-generation search engines for non-text document data types. In the benchmark, we have configured the Ferret toolkit for image similarity search. Ferret is parallelized using the pipeline model.
Fluidanimate (Animation)
This Intel RMS application uses an extension of the Smoothed Particle Hydrodynamics (SPH) method to simulate an incompressible fluid for interactive animation purposes. It was included in the PARSEC benchmark suite because of the increasing significance of physics simulations for animations.
Freqmine (Data Mining)
This application employs an array-based version of the FP-growth (Frequent Pattern-growth) method for Frequent Itemset Mining (FIMI). It is an Intel RMS benchmark which was originally developed by Concordia University. Freqmine was included in the PARSEC benchmark suite because of the increasing use of data mining techniques.
Raytrace (PARSEC 2.0 추가됨, Graphics?)
The Intel RMS application uses a version of the raytracing method that would typically be employed for real-time animations such as computer games. It is optimized for speed rather than realism. The computational complexity of the algorithm depends on the resolution of the output image and the scene.
Streamcluster (Data Mining)
This RMS kernel was developed by Princeton University and solves the online clustering problem. Streamcluster was included in the PARSEC benchmark suite because of the importance of data mining algorithms and the prevalence of problems with streaming characteristics.
Swaptions (Financial Analysis)
The application is an Intel RMS workload which uses the Heath-Jarrow-Morton (HJM) framework to price a portfolio of swaptions. Swaptions employs Monte Carlo (MC) simulation to compute the prices.
Vips (Media Processing)
This application is based on the VASARI Image Processing System (VIPS) which was originally developed through several projects funded by European Union (EU) grants. The benchmark version is derived from a print on demand service that is offered at the National Gallery of London, which is also the current maintainer of the system. The benchmark includes fundamental image operations such as an affine transformation and a convolution.
X264 (Media Processing)
This application is an H.264/AVC (Advanced Video Coding) video encoder. H.264 describes the lossy compression of a video stream and is also part of ISO/IEC MPEG-4. The flexibility and wide range of application of the H.264 standard and its ubiquity in next-generation video systems are the reasons for the inclusion of x264 in the PARSEC benchmark suite.

■ Methodology

Parallelization
Working sets and locality
Communication to computation ratio and sharing
Off-chip traffic



'Architecture' 카테고리의 다른 글

Interconnection Network Topologies  (0) 2009.07.01
PARSEC vs. SPLASH-2  (0) 2009.06.16
The SPLASH-2 Programs  (0) 2009.06.05
CMP vs. SMP  (0) 2009.05.26
Evaluating MapReduce for Multi-core and Multiprocessor Systems  (0) 2009.05.25
블로그 이미지

민둥

,
The SPLASH-2 Programs: Characterization and Methodological Considerations (1995)

SPLASH-2 (vs. SPLASH)

Represent a wider range of computations in the scientific, engineering and graphics domains.
Use better algorithms and implementations.
Are more architecturally aware.

■ Characteristics and Approach

□ Axes of Characterization

Concurrency and load balancing: How many processors can be effectively utilized by that program, assuming a perfect memory system and communication architecture.
Working set: Program’s temporal locality
Communication to computation ratio: Potential impact of communication latency on performance
Spatial locality: Spatial locality and false sharing in the programs

□ Approach to Characterization

Experimental environment
Execution-driven simulation. Simulate a cache-coherent shared address space multiprocessor with physically distributed memory and one processor per node. Each processor has a single-level cache, using a directory-based protocol.
All memory references complete in a single cycle (regardless of hits or misses)
Data are distributed among the processing nodes according to the guidelines.
Data Sets and Scaling
The data sets are small enough to simulate in a reasonable time, yew large enough to be of interest in their problem domain in practice. We fix the number of processors at 32 for most of our characterization.
Inherent versus Practical Characteristics
Focus on these realistic memory system parameters while still trying to approach inherent properties and avoid too many artifacts.

■ The SPLASH-2 Application Suite

It has 8 complete applications and 4 kernels

Barnes Simulates the interaction of a system of bodies in three dimensions over a number of time-steps, using the Barnes-Hut hierarchical N-body method.
Cholesky Factors a sparse matrix into the product of a lower triangular matrix and its transpose.
FFT FFT kernel is a comoplex 1-D version of the radix root n six-step FFT algorithm
FMM Similates a system of bodies over a number of timesteps. Interactions in two dimensions using a different hierarchical N-body method called the adaptive Fast Multipole Method.
LU Factors a dense matrix into the product of a lower triangular and an upper triangular matrix.
Ocean Studies large-scale ocean movements based on eddy and boundary currents.
Radiosity Computes the equilibrium distribution of light in a scene using the iterative hierarchical diffuse radiosity method.
Radix Integer radix sort kernel
Raytrace Renders a three-dimensional scene using ray tracing.
Volrend Renders a three-dimensional volume using a ray casting technique.
Water-Nsquared Evaluates forces and potentials that occur over time in a system of water molecules.
Water-Spatial Solves the same problem as Water-Nsquared, but uses a more efficient algorithm.

■ Concurrency and Load Balance

Concurrency and load balance: how they change with problem size and number of processors
Study how the computational load balance scales with the number of processors by measuring speedups on a PRAM architectural model.

Figure 1: the PRAM speedups for the SPLASH-2 programs for up to 64 processors
Figure 2: the time spent waiting at synchronization points for 32-processor executions of each application.

The reasons for sub-linear speedups: the sizes of the input data sets.
(load imbalance, not-completely parallelized prefix computation, …)

■ Working Sets and Temporal Locality

......

'Architecture' 카테고리의 다른 글

Interconnection Network Topologies  (0) 2009.07.01
PARSEC vs. SPLASH-2  (0) 2009.06.16
The PARSEC Benchmark Suite  (0) 2009.06.05
CMP vs. SMP  (0) 2009.05.26
Evaluating MapReduce for Multi-core and Multiprocessor Systems  (0) 2009.05.25
블로그 이미지

민둥

,

5월 27일 수요일

Lab.work 2009. 5. 27. 17:02
Core 간의 Communication/Performance/Power/Reliability
Applicaion & Architecture <- 에 초점 

Coherence
power, verification, less overhead, cost efficient

★★ MapReduce!!
참고 "Evaluating MapReduce for Multicore and Multiprocessor Systems"
논문에 나오는 application을 따라 구현 & multicore에서 돌릴 수 있도록 접근



Benchmark: 각각에 대한 논문 찾아 읽어보기

"The SPLASH-2 Programs: Characterization and Methodological Considerations"

"The PARSEC Benchmark Suite: Characterization and Architectural Implications"
"PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites on Chip-Multiprocessors"

Simics license/Site license 알아보고 신청하기 (http://www.virtutech.com/)
Garnet: Detailed Interconnection Network Model inside a Full-system Simulation Framework
Orion: A Power-Performance Simulator for Interconnection Networks



매주 화요일 저녁 9시 화상회의
매주 월요일 저녁까지 Progress Report

Cluster account 받기
웹캠 & 마이크 구입

세미나 논문 발표
1. 문제 정의
2. 저자의 접근 방식
3. 기존의 방법과의 차이점
4. 결과 분석
5. 장단점 (나의 생각)

'Lab.work' 카테고리의 다른 글

PROGRESS REPORT (6/8)  (0) 2009.06.09
PROGRESS REPORT (6/1)  (0) 2009.05.31
PROGRESS REPORT (5/22)  (0) 2009.05.22
5월 15일 발표 논문  (0) 2009.05.21
5월 21일 목요일  (0) 2009.05.21
블로그 이미지

민둥

,