'phoenix'에 해당되는 글 23건

Progress Report (9/8)

Lab.work 2009. 9. 8. 14:57
■ THIS WEEK  (9/1~9/7)

□ Booksim

i tested the same process with randperm traffic, but the result is almost similar.
now i am trying to print out the hot spot latency and other latency, and compare them.

□ Phoenix

* degradation of map time
i thought that the reason of degradation was the disk io time.
for shared memory system, cache miss rate increases when # of processors increases.

* test with the sunT2 stats
using 8KB private L1 data cache for each processor and 4MB shared L2 cache.
the speedup degradation appears after 16 cores.
and the map time usually dominates the whole processing time.

* test with the perfect cache
using 100MB private L1 data cache and 400MB shared L2 cache. (other configs are the same as sunT2)
the L1 read hit rate of each processor is almost 99% (it's not 100% because of the cold misses)
but the write hit rate slightly decreases as the number of the processors increases.
the overall performance of the processors increases but the pattern of the speedup is the same.
there still exists the degradation after 16 cores and the map time is a dominative part.

i think that i need to test other application and compare the results.
and i try to focus on working set size of each application.

'Lab.work' 카테고리의 다른 글

Weekly report (9/28)  (0) 2009.09.29
Progress Report (9/15)  (0) 2009.09.15
PROGRESS REPORT (9/1)  (0) 2009.09.01
PROGRESS REPORT (8/26)  (0) 2009.08.26
PROGRESS REPORT (8/17)  (0) 2009.08.17
블로그 이미지

민둥

,

PROGRESS REPORT (9/1)

Lab.work 2009. 9. 1. 09:38
■ THIS WEEK  (8/27~9/1)

□ Booksim

* 50% of packet; dest 0, 50%: uniform random
voq throughput never exceeds the no voq result.
i counted the number of time when a vc was full and no flit couln't be sent to the vc.
using voq, many flits need to wait for 0 vc.
when bufsize is infinity, throughput of voq is also low.
because the waiting time at 0 vc is very long. 

□ Phoenix

* degradation of map time
i think that the reason of degradation is because of the disk io time.
for shared memory system, cache miss rate can increases when # of processors increases.
when input size gets bigger, the gradation can become large.
actually, L1 cache miss rate increases after 8 threads.

'Lab.work' 카테고리의 다른 글

Progress Report (9/15)  (0) 2009.09.15
Progress Report (9/8)  (0) 2009.09.08
PROGRESS REPORT (8/26)  (0) 2009.08.26
PROGRESS REPORT (8/17)  (0) 2009.08.17
PROGRESS REPORT (8/10)  (0) 2009.08.11
블로그 이미지

민둥

,

PROGRESS REPORT (8/26)

Lab.work 2009. 8. 26. 10:17
■ THIS WEEK  (8/17~8/26)

□ Booksim

* single router
single router has 5 inports and 5 outports, so it only has 0~4 source and 0~4 destination.
# of vcs does not affect the result when using no VOQ.

and when # of vcs is 5, the result is very same with the result of 64vcs.
I think because the single router has only 5 destinations (not 64 dests), 
the result will be the same when # of vcs >= 5.
as you said, # of vcs for VOQ should be equal to the # of ports.

* mesh88 uniform
i changed trafficmanagner to use VOQ for injection. so there is no blocking any more.
overall throughput slightly increases, but the graph pattern remains the same.

□ Phoenix

pthread speedup is almost same with the # of processors when network size is < 24.
but the degradation rate is all different using 32, 64 cpus.

for proc = 1 and for some applications, mapreduce takes significantly more time than pthread does.
i am verifying whether the mapreduce code works correctly.

=============================================================

■ NEXT WEEK 

□ Phoenix
test all application and verify whether the mapreduce code works correctly.

□ Booksim
determine why VOQ (single) does poorly on multiflit network.

'Lab.work' 카테고리의 다른 글

Progress Report (9/8)  (0) 2009.09.08
PROGRESS REPORT (9/1)  (0) 2009.09.01
PROGRESS REPORT (8/17)  (0) 2009.08.17
PROGRESS REPORT (8/10)  (0) 2009.08.11
PROGRESS REPORT (8/3)  (0) 2009.08.04
블로그 이미지

민둥

,

PROGRESS REPORT (8/17)

Lab.work 2009. 8. 17. 15:49
■ THIS WEEK  (8/11~8/16)

□ Phoenix 2.0
I had a simple test of wordcount on simics.
There is still a degradation after 64 nodes.
But, the speedup increases up to ~250 times (using 64 nodes).
[test_on_gems.pdf]

□ Booksim 2.0
I ran booksim with bimodal uniform traffic. 
using 64 vcs for each destination (8x8 mesh), the throughput is lower than original one.
When buffer size is 1, the decrease of throughput is huge.
When buffer size is getting bigger, the degradation get smaller, but never exceeds the result which does not use destination info.
I not quite sure, but I think it is because of the under-utilization of vcs.
[booksim_bimodal_voq.pdf]

□ sigMP
I presented the paper at the sigMP seminar.
A Novel Cache Architecture with Enhanced Performance and Security (MICRO 2008)


■ NEXT WEEK  (8/17~8/23)

□ Phoenix 
run all applications on gems with different configurations.

□ Booksim 2.0
Test more with different voq options.
Implement the outqueue router and analysis the results.

'Lab.work' 카테고리의 다른 글

PROGRESS REPORT (9/1)  (0) 2009.09.01
PROGRESS REPORT (8/26)  (0) 2009.08.26
PROGRESS REPORT (8/10)  (0) 2009.08.11
PROGRESS REPORT (8/3)  (0) 2009.08.04
PROGRESS REPORT  (0) 2009.07.28
블로그 이미지

민둥

,

PROGRESS REPORT (8/10)

Lab.work 2009. 8. 11. 01:02
■ THIS WEEK  (8/4~8/10)

□ Phoenix 2.0
I ran pheonix on gems (simics + ruby).
I just tested wordcount and pca.
Both applications still have a few assertion error to fix, but they worked.
But, with ruby, it takes too much time to watch the process.

□ Booksim 2.0
i changed the Bernoulli function and made bimodal injection function.
the result is as below.
(booksim_uniform_bimodal.pdf)

And I draw the traffic patterns correctly.
But we still need to lookup booksim code and analyze each traffic pattern.

□ sigMP
A Novel Cache Architecture with Enhanced Performance and Security (MICRO 2008)


■ NEXT WEEK  (8/11~8/17)

□ Phoenix 
run all applications on gems with different configurations.

□ Booksim 2.0
read code in detail and understand the process. 
test and compare the result, and analysis the characteristic of each traffic pattern.


'Lab.work' 카테고리의 다른 글

PROGRESS REPORT (8/26)  (0) 2009.08.26
PROGRESS REPORT (8/17)  (0) 2009.08.17
PROGRESS REPORT (8/3)  (0) 2009.08.04
PROGRESS REPORT  (0) 2009.07.28
PROGRESS REPORT  (0) 2009.07.21
블로그 이미지

민둥

,

PROGRESS REPORT (8/3)

Lab.work 2009. 8. 4. 10:27
■ THIS WEEK  (7/28~8/3)

□ Phoenix 2.0
I analyzed the result of histogram.
average task processing time are increases as # of cores increase.
but, it does not increase dramatically as compared with wordcount.
(new_result_histogram.pdf)

□ Booksim 2.0
I read the book, and drew each traffic patterns.
I'm not sure this is correct, but there are some strange things.
Bit complement (D_i = ~S_i) and Bit reverse (D_i = S_b-i-1) look exactly the same.
for Shuffle(D_i = S_i-1 mod b) and Bit rotation(D_i = S_i+1 mod b), only the traffic direction is different.
and when mesh size is 4x4, transpose and neighbor are the same.
(traffic_patterns.jpg)

we are checking these features with the results.

□ sigMp
no sigMp seminar this week.


■ NEXT WEEK  (8/4~8/10)

□ Phoenix 
run a couple of applications more and compare the result with wordcount.

□ Booksim 2.0
test and compare the result, and analysis the characteristic of each traffic pattern.

□ sigMp
Jaehong asked me to cover for his turn next week (8/14).
so please introduce me a paper for the presentation.

'Lab.work' 카테고리의 다른 글

PROGRESS REPORT (8/17)  (0) 2009.08.17
PROGRESS REPORT (8/10)  (0) 2009.08.11
PROGRESS REPORT  (0) 2009.07.28
PROGRESS REPORT  (0) 2009.07.21
PROGRESS REPORT  (0) 2009.07.07
블로그 이미지

민둥

,

PROGRESS REPORT

Lab.work 2009. 7. 21. 10:39
■ THIS WEEK  (7/15~7/20)

□ Phoenix 2.0

about degradation after 32 threads:  
first, I conjectured that the reason was skewed distribution of words.
so I randomly generated input file using a-z and white space.
but there still exists slight degradation.

now, I divided the start_worker function into 7 parts and find the degradation part.
here I attached my work. 

□ Booksim 2.0

the simulation script is still running.
the result file using uniform traffic is attached.
I will analysis the result when other data come out.


■ NEXT WEEK  (7/21~7/27)

□ Phoenix 
Test more with huge data set and re-plot the speedup graph.
Find way to determine working set size and L2.

□ Simics & Gems
a little later..

□ Booksim 2.0
Plot all graph using other traffics and analysis the result.

'Lab.work' 카테고리의 다른 글

PROGRESS REPORT (8/3)  (0) 2009.08.04
PROGRESS REPORT  (0) 2009.07.28
PROGRESS REPORT  (0) 2009.07.07
6월 18일 발표논문  (0) 2009.07.01
6월 29일 월요일  (0) 2009.06.29
블로그 이미지

민둥

,

PROGRESS REPORT

Lab.work 2009. 7. 7. 11:38
■ THIS WEEK  (6/29~7/6)

□ Phoenix 2.0
I re-ploted the result graph with the x-axis.
There is no degredation after 32 nodes when data set is huge.
Optimal chunk size is dependent on the application.
□ Booksim 2.0
I tested 2D mesh network with different number of virtual channel and buffer size.


■ NEXT WEEK  (7/7~7/13)

□ Phoenix 
Test more with huge data set and re-plot the speedup graph.
Find way to determine working set size and L2.

□ Simics & Gems
run and test PARSEC on Simics and Gems.

□ SigMP
SigMP seminar will be on Thursday. 

'Lab.work' 카테고리의 다른 글

PROGRESS REPORT  (0) 2009.07.28
PROGRESS REPORT  (0) 2009.07.21
6월 18일 발표논문  (0) 2009.07.01
6월 29일 월요일  (0) 2009.06.29
PROGRESS REPORT (6/29)  (0) 2009.06.29
블로그 이미지

민둥

,

6월 29일 월요일

Lab.work 2009. 6. 29. 22:21
■ Sun Machine 

케이블 가격 알아보기
혹시 자주 다운된다면 다시 OS 깔아야하나.

■ Phoenix

check ** L2가 working set size를 포함하기 못할때..!!

Phoenix on Simics!ㅠ_ㅠ
일단 실제 기계에서 다 돌려보고 Simics에서 해보삼

wordcount가 왜 초반에는 super linear speedup 인데 다음에는 떨어지는걸까
wordcount working set이랑 L2 size 확인
다른 application에서도 chuck size가 어떤 영향을 미치는지

Speedup 그래프 다시 그리기
chunk size가 정확히 어떤 영향을 미치는지 + best chunk size적용

linux x86랑 sun sparc이랑 무슨 차이? 왜??
hyper threading? super 어쩌고 threading? throughput이 더 중요한지 연산을 하는데 최적화 되어있는지 등등
+ thread끼리 공유하는 cache등등의 문제
sun 같은 경우는 모두 on chip communication이라서!?
linux같은 경우는 하나의 board에서... 그래서?

sun system config 찾아보기

■ Booksim

목요일날 다시 얘기!
Booksim에 대해서 다시 설명해주심
topology에 대해서 간단히 찾아보고, code도 대충 보기!

networks.cpp
iq_router.cpp
가장 highlevel은 trafficmanager.cpp <= 여기서 모든게 실행

"wait_for_tail_credit" config 살펴보기
booksim_config.cpp

■ Gems

목요일전에 gems돌려보기

-------------------------------------

목요일 한국시간 8시 다시 미팅


'Lab.work' 카테고리의 다른 글

PROGRESS REPORT  (0) 2009.07.07
6월 18일 발표논문  (0) 2009.07.01
PROGRESS REPORT (6/29)  (0) 2009.06.29
PROGRESS REPORT (6/23)  (0) 2009.06.23
PROGRESS REPORT (6/15)  (0) 2009.06.16
블로그 이미지

민둥

,

PROGRESS REPORT (6/23)

Lab.work 2009. 6. 23. 22:03
■ THIS WEEK  (6/16~6/23)

□ Phoenix 
The results fils is attached. 

□ SigMP seminar (6/18)
An OS-Based Alternative to Full Hardware Coherence on Tiled CMPs (HPCA 2008) (by 김대훈)
PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites (by 신민정) 

□ Simics & Gems
compiled PARSEC binary file on linux machine.



■ NEXT WEEK  (6/23~6/29)

□ SigMP seminar
no sigMP seminar

□ Simics & Gems
run and test PARSEC on Simics and Gems.


'Lab.work' 카테고리의 다른 글

6월 29일 월요일  (0) 2009.06.29
PROGRESS REPORT (6/29)  (0) 2009.06.29
PROGRESS REPORT (6/15)  (0) 2009.06.16
6월 11일 발표논문  (0) 2009.06.15
6월 4일 발표 논문  (0) 2009.06.09
블로그 이미지

민둥

,