📄 test_rpt_thread_blk.txt.bak
字号:
High computing density:
int get_value(const int c)
{
int out=c;
for (int loop=0; loop<REPEAT_TIME; loop++)
{
out=(out*out)& 0xFFFF;
}
return out;
}
__global__ void init_array(int *values, int *out, int size)
{
int i= (THREAD_NUM*blockIdx.x) + threadIdx.x;
do {
// 1) Copy input to .
int v = values[i];
int o =0;
// 2) Synchronization
__syncthreads();
// 3) Process
o = get_value(v);
// 5) Write the result to device memory.
//values[i] = o;
out[i] =o;
// 4) Synchronization again
__syncthreads();
i+= (THREAD_NUM<<THREAD_BLK_NUM_LOG2);//(THREAD_NUM)*THREAD_BLK_NUM;
} while (i < size );
}
//////////////////////////////////////////////// Note: Yesterday's Test Result
Interger number=32(M) multiple times =100
2-way memcopy : 200.60
kernel computing: 1446.45
Time to 2-way memcpy & execute:
non-streamed: 1648.10 (1647.05 expected = 1446.45 + 200.60)
8 streams: 1534.57 (1471.52 expected = 1446.45 + 200.60/8)
Time taken by GPU using clock()= 1375 ms
------------VERIFY USING CPU-------------------
Test PASSED
Time taken by single threaded CPU using clock()= 23063 ms
GPU is 16.00 (23063/1375) times faster than CPU.
//////////////////////////////////////////////// thread_blk_num==1 test result is as same as before
Interger number=32(M) multiple times =100 thread_blk_num=1
2-way memcopy : 200.63
kernel computing: 1446.20
Time to 2-way memcpy & execute:
non-streamed: 1646.65 (1646.83 expected = 1446.20 + 200.63)
8 streams: 1547.35 (1471.28 expected = 1446.20 + 200.63/8)
Time taken by GPU using clock()= 1458 ms
------------VERIFY USING CPU-------------------
Test PASSED
Time taken by single threaded CPU using clock()= 22111 ms
GPU is 15.00 (22111/1458) times faster than CPU.
//////////////////////////////////////////////// Note: 8800GT contains 14 multi-processors
Interger number=32(M) multiple times =100 thread_blk_num=8
2-way memcopy : 200.73
kernel computing: 181.89 (Note: 1446.20 /8 = 180.75)
Time to 2-way memcpy & execute:
non-streamed: 380.34 (382.62 expected = 181.89 + 200.73)
8 streams: 207.07 (206.98 expected = 181.89 + 200.73/8)
Time taken by GPU using clock()= 141 ms
------------VERIFY USING CPU-------------------
Test PASSED
Time taken by single threaded CPU using clock()= 22105 ms
GPU is 156.00 (22105/141) times faster than CPU.
////////////////////////////////////////////////
Interger number=32(M) multiple times =100 thread_blk_num=16
2-way memcopy : 200.66
kernel computing: 181.66
Time to 2-way memcpy & execute:
non-streamed: 290.75 (382.33 expected = 181.66 + 200.66)
8 streams: 206.17 (206.75 expected = 181.66 + 200.66/8)
Time taken by GPU using clock()= 125 ms
------------VERIFY USING CPU-------------------
Test PASSED
Time taken by single threaded CPU using clock()= 22088 ms
GPU is 176.00 (22088/125) times faster than CPU.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -