📄 test_rpt.txt.bak
字号:
High computing density:
int get_value(const int c)
{
int out=c;
for (int loop=0; loop<REPEAT_TIME; loop++)
{
out=(out*out)& 0xFFFF;
}
return out;
}
////////////////////////////////////////////////
Interger number=8(M) multiple times =100
2-way memcopy : 50.11
kernel computing: 361.24
Time to 2-way memcpy & execute:
non-streamed: 415.44 (411.35 expected = 361.24 + 50.11)
8 streams: 371.28 (367.50 expected = 361.24 + 50.11/8)
Time taken by GPU using clock()= 344 ms
------------VERIFY USING CPU-------------------
Test PASSED
Time taken by single threaded CPU using clock()= 5641 ms
GPU is 16.00 (5641/344) times faster than CPU.
////////////////////////////////////////////////
Interger number=16(M) multiple times =100
2-way memcopy : 100.65
kernel computing: 723.04
Time to 2-way memcpy & execute:
non-streamed: 823.49 (823.69 expected = 723.04 + 100.65)
8 streams: 767.51 (735.62 expected = 723.04 + 100.65/8)
Time taken by GPU using clock()= 703 ms
------------VERIFY USING CPU-------------------
Test PASSED
Time taken by single threaded CPU using clock()= 11406 ms
GPU is 16.00 (11406/703) times faster than CPU.
////////////////////////////////////////////////
Interger number=32(M) multiple times =100
2-way memcopy : 200.60
kernel computing: 1446.45
Time to 2-way memcpy & execute:
non-streamed: 1648.10 (1647.05 expected = 1446.45 + 200.60)
8 streams: 1534.57 (1471.52 expected = 1446.45 + 200.60/8)
Time taken by GPU using clock()= 1375 ms
------------VERIFY USING CPU-------------------
Test PASSED
Time taken by single threaded CPU using clock()= 23063 ms
GPU is 16.00 (23063/1375) times faster than CPU.
////////////////////////////////////////////////
Interger number=64(M) multiple times =100
Outof memory at
int *d_a = 0, *d_o = 0; // pointers to data and init value in the device memory
CUDA_SAFE_CALL( cudaMalloc((void**)&d_a, nbytes) );
CUDA_SAFE_CALL( cudaMalloc((void**)&d_o, nbytes) );
*************************************************************
////////////////////////////////////////////////
Interger number=32(M) multiple times =50
2-way memcopy : 200.85
kernel computing: 749.84
Time to 2-way memcpy & execute:
non-streamed: 951.78 (950.70 expected = 749.84 + 200.85)
8 streams: 852.35 (774.95 expected = 749.84 + 200.85/8)
Time taken by GPU using clock()= 703 ms
------------VERIFY USING CPU-------------------
Test PASSED
Time taken by single threaded CPU using clock()= 11734 ms
GPU is 16.00 (11734/703) times faster than CPU.
////////////////////////////////////////////////
Interger number=32(M) multiple times =10
2-way memcopy : 200.67
kernel computing: 196.36
Time to 2-way memcpy & execute:
non-streamed: 396.45 (397.04 expected = 196.36 + 200.67)
8 streams: 221.74 (221.45 expected = 196.36 + 200.67/8)
Time taken by GPU using clock()= 62 ms
------------VERIFY USING CPU-------------------
Test PASSED
Time taken by single threaded CPU using clock()= 2156 ms
GPU is 34.00 (2156/62) times faster than CPU. (???)
GPU is 2156/221.74 =9.7 times faster than CPU. (???)
////////////////////////////////////////////////
Interger number=32(M) multiple times =1
2-way memcopy : 200.65
kernel computing: 71.28
Time to 2-way memcpy & execute:
non-streamed: 271.75 (271.93 expected = 71.28 + 200.65)
8 streams: 202.12 (96.36 expected = 71.28 + 200.65/8)
Time taken by GPU using clock()= 47 ms
------------VERIFY USING CPU-------------------
Test PASSED
Time taken by single threaded CPU using clock()= 406 ms
GPU is 8.00 (406/47) times faster than CPU.
GPU is 2.00 (406/202) times faster than CPU.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -