⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 test_rpt.txt.bak

📁 md5_cuda编程
💻 BAK
字号:
High computing density: 

int get_value(const int c)
{
	int out=c;

	 for (int loop=0; loop<REPEAT_TIME; loop++)
	 {
		 out=(out*out)& 0xFFFF;
	 }

	return out;
}


////////////////////////////////////////////////
Interger number=8(M) multiple times =100
2-way memcopy : 50.11
kernel computing: 361.24
Time to 2-way memcpy & execute:
non-streamed:   415.44 (411.35 expected = 361.24 + 50.11)
8 streams:      371.28 (367.50 expected = 361.24 + 50.11/8)
Time taken by GPU using clock()= 344 ms
------------VERIFY USING CPU-------------------
Test PASSED
Time taken by single threaded CPU using clock()= 5641 ms
GPU is 16.00 (5641/344) times faster than CPU.

////////////////////////////////////////////////
Interger number=16(M) multiple times =100
2-way memcopy : 100.65
kernel computing: 723.04
Time to 2-way memcpy & execute:
non-streamed:   823.49 (823.69 expected = 723.04 + 100.65)
8 streams:      767.51 (735.62 expected = 723.04 + 100.65/8)
Time taken by GPU using clock()= 703 ms
------------VERIFY USING CPU-------------------
Test PASSED
Time taken by single threaded CPU using clock()= 11406 ms
GPU is 16.00 (11406/703) times faster than CPU.

////////////////////////////////////////////////
Interger number=32(M) multiple times =100
2-way memcopy : 200.60
kernel computing: 1446.45
Time to 2-way memcpy & execute:
non-streamed:   1648.10 (1647.05 expected = 1446.45 + 200.60)
8 streams:      1534.57 (1471.52 expected = 1446.45 + 200.60/8)
Time taken by GPU using clock()= 1375 ms
------------VERIFY USING CPU-------------------
Test PASSED
Time taken by single threaded CPU using clock()= 23063 ms
GPU is 16.00 (23063/1375) times faster than CPU.

////////////////////////////////////////////////
Interger number=64(M) multiple times =100

Outof memory at 
    int *d_a = 0, *d_o = 0;             // pointers to data and init value in the device memory
    CUDA_SAFE_CALL( cudaMalloc((void**)&d_a, nbytes) );
    CUDA_SAFE_CALL( cudaMalloc((void**)&d_o, nbytes) );

*************************************************************
////////////////////////////////////////////////
Interger number=32(M) multiple times =50
2-way memcopy : 200.85
kernel computing: 749.84
Time to 2-way memcpy & execute:
non-streamed:   951.78 (950.70 expected = 749.84 + 200.85)
8 streams:      852.35 (774.95 expected = 749.84 + 200.85/8)
Time taken by GPU using clock()= 703 ms
------------VERIFY USING CPU-------------------
Test PASSED
Time taken by single threaded CPU using clock()= 11734 ms
GPU is 16.00 (11734/703) times faster than CPU.


////////////////////////////////////////////////
Interger number=32(M) multiple times =10
2-way memcopy : 200.67
kernel computing: 196.36
Time to 2-way memcpy & execute:
non-streamed:   396.45 (397.04 expected = 196.36 + 200.67)
8 streams:      221.74 (221.45 expected = 196.36 + 200.67/8)
Time taken by GPU using clock()= 62 ms
------------VERIFY USING CPU-------------------
Test PASSED
Time taken by single threaded CPU using clock()= 2156 ms
GPU is 34.00 (2156/62) times faster than CPU. (???)
GPU is 2156/221.74 =9.7 times faster than CPU. (???)
 

////////////////////////////////////////////////
Interger number=32(M) multiple times =1
2-way memcopy : 200.65
kernel computing: 71.28
Time to 2-way memcpy & execute:
non-streamed:   271.75 (271.93 expected = 71.28 + 200.65)
8 streams:      202.12 (96.36 expected = 71.28 + 200.65/8)
Time taken by GPU using clock()= 47 ms
------------VERIFY USING CPU-------------------
Test PASSED
Time taken by single threaded CPU using clock()= 406 ms
GPU is 8.00 (406/47) times faster than CPU.
GPU is 2.00 (406/202) times faster than CPU.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -