⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 test_rpt_thread_blk.txt.bak

📁 md5_cuda编程
💻 BAK
字号:
High computing density: 

int get_value(const int c)
{
	int out=c;

	 for (int loop=0; loop<REPEAT_TIME; loop++)
	 {
		 out=(out*out)& 0xFFFF;
	 }

	return out;
}

__global__ void init_array(int *values, int *out, int size)
{ 
	int i= (THREAD_NUM*blockIdx.x) + threadIdx.x;
      do {
	    // 1) Copy input to .
    	int v = values[i];
		int o =0;
		// 2) Synchronization
	    __syncthreads();
		// 3) Process
		 o = get_value(v);
		 // 5) Write the result to device memory.
		 //values[i] = o; 
		 out[i] =o;

		// 4) Synchronization again
         __syncthreads();

	 i+= (THREAD_NUM<<THREAD_BLK_NUM_LOG2);//(THREAD_NUM)*THREAD_BLK_NUM; 

	 } while (i < size ); 

}

//////////////////////////////////////////////// Note: Yesterday's Test Result
Interger number=32(M) multiple times =100 
2-way memcopy : 200.60
kernel computing: 1446.45
Time to 2-way memcpy & execute:
non-streamed:   1648.10 (1647.05 expected = 1446.45 + 200.60)
8 streams:      1534.57 (1471.52 expected = 1446.45 + 200.60/8)
Time taken by GPU using clock()= 1375 ms
------------VERIFY USING CPU-------------------
Test PASSED
Time taken by single threaded CPU using clock()= 23063 ms
GPU is 16.00 (23063/1375) times faster than CPU.

//////////////////////////////////////////////// thread_blk_num==1 test result is as same as before
Interger number=32(M) multiple times =100 thread_blk_num=1
2-way memcopy : 200.63
kernel computing: 1446.20
Time to 2-way memcpy & execute:
non-streamed:   1646.65 (1646.83 expected = 1446.20 + 200.63)
8 streams:      1547.35 (1471.28 expected = 1446.20 + 200.63/8)
Time taken by GPU using clock()= 1458 ms
------------VERIFY USING CPU-------------------
Test PASSED
Time taken by single threaded CPU using clock()= 22111 ms
GPU is 15.00 (22111/1458) times faster than CPU.

//////////////////////////////////////////////// Note: 8800GT contains 14 multi-processors
Interger number=32(M) multiple times =100 thread_blk_num=8
2-way memcopy : 200.73
kernel computing: 181.89 (Note: 1446.20 /8 = 180.75)
Time to 2-way memcpy & execute:
non-streamed:   380.34 (382.62 expected = 181.89 + 200.73)
8 streams:      207.07 (206.98 expected = 181.89 + 200.73/8)
Time taken by GPU using clock()= 141 ms
------------VERIFY USING CPU-------------------
Test PASSED
Time taken by single threaded CPU using clock()= 22105 ms
GPU is 156.00 (22105/141) times faster than CPU.


////////////////////////////////////////////////
Interger number=32(M) multiple times =100 thread_blk_num=16
2-way memcopy : 200.66
kernel computing: 181.66
Time to 2-way memcpy & execute:
non-streamed:   290.75 (382.33 expected = 181.66 + 200.66)
8 streams:      206.17 (206.75 expected = 181.66 + 200.66/8)
Time taken by GPU using clock()= 125 ms
------------VERIFY USING CPU-------------------
Test PASSED
Time taken by single threaded CPU using clock()= 22088 ms
GPU is 176.00 (22088/125) times faster than CPU.



⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -