Just want to provide you with some pointers that may be possible sources of error. Firstly, use cudaEvents to time your code, not cuda profiler as cudaEvents is more accurate. Secondly, please check what the author is measuring; is he only talking about the computation time, or is he also considering the time to transfer data to and from the GPU.
Are you measuring the same time?
Up vote 0 down vote favorite share g+ share fb share tw.
How can I estimate the cuda performance of cards that I don't own, ie. New cards? For instance I found an incomplete Cuda example and the author wrote, that it takes him 0,7 s on his GF 8600 GT.
But on my Quadro it takes 1,7s. My question is: Is the code which I used to fill the gaps faulty or is the GF 8600 really twice as fast? The kernel is memory bound, but my card has an higher memory bandwidth.
I don't know what conclusions to draw from this. Name Quadro FX 580 GeForce 8600 GT CUDA Cores 32 32 Core clock (MHz) 450 540 Memory clock (MHz) 400 700 Memory BW (GB/s) 25.6 22.4 Shader Clock (MHz)? 1180 cuda gpu gpgpu nvidia link|improve this question asked Sep 21 '11 at 17:46Framester1,0651524 78% accept rate.
In my experience, performance should be quite similar between this two GPUs. It could be differences in hardware or software configurations that cause the performance gap. The dedicated GPGPU card could show much greater performance, than gpu that is simultaneously used to output video signal (especially if you have Windows Aero or Compiz running).
Also, how is the time measured? Overall, posting some code and more detailed PC configuration might be helpful, though it's quite difficult to do accurate estimations in mind, without performing tests/profiling. – aland Sep 21 '11 at 19:42.
Just want to provide you with some pointers that may be possible sources of error. Firstly, use cudaEvents to time your code, not cuda profiler as cudaEvents is more accurate. Secondly, please check what the author is measuring; is he only talking about the computation time, or is he also considering the time to transfer data to and from the GPU.
Are you measuring the same time? Secondly, the cuda architecture is changing quite fast. For example, for cards with cc 1.
X, it is suggested that we should use shared memory to get better performance; however, for cards with cc 2. X, there is a L1 cache with each multiprocessor that makes global memory accesses quite fast. So, you may aslo want to compare the architecture of the two cards and their compute capabilities.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.