Advanced search

Message boards : Number crunching : CPU Projects

Author Message
PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,617,042,755
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 48110 - Posted: 8 Nov 2017 | 19:33:28 UTC

I was wondering, why do so many BOINC projects waste their time writing a CPU only application for their BOINC project. GPUs are inherently more parallel and always have much higher FLOPS. CPUs struggle to get anywhere near the 1 TeraFLOPS range while most gaming GPUs are 5+ TeraFLOPS.

You have the ability to have multiple GPUs per system, making the FLOPS per watt higher, increasing efficiency and miniming cost for things like power suppply and hard drive.

They don't even make CPU coprocessors on expansion cards for consumers so why would projects like LHC, which have more data to crunch than just about anyone, spend the time to develop a CPU only project?

kain
Send message
Joined: 3 Sep 14
Posts: 152
Credit: 641,182,245
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 48111 - Posted: 8 Nov 2017 | 21:03:27 UTC

So... Have you ever written at least one line of the code? No? That's what I thought... ;)

CPU core can be imagined as math professor, GPU as thousand of 8th graders. So when you have to count trees in the forest, bunch of kids would do it faster. But try to use them to calculate optimal path for a rocket going to the moon ;)

In short - not every algorithm can be used on GPU (in fact - the most can't).

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,617,042,755
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 48113 - Posted: 8 Nov 2017 | 23:01:52 UTC

Calculating an optimal path of a rocket is millions of trials done by the processor. A CPU does this in serial while a GPU does each trial more slowly but does millions in parallel. Math professor vs 8th graders might not be the best comparison as 8th graders will never be able to come to an accurate conclusion, while a GPU core can, just much more slowly.

Ray tracing is a good example of parallel trajectory calculations. This activity is far too demanding for a serial based CPU to have a real-time result.

We need to step away from single threaded code. Every new project has a GPU application. XANSONS4COD, Drugdiscovery@home, these projects realize this potential. I understand code is written in a very serial fashion but this simply means we need to rethink how we operate and code.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48114 - Posted: 8 Nov 2017 | 23:12:07 UTC - in response to Message 48110.
Last modified: 8 Nov 2017 | 23:22:00 UTC

I was wondering, why do so many BOINC projects waste their time writing a CPU only application for their BOINC project.
Generally speaking it's easier to develop a CPU application than a massively parallel GPU application.

GPUs are inherently more parallel and always have much higher FLOPS.
That's true, but not all calculations can be efficiently run in parallel.

CPUs struggle to get anywhere near the 1 TeraFLOPS range while most gaming GPUs are 5+ TeraFLOPS.
That's true, but the Intel's x86 architecture is much older, thus it has a bigger codebase and programmer 'community'. The original (16-bit) x86 CPU (the i8086) was released in 1978, while the first programmable GPU (the G80 based NVidia GTX 8800) was released in 8 november 2006. The 8-bit predecessor of i8086 (the i8080) was released in 1974. Another important landmark in the evolution of the x86 architecture is the release of the i80386dx in 1985, which is the first 32-bit Intel CPU. Then the release of the i80486dx in 1989, which is the first Intel CPU containing the FPU (Floating Point Unit). Before this CPU, you have to buy a separate chip (co-processor, x87) - just like now a GPU - to speed up scientific calculations. Their role is similar: an x87 couldn't run without an x86, just like a GPU couldn't run without a CPU.

You have the ability to have multiple GPUs per system, making the FLOPS per watt higher, increasing efficiency and minimizing cost for things like power supply and hard drive.
The current goal of CPU development is to achieve the same efficiency, but the complexity of the x86 (and the x64) architecture makes them less energy efficient, however the use of state of the art programming techniques (the AVX instruction set) makes the CPUs more competitive against the GPUs. The latest generation of Intel server CPUs have 28 cores, and you can put 8 such CPUs in a single server (however it costs $13000 each). If I understand it right, the top models of the latest generation server CPUs has no upper limit of the number of CPUs that could be used in a single system.

They don't even make CPU coprocessors on expansion cards for consumers
You're wrong. See Xeon-Phi, the pricing is here. Some pictures.

so why would projects like LHC, which have more data to crunch than just about anyone, spend the time to develop a CPU only project?
All of the above, plus perhaps their original code is developed on CPUs in the past, and it maybe could not be ported easily to GPUs, so basically they need to develop / maintain a second set of apps for GPUs. GPUs are not that much backward compatible as CPUs are, so it's harder to maintain a GPU app, than a CPU app.

3de64piB5uZAS6SUNt1GFDU9d...
Avatar
Send message
Joined: 20 Apr 15
Posts: 285
Credit: 1,102,216,607
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwat
Message 48115 - Posted: 8 Nov 2017 | 23:17:44 UTC

CPU and GPU are different in architecture and purpose. CPUs are designed as CISC, which means they have a rather "Complex Instruction Set" making them more versatile in application and easier to program. However simultaneity is limited and the instruction set balooned for by making them downwards compatible e.g. 16 or 32Bit operation compliance. But the common personal computer wouldn't have started its triumph without it.

Modern GPU are designed a completely different way, their main purpose is maximum concurrency of tasks of similar kind. Their instruction set is limited and therefore reminds of RISC, whereas the concurrency somewhat reminds me of the old transputer concept. (Although the idea of passing tasks from one node to the next node was not followed up, unfortunately.)

Translating CPU code to GPU code e.g. CUDA is not straightforward for the reasons abovementioned. And it does not always make sense anyway. There sometimes are calculations with little substeps and a lot of dependencies, which means, little to decompose and not many tasks to parallelize. In that case, the GPU might even be slower than a CPU.
____________
I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.

3de64piB5uZAS6SUNt1GFDU9d...
Avatar
Send message
Joined: 20 Apr 15
Posts: 285
Credit: 1,102,216,607
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwat
Message 48116 - Posted: 8 Nov 2017 | 23:46:33 UTC
Last modified: 8 Nov 2017 | 23:49:27 UTC

A good example of a very successful CISC concept is the human brain ... little parallelism of a limited number of specialized brain areas, which still show an astounding neuroplasticity. In fact, the brain is getting more and more complex over the years, and it’s the perfect control unit for mastering the challenges of life. Evolution and natural selection prove that.

Guess how successful our brain would be with (GPU-like) 256 olfactory centres and 64 visual centres. You could smell 256 different odors at the same time ... but still you would hardly complete a grammar school education thereby.
____________
I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.

kain
Send message
Joined: 3 Sep 14
Posts: 152
Credit: 641,182,245
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 48117 - Posted: 8 Nov 2017 | 23:51:45 UTC - in response to Message 48113.

Calculating an optimal path of a rocket is millions of trials done by the processor. A CPU does this in serial while a GPU does each trial more slowly but does millions in parallel. Math professor vs 8th graders might not be the best comparison as 8th graders will never be able to come to an accurate conclusion, while a GPU core can, just much more slowly.

Ray tracing is a good example of parallel trajectory calculations. This activity is far too demanding for a serial based CPU to have a real-time result.

We need to step away from single threaded code. Every new project has a GPU application. XANSONS4COD, Drugdiscovery@home, these projects realize this potential. I understand code is written in a very serial fashion but this simply means we need to rethink how we operate and code.


Nope, you don't understand it at all. GPU like any 8th grader CAN'T do complex operations. It can millions of simple ones at once but not even single one complex. So the "Path to the moon" is a very good example. It is not guessing, math professor would do it perfectly without "trying". Child wouldn't be able to even start. But it's just an example, so raytracing have nothing to do with it. Raytracing algorithm is very simple and very easy to pararellize. Most are not.

Want an example? Calculate x=(1+2)*3. It can be done in serial and can be done in parallel. Guess which way is faster ;)

And no, we don't "need to rethink" anything, we need to be a little more humble and stop telling others how they should do their work. Especially when we have no idea about it :)

WPrion
Send message
Joined: 30 Apr 13
Posts: 87
Credit: 1,065,409,111
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48119 - Posted: 9 Nov 2017 | 16:45:56 UTC - in response to Message 48115.

"... whereas the concurrency somewhat reminds me of the old transputer concept..."

Wow - there's a term I haven't heard in an age. INMOS, T800, C004, B012, Occam...good times!

Win

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,190,971,966
RAC: 10,548,416
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48121 - Posted: 9 Nov 2017 | 23:31:58 UTC

CPUs and GPUs are both just dumb machines. All they can do is crunch ones and zeroes. A programmer writes the code to convert these useless ones and zeroes into something useful. It is easier to write code for CPUs than GPUs, okay I got that. It is not impossible to write code for GPUs, (this and other GPU projects are proof of that), just more difficult.

Calculating 27 is bad example to use. From a human perspective, if that all you have to calculate, either will do fine. The human brain will also do fine. However, if you have a vast quantity of simple calculations (even complicated equations are reduced into ones and zeroes by the computer), which machine will do that faster? And how much human time and effort will it take to write the code to make these ones and zeroes into something useful? At some time and point, the GPU will win out. The more there is to crunch, the larger the GPU victory. Again, this and other GPU projects are examples of that.

Oh, don't underestimate 8th graders, some of them are smarter than professors, and they grow up and some become even smarter. We were all once 8th graders.

This is all logic and common sense, no computer talk mumbo jumbo.



3de64piB5uZAS6SUNt1GFDU9d...
Avatar
Send message
Joined: 20 Apr 15
Posts: 285
Credit: 1,102,216,607
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwat
Message 48122 - Posted: 10 Nov 2017 | 1:06:18 UTC - in response to Message 48121.

CPUs and GPUs are both just dumb machines. All they can do is crunch ones and zeroes. A programmer writes the code to convert these useless ones and zeroes into something useful. It is easier to write code for CPUs than GPUs, okay I got that. It is not impossible to write code for GPUs, (this and other GPU projects are proof of that), just more difficult.

Calculating 27 is bad example to use. From a human perspective, if that all you have to calculate, either will do fine. The human brain will also do fine. However, if you have a vast quantity of simple calculations (even complicated equations are reduced into ones and zeroes by the computer), which machine will do that faster? And how much human time and effort will it take to write the code to make these ones and zeroes into something useful? At some time and point, the GPU will win out. The more there is to crunch, the larger the GPU victory. Again, this and other GPU projects are examples of that.

Oh, don't underestimate 8th graders, some of them are smarter than professors, and they grow up and some become even smarter. We were all once 8th graders.

This is all logic and common sense, no computer talk mumbo jumbo.



Seems that your logic differs from mine significantly.

A machine does NOT simply crunch „ones and zeroes“, no more than human beings just speak dumb „sounds“ or „letters“. In fact, sounds form letters, several letters yield words, words form sentences in a well defined structure and finally sentences result in a meaning, language and communication.

That language is a perfect match to our way of thinking, our lips, our voice, our ears, our lung and our whole anatomy and senses. We don‘t communicate in ultrasound, because our anatomy simply does not permit that. So language and anatomy are inextricably linked, and the same is true of computer architecture and language, instruction set and syntax.

Again, a programmer does NOT convert 0 and 1 into something useful… the programmer has to use computer language for that purpose, which is widely defined by the underlying architecture. A poet does not convert „letters“ into something useful either… how come? He must use a language to compose his poem. And if that language does not comprise words for sadness, despair and poverty, how can he make a tragedy? He could try to paraphrase his emotions, but it will never be successful.
____________
I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.

kain
Send message
Joined: 3 Sep 14
Posts: 152
Credit: 641,182,245
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 48123 - Posted: 10 Nov 2017 | 2:06:50 UTC - in response to Message 48121.
Last modified: 10 Nov 2017 | 2:11:43 UTC

CPUs and GPUs are both just dumb machines. All they can do is crunch ones and zeroes. A programmer writes the code to convert these useless ones and zeroes into something useful. It is easier to write code for CPUs than GPUs, okay I got that. It is not impossible to write code for GPUs, (this and other GPU projects are proof of that), just more difficult.

Calculating 27 is bad example to use. From a human perspective, if that all you have to calculate, either will do fine. The human brain will also do fine. However, if you have a vast quantity of simple calculations (even complicated equations are reduced into ones and zeroes by the computer), which machine will do that faster? And how much human time and effort will it take to write the code to make these ones and zeroes into something useful? At some time and point, the GPU will win out. The more there is to crunch, the larger the GPU victory. Again, this and other GPU projects are examples of that.

Oh, don't underestimate 8th graders, some of them are smarter than professors, and they grow up and some become even smarter. We were all once 8th graders.

This is all logic and common sense, no computer talk mumbo jumbo.





You have absolutely no idea what are you talking about.
That is math not "mumbo jumbo". I see that you are not an 8th grader (or underestimated one) so I hope you can understand this:
https://en.wikipedia.org/wiki/Amdahl%27s_law - very important!

https://en.wikipedia.org/wiki/Analysis_of_parallel_algorithms
"Some problems cannot be split up into parallel portions, as they require the results from a preceding step to effectively carry on with the next step – these are called inherently serial problems." Examples:
https://en.wikipedia.org/wiki/Three-body_problem
https://en.wikipedia.org/wiki/Newton%27s_method

And last but not least - lets say that every algorith can be effectively pararellized (not true) and can be computed on SP core of GPU (not true):
My CPU has 28 CPU Cores and 16GB of RAM so 0,57GB per core. My GPU has 640 SPs and 2GB of RAM so 0,003GB per SP (1080Ti has the same ratio). Can you see the little funny problem here?

This is my favourite part - " Again, this and other GPU projects are examples of that." As logical as this - fishes can run very fast, cheetahs are proof of that!

wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48124 - Posted: 10 Nov 2017 | 3:14:33 UTC - in response to Message 48114.
Last modified: 10 Nov 2017 | 3:30:15 UTC

The latest generation of Intel server CPUs have 28 cores, and you can put 8 such CPUs in a single server (however it costs $13000 each).

The latest AMD server CPUs have 32 cores each. I do not think there is a limit to the number that can be put on a motherboard, but AFAIK, current MBs only support two. They cost $3,000 each.

CPUs are better at iterative (serial) tasks where the input for the next iteration is the output of the previous iteration such as an integration that might be used to calculate a path to the Moon.

GPUs are better at tasks that have multiple sets of data where each data set requires the same calculations.

Both AMD and Intel CPUs have special functions commonly termed "SIMD" for "Single Instruction Multiple Data" (MMX (IIRC) and successive generations) where they are capable of performing the same instruction on multiple sets of data like GPUs do.

Each has their place, as I see it, and I have also heard that there are BOINC projects out there where both GPU and CPU applications exist and they both take the same amount of time to run.

I program and maintain an FEA application professionally. Only portions of that application can be done in parallel. It is not as easy a task as one might think to compute large amounts of data in parallel. For the application I work with, it is basically not possible to program the solver portion as a massively parallel portion of the code because each successive step requires the output from the previous step as input. The exception in that part of the code is that once Gaussian elimination is done, back-substitution can be done in parallel.

The post-processor portion of the code is a different story. Multiple load cases can run in parallel - where the input for the post-processor is the output from the solver step.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,617,042,755
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 48163 - Posted: 13 Nov 2017 | 0:44:55 UTC

Would FPGAs ever be used in a project like this or are GPUs still faster?

wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48179 - Posted: 13 Nov 2017 | 17:44:04 UTC - in response to Message 48163.

Would FPGAs ever be used in a project like this or are GPUs still faster?

The thing about an FPGA is that they are primarily used in low-volume production situations or for low-volume specialized applications. They would require specialized programming, and, in the case of a project like this, the people like you and I who run this project would need to have one, or a board that contains one, in their computer in order to run the project.

Contrast this against a GPU or a CPU where GPUs and CPUs are commonly available. Virtually everyone who has a computer has a CPU and a GPU inside the computer, although, some GPUs are not acceptable to this project.

What makes the GPU/CPU aspect of BOINC attractive to a project like this is that virtually no one has to spend extra money on a specialized board to put in their computer to run the project. If we did have to have a specialized board, outside of the GPU requirements of the project, I doubt that there would be as many people that run this project because the chances are that it would be significantly more expensive for the average computer user to run this, or any project, that required a specialized FPGA device.

This is not to say that a project like this could not be run on an FPGA. It could. However, at least as I see it, the project would constrict itself significantly, and might get very few people who would run it.

To sum it up:

GPU/CPU = widely available, virtually every computer has at least one GPU/CPU - requires no specialized hardware, requires little end-user knowledge.

FPGA = limited availability, project runners would have to spend extra money to buy one, requires specialized programming knowledge that the project would have to support, may require that the project runner (you and I) have specialized knowledge.

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,331,546,800
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48257 - Posted: 26 Nov 2017 | 13:27:02 UTC

Not everyone in the world has a high performance GPU and many enthusiasts still call in using a modem to get online. Even users with the latest and greatest still have available CPU time that could be used by CPU app projects.

If you are running GPUGRID you are using between 35-45 % of a single "CPU". Milkyway (ATI) about 10% and SETI (nVidia) under %10. Any system working for these clients can afford to pick up some of the rather poorer but worthwhile CPU projects. Rosette and SETI come to mind. Rosette of course has no GPU apps while SETI has a whole slew of CPU apps custom made for various CPUs but their GPU apps dont pay very well.

Here is an example of what could be done: I was recently given a generation one i7 system by one of my kids. It was my first "i-anything" as I use core-2 and upgrade only the GPUs every now and then. The i7-920 ran too hot so I replaced it with a really cheap Xeon X5675 that dropped the power by 35 watts and added 2 more cores for a total of 12 hyperthreads. I moved my GTX1070 to it and enabled all processors for use and added the SETI "CPU only" project apps. So while I am racking up points while helping GPUGRID, the SETI projects gets a lot of units crunched that otherwise would not be done.

Post to thread

Message boards : Number crunching : CPU Projects

//