Experimental QMML WUs

Message boards : News : Experimental QMML WUs

Author	Message
Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48432 - Posted: 19 Dec 2017 \| 15:19:45 UTC
	We are experimenting with CPU workunits. Right now they are Linux only. Please note that you may need to install "gcc" in your machine for them to work. More details in the Multicore forum!
	ID: 48432 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1288 Credit: 5,105,956,959 RAC: 9,103,887 Level Scientific publications	Message 48435 - Posted: 19 Dec 2017 \| 17:37:15 UTC Last modified: 19 Dec 2017 \| 18:09:51 UTC
	I have not been able to get any of the QC tasks for a week now. I have GCC installed. A post said the most recent tasks have had their priority lowered to allow "low reliability" hosts to get work. I am a new member with no credits since I have been unable to get any work. Is my no credit status still preventing me from getting work? [Edit] I guess this place is just like SETI, make a complaint in the forums and the servers read it and make adjustments. I have a Long GPU and a QC task now.
	ID: 48435 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 332 Credit: 4,019,346,065 RAC: 12,557,672 Level Scientific publications	Message 48443 - Posted: 20 Dec 2017 \| 3:38:26 UTC - in response to Message 48435.
	I have not been able to get any of the QC tasks for a week now. I have GCC installed. A post said the most recent tasks have had their priority lowered to allow "low reliability" hosts to get work. I am a new member with no credits since I have been unable to get any work. Is my no credit status still preventing me from getting work? [Edit] I guess this place is just like SETI, make a complaint in the forums and the servers read it and make adjustments. I have a Long GPU and a QC task now. Well GPU work is scarce. Nearly every day I run out. And CPU work is still under testing. ATM there is no work for anyone. http://www.gpugrid.net/server_status.php
	ID: 48443 \| Rating: 0 \| rate: / Reply Quote

gianni Send message Joined: 11 Jul 08 Posts: 18 Credit: 105,098 RAC: 0 Level Scientific publications	Message 48447 - Posted: 20 Dec 2017 \| 9:38:51 UTC - in response to Message 48443.
	as soon as it is working properly there will be lots of work
	ID: 48447 \| Rating: 0 \| rate: / Reply Quote

Mathieu Send message Joined: 16 Nov 16 Posts: 2 Credit: 65,554,798 RAC: 0 Level Scientific publications	Message 48450 - Posted: 20 Dec 2017 \| 11:34:21 UTC
	Well i am running gpugrid/milkyway@home from windows in order to heat my bedroom in a less wasteful way than a stupid electric resistor heating, unfortunately it seems since the beginning of this experiment gpugrid doesn't use much gpu on windows ^^ So this morning and until further notice i switch to full milkyway@home, i am freezing here !
	ID: 48450 \| Rating: 0 \| rate: / Reply Quote

PappaLitto Send message Joined: 21 Mar 16 Posts: 511 Credit: 4,642,617,755 RAC: 1,961,666 Level Scientific publications	Message 48452 - Posted: 20 Dec 2017 \| 12:03:24 UTC - in response to Message 48450.
	Well i am running gpugrid/milkyway@home from windows in order to heat my bedroom in a less wasteful way than a stupid electric resistor heating, unfortunately it seems since the beginning of this experiment gpugrid doesn't use much gpu on windows ^^ So this morning and until further notice i switch to full milkyway@home, i am freezing here ! I've been heating my home with BOINC for years and I've actually found GPUGrid to have the highest wattage output of most projects. There are multiple ways to increase GPU utilization and therefore not only increase GPU power consumption but a bit more CPU as well. Enabling SWAN_SYNC by searching environment variables in windows search then in the bottom right clicking enviornment variables and adding SWAN_SYNC and changing the 0 to a 1, can yield some better results. Milkyway at home uses the GPU in a very efficient way, taking data from only the GPU cache rather than the GDDR5 memory. The memory uses 15+ watts and is basically unused for the milkyway application. If you want more heat from a backup project consider switching to einstein@home. Einstein also uses quite a bit of CPU to support the GPU computing so this will help with heating. Another option is to buy another nvidia GPU if your motherboard is large enough to support another card. This will definitely be enough to heat your room.
	ID: 48452 \| Rating: 0 \| rate: / Reply Quote

mmonnin Send message Joined: 2 Jul 16 Posts: 332 Credit: 4,019,346,065 RAC: 12,557,672 Level Scientific publications	Message 48457 - Posted: 20 Dec 2017 \| 13:46:58 UTC
	PrimeGrid will heat up a GPU. The GPU boost on my pascal cards is the lowest there meaning it runs the card hard and hits thermal limits. Math projects tend to be the best as they can easily fork to utilize many cores. Linux will utilize a processor better in many instances. And if the GPU is fast enough, 2x GPUGrid tasks can be run in parallel. The longest GPUGrid task atm last only 16-17 hours at 2x on a 1070. MW is double precision so some of the best cards for MW are still older 7970/280x/etc AMD cards. Those are heaters for sure. I wonder how well the V100 will run DP projects.
	ID: 48457 \| Rating: 0 \| rate: / Reply Quote

Mathieu Send message Joined: 16 Nov 16 Posts: 2 Credit: 65,554,798 RAC: 0 Level Scientific publications	Message 48470 - Posted: 21 Dec 2017 \| 10:09:32 UTC Last modified: 21 Dec 2017 \| 10:13:18 UTC
	Thanks for the responses PappaLitto and mmonnin, everything seems to work again this morning for no particular reasons. I did some test, gpugrid + milkyway@home seems a good mix for me, gpugrid uses more GPU Watt but few CPU (~10%) and GPU usage don't get over 80% (maybe caused by memory fetching delay during computing?), milkyway@home use near 100% CPU and GPU usage but less GPU Watt for the reason pointed by PappaLitto. It may be a hardware problem from my GPU, last winter i had no problem running BOINC for weeks, now from time to time i got screen freezes and automatic restart without any error message. +-----------------------------------------------------------------------------+ \|.NVIDIA-SMI.388.59.................Driver.Version:.388.59....................\| \|-------------------------------+----------------------+----------------------+ \|.GPU..Name............TCC/WDDM.\|.Bus-Id........Disp.A.\|.Volatile.Uncorr..ECC.\| \|.Fan..Temp..Perf..Pwr:Usage/Cap\|.........Memory-Usage.\|.GPU-Util..Compute.M..\| \|===============================+======================+======================\| \|...0..GeForce.GTX.970....WDDM..\|.00000000:01:00.0..On.\|..................N/A.\| \|.75%...80C....P2...135W./.180W.\|...1345MiB./..4096MiB.\|.....91%......Default.\| +-------------------------------+----------------------+----------------------+
	ID: 48470 \| Rating: 0 \| rate: / Reply Quote

Failboat Send message Joined: 24 Apr 16 Posts: 2 Credit: 35,317,258 RAC: 0 Level Scientific publications	Message 48479 - Posted: 21 Dec 2017 \| 23:58:30 UTC
	Had an 8-core QC unit named c53-TONI_QMML314rst-0-1-RND8160. After 2 hours elapsed and 15.5 hours of CPU time it was at 50% complete, but the time remaining incremented upwards by 1 with every elapsed second, rather than decreasing. Aborted it.
	ID: 48479 \| Rating: 0 \| rate: / Reply Quote

talister Send message Joined: 4 Aug 09 Posts: 1 Credit: 625,371,576 RAC: 80,858 Level Scientific publications	Message 48481 - Posted: 22 Dec 2017 \| 18:34:19 UTC - in response to Message 48479.
	Had two units (c71-TONI_QMML314rst-0-1-RND2036_2 and c95-TONI_QMML314rst-0-1-RND7455_2) which reached 69.568% and then stalled out there for a total runtime of 26 and 21 hours respectively. This is on a 8-core i7-6700K running CentOS(effectively RHEL) 7.4. Have aborted them.
	ID: 48481 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48482 - Posted: 22 Dec 2017 \| 20:02:44 UTC - in response to Message 48481.
	Had two units (c71-TONI_QMML314rst-0-1-RND2036_2 and c95-TONI_QMML314rst-0-1-RND7455_2) which reached 69.568% and then stalled out there for a total runtime of 26 and 21 hours respectively. This is on a 8-core i7-6700K running CentOS(effectively RHEL) 7.4. Have aborted them. I have found that I have to run only one work unit at a time to prevent them from stalling.
	ID: 48482 \| Rating: 0 \| rate: / Reply Quote

Trotador Send message Joined: 25 Mar 12 Posts: 103 Credit: 9,919,689,893 RAC: 8,198,806 Level Scientific publications	Message 48490 - Posted: 24 Dec 2017 \| 6:42:50 UTC
	Tasks seem to return to 0% completion after BOINC restart in UBUNTU.
	ID: 48490 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48491 - Posted: 24 Dec 2017 \| 7:52:21 UTC - in response to Message 48490.
	Tasks seem to return to 0% completion after BOINC restart in UBUNTU. They return to 1% complete after a reboot for me on Ubuntu. But then, after about 1/2 hour, they go back to where they left off, more or less. I think it is just a startup period that they have to get through first.
	ID: 48491 \| Rating: 0 \| rate: / Reply Quote

Toni Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 9 Dec 08 Posts: 1006 Credit: 5,068,599 RAC: 0 Level Scientific publications	Message 48492 - Posted: 24 Dec 2017 \| 8:05:09 UTC - in response to Message 48491. Last modified: 24 Dec 2017 \| 8:05:40 UTC
	@jim thanks! I think I understand now. There is both a startup phase (actually two), during which the latest version of the software is checked online. This should be relatively fast, but occurs at low %. Then, the loop over the molecules contained in the task are resumed. The progress bar is currently updated only at the end of each completed molecule. So, I can confirm that, apart from the progress bar, the behaviour is correct and does not imply that the task is repeating work already done.
	ID: 48492 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48495 - Posted: 24 Dec 2017 \| 12:38:07 UTC - in response to Message 48492.
	By the way, I have found that while running two work units at once seems to always cause problems with stuck work units and errors, running only one at once does not solve all problems. They often still get stuck, but a reboot fixes it. So it seems to be a necessary, but not sufficient condition for my machines to work properly.
	ID: 48495 \| Rating: 0 \| rate: / Reply Quote

Trotador Send message Joined: 25 Mar 12 Posts: 103 Credit: 9,919,689,893 RAC: 8,198,806 Level Scientific publications	Message 48505 - Posted: 25 Dec 2017 \| 6:51:35 UTC
	Units stuck either at 78.698% or 69.568% with over 1 day processed time and remaining processing time increasing. Should I let them continue or abort?
	ID: 48505 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48507 - Posted: 25 Dec 2017 \| 13:36:50 UTC - in response to Message 48505. Last modified: 25 Dec 2017 \| 13:53:04 UTC
	Units stuck either at 78.698% or 69.568% with over 1 day processed time and remaining processing time increasing. Should I let them continue or abort? A reboot usually fixes them, but as noted below they will return to a low value before you see any progress past the point where you left them. EDIT: But I am wondering whether that is "elapsed time", in which case one day is probably too long, or else "CPU time" (shown in parenthesis in BoincTasks). One day of CPU time is not unreasonably long for those percentages, and I would just wait a couple of hours to see if you make progress past those points. They do get stuck there for a while, but temporarily. (That is the problem with running this project. You never know if it is working normally or not.)
	ID: 48507 \| Rating: 0 \| rate: / Reply Quote

Jim1348 Send message Joined: 28 Jul 12 Posts: 819 Credit: 1,591,285,971 RAC: 0 Level Scientific publications	Message 48508 - Posted: 25 Dec 2017 \| 16:14:11 UTC - in response to Message 48507.
	I just rebooted one myself that was stuck at 78.698% too long, and it failed. You never know what you will get. http://www.gpugrid.net/result.php?resultid=16790478
	ID: 48508 \| Rating: 0 \| rate: / Reply Quote

Tiger Send message Joined: 30 Jan 15 Posts: 7 Credit: 402,017,837 RAC: 0 Level Scientific publications	Message 48566 - Posted: 31 Dec 2017 \| 16:38:01 UTC - in response to Message 48450.
	:D
	ID: 48566 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : News : Experimental QMML WUs

	About	Science	Volunteers	Performance	Forum	Join us	Donate