Advanced search

Message boards : Number crunching : GPUGRID WUs hang!

Author Message
w1hue
Send message
Joined: 28 Sep 09
Posts: 13
Credit: 54,140,850
RAC: 69,581
Level
Thr
Scientific publications
watwat
Message 51055 - Posted: 20 Dec 2018 | 0:08:03 UTC

I have been haveing problems with one machine -- Intel CPU, Nvidia GTX 750 TI GPU running Win10 -- GPUGRID WUs run for awile and then hang -- GPU temp drops and load stays at zero. Sometime it will recover after a short while, other times it could remain in this state for days! Suspending the WU and then letting it run again after an hour or so will allow it to recover from the point where in hung -- for awhile, anyway. Reseting the project did not help. I have seen this happen occasionally on my two other machine with GT 730 GPUs. but only rarely. All three are running the same NVIDIA driver software.

I am about ready to remove the project from the problem machine as it is making poor use of its compute capability. No problem running other GPU projects on that machine.


____________

w1hue
Send message
Joined: 28 Sep 09
Posts: 13
Credit: 54,140,850
RAC: 69,581
Level
Thr
Scientific publications
watwat
Message 51065 - Posted: 22 Dec 2018 | 20:48:16 UTC

So... No one else has experienced this problem? Really??

Zalster
Avatar
Send message
Joined: 26 Feb 14
Posts: 139
Credit: 3,199,874,135
RAC: 981,078
Level
Arg
Scientific publications
watwatwat
Message 51066 - Posted: 22 Dec 2018 | 20:58:25 UTC - in response to Message 51065.

Have you tried downgrading the drivers to see if that resolves the issues?
____________

kksplace
Send message
Joined: 4 Mar 18
Posts: 25
Credit: 5,192,825
RAC: 0
Level
Ser
Scientific publications
wat
Message 51067 - Posted: 22 Dec 2018 | 22:56:26 UTC - in response to Message 51065.

This sounds similar to a problem I had when starting GPUGrid with a Windows machine and a 1070 card. As with you, the problem was only with GPUGrid, and not other projects.

In my case the problem was with the BOINC "Suspend when non-BOINC usage is above.." setting. Several people on this forum (see link below) recommended to completely disable this feature. Put simply, it worked for me.

I also unchecked the "Suspend when computer is in use" option. If I need to, I just "Snooze GPU" in BOINC when necessary. This seems to hold the GPUGrid WUs nicely and let them restart vs. the constant interruption with the "Suspend when..." features.

http://www.gpugrid.net/forum_thread.php?id=4699#48749

...and, no, I do not know why the GPUGrid WUs seem to be more prone to this.

Hopefully this also helps your setup.

w1hue
Send message
Joined: 28 Sep 09
Posts: 13
Credit: 54,140,850
RAC: 69,581
Level
Thr
Scientific publications
watwat
Message 51069 - Posted: 23 Dec 2018 | 15:56:58 UTC - in response to Message 51067.

In my case the problem was with the BOINC "Suspend when non-BOINC usage is above.." setting. Several people on this forum (see link below) recommended to completely disable this feature. Put simply, it worked for me.

Thanks -- I'll give that a try!

w1hue
Send message
Joined: 28 Sep 09
Posts: 13
Credit: 54,140,850
RAC: 69,581
Level
Thr
Scientific publications
watwat
Message 51070 - Posted: 23 Dec 2018 | 15:58:11 UTC - in response to Message 51066.

Have you tried downgrading the drivers to see if that resolves the issues?

Yep -- down, up and sideways. No difference.

Zalster
Avatar
Send message
Joined: 26 Feb 14
Posts: 139
Credit: 3,199,874,135
RAC: 981,078
Level
Arg
Scientific publications
watwatwat
Message 51072 - Posted: 23 Dec 2018 | 19:53:59 UTC - in response to Message 51070.

Is the graphic card overclocked other than factory settings?
____________

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 1984
Credit: 13,752,772,069
RAC: 12,340,195
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51073 - Posted: 23 Dec 2018 | 23:57:48 UTC - in response to Message 51072.

Is the graphic card overclocked other than factory settings?

Even a too bold factory overclock could cause the GPUGrid tasks to fail or hang.

w1hue
Send message
Joined: 28 Sep 09
Posts: 13
Credit: 54,140,850
RAC: 69,581
Level
Thr
Scientific publications
watwat
Message 51074 - Posted: 24 Dec 2018 | 19:45:12 UTC - in response to Message 51073.

Is the graphic card overclocked other than factory settings?

Even a too bold factory overclock could cause the GPUGrid tasks to fail or hang.

The 750 TI and 730 cards are mildly overclocked, but the problem is so intermittent I doubt that is the problem. I followed an earlier suggestion to not stop BOINC when the computer is in use or under heavy usage and so far so good -- but probably needs to run for a few weeks before I really know if that solves the problem. If I still get hang-ups, I'll restore one of the 730's to factory settings and see what that does.

Thaks
____________

Erich56
Send message
Joined: 1 Jan 15
Posts: 515
Credit: 2,697,368,969
RAC: 2,555,433
Level
Phe
Scientific publications
watwatwatwat
Message 51117 - Posted: 27 Dec 2018 | 14:57:34 UTC - in response to Message 51067.
Last modified: 27 Dec 2018 | 15:04:45 UTC

deleted - double post

Erich56
Send message
Joined: 1 Jan 15
Posts: 515
Credit: 2,697,368,969
RAC: 2,555,433
Level
Phe
Scientific publications
watwatwatwat
Message 51118 - Posted: 27 Dec 2018 | 15:04:03 UTC - in response to Message 51067.

This sounds similar to a problem I had when starting GPUGrid with a Windows machine and a 1070 card. As with you, the problem was only with GPUGrid, and not other projects.

In my case the problem was with the BOINC "Suspend when non-BOINC usage is above.." setting. Several people on this forum (see link below) recommended to completely disable this feature. Put simply, it worked for me.

I also unchecked the "Suspend when computer is in use" option. If I need to, I just "Snooze GPU" in BOINC when necessary. This seems to hold the GPUGrid WUs nicely and let them restart vs. the constant interruption with the "Suspend when..." features.

http://www.gpugrid.net/forum_thread.php?id=4699#48749

...and, no, I do not know why the GPUGrid WUs seem to be more prone to this.

Hopefully this also helps your setup.


On one of my PCs with a GTX750ti inside, for long time I had the problem that all of a sudden the GPUGRID task was paused - for unknown reasons. For continuing, I had to put the task on "pause", and then "continue" after about 2 minutes.
When I tried other GPU tasks (like Folding@Home), this was not the case.

Rather by coincidence I found out that "Suspend when non-BOINC usage is above..." was set at 50%. After I raised it to 80%, the problem did not reoccur.

kain
Send message
Joined: 3 Sep 14
Posts: 143
Credit: 346,401,591
RAC: 537,771
Level
Asp
Scientific publications
watwatwatwatwat
Message 51119 - Posted: 27 Dec 2018 | 15:36:52 UTC

This problem is very old and was reported many times by many people. With no working solution. I came up with one - I'm crunching GPUGRID on Ubuntu. Windows machines are working on folding@home... Suspendind and resuming task works but it has to be done manually. My computers are headless so I cant do that.

Erich56
Send message
Joined: 1 Jan 15
Posts: 515
Credit: 2,697,368,969
RAC: 2,555,433
Level
Phe
Scientific publications
watwatwatwat
Message 51120 - Posted: 27 Dec 2018 | 15:53:05 UTC - in response to Message 51119.

This problem is very old and was reported many times by many people. With no working solution.

well, as I mentioned: after I had raised the percentage in the BOINC settings, the problem did not reoccur.

kain
Send message
Joined: 3 Sep 14
Posts: 143
Credit: 346,401,591
RAC: 537,771
Level
Asp
Scientific publications
watwatwatwatwat
Message 51124 - Posted: 27 Dec 2018 | 23:52:31 UTC - in response to Message 51120.

This problem is very old and was reported many times by many people. With no working solution.

well, as I mentioned: after I had raised the percentage in the BOINC settings, the problem did not reoccur.


Ok, with no working solution for me. I have 100% all the time.

w1hue
Send message
Joined: 28 Sep 09
Posts: 13
Credit: 54,140,850
RAC: 69,581
Level
Thr
Scientific publications
watwat
Message 51126 - Posted: 28 Dec 2018 | 8:14:59 UTC

Ok, with no working solution for me. I have 100% all the time.

Sounds like a different problem. In my case, GPU usage goes to 0 and it just sits there. But I no longer suspend BOINC tasks when the computer is in use or when CPU usage exceeds a certain level -- and so far, it appears to be working.
____________

Post to thread

Message boards : Number crunching : GPUGRID WUs hang!