Author |
Message |
SphynxSend message
Joined: 7 Dec 10 Posts: 18 Credit: 519,717,317 RAC: 3,198 Level
Scientific publications
|
Hope someone can help with this issue. I recently replaced a Asus ENGgts450 Top with an EVGA gtx570 reference. The 450 ran without a problem, but I couldn't resist the 570. Current nvidia drivers are installed. The 570 is in the pci-e x16 slot and works fine by itself. Since letting the 450 sit, when it could be also working a gpugrid wu seemed like a waste, I decided to install it in the pci-e x4 slot I had available. Both GPUs work fine for about 6 - 7 hours, then I get the BSOD which indicates file dxgmms1.sys as the problem. This happens consistently.
I've uninstalled and reinstalled nvidia drivers, even rolled them back....no change. I've updated the bios. prime95 and memtest came back with no errors. No heat related issue (mid to high 60c both cards.) When I remove the 450 everything works perfectly. Any ideas?
Al
AMD Phenom 970 BE
GigaByte GA-870A-UD3
Liquid Cooling
WD 640gb SATA-III
8gb G.Skill DDR3/1600MHz
Corsair HX850W
Evga GTX 570
Asus ENGGTS450 TOP (sometimes)
Win 7
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Try here,
http://nzgeek.org/blog/dxgmms1-sys-crash-blue-screen-of-death/
"There have been a number of issues reported experiencing Blue Screen Of Death (BSOD) crash on Vista and Windows 7 related to dxgmms1.sys and nvlddmkm.sys files..." |
|
|
SphynxSend message
Joined: 7 Dec 10 Posts: 18 Credit: 519,717,317 RAC: 3,198 Level
Scientific publications
|
No luck so far. The only thing I haven't tried is allowing windows to install default drivers. Would doing so cause a problem with cuda or running WU using cuda? |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Do not use the Windows default drivers!
Your last 9 tasks completed without error and I didn’t see any of these types of messages,
# Device 0: "�"
# Clock rate: 0.00 GHz
So if you are still getting blue screens I would say your driver is probably not the issue.
What else are you doing?
Gaming, watching movies, rendering...
What else are you crunching? Alphas, Betas, heavy tasks... and on how many CPU cores?
Any bespoke Boinc settings?
Have you performed a disk check any time recently?
Is Any part of your system overclocked or overvolted, especially the RAM?
Have you checked the RAM using the Win7 disk (or by creating a Microsoft RAM checking disk and booting to it)?
What’s are the CPU and chipset temps?
While the GTX570 mostly exhausts the heat, your GTS450 expels heat into the case so there is a fair chance the system is heating up elsewhere. You could leave the case door off for a while to see if that helps. If it does then you have a cooling issue, probably on another component.
PS. The best way to do a driver installation when you have multiple cards is to uninstall the driver, restart into safe mode, install the latest (or chosen) driver, and restart normally. Sometimes a driver removal tool helps to delete all the files fully. |
|
|
SphynxSend message
Joined: 7 Dec 10 Posts: 18 Credit: 519,717,317 RAC: 3,198 Level
Scientific publications
|
I don't do anything with my system except distributive computing and normal home stuff. No gaming or movies.
I crunch Seti, GPUGRID, Collazt and Milkyway - all type of WU.
Disk check done 3 days ago - no errors.
All overclocking has been returned to stock and never overvolted, including ram.
I've used memtest, prime95 and the windows performance checker on ram and cpus, no errors.
I'm using all 4 cpus for crunching and temps never exceed 60c - gpu temps vary but rarely get over 69c.
I'll try leaving the case open to see if that helps with a possible overheating issue I'm not aware of.
Do you think it would help if I just did GPUGRID to see if the BSOD returns and then add projects one at a time to see if the problem returns?
Frustrating! Thanks for the suggestions and we'll see what happens.
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Do you think it would help if I just did GPUGRID to see if the BSOD returns and then add projects one at a time to see if the problem returns?
Yes, I think that's a good idea, but if you already left the door open, leave things as they are for a while and see if that improves the situation. Doing so might confirm that there is a heat problem.
It's highly likely that some combinations of task increase heat. Depending on the tasks, power requirement of a system can change dramatically; I have observed at least a 30W increase at times from the CPU alone and similar between lightweight and more intensive GPU tasks. So by limiting your system to GPUGrid only tasks you will remove any possibility of the CPU producing a peak of heat at the same time as the GPU.
Using swan_sync=0 and freeing up a CPU core for every GPU can also improve stability as well as performance.
Good luck, |
|
|
SphynxSend message
Joined: 7 Dec 10 Posts: 18 Credit: 519,717,317 RAC: 3,198 Level
Scientific publications
|
Here's a bit of an update. I left the case open and temps did drop a bit, but none no more than 5c. Got another BSOD running all tasks after about 16 hours...an improvement of 10 hours. It had run all night without problems and I thought I was in the clear. Let me say, you wouldn't have wanted to be here when I got home and found the BSOD looking at me...it wasn't pretty!
I then suspended all tasks except for gpugrid and ran for 36 hours without a BSOD. I then started 1 task (along with gpugrid), collazt conjecture which immediately started cpu and cuda wu's and got the blue screen within 2 minutes. I have suspended all collazt tasks and am now trying Einstein along with gupgrid. We are 2 hours in and everything is okay.
My plan is to add 1 project every 24 hours until I get another BSOD. At this point I don't believe collazt made the cut.
I realize this is probably the wrong forum to ask this, but have you heard of this happening before?
|
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
Thanks for the update Al.
My guess is that Collatz is over taxing the GPU, probably crashing the system because it requires too much voltage. If you have a power meter/tester it would be useful to see what if any difference in power is required to run the Collatz tasks over the GPUGrid tasks. Alternatively, you would observe higher GPU Utilization and higher temperatures when running tasks that use more power. So you can do this indirectly using a tool such as EVGA Precision.
You should be aware however that GPUGrid runs several tasks and power usages can vary. The GIANNI_DHFR1000 tasks use the most power at GPUGrid; they have the highest GPU utilization and are the fastest.
You might consider overvolting (or underclocking) your GTX460. Some manufacturers cut the voltages too thin. This has been observed several times at GPUGrid and for other GPU projects.
Good luck, |
|
|
SphynxSend message
Joined: 7 Dec 10 Posts: 18 Credit: 519,717,317 RAC: 3,198 Level
Scientific publications
|
Just an update: Voltage doesn't seem to be a huge issue. With 2 GPUGRID tasks running I sit at about 400 watts. With 2 collazt cuda23 tasks about 430...the 30 watts you mentioned. I don't feel this is a power issue, but I could be wrong. I have 2 P4 Win XP systems (each with a GT 430 and they run fine with collatz.) I ran my system (minus the collazt cuda23 wu's) for several days without a problem. As soon as I allowed a cuda23 wu it failed again. Argh!
Anyway, the easiest solution is not always the most elegant. I just won't allow collatz cuda23 on my GTX 570 or GTS 450, I'll dedicate them to GUDGRID.
Thanks for all the suggestions.
Al |
|
|
SphynxSend message
Joined: 7 Dec 10 Posts: 18 Credit: 519,717,317 RAC: 3,198 Level
Scientific publications
|
BTW - Idid try underclocking both gpus prior to allowing the cuda23 tasks and it still failed. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
The problem is the Collatz 2.3 WU's; Fermi cards don't run cuda 2.3 tasks.
|
|
|
SphynxSend message
Joined: 7 Dec 10 Posts: 18 Credit: 519,717,317 RAC: 3,198 Level
Scientific publications
|
Wow...that was easy. Thanks |
|
|