Advanced search

Message boards : Number crunching : gtx1070 upgrade from 670 and getting too many errors

Author Message
Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,331,546,800
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46205 - Posted: 17 Jan 2017 | 5:08:27 UTC

I thought the problem was SLI but even after removing the SLI cable on this host I am still getting errors.

It seems I even get credit when an error occurs, why?

I upgraded to 376.33 and BM 7.6.33 but still got errors. Last valid credit was on 375.95 but that version also had errors.

of the 10 valid credits every one has an error message:
-------

SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1965.
-------

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46206 - Posted: 17 Jan 2017 | 9:46:48 UTC - in response to Message 46205.

It sounds like a driver corruption. If a simple system restart does not help, then you should:
1. Download Display Driver Uninstaller
2. Suspend all projects in BOINC manager
3. uninstall NVidia driver through "Programs and services" (in Windows 10 you can right click on the start button, and it's at the top)
4. Restart your PC in safe mode by shift+click on the restart (After your PC restarts to the Choose an option screen, select Troubleshoot -> Advanced options -> Startup Settings -> Restart. After your PC restarts (again), you'll see a list of options. Press 4 or F4 to start your PC in Safe Mode. Or if you'll need to use the Internet, select 5 or F5 for Safe Mode with Networking.)
5. Start Display Driver Uninstaller, and remove all NVidia drivers
6. Restart your PC in normal mode
7. Install the latest NVidia driver
8. Restart your PC
9. Resume all projects in BOINC manager

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,331,546,800
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46207 - Posted: 17 Jan 2017 | 13:57:54 UTC - in response to Message 46206.

going to run one board at time. Possibly bad board. should have tested each board first.

3de64piB5uZAS6SUNt1GFDU9d...
Avatar
Send message
Joined: 20 Apr 15
Posts: 285
Credit: 1,102,216,607
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwat
Message 46209 - Posted: 17 Jan 2017 | 15:43:35 UTC
Last modified: 17 Jan 2017 | 15:44:47 UTC

As Zoltan already wrote, a clean re-install of drivers would be essential. Having said this ... and I don't know if applicable here as well... I have had lot of trouble with Folding@Home, my gtx 1070 and recent Nvidia Drivers. The driver crashed all the time. Stepping back to driver 368.xx was the answer to the problem.

Well, GPUGRID is not F@H so maybe it's comparing apples and oranges ... but worth a try even so.
____________
I would love to see HCF1 protein folding and interaction simulations to help my little boy... someday.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 511
Credit: 4,617,042,755
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwat
Message 46210 - Posted: 17 Jan 2017 | 16:00:14 UTC - in response to Message 46207.

going to run one board at time. Possibly bad board. should have tested each board first.


Were you running SLI 1070s?

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,331,546,800
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 46227 - Posted: 18 Jan 2017 | 17:11:18 UTC - in response to Message 46210.
Last modified: 18 Jan 2017 | 17:11:55 UTC

Yes, I had SLI enabled but I pulled the cable connecting the two boards after reading about the problem. Maybe I had to go to the control panel and make additional changes before pulling the cable and rebooting??

Anyway, I completed and validated two WU's and have just swapped boards to check out the other 1070.

I also had problem with einstein on the 1070s running that older driver and they suggested upgrading to the latest.

This system, eVga 132-YW-E178-FTW had 2 PCIe-16 "ver 2" and 1 PCIe-16 "ver 1" in slots 1,3,2 respectively according to the documentation. The 1070 is "ver 3" but seem to work OK in the "ver 2" slots and supposedly the mombo can run 3 boards in SLI. This system will not boot unless the two video boards are in the "ver 2" slots as I get the error message "SLI must be in slots 1 and 3" and bios remains in post mode. This occurs whether I have the SLI cable connected or not. I do not use SLI. I do a lot of video conversions and the programs I use require CUDA to perform the conversions.

Post to thread

Message boards : Number crunching : gtx1070 upgrade from 670 and getting too many errors

//