Advanced search

Message boards : Graphics cards (GPUs) : More bad WUs? ------ KASHIF_HIVPR_auto_spawn

Author Message
Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17094 - Posted: 17 May 2010 | 18:17:43 UTC

Seems like many if not most of the KASHIF_HIVPR_auto_spawn WUs are failing :-(

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17095 - Posted: 17 May 2010 | 18:54:44 UTC - in response to Message 17094.

Seems like many if not most of the KASHIF_HIVPR_auto_spawn WUs are failing :-(


Would you like to point to a few to back up that statement?


____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17096 - Posted: 17 May 2010 | 19:20:06 UTC - in response to Message 17095.

Seems like many if not most of the KASHIF_HIVPR_auto_spawn WUs are failing :-(

Would you like to point to a few to back up that statement?

Looks like they were just released today, but here's some of the results I've found so far:

http://www.gpugrid.net/workunit.php?wuid=1483784
http://www.gpugrid.net/workunit.php?wuid=1483852
http://www.gpugrid.net/workunit.php?wuid=1483846
http://www.gpugrid.net/workunit.php?wuid=1483936
http://www.gpugrid.net/workunit.php?wuid=1483879
http://www.gpugrid.net/workunit.php?wuid=1483947
http://www.gpugrid.net/workunit.php?wuid=1483953
http://www.gpugrid.net/workunit.php?wuid=1483863
http://www.gpugrid.net/workunit.php?wuid=1483862
http://www.gpugrid.net/workunit.php?wuid=1483861
http://www.gpugrid.net/workunit.php?wuid=1483787
http://www.gpugrid.net/workunit.php?wuid=1483792
http://www.gpugrid.net/workunit.php?wuid=1483799
http://www.gpugrid.net/workunit.php?wuid=1483805

Seems like whenever someone reports a problem on this forum, people get all defensive. These bad WUs are simple to find taking 5 minutes to look. They're so new that most of them have only one failure so far, but I've only found 2 that completed. I'd post a lot more but it's such a pain to add URLs on this forum...


Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17097 - Posted: 17 May 2010 | 19:40:30 UTC - in response to Message 17096.
Last modified: 17 May 2010 | 19:42:57 UTC

Thanks for that, they all appear to have failed within a few seconds and I've got one on one of my machines that has been running over 3 hours now so we'll see what happens. it had been to someone else and failed within a few seconds there as well.

Here
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17098 - Posted: 17 May 2010 | 19:51:13 UTC - in response to Message 17097.

Could be that it's a big coincidence but thought I'd report it since pretty much all I was finding was failed ones.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17099 - Posted: 17 May 2010 | 19:53:02 UTC - in response to Message 17096.

Just had a quick look at 6 of those units you posted and the machines that errored are not great examples of anything other than they produce a lot of errored WU's of all types.
Couldn't be bothered to look through the whole list.



____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17100 - Posted: 17 May 2010 | 20:02:35 UTC

The reason I noticed it at all was that the one that failed for me was on a GPU that seldom ever fails, so started looking at the results from the top RAC machines. BTW, the above comment about "defensiveness" was not aimed at you :-)

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17101 - Posted: 17 May 2010 | 20:12:47 UTC - in response to Message 17100.

It's okay, It didn't bother me as I'm the last person to defend a project if they've got it wrong. :)

I'll let you know if I have any problems with these WU's.



____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17102 - Posted: 17 May 2010 | 22:30:01 UTC - in response to Message 17101.

I for one have had a nightmare when it comes to 6.72 WU failures.
We eventually got it sorted, well sort of - the 197.xx drivers were to blame on my Vista x64 machine with four GTX240's, but I tried so many different drivers on my GTX260sp216, that in the end I just gave up.
It is on Win7 x64 - the same as your two GTX275's!
My GTX260 is a good card, and kept failing immediately, even when natively clocked and temps under 60 deg C. I see you dripped your clocks as well, just in case.

I guess you are having a similar problem.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17104 - Posted: 17 May 2010 | 22:56:32 UTC - in response to Message 17102.

I for one have had a nightmare when it comes to 6.72 WU failures.
We eventually got it sorted, well sort of - the 197.xx drivers were to blame on my Vista x64 machine with four GTX240's, but I tried so many different drivers on my GTX260sp216, that in the end I just gave up.
It is on Win7 x64 - the same as your two GTX275's!
My GTX260 is a good card, and kept failing immediately, even when natively clocked and temps under 60 deg C. I see you dripped your clocks as well, just in case.

I guess you are having a similar problem.

Neither of us has a GTX 275, let alone 2. But if you'd like to send me a couple I'd be happy to accept them as a gift :-)

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17105 - Posted: 17 May 2010 | 23:48:11 UTC - in response to Message 17102.

My 2 GT240's are working fine with that setup SK but my 2 remote machines (core2 E4300 and Dual Core E6300) are giving me problems. Why is it always remote machines you have problems with?

You started OK with your 4 GT240's don't know what the problem could be. I have mine clocked at Core 640, Memory 2000, and Shaders 1580 for my 2 GT240 GDDR5 on my Quad and Core 630, Memory 840, and Shaders 1580 on my GDDR3 machines. All clocks are as GPUZ shows them.


____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17109 - Posted: 18 May 2010 | 6:47:04 UTC

1 success and 1 failure to date for these units "KASHIF_HIVPR_auto_spawn"

Fail

Success



____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17112 - Posted: 18 May 2010 | 8:47:21 UTC

My really stable 295 failed 12 of these with the following error:

ERROR: file ntnbrlist.cpp line 63: Insufficent memory available for pairlists. Set pairlistdist to match the cutoff


It also failed 2 TONI (alos 6.72) with a error:
SWAN: FATAL : swanMalloc failed

I am now working on 2 more TONIs which are OK after 2+ hours so I think they will be fine.


____________
Thanks - Steve

ftpd
Send message
Joined: 6 Jun 08
Posts: 152
Credit: 328,250,382
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17113 - Posted: 18 May 2010 | 8:55:00 UTC

Task 2347063 OK with gtx470.
____________
Ton (ftpd) Netherlands

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17116 - Posted: 18 May 2010 | 9:59:44 UTC - in response to Message 17104.
Last modified: 18 May 2010 | 10:18:58 UTC

Beyond, I meant GTX260. On which we were/are both having problems with 6.72 tasks under Win7.
I tried several drivers and clients but the 6.72 WUs still failed (running native).

KASHIF_HIVPR WU are also failing for me on my GT240s. This is after going back to the 19621 driver (which is still working for all TONI_HERG 6.72 WUs).

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17123 - Posted: 18 May 2010 | 15:18:30 UTC - in response to Message 17116.

Beyond, I meant GTX260. On which we were/are both having problems with 6.72 tasks under Win7.
I tried several drivers and clients but the 6.72 WUs still failed (running native).

Actually if you look at that machine, it really has a GTX 260 and a GT 240. For some reason the BOINC server code lists 2 GTX 260s. I reported the problem on the BOINC Dev list. It's running XP64 (not Win7) and NV 197.45. Also it's not having problems with v6.72 WUs. There have been 86 successful, 2 failures on the GT 240 and 2 on the GTX 260. Those were my fault as I was experimenting with higher shader clocks. For me the v6.72 WUs have been FAR more reliable than v6.03. I've been following your messages about various drivers/OSes/BOINC vers with the GT 240. I have 3 of the GDDR5 models running for a long time and have had no issues with any of them with any BOINC version. They've run on XP32, XP64, Win7-32 and Win7-64 machines at various times with no problems. They've run on NV v195.62, 196.21 and v197.45 with no problems. They've run on a large variety of BOINC clients from 6.10.18 up to v6.10.45 with no problems. Will be going to v6.10.56 today on some. Have you tried pulling one of your GT 240 cards from that 4x machine yet?

KASHIF_HIVPR WU are also failing for me on my GT240s. This is after going back to the 19621 driver (which is still working for all TONI_HERG 6.72 WUs).

I notice that most of the KASHIF_HIVPR_auto_spawn WUs that I listed above are now validating with subsequent machines so I probably jumped the gun by posting this thread. Like I said before, trying to be proactive. Seems they run fine on some machines, not so fine on others.

Profile liveonc
Avatar
Send message
Joined: 1 Jan 10
Posts: 292
Credit: 41,567,650
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 17126 - Posted: 18 May 2010 | 16:00:18 UTC
Last modified: 18 May 2010 | 16:07:54 UTC

What's the temps on these failed WU's? I know that there is a need for speed & it's always nicer to get these WU's done faster. But it's "my personal opinion" that GPUGRID is becoming more & more elitist. It's Enthusiast friendly, Consumer unfriendly when WU's become so aggressive that even stock clocked GPU's are failing due to overheating. Not everybody has water cooled systems. I've moved around just about every cable to enhance the airflow of my PC's, but now that Linux is so good & Windows is so slow, I'm almost forced to stop using Linux because my GPU's are running at 80-90 degrees on Linux no matter how they're clocked or how great the airflow is. I've got 2 PC's in enclosures that aren't meant for this, & the only way to improve airflow there, is to get new enclosures.

If GPUGRID keeps on improving their WU's for Windows, I soon won't be able to use Windows either. But that's no reason not to. I'm just itching for an option to throttle down the use of GPU, as is possible to set a max CPU use of x%.
____________

Snow Crash
Send message
Joined: 4 Apr 09
Posts: 450
Credit: 539,316,349
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17127 - Posted: 18 May 2010 | 16:27:23 UTC - in response to Message 17126.

Heat is not a problem for me ... the 295 that has recieved lots of errors is running a cool 75 degrees c.

The errors as I posted above appear to be memory allocation issues ... with
1896 MB shared between both cards I doubt it really is "insufficient".

Perhaps CUDA or the driver is reporting incorrectly or a bug at some internal condition? That will not be easy for the GPUGrid devs to identify buit because the errors are always the same that might point in a particular direction for investigation.

When HERG WUs fail (some do pass) it is always the same error =
"SWAN: FATAL : swanMalloc failed"

When KAHIF WUs fail (I havbe not had any success yet with them) it is always the same error =
"ERROR: file ntnbrlist.cpp line 63: Insufficent memory available for pairlists. Set pairlistdist to match the cutoff."

This is a dedicated cruncher so the only other thing going on is WCG which should not matter.
____________
Thanks - Steve

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17129 - Posted: 18 May 2010 | 16:39:17 UTC - in response to Message 17126.
Last modified: 18 May 2010 | 16:41:03 UTC

What's the temps on these failed WU's? I know that there is a need for speed & it's always nicer to get these WU's done faster. But it's "my personal opinion" that GPUGRID is becoming more & more elitist. It's Enthusiast friendly, Consumer unfriendly when WU's become so aggressive that even stock clocked GPU's are failing due to overheating. Not everybody has water cooled systems. I've moved around just about every cable to enhance the airflow of my PC's, but now that Linux is so good & Windows is so slow, I'm almost forced to stop using Linux because my GPU's are running at 80-90 degrees on Linux no matter how they're clocked or how great the airflow is. I've got 2 PC's in enclosures that aren't meant for this, & the only way to improve airflow there, is to get new enclosures.

If GPUGRID keeps on improving their WU's for Windows, I soon won't be able to use Windows either. But that's no reason not to. I'm just itching for an option to throttle down the use of GPU, as is possible to set a max CPU use of x%.

Actually it looks like your Win7 GTX 260 machine is running the v6.72 WUs at a higher credit/hour rate than your similar cards in Linux. There would be even a larger difference if you were running XP. I'm currently using Win7 and XP for my GPUGRID crunching, all cards are at 54C - 64C, max GPU fan is 57%. I use Antec 300 cases which can often be had for around $50 and add two extra low to mid speed 120mm fans: 1 in the front and 1 on the side, both blowing inward. Most are running 2 GPUs. An efficient PSU is also helpful.

The XP machines are running at 93-96% GPU versus 80-85% in Win7, undoubtedly the reason that XP is markedly faster in GPUGRID. I'm hoping that the client can be changed so that Win7 will run in the 90%+ range.

I do hear what you're saying though. GPU computing of any kind produces considerable heat and the stock Dell, HP, etc. machines are not built to handle the loads. Better to build our own machines:-)

Profile Bikermatt
Send message
Joined: 8 Apr 10
Posts: 37
Credit: 3,839,902,185
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17138 - Posted: 18 May 2010 | 19:45:24 UTC

They have all failed an my system within 10 seconds. I am running BOINC 6.10.43 and 197.13 driver on 3 GT 240s.
-Matt




Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,602,236,851
RAC: 8,763,004
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17139 - Posted: 18 May 2010 | 19:51:36 UTC

Just finished one of these on a Fermi (470) - 285-KASHIF_HIVPR_auto_spawn_2_90_ba1-0-100-RND3939_2.

Two errors before mine, on a GT 240M and a GTX 295, which possibly suggests that they're tough. Also, I saw a lot more screen freezing while it was running, for up to 10 seconds at a time: made the computer almost unusable. I'm running with Swan_Sync=0, and BOINC restricted to 7 out of 8 cores.

BTW, although this is a "stock Dell", I don't think cooling is going to be the issue. It's a Precision 490 workstation, with two factory-fitted front case fans, and even a separate dinky little fan angled onto the RAM. Eeven though it's three and a half years old, it took Windows 7 and the Fermi with no problems at all.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17140 - Posted: 18 May 2010 | 20:04:30 UTC - in response to Message 17139.
Last modified: 18 May 2010 | 20:29:20 UTC

This is my present situation:

My GTX470 is working well on XP x86. It worked slowly on Win7, and as far as I know this is the case for everyone using a Fermi’s on Win7.

My GTX260sp216 works well for 6.03 tasks on Win7 x64, but will not crunch any 6.72 WU’s of any kind - Tried many versions of Boinc and many drivers. It is a solid card and has been working well for many months on many other WUs and in several systems. I will move that card into a different system at some stage; when the 6.03 WU’s run out. I expect a much earlier driver would do the trick, but I have had enough with drivers for one week.

My single GT240 systems are reasonably stable. These are on XP, and Win7.
One Win 7 card occasionally drops the shaders to 400MHz, and then misses the bonus deadline (working on that), but I did solve the intermittent connection problem it had; It is networked using a wireless USB dongle, and the system very occasionally disabled the USB ports in some sort of power saving effort (apparently randomly and despite high performance power mode being selected). It took a Bios update and a re-configuration of the advanced Power Saving features (USB) – so much for plug and play!
This Win7 card (and the other) can crunch the TONY_HERG and IBUCH 6.72 WU’s but not the KASHIF_HIVPR 6.72 WU’s (using 196.34 & 196.21).

The XP x86 card crunches everything perfectly (197.45, Boinc 6.10.51).

My four card system is doing reasonably well on Vista x64:
It crunches the TONI_HERG 6.72 WU’s very well, no failures with these since 6.10.21 went on. However, it fails ALL the KASHIF_HIVPR 6.72 WU’s (usually in about 8seconds).
Using Boinc 6.10.56. My RAC with that system is 54K and rising towards the potential 70K. Pulling a card won’t make any difference; I have already demonstrated that the system can crunch 6.72 WU’s and that the initial problem was the driver, and not power. Perhaps this is a different driver issue or a WU issue. Perhaps the KASHIF_HIVPR 6.72 WU’s just don’t like any of my XFX GT240 DDR5 cards, or my Gigabyte DDR5 card or my Gigabyte DDR3 card, or my GTX260?
I’m getting the impression that Vista & Win7 don’t like KASHIF_HIVPR 6.72 WU’s.

My temperatures are all fine:
GT240’s all below 65 deg C.
GTX260 60 deg C (native)
GTX470 is about 78 deg C, but is OC’d.

liveonc, if you can, try removing a back plate close to the GPU, it sometimes helps a bit. Could you turn the GPU fan speed up?

I also doubt that 475MB RAM is not enough to run the tasks. I suspect that the Boinc code is looking for exactly 512MB video RAM, and the latest drivers are reporting less. As this only seems to be the case with Vista & Win 7, I suspect the drivers are specifically allocating this to Aero in order to prevent other applications trying to use it and crashing apps and systems. The code might be detecting an error in the difference between two reported values from the drivers. Perhaps the drivers are saying this is what the card has and then this is what is available, and as these don’t match (Aero or other) Boinc reports that some of the RAM is erroneous and ends the tasks early. ???

Profile Paul D. Buck
Send message
Joined: 9 Jun 08
Posts: 1050
Credit: 37,321,185
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 17144 - Posted: 18 May 2010 | 20:55:56 UTC - in response to Message 17140.

skgiven,

BOINC 6.10.51 will randomly stop running tasks on your GPUs, it also has a memory leak... move up to 6.10.56 which has neither of these problems ... don't know if it will cure any of the issues you are having for sure, but it cannot hurt to use the better version ... all the versions between 6.10.45 and 6.10.55 have these two issues at the very least ... I know, I tried most of them ...

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17204 - Posted: 21 May 2010 | 15:25:32 UTC - in response to Message 17123.

KASHIF_HIVPR WU are also failing for me on my GT240s. This is after going back to the 19621 driver (which is still working for all TONI_HERG 6.72 WUs).

I notice that most of the KASHIF_HIVPR_auto_spawn WUs that I listed above are now validating with subsequent machines so I probably jumped the gun by posting this thread. Like I said before, trying to be proactive. Seems they run fine on some machines, not so fine on others.

An update on the original topic. The KASHIF_HIVPR_auto_spawn WUs are running fine on my GTX 260 but so far have not worked on any of my GT 240 cards. The theory posted above by Richard that these WUs are "tough" may be the answer. Has anyone had success with them on a GT 240?

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17212 - Posted: 21 May 2010 | 20:04:08 UTC

This is what I'm getting on every KASHIF_HIVPR_auto_spawn on any GT 240, doesn't matter what driver or OS:


<core_client_version>6.10.56</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GT 240"
# Clock rate: 1.55 GHz
# Total amount of global memory: 536870912 bytes
# Number of multiprocessors: 12
# Number of cores: 96
ERROR: file ntnbrlist.cpp line 63: Insufficent memory available for pairlists. Set pairlistdist to match the cutoff.
called boinc_finish

</stderr_txt>
]]>

Seems the KASHIF_HIVPR_auto_spawn WUs are asking for more memory than 512k?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17213 - Posted: 21 May 2010 | 23:47:37 UTC - in response to Message 17212.

Only one GT240 works for me:
XP Pro SP3 x86, driver 19745, was Boinc 6.10.51, now moved to 6.10.56.

I have had no failures on that card for any task, but Boinc did eventually lock up, hence the late move to 6.10.56.
Shaders at 1.6GHz working fine on a DDR3 card.

http://www.gpugrid.net/result.php?resultid=2349243

Looks like we are on our own when it comes to drivers, so please, post up any working specs!

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17217 - Posted: 22 May 2010 | 11:52:30 UTC - in response to Message 17213.

Those units succeed on all my gt240 1 gig gddr3 cards but fail on my gt240's with 512mb gddr5 cards.
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,602,236,851
RAC: 8,763,004
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17278 - Posted: 25 May 2010 | 8:56:09 UTC

WU 1496749 looks like a bad job, by any standard and on any card. Just failed on my Fermi.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 17286 - Posted: 25 May 2010 | 11:51:24 UTC - in response to Message 17278.

Hopefully they fail immediately and don't cause credit loss.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17289 - Posted: 25 May 2010 | 12:02:42 UTC - in response to Message 17217.

Those units succeed on all my gt240 1 gig gddr3 cards but fail on my gt240's with 512mb gddr5 cards.


That seems to be the case with Win7.
Vista is a similar picture.

I have a 512MB GDDR3 GT240 on Win XP x86 and it has still not had one error for any task including the KASHIF_HIVPR WU's. I might try the GTX260 in that system, with an early driver to see how it fairs, as it is sitting idle.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,602,236,851
RAC: 8,763,004
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17325 - Posted: 26 May 2010 | 7:47:02 UTC - in response to Message 17286.

Hopefully they fail immediately and don't cause credit loss.

Yes, they do fail quickly, like today's WU 1496749, but they still cause research data loss ;-)

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17882 - Posted: 5 Jul 2010 | 11:03:48 UTC - in response to Message 17289.

My GTX260 worked well on XP and on Linux.
Unfortunately all the HIV WUs still seem to fail after about 13sec for me on systems with a GT240 and only 512MB RAM. This Vista System for example.
Fortunately the other WUs run fine and I am picking up enough of them.

KASHIF_HIVPR does not run on these systems,
19621 drivers, 4xGT240 (512MB GDDR5), Vista x64, Phenom II 940, 4GB RAM, 1TB Drive.
19562 drivers also fail on Microsoft Windows Server 2008 R2 x64 with a GT240 (512MB GDDR5).
19621 drivers also fail on Win 7 x64, again with a GT240 (512MB).

KASHIF_HIVPR does work on these Windows setups,
19745 drivers work for Win XP x86, again with a GT240 (512MB), as do other drivers.
19634 drivers work on Win7 for a GT240 (1024MB).

It is clear (as reported before) that the issue is with the amount of RAM on the card, and the operating system; if it has 512MB it will not complete KASHIF_HIVPR tasks on Vista, Win 7 or 2008 R2 Server with any driver, but it will work with XP and Linux. This is with a range of drivers from 19562 through to 19745. I have not put the latest drivers on, as these further slow crunching down on Vista and Win7.

If the 1GB cards succeed while the 512MB cards fail (depending on driver and OS) then this might continue into the future, and we are about to see another wave of Fermi cards with varying RAM amounts – just to make things more complicated.

Perhaps the servers could be made to distinguish between cards that fail and succeed on each task type and allocate accordingly. It would reduce the Internet overhead and improve performance slightly. At the minute this is done on a system to any task failure rate, rather than a card to individual task type failure rate.

If I have a GT240 that under a given OS can run one task type perfectly but fails others, I would like it just to pick up the tasks that it will run successfully. Picking up tasks randomly can lead to picking up no tasks. Obviously with new tasks there will be a learning period, but you could send out a few new tasks compared to many known working tasks. Crunchers don’t have the option to select projects so we cannot do this for ourselves, and automated systems tend to be more fool proof.

Doing this might also map good drivers to cards for specific work units. Something that could perhaps be published on the site from time to time.


Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17884 - Posted: 5 Jul 2010 | 13:28:52 UTC - in response to Message 17882.

Unfortunately all the HIV WUs still seem to fail after about 13sec for me on systems with a GT240 and only 512MB RAM. This Vista System for example.
Fortunately the other WUs run fine and I am picking up enough of them.

KASHIF_HIVPR does not run on these systems,
19621 drivers, 4xGT240 (512MB GDDR5), Vista x64, Phenom II 940, 4GB RAM, 1TB Drive.
19562 drivers also fail on Microsoft Windows Server 2008 R2 x64 with a GT240 (512MB GDDR5).
19621 drivers also fail on Win 7 x64, again with a GT240 (512MB).

Same here. All my GT 240 cards fail on the KASHIF_HIVPR_auto_spawn WUs. They're also all 512k and fail both in Win7 & XP. This is always the message:

- exit code 98 (0x62)
</message>
<stderr_txt>
# Using device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GT 240"
# Clock rate: 1.50 GHz
# Total amount of global memory: 536870912 bytes
# Number of multiprocessors: 12
# Number of cores: 96
ERROR: file ntnbrlist.cpp line 63: Insufficent memory available for pairlists. Set pairlistdist to match the cutoff.
called boinc_finish

Looks like the WUs are asking for too much memory.

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17885 - Posted: 5 Jul 2010 | 13:51:07 UTC - in response to Message 17884.

Upgrade to the 257.21 drivers and they will work OK



____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17887 - Posted: 5 Jul 2010 | 14:35:14 UTC - in response to Message 17885.
Last modified: 5 Jul 2010 | 14:41:40 UTC

Betting Slip, well spotted!

The latest drivers facilitate CUDA 3.1 tasks, even for CC1.2 cards.
So the KASHIF_HIVPR WU's compiled using CUDA 3.1 (6.09) work for GT240 cards with 512MB RAM, while the older 6.05 WU's do not work for these cards with earlier CUDA 3.0 drivers.
I'm not sure that if you install the latest drivers you will be able to crunch 6.05 KASHIF_HIVPR WUs, but as long as you just pick up the 6.09 task the problem is solved.

However, we took a big speed hit from the last few drivers.
Fortunately GDF said he knows why some WU's are running slower under 3.1, and in a few days (probably) they will manage to correct it. I'm not sure if this just applies to Fermi tasks or if the tasks will also speed up for CC1.1, CC1.2 and CC1.3 cards?
I think I will move one system over at a time.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 17893 - Posted: 5 Jul 2010 | 17:46:00 UTC - in response to Message 17887.

On second thoughts, I think I will sit tight, and wait this one out.
Good luck,

Post to thread

Message boards : Graphics cards (GPUs) : More bad WUs? ------ KASHIF_HIVPR_auto_spawn

//