Posted 1 hour ago by Profile Retvari Zoltan*
I'd like to add an important notice for those, who change the original heatsink / cooler assembly on their GPU:
The metal frame, which acts as the memory / VRM FET heatsink, also acts as reinforcement, making the card rigid.
Without this frame, when the card is fitted horizontally its PCB could be easily warped by its own weight (plus the weight of the PCIe power cables), shortening the lifespan of every soldering, thus of the whole card.
Some factory OC-ed cards are manufactured without such frame, so it applies for those cards too.
To avoid warping the PCB, I underpin the inner end of the card (at the PCIe power connectors) with a proper sized stick (actually I've cut a bamboo skewer to the needed length). There are other (more professional) accessories for this purpose.
Posted 13 hours ago by Profile Retvari Zoltan*
It seems that the reduction of the nominal frequency to not help :(
I would reduce the GPU clock a little more, and also the RAM frequency (by 100MHz). At some point it will get stable.
Your GPU is a bit hot, so perhaps cleaning the fans and heatsinks with compressed air would help.

The truth is not clear why in the log says that the frequency is still the same, but the utility NVidia Inspector and GPU-Z shows that the video card running at the nominal frequencies. What should I do to reflash the BIOS? :(
That's simply because this info comes from the BIOS. You don't have to reflash the BIOS for testing.

I have all the same strong feeling that the blame is Firefox. Really no one has similar problems in the combination of Firefox + GPUGrid?
I use Chrome most of the time, but I haven't noticed that Firefox could make GPUGrid tasks to crash.
Posted 1 day ago by Profile Retvari Zoltan*
I've recently changed the standard cooler of my GTX 780Ti to this Raijintek Morpheus cooler, with two Scythe Kaze jyu 100mm 2000 rpm fans (SY1025SL12M), and I'm quite amazed that the card is running at 56°C instead of 75°C, with much less noise.
You can witness the change in this task's stderr.
Roused by this, I've changed the cooler of my standard GTX980 also, and it runs at 45°C instead of 70°C.
The cons of this cooler are:
1. It's very wide (especially with two standard 25mm thick fans), so it's not really applicable for multi-GPU systems
2. The aluminium heat sink pieces are not fully compatible with these cards, as I had to cut the VRM heatsink in half for the GTX 780Ti, and use the leftover memory heat sinks for the GTX 980, as the FETs of its VRM have a quite different arrangement, and the supplied VRM heatsink can't be applied to a GTX980. Moreover the RAM chips in the GTX980 are closer than on the GTX780Ti, so I had to cut the edges of the RAM heat sinks to avoid it to contact with the main (GPU) heat sink's heat pipes.
3. The supplied thermal twin adhesive didn't stick to the RAM chips (only to the heat sink), so I had to use a leftover thermal adhesive from Arctic Cooling Xtreme.
Posted 2 days ago by Profile Retvari Zoltan*
BTW is there a place to look at my current queue outside of the BOINC console?
Here is the task list of your host.
To get it, you should click on your user name on the top, then click on the "show computers from this account", then on the "tasks" (or "results"?), then click on "show names".
Posted 2 days ago by Profile Retvari Zoltan*
Finally I realized that because of the errors occur in the GPU countable applications.
To blame Firefox. If you use it the job often fails, but if it does not run, and the use of Chrome is no problem.

Both browsers use graphics acceleration for displaying contents of webpages; dynamic content (video, flash, java, html5, vr or 3d engines) could push your card over its limits.

Tell me, please, can somebody faced with similar and what can help.

Link to host job http://www.gpugrid.net/results.php?hostid=170640

Did the development team can offer what is the debug version of the job and together we will be able to fix the error?

P.S. I do not want to stop participating in the project because of errors :(

You are getting "Simulation Has Become Unstable" messages in your output files which usually means you or factory have overclocked the GPU or Memory or both too much. Reset it to stock speeds and see how it goes.

According to NVidia specifications, the GPU of this GTX 660 is overclocked:
GPU Engine Specs: CUDA Cores: 960 Base Clock: 980 MHz Boost Clock: 1033 MHz Texture Fill Rate: 78.4 billion/sec Memory Specs: Memory Clock: 6.0 Gbps Standard Memory Config: 2048 MB GDDR5Memory Interface Memory Interface Width 192 bit GDDR5 Memory Bandwidth 144.2 GB/sec

Here's an excerpt from the stderr.txt of a successful task:
# GPU [GeForce GTX 660] Platform [Windows] Rev [3212] VERSION [65] # SWAN Device 0 : # Name : GeForce GTX 660 # ECC : Disabled # Global mem : 2048MB # Capability : 3.0 # PCI ID : 0000:01:00.0 # Device clock : 1097MHz # Memory clock : 3004MHz # Memory width : 192bit # Driver version : r355_00 : 35582 # GPU 0 : 69C # GPU 0 : 70C # Simulation unstable. Flag 10 value 773 # The simulation has become unstable. Terminating to avoid lock-up # The simulation has become unstable. Terminating to avoid lock-up (2) # Attempting restart (step 2765000)
The GPU clock of this card is 64 MHz higher than the boost clock of the standard, so it should be "underclocked" of its factory settings to get NVidia's original settings. Use MSI Afterburner, or NVidia Inspector to do that.
Posted 3 days ago by Profile Retvari Zoltan*
The free space on my hosts have run below 5*10^9 bytes again, but this time I've find out its reason: the Google Chrome browser has the bad habit of keeping all of its previous installers (~44MB each) in its installation folder, so the size of this folder is continuously increasing. On one of my hosts it was wasting 1.5GB. I haven't found any setting in Chrome which would limit the size of this folder, so I've manually deleted everything except the last one.
I still think this
is an order of magnitude higher than reasonable.
Posted 7 days ago by Profile Retvari Zoltan*
Before you start listing the worst hosts, it would be a good idea to set up a proper criteria for this.
At first, I thought to exclude from the statistic only the obviously bad hosts, which fail on every task (for example: host 255774, 180977) or only occasionally finish a task (their error rate is above say 90%, for example: host 179830, 74100). But it could be a more sophisticated statistical algorithm.

Errors can be put into 2 categories: hosts errors and non hosts errors (like bad batch of WU's, or the server canceling the units), so make sure the host are labeled with host errors only. Ok, this is obvious, but I don't want to be labeled with a scarlet letter because of a bad batch.
I meant "most error in the past 24 hours" by "most errors per day", so this list would be automatically refreshed / fixed hosts would be cleared.
The purpose of filtering the worst hosts is to avoid putting a scarlet letter on a batch, caused by the worst hosts failing workunits from a more demanding batch (presumably because the host's GPU is overclocked above its maximum), which result in misleading percentages.
A "scarlet letter" on a batch could be dangerous, as it could make some crunchers selectively cancelling workunits from the (mistakenly) worst batches, making the whole process worse.

Also, errors that happened a while back (which are mostly back batch errors) should not count either. I would think that the cleaning out the data base of these errors should prerequisite.
That's a good idea anyway.
Posted 10 days ago by Profile Retvari Zoltan*
Beta version released in the server_status page.
Thank you!

I tweaked a bit the original idea but you can see the information you desired.
Well, that's the point of it. :)
The another purpose of this list to "announce" new batches.

I got a bit surprised about the error rate at first, later I realised the amount of errors a client can make so is not that bad after all.
Oh, another idea popped in my mind while I read your words:
There should be a top list of the worst hosts (most errors per day) on the performance chart.
Is there a way to make a "normalized" error rate column by filtering out these worst hosts? Would such a column be more conclusive?

Hopefully this data will enhance the ability of detecting corrupted batches.
Hopefully you've already had something like this for internal use before, right? :)
Posted 10 days ago by Profile Retvari Zoltan*
I have registered and used the windows 7 expansion many times,....
Do you mean Windows 7 upgrade?

but is there a point where I dont need to install xp for it to work?
If it's a Windows 7 upgrade, then you'll need every time the base product installation media & product key to install the upgrade version.

I say this because I will be buying either a gpu or a os next check and need to know. ps I still have to xp key just not the cd.
That's bad news...

I also plan on buying a amd card for folding@home but was on here for many years and like your help alot more. maybe some card recommendations?
Always buy the latest & greatest. Folding@home can be run in NVidia cards too.
Posted 12 days ago by Profile Retvari Zoltan*
Is it possible to display the batches (with the number of workunits) in the queues, not just the total number of the tasks?
I think about something like this:

Application unsent in progress valid invalid error Short runs (2-3 hours on fastest card) 1 48 130 5 17 NOELIA_467x 1 48 130 5 17 Long runs (8-12 hours on fastest card) 114 2,295 1400 34 420 GERARD_FXCXCL12_LIG 34 1,865 970 39 40 GERARD_PTCL2_CTL_IPZ1 15 320 294 22 34 GERARD_PTCL2_CTL_PRZ1 10 201 231 12 22 GERARD_PTCL_CTL_IPZ2 11 181 175 10 12 GERARD_PTCL_CTL_PRZ1 9 117 111 7 9 GERARD_PTCTL_LFE_AIN3 14 104 94 5 8 GERARD_PTCTL_LFE_IBP2 7 97 55 2 5 GERARD_PTCTL_PLA2_AIN3 12 86 64 3 6 GERARD_PTCTL_PLA2_IBP1 67 34 73 4 9 GERARD_VACXCL12_LIG 42 309 211 20 19 SDOERR_ntl9evSSXX3 3 15 15 0 1

These are *not* the actual numbers, so they won't add up.

