Advanced search

Message boards : News : New beta Nvidia application 60% faster.

Author Message
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 14947 - Posted: 3 Feb 2010 | 10:02:00 UTC

The new Nvidia application is now out in beta. It is available by accepting beta work from GPUGRID.

scrap
Send message
Joined: 3 Jan 10
Posts: 3
Credit: 29,391,330
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 14948 - Posted: 3 Feb 2010 | 10:39:07 UTC - in response to Message 14947.

Really good news !

Profile ocgbargas
Send message
Joined: 18 Jun 09
Posts: 12
Credit: 4,327,530
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 14952 - Posted: 3 Feb 2010 | 14:52:07 UTC - in response to Message 14948.
Last modified: 3 Feb 2010 | 14:56:32 UTC

¿¿Solucionara esto el problema con las gtx260??

Les pongo lo que expuse en un post en el foro de mi equipo:
"Somos muchisimas personas las que se han quejado en sus foros con las gtx260, pero no parece que haya voluntad de solucionarlo.
He conseguido hacer 4 unidades. 3 de ellas beta y la otra normal. Pero tambien me ha tirado 4 betas y una normal. Llevo sin procesar en su proyecto desde hace mas de 6 meses.
Parece mentira que gente que quiere apoyar tu proyecto y procesar en el, no obtenga respuesta por parte de los admin de su proyecto. Y que no me vengan con historias porque en mis estadisticas hay mas de 386.000 puntos con gpugrid y desde que hicieron algun cambio en sus unidades (cuando pasaron a cuda 2.3) se acabo, no puedo procesar. Me pregunto si realmente tendran interes en buscar el fin que buscan, porque si yo fuera admin de un proyecto y tuviera por ejemplo 2000 personas que quieren prestarme su grafica para avanzar en mi proyecto,(el unico medico ahora mismo sobre boinc) me partiria los cuernos en solucionarselo, que estan cediendo su dinero (grafica+luz+tiempo).
Por otro lado eso que se esta llevando folding, al menos por mi parte, pero me gustaria tambien hacer mas puntos en boinc globalmente, pero no se puede."

No pretengo crear malos royos ni nada parecido, pero creo que al menos hay que buscar una solucion.

Lo pongo tambien en ingles traducido por google para el que no entienda español

Does this problem with the GTX260?

I put what I stated in a post on the forum from my computer:
"We are many people that have complained in their forums with the GTX260, but there seems no will to fix it.
I managed to make 4 pieces. 3 of them and one normal beta. But I was also thrown and a normal beta 4. I've been raw in his project for over 6 months.
It's unbelievable that people who want to support your project and in the process, get no response from the admin of your project. And do not come to me with stories because in my statistics there are more than 386,000 points GPUGRID and since they made any changes in their units (they passed a cuda 2.3) is just, I can not be processed. I wonder if they really have an interest in seeking the order they want, because if I was admin of a project and had for example 2000 people who want to borrow your graphics to advance in my project (the only doctor on boinc now) I would depart the horns on solutions that are giving money (graphic + light + time).
On the other hand that you are dragging folding, at least for me, but I would also like to do more overall points in boinc, but you can not. "

Royos pretengo not create bad or anything, but I think at least one must find a solution.

I put it also in English translation by google for the Spanish do not understand

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 14955 - Posted: 3 Feb 2010 | 15:06:20 UTC - in response to Message 14952.
Last modified: 3 Feb 2010 | 15:07:25 UTC

The problem of the GTX260 depends on a bug of the nvidia FFT which Nvidia does not have the time to solve. Probably is very rare. We don't know what is exactly because there is not source code for the Nvidia FFT. In any case, we could not do anything more about it.

Does it work with the new application? We don't know. There have been so many changes that the specific situation which causes the problem may have disappeared.
Some users are now trying the beta application with GTX260.

gdf

ignasi
Send message
Joined: 10 Apr 08
Posts: 254
Credit: 16,836,000
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 14956 - Posted: 3 Feb 2010 | 15:19:55 UTC - in response to Message 14955.

We submitted 1000 production WUs to the beta application.

Profile ocgbargas
Send message
Joined: 18 Jun 09
Posts: 12
Credit: 4,327,530
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 14961 - Posted: 3 Feb 2010 | 17:16:19 UTC
Last modified: 3 Feb 2010 | 17:18:14 UTC

Y de repente sido un error NVIDIA?. Como dije en mi mensaje, tengo más de 386.000 puntos obtenidos con el proyecto sin un solo error.
CUDA fue el cambio y terminar el proceso.


And suddenly been an nvidia bug?. As I said in my message I have more than 386,000 points earned with the project without a single mistake.
Cuda was the change and end the process.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14964 - Posted: 3 Feb 2010 | 17:47:17 UTC - in response to Message 14961.

ocgbargas,
I think the FFT bug is related to Cuda version 2.3 and only caused some errors in some cards when using some applications to run some tasks.

There are things you can do to reduce the likelihood of errors in older cards. For example,
Configure Boinc not to use the GPU when the computer is in use. Use after 1min.
If you are watching video either shut Boinc down and start it up later or use the Snooze button.
You only need to do these things on some older cards (G92 cards and 65nm G200 first release cards)!
As the applications have changed, the FFT bug might not be such a problem any more because the way the application uses CUDA has changed.

Profile Michael Goetz
Avatar
Send message
Joined: 2 Mar 09
Posts: 124
Credit: 46,573,744
RAC: 837,894
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 14966 - Posted: 3 Feb 2010 | 18:30:49 UTC - in response to Message 14964.

You only need to do these things on some older cards (G92 cards and 65nm G200 first release cards)!


To be a bit more specific, it doesn't appear to be all 65nm G200 GPUs; I've only seen people reporting trouble with GTX260 cards. I haven't seen any reports of errors with the GTX280, which uses same 65 nm G200 chips, except with all 240 shaders activated. The GTX 260 uses chips that have some faulty cores or shaders and are degraded to either 192 or 216 shaders. The GTX 280 uses the 65nm G200 chips that are fully functional.

Considering it's the same chip, I find it odd that the problem doesn't seem to affect the GTX 280.

FWIW, I have a GTX 280 and have never had a problem. Mine's factory over-clocked too, so it's running a bit hotter and faster than nominal.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 14967 - Posted: 3 Feb 2010 | 18:45:08 UTC - in response to Message 14966.

The FFT bug did only affect GTX260 (mostly the old one) and from driver 182 onwards. So CUDA 2.1 was fine, CUDA 2.2 was not.

Let's see if the beta by luck has improved the situation.

gdf

Profile ocgbargas
Send message
Joined: 18 Jun 09
Posts: 12
Credit: 4,327,530
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 14978 - Posted: 4 Feb 2010 | 7:30:11 UTC
Last modified: 4 Feb 2010 | 7:31:18 UTC

Pues primera unidad probada y error despues de llevar 9 horas. Ha sido durante la noche que no tocaba nadie el pc. Os lo pongo detallado segun pone el visor de errores de win7 64.

Nombre de la aplicación con errores: acemdbeta_6.08_windows_intelx86__cuda, versión: 0.0.0.0, marca de tiempo: 0x4b680f5f
Nombre del módulo con errores: acemdbeta_6.08_windows_intelx86__cuda, versión: 0.0.0.0, marca de tiempo: 0x4b680f5f
Código de excepción: 0x40000015
Desplazamiento de errores: 0x0003274d
Id. del proceso con errores: 0x1b24
Hora de inicio de la aplicación con errores: 0x01caa51e0458ad34
Ruta de acceso de la aplicación con errores: D:\boinc\programData\projects\www.gpugrid.net\acemdbeta_6.08_windows_intelx86__cuda
Ruta de acceso del módulo con errores: D:\boinc\programData\projects\www.gpugrid.net\acemdbeta_6.08_windows_intelx86__cuda
Id. del informe: d4517c31-1125-11df-845c-00158307a3d0

Profile ocgbargas
Send message
Joined: 18 Jun 09
Posts: 12
Credit: 4,327,530
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 14987 - Posted: 4 Feb 2010 | 14:41:32 UTC

2ª unidad tambien error. Lo dejo por imposible.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 14988 - Posted: 4 Feb 2010 | 15:22:47 UTC - in response to Message 14987.

Yes,
your errors are due to the FFT bug which therefore did not go away by simple luck.
We are reimplementing the FFT with our own code. So you have to wait for that to get rid of this problem.

gdf

Profile ocgbargas
Send message
Joined: 18 Jun 09
Posts: 12
Credit: 4,327,530
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 14993 - Posted: 4 Feb 2010 | 18:34:35 UTC

Pero y porque falla con Gpugrid y por ejemplo con collatz, milky o folding no ocurren esos fallos??.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 14999 - Posted: 5 Feb 2010 | 0:33:04 UTC - in response to Message 14993.

They use different code, and they have their own problems!

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 727,920,933
RAC: 155,858
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15022 - Posted: 5 Feb 2010 | 15:29:59 UTC

I've noticed that ACEMD beta 6.08 (cuda) workunits leave my 9800 GT GPU running
about 2 C cooler than the older application. Not sure whether to consider that an improvement or a problem.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 727,920,933
RAC: 155,858
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15023 - Posted: 5 Feb 2010 | 15:34:28 UTC - in response to Message 14966.

You only need to do these things on some older cards (G92 cards and 65nm G200 first release cards)!


To be a bit more specific, it doesn't appear to be all 65nm G200 GPUs; I've only seen people reporting trouble with GTX260 cards. I haven't seen any reports of errors with the GTX280, which uses same 65 nm G200 chips, except with all 240 shaders activated. The GTX 260 uses chips that have some faulty cores or shaders and are degraded to either 192 or 216 shaders. The GTX 280 uses the 65nm G200 chips that are fully functional.

Considering it's the same chip, I find it odd that the problem doesn't seem to affect the GTX 280.

FWIW, I have a GTX 280 and have never had a problem. Mine's factory over-clocked too, so it's running a bit hotter and faster than nominal.


Could it be a problem in handling the way faulty shaders or cores are deactivated? If so, I would not expect to see the same problem on a GTX 280, since none are deactivated.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 727,920,933
RAC: 155,858
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15024 - Posted: 5 Feb 2010 | 15:39:19 UTC - in response to Message 14967.
Last modified: 5 Feb 2010 | 16:24:53 UTC

The FFT bug did only affect GTX260 (mostly the old one) and from driver 182 onwards. So CUDA 2.1 was fine, CUDA 2.2 was not.

Let's see if the beta by luck has improved the situation.

gdf


Is it possible to do a separate build for the GTX260 (and any other machines using the old drivers) using the CUDA 2.1 SDK, but with the same source code otherwise?

You could then test whether this build, but the newer drivers, works with the problem GTX260s or not, or whether any machines with the problem GTX260s need to switch back to the older drivers. I'd expect similar results for any other types of Nvidia GPU boards that the newer CUDA versions do not support well enough.

If your server can detect which driver the machine is using, it could use this information to determine which application version to include; otherwise, you could just include both application versions and add a wrapper to determine which one is actually used. That would allow people with a GTX260, but one without the problem, to use the other application program instead if it runs faster.

Another idea you could try, if you think the problem is in a DLL, is to try some builds with a mixture of CUDA 2.1 DLLs and more recent CUDA version DLLs.

Another thought: Is it possible to mark parts of the results with which cores and shaders were used to calculate them? If so, you could then have the server build a file to specify additional cores and shaders that a particular machine should avoid using for any future workunits.

Profile Michael Goetz
Avatar
Send message
Joined: 2 Mar 09
Posts: 124
Credit: 46,573,744
RAC: 837,894
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 15026 - Posted: 5 Feb 2010 | 16:49:16 UTC - in response to Message 15024.

Another thought: Is it possible to mark parts of the results with which cores and shaders were used to calculate them? If so, you could then have the server build a file to specify additional cores and shaders that a particular machine should avoid using for any future workunits.


I don't think that's possible under CUDA. You just tell CUDA, "Here, run these 10,000 copies of this kernal and tell me when you're done". All the fun stuff of assigning all that work to the individual multi-processors is done under the hood by the CUDA libraries. I don't recall ever seeing anything that would let you know where something actually ran.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 503
Credit: 727,920,933
RAC: 155,858
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 15027 - Posted: 5 Feb 2010 | 17:05:04 UTC - in response to Message 15026.
Last modified: 5 Feb 2010 | 17:12:52 UTC

Then you may need to use a different CUDA library, such as one from the CUDA 2.1 SDK.

You could also look for a way to tell the software to exclude some of the cores and shaders marked as usable from actually being used. Going from the results of doing that to determining which ones to exclude on a particular machine would be slower, though.

Post to thread

Message boards : News : New beta Nvidia application 60% faster.

//