Author |
Message |
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
The new Nvidia application is now out in beta. It is available by accepting beta work from GPUGRID. |
|
|
scrapSend message
Joined: 3 Jan 10 Posts: 3 Credit: 29,391,330 RAC: 0 Level
Scientific publications
|
Really good news !
|
|
|
|
¿¿Solucionara esto el problema con las gtx260??
Les pongo lo que expuse en un post en el foro de mi equipo:
"Somos muchisimas personas las que se han quejado en sus foros con las gtx260, pero no parece que haya voluntad de solucionarlo.
He conseguido hacer 4 unidades. 3 de ellas beta y la otra normal. Pero tambien me ha tirado 4 betas y una normal. Llevo sin procesar en su proyecto desde hace mas de 6 meses.
Parece mentira que gente que quiere apoyar tu proyecto y procesar en el, no obtenga respuesta por parte de los admin de su proyecto. Y que no me vengan con historias porque en mis estadisticas hay mas de 386.000 puntos con gpugrid y desde que hicieron algun cambio en sus unidades (cuando pasaron a cuda 2.3) se acabo, no puedo procesar. Me pregunto si realmente tendran interes en buscar el fin que buscan, porque si yo fuera admin de un proyecto y tuviera por ejemplo 2000 personas que quieren prestarme su grafica para avanzar en mi proyecto,(el unico medico ahora mismo sobre boinc) me partiria los cuernos en solucionarselo, que estan cediendo su dinero (grafica+luz+tiempo).
Por otro lado eso que se esta llevando folding, al menos por mi parte, pero me gustaria tambien hacer mas puntos en boinc globalmente, pero no se puede."
No pretengo crear malos royos ni nada parecido, pero creo que al menos hay que buscar una solucion.
Lo pongo tambien en ingles traducido por google para el que no entienda español
Does this problem with the GTX260?
I put what I stated in a post on the forum from my computer:
"We are many people that have complained in their forums with the GTX260, but there seems no will to fix it.
I managed to make 4 pieces. 3 of them and one normal beta. But I was also thrown and a normal beta 4. I've been raw in his project for over 6 months.
It's unbelievable that people who want to support your project and in the process, get no response from the admin of your project. And do not come to me with stories because in my statistics there are more than 386,000 points GPUGRID and since they made any changes in their units (they passed a cuda 2.3) is just, I can not be processed. I wonder if they really have an interest in seeking the order they want, because if I was admin of a project and had for example 2000 people who want to borrow your graphics to advance in my project (the only doctor on boinc now) I would depart the horns on solutions that are giving money (graphic + light + time).
On the other hand that you are dragging folding, at least for me, but I would also like to do more overall points in boinc, but you can not. "
Royos pretengo not create bad or anything, but I think at least one must find a solution.
I put it also in English translation by google for the Spanish do not understand |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
The problem of the GTX260 depends on a bug of the nvidia FFT which Nvidia does not have the time to solve. Probably is very rare. We don't know what is exactly because there is not source code for the Nvidia FFT. In any case, we could not do anything more about it.
Does it work with the new application? We don't know. There have been so many changes that the specific situation which causes the problem may have disappeared.
Some users are now trying the beta application with GTX260.
gdf |
|
|
ignasiSend message
Joined: 10 Apr 08 Posts: 254 Credit: 16,836,000 RAC: 0 Level
Scientific publications
|
We submitted 1000 production WUs to the beta application. |
|
|
|
Y de repente sido un error NVIDIA?. Como dije en mi mensaje, tengo más de 386.000 puntos obtenidos con el proyecto sin un solo error.
CUDA fue el cambio y terminar el proceso.
And suddenly been an nvidia bug?. As I said in my message I have more than 386,000 points earned with the project without a single mistake.
Cuda was the change and end the process. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
ocgbargas,
I think the FFT bug is related to Cuda version 2.3 and only caused some errors in some cards when using some applications to run some tasks.
There are things you can do to reduce the likelihood of errors in older cards. For example,
Configure Boinc not to use the GPU when the computer is in use. Use after 1min.
If you are watching video either shut Boinc down and start it up later or use the Snooze button.
You only need to do these things on some older cards (G92 cards and 65nm G200 first release cards)!
As the applications have changed, the FFT bug might not be such a problem any more because the way the application uses CUDA has changed. |
|
|
|
You only need to do these things on some older cards (G92 cards and 65nm G200 first release cards)!
To be a bit more specific, it doesn't appear to be all 65nm G200 GPUs; I've only seen people reporting trouble with GTX260 cards. I haven't seen any reports of errors with the GTX280, which uses same 65 nm G200 chips, except with all 240 shaders activated. The GTX 260 uses chips that have some faulty cores or shaders and are degraded to either 192 or 216 shaders. The GTX 280 uses the 65nm G200 chips that are fully functional.
Considering it's the same chip, I find it odd that the problem doesn't seem to affect the GTX 280.
FWIW, I have a GTX 280 and have never had a problem. Mine's factory over-clocked too, so it's running a bit hotter and faster than nominal. |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
The FFT bug did only affect GTX260 (mostly the old one) and from driver 182 onwards. So CUDA 2.1 was fine, CUDA 2.2 was not.
Let's see if the beta by luck has improved the situation.
gdf |
|
|
|
Pues primera unidad probada y error despues de llevar 9 horas. Ha sido durante la noche que no tocaba nadie el pc. Os lo pongo detallado segun pone el visor de errores de win7 64.
Nombre de la aplicación con errores: acemdbeta_6.08_windows_intelx86__cuda, versión: 0.0.0.0, marca de tiempo: 0x4b680f5f
Nombre del módulo con errores: acemdbeta_6.08_windows_intelx86__cuda, versión: 0.0.0.0, marca de tiempo: 0x4b680f5f
Código de excepción: 0x40000015
Desplazamiento de errores: 0x0003274d
Id. del proceso con errores: 0x1b24
Hora de inicio de la aplicación con errores: 0x01caa51e0458ad34
Ruta de acceso de la aplicación con errores: D:\boinc\programData\projects\www.gpugrid.net\acemdbeta_6.08_windows_intelx86__cuda
Ruta de acceso del módulo con errores: D:\boinc\programData\projects\www.gpugrid.net\acemdbeta_6.08_windows_intelx86__cuda
Id. del informe: d4517c31-1125-11df-845c-00158307a3d0 |
|
|
|
2ª unidad tambien error. Lo dejo por imposible. |
|
|
GDFVolunteer moderator Project administrator Project developer Project tester Volunteer developer Volunteer tester Project scientist Send message
Joined: 14 Mar 07 Posts: 1957 Credit: 629,356 RAC: 0 Level
Scientific publications
|
Yes,
your errors are due to the FFT bug which therefore did not go away by simple luck.
We are reimplementing the FFT with our own code. So you have to wait for that to get rid of this problem.
gdf |
|
|
|
Pero y porque falla con Gpugrid y por ejemplo con collatz, milky o folding no ocurren esos fallos??. |
|
|
skgivenVolunteer moderator Volunteer tester
Send message
Joined: 23 Apr 09 Posts: 3968 Credit: 1,995,359,260 RAC: 0 Level
Scientific publications
|
They use different code, and they have their own problems! |
|
|
|
I've noticed that ACEMD beta 6.08 (cuda) workunits leave my 9800 GT GPU running
about 2 C cooler than the older application. Not sure whether to consider that an improvement or a problem. |
|
|
|
You only need to do these things on some older cards (G92 cards and 65nm G200 first release cards)!
To be a bit more specific, it doesn't appear to be all 65nm G200 GPUs; I've only seen people reporting trouble with GTX260 cards. I haven't seen any reports of errors with the GTX280, which uses same 65 nm G200 chips, except with all 240 shaders activated. The GTX 260 uses chips that have some faulty cores or shaders and are degraded to either 192 or 216 shaders. The GTX 280 uses the 65nm G200 chips that are fully functional.
Considering it's the same chip, I find it odd that the problem doesn't seem to affect the GTX 280.
FWIW, I have a GTX 280 and have never had a problem. Mine's factory over-clocked too, so it's running a bit hotter and faster than nominal.
Could it be a problem in handling the way faulty shaders or cores are deactivated? If so, I would not expect to see the same problem on a GTX 280, since none are deactivated. |
|
|
|
The FFT bug did only affect GTX260 (mostly the old one) and from driver 182 onwards. So CUDA 2.1 was fine, CUDA 2.2 was not.
Let's see if the beta by luck has improved the situation.
gdf
Is it possible to do a separate build for the GTX260 (and any other machines using the old drivers) using the CUDA 2.1 SDK, but with the same source code otherwise?
You could then test whether this build, but the newer drivers, works with the problem GTX260s or not, or whether any machines with the problem GTX260s need to switch back to the older drivers. I'd expect similar results for any other types of Nvidia GPU boards that the newer CUDA versions do not support well enough.
If your server can detect which driver the machine is using, it could use this information to determine which application version to include; otherwise, you could just include both application versions and add a wrapper to determine which one is actually used. That would allow people with a GTX260, but one without the problem, to use the other application program instead if it runs faster.
Another idea you could try, if you think the problem is in a DLL, is to try some builds with a mixture of CUDA 2.1 DLLs and more recent CUDA version DLLs.
Another thought: Is it possible to mark parts of the results with which cores and shaders were used to calculate them? If so, you could then have the server build a file to specify additional cores and shaders that a particular machine should avoid using for any future workunits. |
|
|
|
Another thought: Is it possible to mark parts of the results with which cores and shaders were used to calculate them? If so, you could then have the server build a file to specify additional cores and shaders that a particular machine should avoid using for any future workunits.
I don't think that's possible under CUDA. You just tell CUDA, "Here, run these 10,000 copies of this kernal and tell me when you're done". All the fun stuff of assigning all that work to the individual multi-processors is done under the hood by the CUDA libraries. I don't recall ever seeing anything that would let you know where something actually ran. |
|
|
|
Then you may need to use a different CUDA library, such as one from the CUDA 2.1 SDK.
You could also look for a way to tell the software to exclude some of the cores and shaders marked as usable from actually being used. Going from the results of doing that to determining which ones to exclude on a particular machine would be slower, though. |
|
|