1) Message boards : Graphics cards (GPUs) : Any benefit from CUDA 4.1 drivers? (Message 23142)
Posted 945 days ago by Ketzer7
Saw this today:

http://www.techpowerup.com/159439/NVIDIA-Releases-CUDA-Toolkit-4.1.html

So CUDA 4.1 is now a GA release I guess.

Is there any advantage or improvement in using this version's dev driver versus the older CUDA 4.0 dev driver? (4.1 = 285.05.32 vs. 4.0 = 270.41.19, for Linux anyways)

I've had good success running the CUDA 4.0 dev driver in the past and am leery of trying an upgrade to any newer ones on Linux. Just want to see if there would be any tangible gains from using the newer one.

Thanks.
2) Message boards : Graphics cards (GPUs) : Befuddled by CC1.3 cards on Linux x64 -- help/advice (Message 21806)
Posted 1120 days ago by Ketzer7
Thanks for the replies guys. As it turned out, the installer finally came up on the Skulltrail system after a REALLY long time sitting at the blinking cursor, so not sure what's up with that. It was all for naught though, because as it was going through the install, it also crapped out with the same failed to install bootloader to /dev/sda error just like I got on fortress. So that is twice I've had this specific problem happen on two entirely different systems.

@Dagorath:

The ISO I downloaded of Ubuntu looks good. I checked it against md5sum, sha1sum, and sha256sum, and each of them returned a result matching the appropriate value for the Ubuntu 11.04 x64 (amd64) ISO. I had a hell of time finding the sums files for Ubuntu on their site. Had to finally dig into some FTP mirror to get them.

I also used ImgBurn when I created the disk and had it verify the burn after it was done, and that came back good as well.

Last night I started wondering the same thing about the RAID1 array and if maybe that is causing a problem. Ubuntu's drivers don't seem to have an issue with picking it up and manipulating it during the install, but who knows. What made me start thinking this is that both of the boxes I tried installing it to, skulltrail and fortress, both hit that error of failing to install the bootloader the first time around. The only thing common to both of them is that they are running RAID1 arrays on an Intel SB RAID controller.

Unfortunately neither of these boxes have on-board video on the mobos, so I'd have to leave at least one card in, but I could try that as well.

@skg:

I noticed something similar the first time I tried installing it on fortress. I got that failed to install bootloader error, but then I started the install over again (right OTT like you said) and the second time is when it seemed to complete Ok, but then the system wouldn't boot up afterwards. So I'm not sure what is up with that. Dagorath may be right and I need to scrub the disks somehow to cleanse them of their Red Hat ways, but dunno.

I'm going to give it one more try with Ubuntu and see how far I get, but in the background I've been trying to get a WinXP Pro x64 installer from work and just go with that. Unfortunately, it seems the dark side is better in this regard to getting the cards up and running with grid.
3) Message boards : Graphics cards (GPUs) : Befuddled by CC1.3 cards on Linux x64 -- help/advice (Message 21803)
Posted 1121 days ago by Ketzer7
Well, between any reinstalls I have done, I always go into the Intel south bridge RAID controller and delete the array I have defined and then recreate it so that it should wipe out all of the MBR and partition data, at least that's what I always thought it did, but maybe not..
4) Message boards : Graphics cards (GPUs) : Befuddled by CC1.3 cards on Linux x64 -- help/advice (Message 21801)
Posted 1121 days ago by Ketzer7
heh I would if I could get it to install and run..

I just tried installing Ubuntu 11.04 to both of my crunching boxes with GT200 cards in them.

The first one (fortress) got most of the way through the install without any problems and then just crapped out with an error telling me it couldn't install GRUB to /dev/sda. I was like wtfbbq? So I killed it, tried again from scratch, and then it finished installing fine, came up with the little screen I needed to reboot, pushed reboot, and it basically locked at that screen. So reset the box manually, and now when it comes up, it acts like there is no bootloader, it just sits there. <sigh>

So I went to my other machine which has a Skulltrail motherboard in it to see if it was something specific to the first box. CD starts booting up and it gets to the point where I think it should be detecting the hardware (just a flashing cursor in the upper left corner of the monitor after a purple splash screen that shows the accessibility icon at the bottom), and then it just sits there and doesn't do anything else. X-[

Ugh..the fight continues...
5) Message boards : Graphics cards (GPUs) : Befuddled by CC1.3 cards on Linux x64 -- help/advice (Message 21799)
Posted 1121 days ago by Ketzer7
Thanks skg!

Yeah, I am really not sure what is going on with it.

When looking at my two computers, the only difference between the two of them is really the CPU and the GPUs; haf-x runs perfectly with the GTX 480s, but fortress doesn't at all on the 280s with the same OS, same driver, and same BOINC version. The installs of the OS and driver were basically identical between the two as well.

I'm fairly certain that there's no problems with the memory or disks as when I first built fortress, I think I let memtest86+ run for 24+ hours with no errors. Also, I keep all of my crunching boxes on stock clock speeds and timings to avoid any corruption errors that could be caused by overclocking them. The disks are also configured in a RAID1 array so that if there's a problem with one of them and/or it dies, the other should be able to take over and keep the system going until I can repair it.

The cards themselves don't seem to be overheating either as when I look at the Nvidia control panel app in Gnome, the thermal monitor shows all of them are basically sitting at idle temperatures, like they aren't even running a task at all.

From the example system you showed, it appears it is running Ubuntu based on the kernel version. I'm downloading it right now and will give it a try to see if it makes any difference. I had similar problems to this with the CC1.3 cards back in March when I was also running Fedora 14 x64, so maybe it is a Red Hat based distro related thing? I noticed that the top GPUGrid volunteer is running Linux (albeit with CC2.0 cards) but judging from his kernel versions, is also using Ubuntu, so it might be worth a shot.

I'll try some of your other suggestions as well. Like you said, hopefully some of the more experienced Linux users will have some ideas. I just want to get all of cards running again. :-(

Thanks again!
6) Message boards : Graphics cards (GPUs) : Befuddled by CC1.3 cards on Linux x64 -- help/advice (Message 21797)
Posted 1121 days ago by Ketzer7
Hi all!

I have two boxes in the lab here that I have been trying to get running with GPUGrid again.

One has 3 GTX 280s installed in it, and the other has 2 GTX 295s and a GTX 280. Previously (maybe 6 months+ ago), all of these cards were able to run GPUGrid just fine and return good results. No crashes or other stability problems with any of them; they're all plain vanilla EVGA cards.

The problem I am having now is that no matter what I seem to try, I can't get any of these cards to finish a WU without getting a computation error. I've observed two different behaviors, however, which I'll try outlining.

1. I installed CentOS 6 x64 with all of the necessary libraries, and installed the CUDA 4.0 dev driver for Linux x64, version 270.41.19, with BOINC 6.10.58. With this setup, I found that one of the GTX 280s seemed to be able to run and complete WUs without error (the card with monitor attached), but the other two cards would just fail within seconds of starting and report computation errors.

2. I tried starting over by doing a fresh reinstall of CentOS 6 x64, but this time with the recommended CUDA 3.1 dev driver, 256.40, again with BOINC 6.10.58. In this combination, now I find that none of the GTX 280s is able to complete a WU without failing on a computation error within seconds of starting.

If memory serves me correctly, all of these cards worked with the 6.12 and 6.13 GPUGrid clients a couple of months ago, but I'm noticing that now the client version is 6.14.

Is there something new in 6.14 that breaks CC1.3 cards on Linux x64? It doesn't seem to be an OS/driver/BOINC version problem as I have another box in the lab with 3 GTX 480s in it, and that box runs like a champ with all cards crunching and returning good results. I've seen other volunteers that seem to still be running CC1.3 cards without problems, but I think all of them have been running on Windows unfortunately. So I'm beginning to wonder if it's something to do with the version of GPUGrid client for Linux.

In any case, I'm hoping maybe someone has experience or insight with this and can offer some help. I hate to let these cards go to waste if I am just doing something wrong. Although I get the impression that for Linux x64 I may be SOL unless I am running all CC2.0 cards.

Sorry for the long post. Many thanks in advance.
7) Message boards : Graphics cards (GPUs) : 6.10.36 versus 6.10.58/.60? (Message 20815)
Posted 1247 days ago by Ketzer7
On the Join page for the project, it lists that BOINC client version 6.10.36 is recommended for use.

Is there any particular reason why this is the case, or better yet, any reason why someone should _not_ use 6.10.58/.60 with GPUGrid?

I had been running 6.10.58 previously on one of the my crunching boxes, and then out of the blue it up and started failing every WU after ~March 10-11.

On a second go around on my newer box, I intentionally installed 6.10.36 to see if it made any difference, and so far so good, but no guarantee I guess.

So I'm just trying to find out for my own edification which BOINC client version is preferable to use.

Sorry if this has been asked before, and thank in advance for any input here.