1) Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU (Message 61471)
Posted 4 days ago by Skip Da Shu
Steve replied:
yes this make sense unfortunately. In the previous round of "inputs_v3**" it was calculating things incorrectly for any molecule containing Iodine. This is heaviest element in our dataset. The computational cost of this QM method scales with the size of the elements (it depends on the number of electrons). We are resending the incorrect calculations for Iodine containing molecules in this round of "v4" work units. Therefore the v4 set is a subset of the previous v3 WUs containing heavier elements, hence there are more OOM errors.


Any change in this situation?

I got my 12GB card back and my haphazard data collection seems to have it under a 9% error rate and with the very last grab showing 5.85%.


Somethings coming around... error rates for 10GB cards are now under 13% and the 12GB card is ~3%.

Skip
2) Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU (Message 61470)
Posted 9 days ago by Skip Da Shu
Steve replied:
yes this make sense unfortunately. In the previous round of "inputs_v3**" it was calculating things incorrectly for any molecule containing Iodine. This is heaviest element in our dataset. The computational cost of this QM method scales with the size of the elements (it depends on the number of electrons). We are resending the incorrect calculations for Iodine containing molecules in this round of "v4" work units. Therefore the v4 set is a subset of the previous v3 WUs containing heavier elements, hence there are more OOM errors.


Any change in this situation?

I got my 12GB card back and my haphazard data collection seems to have it under a 9% error rate and with the very last grab showing 5.85%.

The 8GB & 10GB cards are still on NNW (other than 3 WUs i let thru on 10GB cards. They completed).

Skip
3) Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU (Message 61456)
Posted 15 days ago by Skip Da Shu
Thank you. U probably just saved me hours of wasted time.

Error %
AVG ALL: 29.1
AVG – last 3: 59.0

8GB – last 2 72.76
10GB – last 2 66.52
12GB – last 2 3.55 (card out for a week)
4) Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU (Message 61454)
Posted 15 days ago by Skip Da Shu
Error rates skyrocketed on me for this app... even on the 10GB cards (12GB card will be back on Thursday). This started late on April 7th.

Error rate now over 50% so I will have to NNW till I can figure it out.

Skip
5) Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU (Message 61404)
Posted 46 days ago by Skip Da Shu
Download error causing the zip file to be corrupted because it is missing the end of file signature.

I was getting that on a Google Drive zip archive a couple of days ago. Switching browsers let me download the archive correctly so it would unpack.


Well after 100+ of these errors I finally got 3 good ones out of that box after a reboot for a different reason.

Thanx, Skip
6) Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU (Message 61402)
Posted 46 days ago by Skip Da Shu
Anyone have insight into this error:

<stderr_txt>
09:06:00 (130033): wrapper (7.7.26016): starting
[x86_64-pc-linux-gnu__cuda1121.zip]
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of x86_64-pc-linux-gnu__cuda1121.zip or
x86_64-pc-linux-gnu__cuda1121.zip.zip, and cannot find x86_64-pc-linux-gnu__cuda1121.zip.ZIP, period.
boinc_unzip() error: 9

It looks like every WU since the afternoon of the 7th (Zulu) is getting this but only on my single 12GB 4070S

Skip
7) Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU (Message 61401)
Posted 46 days ago by Skip Da Shu
to be expected with 8-10GB cards.

might get better context if you split the graphs up by card type. so you can see the relative error rate vs different VRAM sizes. I'm guessing most errors come from the 8GB cards.


They do:

8GB – last 2 checks of 2 cards 44.07
10GB – last 2 checks of 2 cards 30.80
12GB – last 2 checks of 1 card 7.62

But I need to look at the last day or two as rates have been going up.
8) Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU (Message 61363)
Posted 52 days ago by Skip Da Shu



Going the wrong direction :-(

Skip
9) Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU (Message 61360)
Posted 53 days ago by Skip Da Shu
https://imgur.com/evCBB73

GPUGRID error rate across 2x 3070 8GB, 2x 3080 10GB & 1 4070 Super 12GB (early part is with 3x 3070 8GB one of which was replaced by 4070S 2/20).

Skip
10) Message boards : News : PYSCFbeta: Quantum chemistry calculations on GPU (Message 61295)
Posted 68 days ago by Skip Da Shu
between your systems and mine, looking at the error rates;

~23% of tasks need more than 8GB
~17% of tasks need more than 10GB
~4% of tasks need more than 12GB
<1% of tasks need more than 16GB

me personally, i wouldn't run these (as they are now) with less than 12GB VRAM.


Not sure why but...

Error rates seemed to start dropping after 5pm (23:00 Zulu) today. Overall error average since 2/11 across my 5 Nvid cards was 26.7% with it slowly creeping down over time. Early on a little bit of this was the result of lowering clocks to eliminate the occasional segfault (0x8b).

The average of the last two captures today across the 5 cards was 20.5%

For the last 6 hour period I just checked, my 10GB card average error rate dropped to 17.3% (15.92 & 18.7) and the 8GB card error rate was at 21.3%.

Skip


IGNORE... all went to crap the next day (today)


Next 10
//