1) Message boards : Number crunching : GERARD_CXCL12LOCKMONO (Message 43816)
Posted 2 days ago by Profile skgiven
Server says:

Detailed computing status

Application unsent in progress success error rate -- GERARD_CXCL12LOCKMON 234 170 0 100%


http://www.gpugrid.net/server_status.php - scroll down

If it's a new batch this is to be expected, lots of immediate failures will return before completed tasks. The first completed task might not report until ~7h after it was sent.
The first healthy looking LOCKMONO task I'm running is now at 10% now and I've another on W10 at 1.3% after 11 mins. GPU usage 80% @73% power and temp throttling enabled, using 1GB GDDR at present - that all looks quite normal.
2) Message boards : Number crunching : cant get any new tasks... why? (Message 43815)
Posted 2 days ago by Profile skgiven
The 10 max errors might prove to be a problem with these GERARD_CXCL12LOCKMONO tasks.
3) Message boards : Number crunching : Ghost workunits (Message 43814)
Posted 2 days ago by Profile skgiven
Each ghost task would probably need to be manually purged from the server - not a job I'd fancy doing from a Terminal!
4) Message boards : Number crunching : GTX 1080 WU (Message 43812)
Posted 3 days ago by Profile skgiven
I can't recompile the ACEMD app using the CUDA 8 Toolkit for them, so this is just a help you work-around until the people who can, do.

Anyone in this 'mixed-GPU' situation can also exclude the long app and allow any Beta's that might turn up some time down the line to run on their Pascal's.

<exclude_gpu>
<url>http://www.gpugrid.net/</url>
<type>NVIDIA</type>
<device_num>1</device_num>
<app>acemdlong</app>
</exclude_gpu>

There is a short task queue too, but that's empty ATM. Perhaps the devs would want to use it for testing also? Note that I've had no advice that the devs are working on this or how long it might take them to get to Beta. Last I heard they didn't have a Pascal to even start to work with.
5) Message boards : Number crunching : GTX 1080 WU (Message 43810)
Posted 3 days ago by Profile skgiven
You can edit your cc_config.xml file to "exclude GPU" by device number against a project to prevent it from receiving work here, but allow the 980Ti to run tasks here and still use the 1080 elsewhere.

https://boinc.berkeley.edu/wiki/Client_configuration

    <exclude_gpu>
    Don't use the given GPU for the given project. If <device_num> is not specified, exclude all GPUs of the given type. <type> is required if your computer has more than one type of GPU; otherwise it can be omitted. <app> specifies the short name of an application (i.e. the <name> element within the <app> element in client_state.xml). If specified, only tasks for that app are excluded. You may include multiple <exclude_gpu> elements. If you change GPU exclusions, you must restart the BOINC client for these changes to take effect.

    <exclude_gpu>
    <url>project_URL</url>
    <device_num>N</device_num>
    <type>NVIDIA|ATI|intel_gpu</type>
    <app>appname</app>
    </exclude_gpu>




For example, if your 1080 is device 1 (according to the Boinc log), then the following should prevent it's use at GPUGrid if you add it to your cc_config file and Save As all file types:

<exclude_gpu>
<url>http://www.gpugrid.net/</url>
<device_num>1</device_num>
</exclude_gpu>

6) Message boards : Number crunching : GERARD_CXCL12LOCKMONO (Message 43809)
Posted 3 days ago by Profile skgiven
Most failing on Linux too. Errors:

Stderr output

<core_client_version>7.6.31</core_client_version>
<![CDATA[
<message>
process exited with code 176 (0xb0, -80)
</message>
<stderr_txt>
# SWAN Device 0 :
# Name : GeForce GTX 970
# ECC : Disabled
# Global mem : 4095MB
# Capability : 5.2
# PCI ID : 0000:01:00.0
# Device clock : 1215MHz
# Memory clock : 3505MHz
# Memory width : 256bit
# The simulation has become unstable. Terminating to avoid lock-up (1)
# Attempting restart (step 5000)

</stderr_txt>
]]>

Stderr output

<core_client_version>7.6.31</core_client_version>
<![CDATA[
<message>
process exited with code 199 (0xc7, -57)
</message>
<stderr_txt>
# SWAN Device 0 :
# Name : GeForce GTX 970
# ECC : Disabled
# Global mem : 4095MB
# Capability : 5.2
# PCI ID : 0000:01:00.0
# Device clock : 1215MHz
# Memory clock : 3505MHz
# Memory width : 256bit
SWAN : FATAL : Cuda driver error 700 in file 'swanlibnv2.cpp' in line 1963.
# SWAN swan_assert -57

</stderr_txt>
]]>

Stderr output

<core_client_version>7.6.31</core_client_version>
<![CDATA[
<message>
process exited with code 176 (0xb0, -80)
</message>
<stderr_txt>
# SWAN Device 0 :
# Name : GeForce GTX 970
# ECC : Disabled
# Global mem : 4095MB
# Capability : 5.2
# PCI ID : 0000:01:00.0
# Device clock : 1215MHz
# Memory clock : 3505MHz
# Memory width : 256bit
# The simulation has become unstable. Terminating to avoid lock-up (1)
# Attempting restart (step 5000)

</stderr_txt>
]]>


Stderr output

<core_client_version>7.6.31</core_client_version>
<![CDATA[
<message>
process exited with code 199 (0xc7, -57)
</message>
<stderr_txt>
# SWAN Device 0 :
# Name : GeForce GTX 970
# ECC : Disabled
# Global mem : 4095MB
# Capability : 5.2
# PCI ID : 0000:01:00.0
# Device clock : 1215MHz
# Memory clock : 3505MHz
# Memory width : 256bit
SWAN : FATAL : Cuda driver error 700 in file 'swanlibnv2.cpp' in line 1963.
# SWAN swan_assert -57

</stderr_txt>
]]>

Maybe these are designed to 'fail early' if they are likely to fail at all?

Task 15166493 has reached 5.5% after 1h on my Linux system, so the odd one appears to be running normally.

1x39-GERARD_CXCL12LOCKMONO-0-3-RND8941_0
7) Message boards : Graphics cards (GPUs) : GTX 1080 (Message 43804)
Posted 3 days ago by Profile skgiven
When they get a GTX1080/GTX1070 to work with they need to recompile the app using the latest CUDA 8 dev kit, then perform some in-house alpha testing and then Beta test it here. Might also be compatibility issues with previous generations which would need to be ironed out. My guess is several weeks minimum but possibly after the summer. Don't know if they can run multiple app versions, but would be best to.
8) Message boards : Graphics cards (GPUs) : CUDA 7.5 drivers are faster than CUDA 8 on GTX 970 (Message 43803)
Posted 3 days ago by Profile skgiven
Yes, thanks, should have said 362 took 6% longer & it does look like CUDA 8 drivers are slower at running CUDA 6.5 apps than CUDA 7.5 drivers.
Maybe I'm wrong but I presumed CUDA 8 was developed for Pascal and then patched for backward compatibility with earlier CUDA versions? Whatever the route I don't see why NV would dev for older products, so I would not expect any performance gain from CUDA 8 for 28nm GTX cards, though they might improve upon their earliest CUDA 8 drivers somewhat. As usual I suggest people stick to the CUDA 7.5 drivers they have unless they need new drivers or want to test and report on them - which is useful if it confirms CUDA 7.5 drivers are faster than CUDA 8 for GTX 900 series GPU's or any changes to the 6% to 10% performance drop.
9) Message boards : Graphics cards (GPUs) : CUDA 7.5 drivers are faster than CUDA 8 on GTX 970 (Message 43769)
Posted 13 days ago by Profile skgiven
This has been seen before here.
Drivers become bloated with patches and it slows some apps down. Operating systems get bloated in a similar way. Basically each fix introduces an additional subroutine or set of subroutines.
The difference observed by Jim1348 was 10% while the difference observed by ETA was 6%, so perhaps subsequent drivers (ETA's being slightly newer) or different setups change things.

The latest apps hear are CUDA 6.5 so any drivers that support 6.5 should suffice (unless there has been reported issues).
To test any differences you do need a steady supply of the same work type, and to stick to the same setup.
10) Message boards : News : WU: OPM995 simulations (Message 43731)
Posted 19 days ago by Profile skgiven
Just after correcting my remedial English :)

PS. Looks like it's backup-project time again 🕒


Next 10