Advanced search

Message boards : Number crunching : New "testX-RAIMIS" WUs, all erroring

Author Message
Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 579
Credit: 8,927,737,024
RAC: 17,236,548
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55722 - Posted: 11 Nov 2020 | 17:55:12 UTC

New "testX-RAIMIS" - Anaconda Python 3 Environment v4.01 (cuda 100) received today: I received 4 tasks at four different of my Linux hosts, and all 4 finished with error after several minutes.
test2-RAIMIS_PYTEST6-0-1-RND6445_4
test2-RAIMIS_NNPMM-0-1-RND9981_4
test3-RAIMIS_NNPMM-0-1-RND4429_1
test4-RAIMIS_NNPMM-0-1-RND7976_7
I got to watch an extrange behavior in the last of them: It started to progress quickly, reaching 100% after about 3 minutes, then it returned to 10% progress for a while, then to 100% again for a couple of minutes, and finally it finished erroring... (?)

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 1332
Credit: 7,085,167,459
RAC: 14,961,402
Level
Tyr
Scientific publications
watwatwatwatwat
Message 55723 - Posted: 12 Nov 2020 | 1:43:16 UTC

I got one of these too that errored out after 1200 seconds or so.

I wonder if the tasks really are completely self-contained within the wrapper.

From the errors in the stderr.txt output, I wonder if the problem is the app trying to run within the original python environment which is deprecated in the modern distros.

I can't run any python apps anymore. I always have to remember to run them as python3 since python2 was deprecated.

Toni
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 9 Dec 08
Posts: 1006
Credit: 5,068,599
RAC: 0
Level
Ser
Scientific publications
watwatwatwat
Message 55729 - Posted: 12 Nov 2020 | 18:06:09 UTC - in response to Message 55723.

We are debugging those workunits. They don't depend on the system's python. There is a relatively large initial download which gets a full environment (which should be then reused).

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 579
Credit: 8,927,737,024
RAC: 17,236,548
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 55768 - Posted: 17 Nov 2020 | 22:22:35 UTC - in response to Message 55729.
Last modified: 17 Nov 2020 | 22:23:37 UTC

In fact, I received these other two tasks, and both succeeded:
test15-RAIMIS_NNPMM-1-2-RND4792_0
test7-RAIMIS_NNPMM-0-1-RND7929_0
Well done!

rod4x4
Send message
Joined: 4 Aug 14
Posts: 266
Credit: 2,219,935,054
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 55770 - Posted: 18 Nov 2020 | 0:40:04 UTC - in response to Message 55768.
Last modified: 18 Nov 2020 | 1:06:38 UTC

In fact, I received these other two tasks, and both succeeded:
test15-RAIMIS_NNPMM-1-2-RND4792_0
test7-RAIMIS_NNPMM-0-1-RND7929_0
Well done!

There is a volume of info in the STDerr Output

A lot of the output describes the atoms, time period, calculations, box size, etc.

What I found interesting was it can identify the GPU
#Platform properties:
# DeviceIndex: 2
# DeviceName: GeForce GTX 1650


It checks the Platform, both OpenCL and CUDA
# WARNING: there is no library for "OpenCL" plugin
# Available platforms
# CPU
# CUDA


Then states calculation platform
# Setting up platform: CUDA


Dare I say Gpugrid are preparing for OpenCL version.... AMD compatibility?

EDIT:
Or maybe not.
Prior to the OpenCL warning, there is this:
# Plugin directory: /var/lib/boinc-client/slots/4/gpugridpy/lib/acemd3
# Loaded plugins
# CPU
# PME
# CUDA
# CudaCompiler

The Wrapper has only preloaded CUDA plugins, no OpenCL plugins are packaged with the App. So I think i got excited too early.

Post to thread

Message boards : Number crunching : New "testX-RAIMIS" WUs, all erroring

//