Advanced search

Message boards : News : New multicore app and WUs

Author Message
Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 591
Credit: 4,273,184
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 48127 - Posted: 10 Nov 2017 | 14:58:07 UTC
Last modified: 10 Nov 2017 | 15:10:10 UTC

Dears,

we would like to test our new CPU multicore application for quantum chemistry tasks ("QC"). Since it’s the first time we have a CPU app out, I’ll test the behavior of GPUGRID with a relatively large batch that you will see soon. Workunits are named "*QC309big*".

Here’s some features of the app, in short (subject to change):

* Platform: Linux only for now, generic x64.
* Threads: as many as Boinc decides. I guess it depends on your machine, your preferences, and other running tasks in ways which are obscure to me…
* Run time: about 1 CPU hour per WU (so, shorter if multithreading)
* Credit: computed with the default algorithm (tasks are short, don’t expect much). Bonus mechanism for fast turnaround is still on.
* Known bugs: restarts and checkpoints. This should be mitigated with the “keep in memory when suspended” option. Sorry about that, it’s outside of our control.
* Network behavior: the first time you get a WU of this kind it downloads a Python interpreter (miniconda) and then some open-source packages, and installs them in the project directory. The installation is reused whenever possible.
* Disk usage: could go around 1 GB, perhaps more when tasks are running. Resetting the project should remove everything.
* Memory usage: should be around 1 GB when running.

Depending on the results of this test, we’ll start thinking about other platforms.

Thanks and nice crunching!

Toni

Sergey Kovalchuk
Send message
Joined: 18 Feb 16
Posts: 5
Credit: 1,009,912
RAC: 41
Level
Ala
Scientific publications
wat
Message 48130 - Posted: 10 Nov 2017 | 15:37:26 UTC - in response to Message 48127.

the client does not receive WUs, although there are almost a thousand of them and the client is suitable for the requirements (Linux x64). earlier this host was able to receive test tasks for QC and python

please write the exact requirements (memory, disk, OS) specified when generating tasks

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 591
Credit: 4,273,184
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 48131 - Posted: 10 Nov 2017 | 15:41:06 UTC - in response to Message 48130.
Last modified: 10 Nov 2017 | 15:44:30 UTC

Can you check what applications are you accepting in your preferences?

By the way requests are currently as follows:


<rsc_fpops_est>3e12</rsc_fpops_est>
<rsc_fpops_bound>250e15</rsc_fpops_bound>
<rsc_disk_bound>4e9</rsc_disk_bound>
<rsc_memory_bound>1e9</rsc_memory_bound>

Sergey Kovalchuk
Send message
Joined: 18 Feb 16
Posts: 5
Credit: 1,009,912
RAC: 41
Level
Ala
Scientific publications
wat
Message 48132 - Posted: 10 Nov 2017 | 16:05:30 UTC - in response to Message 48131.

All apps selected & "accept work from other"


Preferences:
max memory usage when active: 1900.76MB
max memory usage when idle: 1980.80MB
max disk usage: 6.71GB (4,47 free)

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 591
Credit: 4,273,184
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 48133 - Posted: 10 Nov 2017 | 16:09:28 UTC - in response to Message 48132.

Another boinc mystery...

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 591
Credit: 4,273,184
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 48134 - Posted: 10 Nov 2017 | 16:44:59 UTC - in response to Message 48133.

Jobs only seem to go to a subset of eligible machines. If anybody out there has a clue of the reason, I'll be glad to hear.

klepel
Send message
Joined: 23 Dec 09
Posts: 136
Credit: 1,829,755,020
RAC: 1,443,951
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48135 - Posted: 10 Nov 2017 | 17:25:12 UTC

All error out with this:
Stderr output

<core_client_version>7.6.33</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)
</message>
<stderr_txt>
12:19:41 (31019): wrapper (7.7.26016): starting
12:19:41 (31019): wrapper (7.7.26016): starting
12:19:41 (31019): wrapper: running ../../projects/www.gpugrid.net/Miniconda3-4.3.30-Linux-x86_64.sh (-b -f -p /var/lib/boinc-client/projects/www.gpugrid.net/miniconda)
Python 3.6.3 :: Anaconda, Inc.
12:19:49 (31019): miniconda-installer exited; CPU time 6.649529
12:19:49 (31019): wrapper: running /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/python (pre_script.py)
12:19:59 (31019): $PROJECT_DIR/miniconda/bin/python exited; CPU time 7.101246
12:19:59 (31019): wrapper: running /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/psi4 (-n 14 -i psi4.in -o psi4.out)
/var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/psi4: 3: /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/psi4: readlink: not found
/var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/psi4: 9: /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/psi4: /bin/psi4.bin: not found
12:20:00 (31019): $PROJECT_DIR/miniconda/bin/psi4 exited; CPU time 0.001541
12:20:00 (31019): app exit status: 0x7f
12:20:00 (31019): called boinc_finish(195)

</stderr_txt>
]]>
It is this computer:
http://www.gpugrid.net/show_host_detail.php?hostid=420971

Profile Stoneageman
Avatar
Send message
Joined: 25 May 09
Posts: 211
Credit: 12,238,523,896
RAC: 8,910,972
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48136 - Posted: 10 Nov 2017 | 17:53:41 UTC

All error out after a few seconds on AMD and Intel machines

<core_client_version>7.6.33</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)
</message>
<stderr_txt>
17:27:46 (14006): wrapper (7.7.26016): starting
17:27:46 (14006): wrapper (7.7.26016): starting
17:27:46 (14006): wrapper: running ../../projects/www.gpugrid.net/Miniconda3-4.3.30-Linux-x86_64.sh (-b -f -p /var/lib/boinc-client/projects/www.gpugrid.net/miniconda)
Python 3.6.3 :: Anaconda, Inc.
17:27:54 (14006): miniconda-installer exited; CPU time 6.648000
17:27:54 (14006): wrapper: running /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/python (pre_script.py)
17:28:05 (14006): $PROJECT_DIR/miniconda/bin/python exited; CPU time 7.584000
17:28:05 (14006): wrapper: running /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/psi4 (-n 15 -i psi4.in -o psi4.out)
/var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/psi4: 3: /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/psi4: readlink: not found
/var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/psi4: 9: /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/psi4: /bin/psi4.bin: not found
17:28:06 (14006): $PROJECT_DIR/miniconda/bin/psi4 exited; CPU time 0.000000
17:28:06 (14006): app exit status: 0x7f
17:28:06 (14006): called boinc_finish(195)

Profile [AF] fansyl
Send message
Joined: 26 Sep 13
Posts: 1
Credit: 566,531,404
RAC: 936,523
Level
Lys
Scientific publications
watwatwatwatwatwatwatwat
Message 48137 - Posted: 10 Nov 2017 | 18:09:57 UTC
Last modified: 10 Nov 2017 | 18:11:15 UTC

Hello,

error on my computer: Ubuntu mate 16.04/kernel 4.13.11/Ryzen 5 1400

Stderr output

<core_client_version>7.6.31</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)
</message>
<stderr_txt>
19:00:23 (31619): wrapper (7.7.26016): starting
19:00:23 (31619): wrapper (7.7.26016): starting
19:00:23 (31619): wrapper: running ../../projects/www.gpugrid.net/Miniconda3-4.3.30-Linux-x86_64.sh (-b -f -p /var/lib/boinc-client/projects/www.gpugrid.net/miniconda)
Python 3.6.3 :: Anaconda, Inc.
19:00:33 (31619): miniconda-installer exited; CPU time 8.382948
19:00:33 (31619): wrapper: running /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/python (pre_script.py)
19:03:37 (31619): $PROJECT_DIR/miniconda/bin/python exited; CPU time 63.497739
19:03:37 (31619): wrapper: running /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/psi4 (-n 7 -i psi4.in -o psi4.out)
/var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/psi4: 3: /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/psi4: readlink: not found
/var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/psi4: 9: /var/lib/boinc-client/projects/www.gpugrid.net/miniconda/bin/psi4: /bin/psi4.bin: not found
19:03:38 (31619): $PROJECT_DIR/miniconda/bin/psi4 exited; CPU time 0.002335
19:03:38 (31619): app exit status: 0x7f
19:03:38 (31619): called boinc_finish(195)

</stderr_txt>
]]>


Good luck for debug

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 591
Credit: 4,273,184
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 48138 - Posted: 10 Nov 2017 | 18:21:56 UTC - in response to Message 48137.

Dears, all three errors mention a missing "readlink" executable. It is surprising, because it's a fairly basic command, but please check if you can run "readlink" in a terminal. If not installed, should be in the "coreutils" package.

Profile Stoneageman
Avatar
Send message
Joined: 25 May 09
Posts: 211
Credit: 12,238,523,896
RAC: 8,910,972
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48139 - Posted: 10 Nov 2017 | 18:40:21 UTC

It is installed

readlink --version
readlink (GNU coreutils) 8.26
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.

klepel
Send message
Joined: 23 Dec 09
Posts: 136
Credit: 1,829,755,020
RAC: 1,443,951
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48140 - Posted: 10 Nov 2017 | 19:01:39 UTC

Same here. It is installed readlink version 8.26.

Profile Daniel
Send message
Joined: 17 Sep 16
Posts: 4
Credit: 101,955,747
RAC: 2,163,426
Level
Cys
Scientific publications
watwat
Message 48141 - Posted: 10 Nov 2017 | 19:12:06 UTC

I also have problem with getting new WUs on some of my machines. Looks that ones with Nvidia card get work, and ones without it do not get anything.
____________

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 271
Credit: 1,305,108,981
RAC: 5,303,877
Level
Met
Scientific publications
watwat
Message 48142 - Posted: 10 Nov 2017 | 19:21:43 UTC

Is there a particular reason this is a CPU application and not a GPU one?

mmonnin
Send message
Joined: 2 Jul 16
Posts: 38
Credit: 48,937,070
RAC: 1,364,637
Level
Val
Scientific publications
wat
Message 48144 - Posted: 10 Nov 2017 | 23:02:51 UTC - in response to Message 48140.

Same here. It is installed readlink version 8.26.



Same here.

NNW until there's a fix.

Profile Daniel
Send message
Joined: 17 Sep 16
Posts: 4
Credit: 101,955,747
RAC: 2,163,426
Level
Cys
Scientific publications
watwat
Message 48145 - Posted: 11 Nov 2017 | 0:06:59 UTC - in response to Message 48144.
Last modified: 11 Nov 2017 | 0:19:28 UTC

Same here. It is installed readlink version 8.26.



Same here.

NNW until there's a fix.

On Linux CentOS 7.4 is works fine. I suspect that bolinc is not able to find or execute readlink cmd. Please try executing following commands:

which readlink
ls -l `which readlink`
sudo -iu boinc bash -c 'which readlink'
sudo -iu boinc bash -c 'ls -l `which readlink`'
sudo -iu boinc readlink /lib/libz.so.1


On my CentOS they return following results:

# which readlink
/usr/bin/readlink
# ls -l `which readlink`
-rwxr-xr-x. 1 root root 41800 2016-11-05 /usr/bin/readlink
# sudo -iu boinc bash -c 'which readlink'
/bin/readlink
# sudo -iu boinc bash -c 'ls -l `which readlink`'
-rwxr-xr-x. 1 root root 41800 2016-11-05 /bin/readlink
# sudo -iu boinc readlink /lib/libz.so.1
libz.so.1.2.7

____________

Profile Stoneageman
Avatar
Send message
Joined: 25 May 09
Posts: 211
Credit: 12,238,523,896
RAC: 8,910,972
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48146 - Posted: 11 Nov 2017 | 1:25:35 UTC

# which readlink
/bin/readlink


# ls -l `which readlink`
-rwxr-xr-x 1 root root 43192 Oct 4 20:56 /bin/readlink

The following return nothing
# sudo -iu boinc bash -c 'which readlink'
# sudo -iu boinc bash -c 'ls -l `which readlink`'
# sudo -iu boinc readlink /lib/libz.so.1

mmonnin
Send message
Joined: 2 Jul 16
Posts: 38
Credit: 48,937,070
RAC: 1,364,637
Level
Val
Scientific publications
wat
Message 48147 - Posted: 11 Nov 2017 | 2:51:59 UTC - in response to Message 48146.

Commands do not work for me either.

Trotador
Send message
Joined: 25 Mar 12
Posts: 83
Credit: 1,070,639,199
RAC: 147,866
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 48148 - Posted: 11 Nov 2017 | 9:13:37 UTC

So, I copied readlink program to usr/bin and now it is working in my ubuntu hosts.

Profile [VENETO] sabayonino
Send message
Joined: 4 Apr 10
Posts: 47
Credit: 545,196,862
RAC: 424,804
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48149 - Posted: 11 Nov 2017 | 10:27:00 UTC - in response to Message 48148.
Last modified: 11 Nov 2017 | 10:35:56 UTC

Readlink path usually is /usr/bin but it depend on various packaging and configuration provided by the distro

Don't copy the file from /bin to /usr/bin (or whatever)

just create a symlink. If for same reason readlink will be updated , the file you've copied will not updated

$ sudo ln -sf /bin/readlink /usr/bin/readlink


PS : my readlink path is
$ which readlink
/usr/bin/readlink

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 591
Credit: 4,273,184
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 48150 - Posted: 11 Nov 2017 | 10:51:23 UTC - in response to Message 48149.
Last modified: 11 Nov 2017 | 10:55:02 UTC

I'll add /bin to the path in the next app update. That may work, unless there is some weird sandboxing thing going on. You shouldn't need to tweak your system: just let them fail (they should fail fast, so no CPU loss).

Concerning why some hosts are not receiving WUs, it's baffling me. It's not a matter of hosts already having GPUs because my own machine does and it did not get tasks. It may be related to the "reliable hosts" classification.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 591
Credit: 4,273,184
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 48151 - Posted: 11 Nov 2017 | 11:38:12 UTC - in response to Message 48150.

@Daniel: can you list one of your hosts which gets QC tasks and one which doesn't?

Thanks

Profile Daniel
Send message
Joined: 17 Sep 16
Posts: 4
Credit: 101,955,747
RAC: 2,163,426
Level
Cys
Scientific publications
watwat
Message 48152 - Posted: 11 Nov 2017 | 12:09:40 UTC - in response to Message 48151.

@Daniel: can you list one of your hosts which gets QC tasks and one which doesn't?

Thanks

Hosts which get tasks: 449991, 449992, 391907
Hosts which did not get any: 444456, 452231
____________

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 178
Credit: 132,357,411
RAC: 2,487
Level
Cys
Scientific publications
watwatwatwatwatwatwatwat
Message 48153 - Posted: 11 Nov 2017 | 12:28:20 UTC - in response to Message 48127.

Many thanks for this: I look forward to the Windows version!

Dears,

we would like to test our new CPU multicore application for quantum chemistry tasks ("QC"). Since it’s the first time we have a CPU app out, I’ll test the behavior of GPUGRID with a relatively large batch that you will see soon. Workunits are named "*QC309big*".

Here’s some features of the app, in short (subject to change):

* Platform: Linux only for now, generic x64.
* Threads: as many as Boinc decides. I guess it depends on your machine, your preferences, and other running tasks in ways which are obscure to me…
* Run time: about 1 CPU hour per WU (so, shorter if multithreading)
* Credit: computed with the default algorithm (tasks are short, don’t expect much). Bonus mechanism for fast turnaround is still on.
* Known bugs: restarts and checkpoints. This should be mitigated with the “keep in memory when suspended” option. Sorry about that, it’s outside of our control.
* Network behavior: the first time you get a WU of this kind it downloads a Python interpreter (miniconda) and then some open-source packages, and installs them in the project directory. The installation is reused whenever possible.
* Disk usage: could go around 1 GB, perhaps more when tasks are running. Resetting the project should remove everything.
* Memory usage: should be around 1 GB when running.

Depending on the results of this test, we’ll start thinking about other platforms.

Thanks and nice crunching!

Toni


____________
John

Profile Conan
Send message
Joined: 25 Mar 09
Posts: 5
Credit: 85,320
RAC: 16
Level

Scientific publications
wat
Message 48154 - Posted: 11 Nov 2017 | 22:48:37 UTC
Last modified: 11 Nov 2017 | 22:51:20 UTC

Two of my computers have received tasks and processed them with no trouble.
Both run Fedora (16 and 21), host ids are 192138 and 189186.
My 8 core (16 thread) computer (running Fedora 25) has yet to receive a task.

Host 192138 is a 6 core computer and Host 189186 is a four core computer.

The 6 core has shorter Run times per task and more CPU times than the 4 core.

This is as expected due to core count, however the 4 core computer gets higher credit per task than the 6 core, this does not make sense.

6 core getting around 1,500 sec Run time, 8,600 CPU time and about 66 credits.

4 core getting around 3,200 sec Run time, 6,900 CPU time and about 85+ credits.

A bit odd perhaps?

Conan

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 591
Credit: 4,273,184
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 48155 - Posted: 11 Nov 2017 | 23:04:48 UTC - in response to Message 48154.
Last modified: 11 Nov 2017 | 23:29:30 UTC

Credit assignment logic has historically been problematic (see here) to the point that I am inclined to think that it has no best solution. For the time being the credit algorithm is the old default one from boinc. I think it relies heavily on the self-computed FLOPS and yes that seems paradoxical.

el_gallo_azul
Send message
Joined: 14 Jun 14
Posts: 8
Credit: 28,088,602
RAC: 0
Level
Val
Scientific publications
wat
Message 48156 - Posted: 12 Nov 2017 | 2:36:31 UTC

I haven't been able to successfully process a WU on my computer. I've received many, but they've all resulted in "Computation error".

See screenshot: https://imgur.com/z0vLkoh

mmonnin
Send message
Joined: 2 Jul 16
Posts: 38
Credit: 48,937,070
RAC: 1,364,637
Level
Val
Scientific publications
wat
Message 48157 - Posted: 12 Nov 2017 | 3:14:36 UTC - in response to Message 48156.

I haven't been able to successfully process a WU on my computer. I've received many, but they've all resulted in "Computation error".

See screenshot: https://imgur.com/z0vLkoh


You'll have to try one of the suggestions posted by Daniel or [VENETO] sabayonino above. I'm waiting for more WUs to try myself.

gianni
Send message
Joined: 11 Jul 08
Posts: 16
Credit: 105,098
RAC: 0
Level

Scientific publications
watwatwat
Message 48158 - Posted: 12 Nov 2017 | 4:39:28 UTC - in response to Message 48142.

we are not aware of fast and free gpu qm applications. if you know one, let us know.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 591
Credit: 4,273,184
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 48159 - Posted: 12 Nov 2017 | 8:40:07 UTC - in response to Message 48157.
Last modified: 12 Nov 2017 | 9:30:30 UTC

Please do not tweak your system. The current application (QC 3.10) should solve the problem.

eXaPower
Send message
Joined: 25 Sep 13
Posts: 265
Credit: 1,043,017,367
RAC: 1,762,909
Level
Met
Scientific publications
watwatwatwatwatwat
Message 48160 - Posted: 12 Nov 2017 | 13:28:59 UTC - in response to Message 48158.

we are not aware of fast and free gpu qm applications. if you know one, let us know.


@UF & @UNC developed ANAKIN-ME to create fast, accurate quantum mechanical simulations. See the demo at #SC17 http://nvda.ws/2zyBhKj


https://twitter.com/NVIDIADC

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1895
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 48161 - Posted: 12 Nov 2017 | 16:14:10 UTC - in response to Message 48160.

Yes, we have that and it is nice, but limited and not a QM code.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 38
Credit: 48,937,070
RAC: 1,364,637
Level
Val
Scientific publications
wat
Message 48167 - Posted: 13 Nov 2017 | 13:14:17 UTC

I completed one this morning in Ubuntu.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 591
Credit: 4,273,184
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 48169 - Posted: 13 Nov 2017 | 14:03:22 UTC - in response to Message 48167.

The new app has 0% failure rate. However, only a handful of hosts are receiving it, for reasons utterly obscure.

This is the only indication i found in the logs:

2017-11-10 20:06:33.9454 [PID=182743] [quota] Overall limits on jobs in progress:
2017-11-10 20:06:33.9454 [PID=182743] [quota] CPU: base 2 scaled 112 njobs 0
2017-11-10 20:06:33.9454 [PID=182743] [quota] GPU: base 2 scaled 0 njobs 0


That "njobs 0" seems to prevent result sending. Any clue hugely appreciated...

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 788
Credit: 1,422,060,845
RAC: 1,410,932
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48170 - Posted: 13 Nov 2017 | 14:19:55 UTC - in response to Message 48169.

The new app has 0% failure rate. However, only a handful of hosts are receiving it, for reasons utterly obscure.

This is the only indication i found in the logs:

2017-11-10 20:06:33.9454 [PID=182743] [quota] Overall limits on jobs in progress:
2017-11-10 20:06:33.9454 [PID=182743] [quota] CPU: base 2 scaled 112 njobs 0
2017-11-10 20:06:33.9454 [PID=182743] [quota] GPU: base 2 scaled 0 njobs 0


That "njobs 0" seems to prevent result sending. Any clue hugely appreciated...

The only reading material I can suggest is http://boinc.berkeley.edu/trac/wiki/ProjectOptions#Joblimits, but I imagine you know that already. Remember to read the following 'Job limits (advanced)' section too.

captainjack
Send message
Joined: 9 May 13
Posts: 112
Credit: 820,718,399
RAC: 1,200,373
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwat
Message 48171 - Posted: 13 Nov 2017 | 14:44:30 UTC

For those interested in controlling the number of threads used by the multicore app, the following app_config.xml entries seem to work.

<app>
<name>QC</name>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>QC</app_name>
<plan_class>mt</plan_class>
<avg_ncpus>9</avg_ncpus>
<cmdline>--nthreads 9</cmdline>
</app_version>

The <avg_ncpus> entry tells BOINC the number of threads to reserve for the app.

The <cmdline> entry tells the app the number of threads available for processing.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 591
Credit: 4,273,184
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 48172 - Posted: 13 Nov 2017 | 14:49:59 UTC - in response to Message 48171.

Can anybody comment on the suspend/resume behavior under a variety of conditions (ie. with and without "keep in memory")? I expect the calculation to restart from scratch, but not crash.

Profile bormolino
Send message
Joined: 16 May 13
Posts: 17
Credit: 18,355,071
RAC: 29,938
Level
Pro
Scientific publications
watwat
Message 48173 - Posted: 13 Nov 2017 | 15:17:03 UTC

Like many others I don't get any WUs on my linux machines.
____________

captainjack
Send message
Joined: 9 May 13
Posts: 112
Credit: 820,718,399
RAC: 1,200,373
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwat
Message 48174 - Posted: 13 Nov 2017 | 15:32:43 UTC

Can anybody comment on the suspend/resume behavior under a variety of conditions (ie. with and without "keep in memory")? I expect the calculation to restart from scratch, but not crash.


When I suspended a task with LAIM on, BOINC manager showed that it was suspended, but the system monitor showed that the task was still busy using all the threads that were allocated to it.

When I suspended a task with LAIM off, BOINC manager showed that the task was suspended and the task disappeared from the system monitor. When the task was resumed, it restarted from 0 and appears to be running normally.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 591
Credit: 4,273,184
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 48175 - Posted: 13 Nov 2017 | 15:54:35 UTC - in response to Message 48174.

@captainjack - thanks, appreciated.

klepel
Send message
Joined: 23 Dec 09
Posts: 136
Credit: 1,829,755,020
RAC: 1,443,951
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48176 - Posted: 13 Nov 2017 | 16:03:57 UTC

I just wanted to report back:
My host ID: 420971 gets work and finishes latest version with success!
My host ID: 452211 does not get any work. Message is: There is now work available. This host does not have any GPU and works from an USB stick.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 591
Credit: 4,273,184
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 48177 - Posted: 13 Nov 2017 | 16:15:25 UTC - in response to Message 48176.
Last modified: 13 Nov 2017 | 16:21:37 UTC

Working/not working pairs are useful for debugging indeed (if they have the same preferences, that is). It was suggested that it was the presence of a GPU, but there are GPU-less counter-examples, like this. The scheduler is a software nightmare...

I'll resume tests later this week. In the meantime, there are 1000 more CPU WUs (QC310big).

Jim1348
Send message
Joined: 28 Jul 12
Posts: 460
Credit: 1,130,761,180
RAC: 18,722
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 48180 - Posted: 13 Nov 2017 | 17:45:45 UTC
Last modified: 13 Nov 2017 | 17:52:18 UTC

Today is my lucky day. I just enabled the multicore app, and immediately picked up two of them on my i7-3770 machine running Ubuntu 16.04.3 (Linux 4.10.0.38), and BOINC 7.8.3. They run on 7 cores, with one core reserved for GPU support as set by BOINC preferences, not in the app_config (though I use one for other purposes).

However, suspending them does not shut them down with LAIM enabled, as noted before. I have not tried the non-LAIM case.

If it matters, this machine was attached to GPUGrid earlier, and I had run a few GPU work units on the GTX 980, though I am requesting only the CPU work now. But maybe that has something to do with why I am getting them.

EDIT: Also, I have "Run test applications?" enabled, though I don't know if that is necessary in this case.

Profile Conan
Send message
Joined: 25 Mar 09
Posts: 5
Credit: 85,320
RAC: 16
Level

Scientific publications
wat
Message 48183 - Posted: 13 Nov 2017 | 22:42:44 UTC

My two computers that are getting or have gotten cpu work, have both been connected before.
The new computer I attached does not get work but says "No work available" even when there is plenty.

Conan

el_gallo_azul
Send message
Joined: 14 Jun 14
Posts: 8
Credit: 28,088,602
RAC: 0
Level
Val
Scientific publications
wat
Message 48184 - Posted: 14 Nov 2017 | 0:19:29 UTC - in response to Message 48157.
Last modified: 14 Nov 2017 | 0:20:27 UTC

OK, thanks @mmonnin.

I've just

which readlink

followed by
sudo ln -sf /bin/readlink /usr/bin/readlink
,
and am now waiting for some more WUs.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 591
Credit: 4,273,184
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 48187 - Posted: 14 Nov 2017 | 9:55:22 UTC - in response to Message 48184.

Do not make symlinks. The problem is already solved.

Profile Coleslaw
Send message
Joined: 24 Jul 08
Posts: 35
Credit: 142,697,792
RAC: 127,574
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 48192 - Posted: 14 Nov 2017 | 20:09:33 UTC
Last modified: 14 Nov 2017 | 20:11:47 UTC

Since it’s the first time we have a CPU app out, I’ll test the behavior of GPUGRID with a relatively large batch that you will see soon.


I just started reading this thread. I thought I would point out that there was a multi-threaded CPU application back in 2014. It just wasn't necessarily for Quantum Chemistry.
____________

Profile Conan
Send message
Joined: 25 Mar 09
Posts: 5
Credit: 85,320
RAC: 16
Level

Scientific publications
wat
Message 48198 - Posted: 16 Nov 2017 | 7:21:52 UTC - in response to Message 48192.

Since it’s the first time we have a CPU app out, I’ll test the behavior of GPUGRID with a relatively large batch that you will see soon.


I just started reading this thread. I thought I would point out that there was a multi-threaded CPU application back in 2014. It just wasn't necessarily for Quantum Chemistry.


Yes I ran that one on both Windows 32 bit and Linux 64 bit, which is where nearly all my points came from, as I had to stop GPU use a few years ago so I ran the CPU app instead.

Conan

Dayle Diamond
Send message
Joined: 5 Dec 12
Posts: 69
Credit: 1,066,554,365
RAC: 978,405
Level
Met
Scientific publications
watwatwatwatwatwatwatwat
Message 48228 - Posted: 23 Nov 2017 | 16:47:19 UTC
Last modified: 23 Nov 2017 | 17:05:04 UTC

On a 1950x it's reserving all 32 threads but not running them near the maximum.
It seems to be switching which cores are active - my System Monitor CPU usage chart looks like a long line of infinity symbols.

If you divide the CPU time by the runtime, you'll see an average usage of about seventeen cores a second. Everything else is going to waste.

16713948 12878079 453935 23 Nov 2017 | 12:59:03 UTC 23 Nov 2017 | 16:09:15 UTC Completed and validated 680.18 11,586.25
67.70 Quantum Chemistry v3.10 (mt)

16713947 12878078 453935 23 Nov 2017 | 12:59:03 UTC 23 Nov 2017 | 14:12:17 UTC Completed and validated 761.12 12,984.46 267.57 Quantum Chemistry v3.10 (mt)

16713946 12878077 453935 23 Nov 2017 | 12:59:03 UTC 23 Nov 2017 | 15:11:46 UTC Completed and validated 702.76 11,639.75

PS. It's running at top priority over World Community Grid, but they've got similar deadlines. Is this intentional?

dfygrvty
Send message
Joined: 21 Nov 17
Posts: 2
Credit: 434,738
RAC: 16,941
Level

Scientific publications
wat
Message 48229 - Posted: 23 Nov 2017 | 17:53:43 UTC - in response to Message 48127.

getting a ton of quantum chemistry tasks on my aws ec2 p2.xlarge instance.
a47-toni_qc310k-0-1-* are the names of the tasks. Are these the new multicore tasks you talked about? The machine takes a task to 66% in 2 seconds and then sits at that percentage for ~10 minutes.

I think the task stops reporting progress @ 66%? bug? I compiled the boinc client on the ec2 instance, so it could definitely be user error as well.

klepel
Send message
Joined: 23 Dec 09
Posts: 136
Credit: 1,829,755,020
RAC: 1,443,951
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48230 - Posted: 23 Nov 2017 | 18:05:48 UTC

Same here stuck at 66%. Will go to lunch and see if it finished in the meanwhile.

dfygrvty
Send message
Joined: 21 Nov 17
Posts: 2
Credit: 434,738
RAC: 16,941
Level

Scientific publications
wat
Message 48231 - Posted: 23 Nov 2017 | 18:28:37 UTC - in response to Message 48229.

they finish about 10-15 minutes after they 'hang' on my ec2 instance.

klepel
Send message
Joined: 23 Dec 09
Posts: 136
Credit: 1,829,755,020
RAC: 1,443,951
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48232 - Posted: 23 Nov 2017 | 18:50:00 UTC

Here as well! Times are in relation with more threads and higher clock frequency on the other computer.

Dayle Diamond
Send message
Joined: 5 Dec 12
Posts: 69
Credit: 1,066,554,365
RAC: 978,405
Level
Met
Scientific publications
watwatwatwatwatwatwatwat
Message 48233 - Posted: 23 Nov 2017 | 20:44:43 UTC

I'm using Ubuntu's bundled system monitor to display CPU usage graphs. That 66% thing is just a bug with the work unit time estimation, but my cores really were gradually rising and falling from 0 to 100%. Like a helix on its side, but with 32 lines.

(It's not thermal throttling.)

IF at all possible, consider limiting each multicore app to four cores - almost every modern CPU's threads can be divided equally by four, so we can ensure the highest throughput as no thread would go to waste.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 591
Credit: 4,273,184
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 48234 - Posted: 23 Nov 2017 | 21:57:07 UTC - in response to Message 48233.
Last modified: 23 Nov 2017 | 22:00:00 UTC

The 66% is due to our using the boinc wrapper for an app which doesn't report its progress. There are three steps in the WU (install, update, compute) and the third is the long one, hence the 2/3.

If I figure out how, I'll try to limit the number of CPUs requested. I think the client has some control over it as well.

Petr Kriz
Send message
Joined: 22 Feb 09
Posts: 1
Credit: 9,642
RAC: 0
Level

Scientific publications
wat
Message 48235 - Posted: 23 Nov 2017 | 22:46:53 UTC

Just tried to run few tasks and still getting the same error:

<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)
</message>
<stderr_txt>
23:27:04 (6871): wrapper (7.7.26016): starting
23:27:04 (6871): wrapper (7.7.26016): starting
23:27:04 (6871): wrapper: running ../../projects/www.gpugrid.net/Miniconda3-4.3.30-Linux-x86_64.sh (-b -f -p /var/lib/boinc/projects/www.gpugrid.net/miniconda)
Python 3.6.3 :: Anaconda, Inc.
23:33:01 (6871): task miniconda-installer reached time limit 360
23:33:01 (6871): wrapper: running /var/lib/boinc/projects/www.gpugrid.net/miniconda/bin/python (pre_script.py)
Traceback (most recent call last):
File "pre_script.py", line 1, in <module>
import conda.cli
ModuleNotFoundError: No module named 'conda'
23:33:02 (6871): $PROJECT_DIR/miniconda/bin/python exited; CPU time 0.025285
23:33:02 (6871): app exit status: 0x1
23:33:02 (6871): called boinc_finish(195)

</stderr_txt>
]]>

Any idea, how to solve it?

klepel
Send message
Joined: 23 Dec 09
Posts: 136
Credit: 1,829,755,020
RAC: 1,443,951
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48236 - Posted: 24 Nov 2017 | 1:56:57 UTC

This one hang for about 6 hours:
http://www.gpugrid.net/result.php?resultid=16717461

el_gallo_azul
Send message
Joined: 14 Jun 14
Posts: 8
Credit: 28,088,602
RAC: 0
Level
Val
Scientific publications
wat
Message 48237 - Posted: 24 Nov 2017 | 4:44:33 UTC

Since I had 100% errors (Message 48156 - Posted: 12 Nov 2017 | 2:36:31 UTC) on my first batch of these CPU tasks, I created a symlink as instructed, then deleted the symlink as subsequently instructed, but I have never received a single task since my 12 Nov 2017 post.

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1895
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 48239 - Posted: 24 Nov 2017 | 11:10:37 UTC - in response to Message 48237.

OK, we will start production mode next week. Unfortunately we will need more than 50x the current number of CPUs, but it is just the start now, so it is ok.

gdf

mmonnin
Send message
Joined: 2 Jul 16
Posts: 38
Credit: 48,937,070
RAC: 1,364,637
Level
Val
Scientific publications
wat
Message 48241 - Posted: 24 Nov 2017 | 11:17:28 UTC - in response to Message 48228.

On a 1950x it's reserving all 32 threads but not running them near the maximum.
It seems to be switching which cores are active - my System Monitor CPU usage chart looks like a long line of infinity symbols.

If you divide the CPU time by the runtime, you'll see an average usage of about seventeen cores a second. Everything else is going to waste.

16713948 12878079 453935 23 Nov 2017 | 12:59:03 UTC 23 Nov 2017 | 16:09:15 UTC Completed and validated 680.18 11,586.25
67.70 Quantum Chemistry v3.10 (mt)

16713947 12878078 453935 23 Nov 2017 | 12:59:03 UTC 23 Nov 2017 | 14:12:17 UTC Completed and validated 761.12 12,984.46 267.57 Quantum Chemistry v3.10 (mt)

16713946 12878077 453935 23 Nov 2017 | 12:59:03 UTC 23 Nov 2017 | 15:11:46 UTC Completed and validated 702.76 11,639.75

PS. It's running at top priority over World Community Grid, but they've got similar deadlines. Is this intentional?


Pretty typical of multithreaded apps (of any BOINC project) that they do not scale that well past 4-8 cores. I typically use an app_config to 4 cores on mt apps like LHC, Cosmology, yafu, etc.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 340
Credit: 3,819,818,009
RAC: 929,462
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48242 - Posted: 24 Nov 2017 | 11:18:30 UTC - in response to Message 48239.

OK, we will start production mode next week. Unfortunately we will need more than 50x the current number of CPUs, but it is just the start now, so it is ok.

gdf



You will need a windows app for this.


Profile bormolino
Send message
Joined: 16 May 13
Posts: 17
Credit: 18,355,071
RAC: 29,938
Level
Pro
Scientific publications
watwat
Message 48243 - Posted: 24 Nov 2017 | 12:19:44 UTC - in response to Message 48237.
Last modified: 24 Nov 2017 | 12:20:25 UTC

Since I had 100% errors (Message 48156 - Posted: 12 Nov 2017 | 2:36:31 UTC) on my first batch of these CPU tasks, I created a symlink as instructed, then deleted the symlink as subsequently instructed, but I have never received a single task since my 12 Nov 2017 post.


Same here ...

mmonnin
Send message
Joined: 2 Jul 16
Posts: 38
Credit: 48,937,070
RAC: 1,364,637
Level
Val
Scientific publications
wat
Message 48244 - Posted: 24 Nov 2017 | 13:12:43 UTC

I received some yesterday on a new install of Ubuntu 17.10. No symlink or anything and they completed.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 271
Credit: 1,305,108,981
RAC: 5,303,877
Level
Met
Scientific publications
watwat
Message 48245 - Posted: 24 Nov 2017 | 16:05:33 UTC

If you need that many CPUs, you will definitely need a windows app

klepel
Send message
Joined: 23 Dec 09
Posts: 136
Credit: 1,829,755,020
RAC: 1,443,951
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48246 - Posted: 25 Nov 2017 | 0:07:38 UTC

Will the app name stay with "*QC309big*" or will it change for the real stuff? So we might make a app_config file or better still, might you propose an app_config file to limit cpu cores per work-unit to X cores.

@PappaLitto: I quite happy with a Linux only app!

It is time to make some Linux USB sticks (16GB USB3.0, 10 USD): I work with Lubuntu 17.10 on varios computers I do not use, and it works great! Or try BOINCOS v2.0 Beta Release, there all is pre-configured.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 591
Credit: 4,273,184
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 48249 - Posted: 25 Nov 2017 | 10:28:08 UTC - in response to Message 48246.

Making a windows app will probably need one of the following two solutions. Neither is perfect (by far).

* The "Windows Subsystem for Linux" from Microsoft. It's unfortunately W10 only (as far as I can tell), and probably we'd be the first BOINC project to use it (=headaches).
* A VirtualBox app. Its downsides are known I think.

By the way, question for the gurus: when you run a vbox app, is virtualbox automatically installed on your system?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 788
Credit: 1,422,060,845
RAC: 1,410,932
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48250 - Posted: 25 Nov 2017 | 12:40:39 UTC - in response to Message 48249.

By the way, question for the gurus: when you run a vbox app, is virtualbox automatically installed on your system?

No. The user has to install it themselves, and usually some VBox extensions are recommended as well.

There are two ways of installing VBox for Windows:

1) Via a combined single-click installer for both VBox and BOINC, available from BOINC. The simplicity is attractive, but there are downsides - there is no control over e.g. installation location, and the version of VBox included is usually several steps behind the current release.

2) Direct from the Oracle VBox site. BOINC will still recognise this - there's no special BOINC code in the combined VBox installer.

Any VBox extensions desired will always have to be downloaded from Oracle. There may be other adjustments required to the host computer, such as enabling virtualisation in the BIOS, which might be unfamiliar to the casual user.

captainjack
Send message
Joined: 9 May 13
Posts: 112
Credit: 820,718,399
RAC: 1,200,373
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwat
Message 48251 - Posted: 25 Nov 2017 | 16:26:24 UTC

klepel asked:


So we might make a app_config file or better still, might you propose an app_config file to limit cpu cores per work-unit to X cores.


<app_config>
<app>
<name>acemdlong</name>
<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>2</cpu_usage>
</gpu_versions>
</app>
<app>
<name>acemdshort</name>
<max_concurrent>2</max_concurrent>
<gpu_versions>
<gpu_usage>1.0</gpu_usage>
<cpu_usage>2</cpu_usage>
</gpu_versions>
</app>
<app>
<name>QC</name>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>QC</app_name>
<plan_class>mt</plan_class>
<avg_ncpus>4</avg_ncpus>
<cmdline>--nthreads 4</cmdline>
</app_version>
</app_config>


This will limit the QC (quantum chemistry) app to 4 threads per task and a maximum of 1 task at a time. You can adjust to your preferences.

Hope that helps.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 591
Credit: 4,273,184
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 48252 - Posted: 25 Nov 2017 | 19:39:08 UTC - in response to Message 48251.

Thanks to both!

mmonnin
Send message
Joined: 2 Jul 16
Posts: 38
Credit: 48,937,070
RAC: 1,364,637
Level
Val
Scientific publications
wat
Message 48253 - Posted: 25 Nov 2017 | 20:25:19 UTC

I think if the CPU app goes to VBox for Windows/Linux then there will be less user support than just Linux.

More than one current task will need to be allowed for efficient CPU usage.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 340
Credit: 3,819,818,009
RAC: 929,462
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 48254 - Posted: 25 Nov 2017 | 22:41:06 UTC

Even when you get the windows app going, it looks like you're still going short on the number of crunchers by more than half, based on the server status page which shows currently 821 users crunching long units in the last 24 hours, while 34 are crunching quantum chemistry. (821/34 = 24.15)

So, in order to meeting 50x, you will eventually have to create an app for multi CPU-GPU.

Quantum chemistry has a long way to go.

In the mean time, you can't make the windows app too difficult for the crunchers to set up, because most of us are not computer gurus and you will end up with only a few more crunchers.

This is a big undertaking. Good luck guys!!





Jim1348
Send message
Joined: 28 Jul 12
Posts: 460
Credit: 1,130,761,180
RAC: 18,722
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 48255 - Posted: 25 Nov 2017 | 22:56:34 UTC - in response to Message 48254.

Even when you get the windows app going, it looks like you're still going short on the number of crunchers by more than half, based on the server status page which shows currently 821 users crunching long units in the last 24 hours, while 34 are crunching quantum chemistry. (821/34 = 24.15)

So, in order to meeting 50x, you will eventually have to create an app for multi CPU-GPU.

I am all in favor of GPU, but as noted on many project forums, it doesn't work for most problems. But don't write off Linux on the CPU yet. It is just in the startup phase. I have even taken my machines off until the production version is released. Once the word gets around (be sure to post a note on the BOINC forum), you will get lots of help. And CPUs are getting more cores all the time.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 38
Credit: 48,937,070
RAC: 1,364,637
Level
Val
Scientific publications
wat
Message 48256 - Posted: 26 Nov 2017 | 4:17:47 UTC - in response to Message 48254.

Even when you get the windows app going, it looks like you're still going short on the number of crunchers by more than half, based on the server status page which shows currently 821 users crunching long units in the last 24 hours, while 34 are crunching quantum chemistry. (821/34 = 24.15)

So, in order to meeting 50x, you will eventually have to create an app for multi CPU-GPU.

Quantum chemistry has a long way to go.

In the mean time, you can't make the windows app too difficult for the crunchers to set up, because most of us are not computer gurus and you will end up with only a few more crunchers.

This is a big undertaking. Good luck guys!!


Oh there will be many more if there is consistent work. The work inconsistency of GPU work pushes many away. Compare the support for pogs vs duchamp. Similar projects but duchamp requires vbox

Profile Coleslaw
Send message
Joined: 24 Jul 08
Posts: 35
Credit: 142,697,792
RAC: 127,574
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 48259 - Posted: 28 Nov 2017 | 18:57:05 UTC

I would say don't mess with a virtualbox application if it would replace the Linux application. Too many headaches. If someone is running Windows, they could easily set up their own virtualbox VM and run it under Linux with the standard app. Win-win for everyone that way. It also give the user more control over the VM. Just my thoughts on it. Also, more and more people are migrating to Windows 10 and it is the direction all new machines are following. So, might as well prepare for the future.
____________

Jim1348
Send message
Joined: 28 Jul 12
Posts: 460
Credit: 1,130,761,180
RAC: 18,722
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 48260 - Posted: 28 Nov 2017 | 19:33:44 UTC - in response to Message 48259.

I would say don't mess with a virtualbox application if it would replace the Linux application. Too many headaches. If someone is running Windows, they could easily set up their own virtualbox VM and run it under Linux with the standard app. Win-win for everyone that way. It also give the user more control over the VM. Just my thoughts on it. Also, more and more people are migrating to Windows 10 and it is the direction all new machines are following. So, might as well prepare for the future.

My guess is that they could leave the Linux application as it is, and just add a VirtualBox application for Windows. I have had no particular problems with VBox on either Windows or Linux machines recently. I run LHC, Cosmology and sometimes others on it. I would prefer that they set it up so that I don' have to configure my own machine. All you really have to do is to first ensure that running a Virtual Machine is enabled in your BIOS. A good primer is on the Cosmology site:
http://www.cosmologyathome.org/faq.php#vtx

There is a much more elaborate checklist (if you need it) by Yeti on LHC:
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=4161&postid=29359#29359

After that, you just install VirtualBox and attach to the project. It is all set up from there.

Profile Coleslaw
Send message
Joined: 24 Jul 08
Posts: 35
Credit: 142,697,792
RAC: 127,574
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 48261 - Posted: 28 Nov 2017 | 20:01:05 UTC - in response to Message 48260.
Last modified: 28 Nov 2017 | 20:05:05 UTC

There is actually more to it than that. I have ran every VM project as well and have done so with a whole slew of different hardware and software setups. There is a huge reason why those projects do not get much support. LHC only has a large user base now because it merged the projects with the original Six Track project. Event still, the virtualbox applications remain less popular. Keep also in mind that these projects have problems with new releases of Virtualbox as they have right now. If I make my own VM using the latest release, it does not suffer the same. The only advantage to a vbox application is to allow the scientist to have an easier time compiling a single application. This may sound great to them, but the amount of lost time on the end user far exceeds their lost time.

Also, for reference if it helps, GPUGrid attempted vbox applications back in 2014. Discussion started in 2013 http://www.gpugrid.net/forum_thread.php?id=3542#33874
____________

Jim1348
Send message
Joined: 28 Jul 12
Posts: 460
Credit: 1,130,761,180
RAC: 18,722
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 48262 - Posted: 28 Nov 2017 | 20:03:21 UTC - in response to Message 48261.
Last modified: 28 Nov 2017 | 20:04:40 UTC

If you have big problems, don't do it. That is not a reason they should not offer it.

LHC would not exist in its present form without VirtualBox; neither would Cosmology and others.

And if you prefer to set up your own VirtualBox machine, you can still do that in order to run the Linux version from a Windows machine in any case. I think you are arguing the wrong point.

Profile Coleslaw
Send message
Joined: 24 Jul 08
Posts: 35
Credit: 142,697,792
RAC: 127,574
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 48263 - Posted: 28 Nov 2017 | 20:09:00 UTC
Last modified: 28 Nov 2017 | 20:14:47 UTC

I'm actually arguing for keeping the Linux version rather than replace it. Telling me to not bother because you like them isn't acceptable to me. You play it off like they run great because you had little issue with them. You can scour their forums over and over to find the average user does not agree. You are right about LHC not being in its present form as it would still be Six Track running traditional work and the others doing it in house or eventually adapting things different. Cosmology would just be down one application as well. I don't see how that is relevant. Either way. My vote is to not embrace virtualbox if it means pulling non-virtualbox work.
____________

Jim1348
Send message
Joined: 28 Jul 12
Posts: 460
Credit: 1,130,761,180
RAC: 18,722
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwat
Message 48264 - Posted: 28 Nov 2017 | 20:16:40 UTC - in response to Message 48263.

I'm actually arguing for keeping the Linux version rather than replace it. Telling me to not bother because you like them isn't acceptable to me. You play it off like they run great because you had little issue with them. You can scour their forums over and over to find the average user does not agree. You are right about LHC not being in its present form as it would still be Six Track running traditional work and the others doing it in house or eventually adapting things different. Cosmology would just be down one application as well. I don't see how that is relevant. Either way. My vote is to not embrace virtualbox if it means pulling non-virtualbox work.

I won't bother responding to fiction.

Profile Coleslaw
Send message
Joined: 24 Jul 08
Posts: 35
Credit: 142,697,792
RAC: 127,574
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 48265 - Posted: 28 Nov 2017 | 20:19:49 UTC - in response to Message 48264.

Good. Just don't be delusional in the process...
____________

Profile Coleslaw
Send message
Joined: 24 Jul 08
Posts: 35
Credit: 142,697,792
RAC: 127,574
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 48266 - Posted: 28 Nov 2017 | 20:41:22 UTC
Last modified: 28 Nov 2017 | 20:42:40 UTC

Also, if serious consideration is being discussed keep in mind the latest 5.2 versions of Virtualbox needs updated vbox wrappers provided.

https://www.rechenkraft.net/forum/viewtopic.php?f=75&t=16780&start=12

http://www.cosmologyathome.org/forum_thread.php?id=7517#21579

https://sourcefinder.theskynet.org/duchamp/forum_thread.php?id=229#864

I can confirm the one box I had upgraded fails every work unit at the moment in my testing at Source Finder.
____________

mmonnin
Send message
Joined: 2 Jul 16
Posts: 38
Credit: 48,937,070
RAC: 1,364,637
Level
Val
Scientific publications
wat
Message 48267 - Posted: 28 Nov 2017 | 22:48:10 UTC - in response to Message 48263.

I'm actually arguing for keeping the Linux version rather than replace it. Telling me to not bother because you like them isn't acceptable to me. You play it off like they run great because you had little issue with them. You can scour their forums over and over to find the average user does not agree. You are right about LHC not being in its present form as it would still be Six Track running traditional work and the others doing it in house or eventually adapting things different. Cosmology would just be down one application as well. I don't see how that is relevant. Either way. My vote is to not embrace virtualbox if it means pulling non-virtualbox work.


I agree that VBox projects/apps get much less support. That's supported by data. It's even more evident when there are competitions and people do not have VBox already setup so it will run with BOINC. They just end up running the non-VBox apps.

LHC may not exist w/o VBox. Maybe they wanted to keep their stuff a secret or whatever. They could definitely get some more support if the rest of the apps were not Vbox.

I received two tasks today and they both worked. Two at once as well.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 591
Credit: 4,273,184
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 48269 - Posted: 29 Nov 2017 | 13:25:32 UTC - in response to Message 48267.

Multiple tasks at once are the way we intend to go for QC (consistent with your preferences of course). The idea is to limit the number of cores to 4, and the BOINC client should manage the available capacity.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 38
Credit: 48,937,070
RAC: 1,364,637
Level
Val
Scientific publications
wat
Message 48270 - Posted: 29 Nov 2017 | 16:25:57 UTC - in response to Message 48269.

Multiple tasks at once are the way we intend to go for QC (consistent with your preferences of course). The idea is to limit the number of cores to 4, and the BOINC client should manage the available capacity.


I wasn't at home to see them run. I just happened to nice I had two tasks on my account page. They def used more than 4 cores. Looks like a little more than 8 threads.
Run time CPU Time Credit
3,745.85 30,947.64 138.80
3,501.03 30,948.79 129.71

Completed an hour apart. Another client was running some other CPU work so run time could have been better.

[VENETO] boboviz
Send message
Joined: 10 Sep 10
Posts: 40
Credit: 80,281
RAC: 0
Level

Scientific publications
wat
Message 48333 - Posted: 10 Dec 2017 | 19:25:18 UTC

No news about cpu wus??

Post to thread

Message boards : News : New multicore app and WUs