Advanced search

Message boards : News : More Acemd3 tests

Author Message
Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 813
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 52582 - Posted: 6 Sep 2019 | 12:23:10 UTC

We've uploaded Windows and Linux apps named "acemd3". If thing go as expected, they should be the new simulation engine. They should be an improvement on many aspects, especially maintainability and compatibility with RTX.

There were a few short test workunits (TONI_TEST). Larger one should come soon. Please be patient as we iron out the details.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 813
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 52583 - Posted: 6 Sep 2019 | 13:05:32 UTC - in response to Message 52582.

By the way: things we'd need a comment on:

1. do PCs with multiple GPUs work as expected?
2. does suspend/restart work as expected?

eXaPower
Send message
Joined: 25 Sep 13
Posts: 280
Credit: 1,449,553,667
RAC: 12,768
Level
Met
Scientific publications
watwatwatwatwatwatwatwat
Message 52584 - Posted: 6 Sep 2019 | 13:21:41 UTC - in response to Message 52582.

By the way: things we'd need a comment on:

1. do PCs with multiple GPUs work as expected?
2. does suspend/restart work as expected?


1. No - App not allowing 2/3/4/5 GPUs to run concurrent - Only 1 GPU at a time while other Turing error out.

http://www.gpugrid.net/results.php?hostid=208061
http://www.gpugrid.net/workunit.php?wuid=16748681
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
08:39:48 (1632): wrapper (7.9.26016): starting
08:39:48 (1632): wrapper: running acemd3.exe (--boinc input --device 1)
# Engine failed: Illegal value for DeviceIndex: 1
08:39:49 (1632): acemd3.exe exited; CPU time 0.000000
08:39:49 (1632): app exit status: 0x1
08:39:49 (1632): called boinc_finish(195)

2. Yes and No suspend/restart worked it just error once it restarted WU.

http://www.gpugrid.net/result.php?resultid=21350515

<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code 195 (0xc3)</message>
<stderr_txt>
08:55:41 (4032): wrapper (7.9.26016): starting
08:55:41 (4032): wrapper: running acemd3.exe (--boinc input --device 0)
Detected memory leaks!
Dumping objects ->
..\api\boinc_api.cpp(309) : {1845} normal block at 0x0000005BE25C15C0, 8 bytes long.
Data: < M [ > 00 00 4D E2 5B 00 00 00
..\lib\diagnostics_win.cpp(417) : {203} normal block at 0x0000005BE25C43B0, 1080 bytes long.
Data: < > 04 0C 00 00 CD CD CD CD EC 00 00 00 00 00 00 00
Object dump complete.
09:09:55 (3728): wrapper (7.9.26016): starting
09:09:55 (3728): wrapper: running acemd3.exe (--boinc input --device 0)
# Engine failed: The periodic box size has decreased to less than twice the nonbonded cutoff.
09:09:58 (3728): acemd3.exe exited; CPU time 0.000000
09:09:58 (3728): app exit status: 0x1
09:09:58 (3728): called boinc_finish(195)



Frank [RKN]
Send message
Joined: 30 Sep 17
Posts: 1
Credit: 36,342,900
RAC: 263,466
Level
Val
Scientific publications
wat
Message 52585 - Posted: 6 Sep 2019 | 13:32:14 UTC - in response to Message 52583.

Hi Toni,
i got 2 of them.
The 1. was at 32% when i suspend it, the 2. startet.
I restarted the 1.WU, and when 1 suspend the 2.WU to continue the 1. it exit with an error.
Then the 2. (still was at 0.0%) startet from itself.

At 37% i suspend and restartet it, it also exit with an error.

You can find my WU's here

GTX 1660 Ti and Windows 10

I hope it helps you to improve the app.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 504
Credit: 4,286,395,351
RAC: 611,023
Level
Arg
Scientific publications
watwatwat
Message 52586 - Posted: 6 Sep 2019 | 15:10:05 UTC

Hey Toni,

The one I received errored out with only one 2080ti and no other cards in the system on windows:

http://www.gpugrid.net/result.php?resultid=21344094

STARBASEn
Avatar
Send message
Joined: 17 Feb 09
Posts: 68
Credit: 1,001,833,751
RAC: 8,656
Level
Met
Scientific publications
watwatwatwatwat
Message 52587 - Posted: 6 Sep 2019 | 18:04:26 UTC
Last modified: 6 Sep 2019 | 18:05:52 UTC

http://www.gpugrid.net/workunit.php?wuid=16749264
This WU run concurrently with E@H fine (2x GTX1060's). Suspended it once with leave WU in memory and it restarted fine from where it left off. Same with another ACEMD 2.06 WU but on another machine with only one GTX1060.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 371
Credit: 4,753,856,989
RAC: 1,033,057
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52589 - Posted: 6 Sep 2019 | 23:25:22 UTC - in response to Message 52583.

By the way: things we'd need a comment on:

1. do PCs with multiple GPUs work as expected?
2. does suspend/restart work as expected?



I managed to get 1 of these unit on my windows 7 computer, with 1 rtx 2080ti card. It took nearly a minute from the time it started running for "elapsed" time to start moving and about another minute for the "process" % to start moving. I let it run for about 5 minutes before suspending it, (it was about 20% complete). It stopped within a couple of seconds. I waited about 30 seconds before resuming it, and it crashed within a few seconds. During its run time, the GPU usage was low (under 65%), and on all 6 of the CPU cores, usage was jumping up and down from 0 to 100%, according to Afterburner. I never seen that before.

I didn't get a chance to run it on a multiple GPU computer, but send out more units and I will let you know what happens.

See link:

http://www.gpugrid.net/result.php?resultid=21352529





Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 371
Credit: 4,753,856,989
RAC: 1,033,057
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52592 - Posted: 7 Sep 2019 | 2:46:54 UTC

I ran 2 of the units on my windows 10 machine. This machine has a gtx 980 ti, which was running long unit, while the rtx 2080 ti was running the new version of ACEMD v2.06 (cuda100)unit. When I let the test unit run from start to finish without interruption, it finishes successfully, but when I suspend it and then resume it, it will crash within a few seconds. GPU usage on this machine was 80% maximum, compared to 90% usage for the long run, which was running on the 980 ti.

http://www.gpugrid.net/results.php?hostid=263612&offset=0&show_names=1&state=0&appid=32


clemmo
Send message
Joined: 24 Jun 12
Posts: 2
Credit: 2,333,050
RAC: 2,100
Level
Ala
Scientific publications
wat
Message 52596 - Posted: 7 Sep 2019 | 12:39:31 UTC
Last modified: 7 Sep 2019 | 12:48:20 UTC

I've also had a test app have an error when suspended then resumed. Currently have one running. I'll let it go to see if it goes to completion.

The workunit seems to be using 1 full CPU core and 92% GPU Load.
My CPU is i7-4790 and GPU is GTX 1660.

kksplace
Send message
Joined: 4 Mar 18
Posts: 35
Credit: 66,269,825
RAC: 409,087
Level
Thr
Scientific publications
wat
Message 52598 - Posted: 7 Sep 2019 | 13:32:54 UTC - in response to Message 52583.

This test WU was suspended twice (once using Suspend, once using Suspend GPU in BOINC Manager) and successfully restarted and completed.

http://www.gpugrid.net/result.php?resultid=21354832

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 371
Credit: 4,753,856,989
RAC: 1,033,057
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52599 - Posted: 7 Sep 2019 | 14:19:53 UTC - in response to Message 52598.

This test WU was suspended twice (once using Suspend, once using Suspend GPU in BOINC Manager) and successfully restarted and completed.

http://www.gpugrid.net/result.php?resultid=21354832


You're running linux with a GTX1080 card, while I am running windows with a RTX card. This is either a OS problem or a card type problem. To determine what is the problem we need to run these WU's on a non RTX card with windows and/or RTX card on linux.





Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 371
Credit: 4,753,856,989
RAC: 1,033,057
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52600 - Posted: 7 Sep 2019 | 14:43:50 UTC - in response to Message 52599.

This test WU was suspended twice (once using Suspend, once using Suspend GPU in BOINC Manager) and successfully restarted and completed.

http://www.gpugrid.net/result.php?resultid=21354832


You're running linux with a GTX1080 card, while I am running windows with a RTX card. This is either a OS problem or a card type problem. To determine what is the problem we need to run these WU's on a non RTX card with windows and/or RTX card on linux.






It looks like it is a windows problem. I ran this unit on a GTX 980 ti on windows 10. I suspended and resumed it. It crashed a few seconds after resuming.



http://www.gpugrid.net/result.php?resultid=21355024



Billy Ewell 1931
Send message
Joined: 22 Oct 10
Posts: 16
Credit: 257,493,445
RAC: 384,864
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 52601 - Posted: 7 Sep 2019 | 15:14:24 UTC

9/7/2019 9:44:40 AM | GPUGRID | task a70-TONI_TESTDHFR206b-9-30-RND0994_0

This task assigned to i7 Windows 10 and RTX 2080: Processed about 1:00 minute, suspended for 30 seconds, resumed and the task immediately failed.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 260
Credit: 647,762,639
RAC: 1,739
Level
Lys
Scientific publications
wat
Message 52602 - Posted: 7 Sep 2019 | 16:37:08 UTC
Last modified: 7 Sep 2019 | 16:37:22 UTC

Running OK on 2x GPU system:
https://www.gpugrid.net/results.php?hostid=475308

Results show a -device 0 or -device 1.

clemmo
Send message
Joined: 24 Jun 12
Posts: 2
Credit: 2,333,050
RAC: 2,100
Level
Ala
Scientific publications
wat
Message 52603 - Posted: 8 Sep 2019 | 0:15:42 UTC

Just received another test task. Decided to check the suspend/resume. Computation error on resume still. GTX1660

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 274
Credit: 234,046,463
RAC: 3
Level
Leu
Scientific publications
wat
Message 52604 - Posted: 8 Sep 2019 | 0:34:18 UTC

I continue to have no luck getting any of these new test tasks.

Billy Ewell 1931
Send message
Joined: 22 Oct 10
Posts: 16
Credit: 257,493,445
RAC: 384,864
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 52605 - Posted: 9 Sep 2019 | 16:55:58 UTC - in response to Message 52583.
Last modified: 9 Sep 2019 | 16:59:14 UTC

TONI:

The GPUGrid configuration (below)is set specifically to accommodate my i7, Windows 10 with RTX 2080. I momentarily selected both short and long runs ACEMD tasks and two immediately in sequence failed.
Do you wish us to continue the Pause-then-Resume on the ACEMD3 and other special test tasks for the RTX cards.

My three other machines with Windows and GTX 750ti and 1060s set idle as far as GPUGrid is concerned.

ACEMD short runs (2-3 hours on fastest card): no
ACEMD long runs (8-12 hours on fastest GPU): no
ACEMD3: yes
Quantum Chemistry (CPU): no
Quantum Chemistry (CPU, beta): no
Python Runtime: no
If no work for selected applications is available, accept work from other applications?no

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2037
Credit: 14,773,051,869
RAC: 2,156,019
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52606 - Posted: 9 Sep 2019 | 22:45:03 UTC - in response to Message 52605.

The GPUGrid configuration (below)is set specifically to accommodate my i7, Windows 10 with RTX 2080. I momentarily selected both short and long runs ACEMD tasks and two immediately in sequence failed.
These were downloaded from the "long" queue, which has only the old client, which is not compatible with Turing (RTX + GTX 1660, 1650) cards. As of yet, you should select only the ACEMD3 queue for Turing cards.

My three other machines with Windows and GTX 750ti and 1060s set idle as far as GPUGrid is concerned.
You should set up two different venues (one for ACEMD3 only for Turing, one for short+long for older cards), and assign your hosts to these venues according their GPUs.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 274
Credit: 234,046,463
RAC: 3
Level
Leu
Scientific publications
wat
Message 52607 - Posted: 10 Sep 2019 | 0:48:49 UTC

Since I have been unable to get any of these new acemd3 tasks, is it valid to say that only the Windows hosts are having issues? And that the Linux hosts continue to not have any issues with the new app or tasks? I've only seen one post from a Linux user saying they had no issues.

I was hoping to test for myself the new apps and higher rewarding tasks. I had no issues with the previous beta and tasks back in July. No such luck for the new apps and tasks in retrieving either so far.

klepel
Send message
Joined: 23 Dec 09
Posts: 161
Credit: 2,804,071,738
RAC: 772,044
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52608 - Posted: 10 Sep 2019 | 1:01:14 UTC - in response to Message 52607.

And that the Linux hosts continue to not have any issues with the new app or tasks? I've only seen one post from a Linux user saying they had no issues.

It seems to me that LINUX hosts do not have issues with the new app (Acemd3). My three hosts work just fine, if they receive WUs (once a day).
Since I have been unable to get any of these new acemd3 tasks, is it valid to say that only the Windows hosts are having issues?

Only one of my Windows hosts with Turing Card has received WUs: The first was finished successfully. The second one, I stopped at the one minute mark, after restart it crashed after 2 seconds: http://www.gpugrid.net/result.php?resultid=21364023

From my small samples size, I would think LINUX works fine and we might start regular production (Toni?), Windows does not work yet.

Billy Ewell 1931
Send message
Joined: 22 Oct 10
Posts: 16
Credit: 257,493,445
RAC: 384,864
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 52609 - Posted: 10 Sep 2019 | 2:55:56 UTC - in response to Message 52606.



My three other machines with Windows and GTX 750ti and 1060s set idle as far as GPUGrid is concerned.
You should set up two different venues (one for ACEMD3 only for Turing, one for short+long for older cards), and assign your hosts to these venues according their GPUs.[/quote]

Thank you for the instruction but unfortunately I do not know how to accomplish the task you outline. When I access my GpuGrid account and select Preferences and subsequently GpuGrid Preferences, whatever I select as to applications to run has always applied to all of the four computers I have attached to GpuGrid. If I change any preference, such as select only ACEMD3, then obviously only tasks designed for my turing card will be downloaded to its computer. But, if I additionally select ACEMD both Long and Short, then those will be downloaded not only to the three non-Turing computers but also to the Turing 2080 where immediate failure will occur. Your recommendation seems to be the perfect solution and I am frustrated that I do not know how to accomplish the task. Most appreciative!

klepel
Send message
Joined: 23 Dec 09
Posts: 161
Credit: 2,804,071,738
RAC: 772,044
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52610 - Posted: 10 Sep 2019 | 4:30:08 UTC - in response to Message 52609.

When I access my GpuGrid account and select Preferences and subsequently GpuGrid Preferences, whatever I select as to applications to run has always applied to all of the four computers I have attached to GpuGrid.

You are looking in the right direcction: Under "GPUGRID Prreference" you are able to set the preference for 4 differrent locations:
Default
Home
School
Work.
After that you have to assign a location to each host, selecting under "computers under this account", Details: The location you want to assign to the computer: Location is at the bottom of the page.
So you are able to assign one location for your Turing card and another for the other cards.
Hope this helps!

Billy Ewell 1931
Send message
Joined: 22 Oct 10
Posts: 16
Credit: 257,493,445
RAC: 384,864
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 52621 - Posted: 12 Sep 2019 | 15:43:11 UTC - in response to Message 52583.

By the way: things we'd need a comment on:

2. does suspend/restart work as expected?


Toni(or other): do you still want the suspend/restart to apply.

An interesting comment: Yesterday, before I sorted out my GPUGrid preferences and my RTX 2080 associated machine downloaded two Non-New ACEMD tasks,one of the "longer running" tasks processed for over 2 hours and 40 minutes on the Turing card before failure but the second task failed apparently immediately.

Steve Jones
Send message
Joined: 28 Oct 18
Posts: 1
Credit: 22,048,150
RAC: 2,502
Level
Pro
Scientific publications
wat
Message 52631 - Posted: 14 Sep 2019 | 11:34:02 UTC
Last modified: 14 Sep 2019 | 11:56:08 UTC

http://gpugrid.net/result.php?resultid=21378124

Linux machine with a GeForce GTX 1050 Ti (ran GPUGrid task) and GeForce GTX 660 (running another project). Survived two suspends, including a machine reboot, and finished happily.

Seems that others weren't so lucky with the same work unit though.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 813
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 52637 - Posted: 18 Sep 2019 | 10:51:15 UTC - in response to Message 52631.

Dears, thanks for the reports and patience. A small update thanks to your testing:

we found a problem with the WINDOWS Cuda 10.1 app - it's slower than it should be. Possibly related: restart fails most of the time. We are working on it.

Aurum
Send message
Joined: 12 Jul 17
Posts: 105
Credit: 7,254,814,243
RAC: 2,082,691
Level
Tyr
Scientific publications
wat
Message 52646 - Posted: 18 Sep 2019 | 17:46:07 UTC
Last modified: 18 Sep 2019 | 17:46:28 UTC

I lucked out and checked in just when a new tranche of acemd3 WUs popped up. My 2080 Ti caught two sets of two WUs and they ran fine.
My 2080 Ti has no problem running 4 einstein WUs but I've yet to get 4 WUs at the same time to test this for acemd3. Is there a limitation set on the server side???
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 274
Credit: 234,046,463
RAC: 3
Level
Leu
Scientific publications
wat
Message 52647 - Posted: 18 Sep 2019 | 18:05:02 UTC

Sigh . . . . still have never caught a single one of the new tasks or applications.

Aurum
Send message
Joined: 12 Jul 17
Posts: 105
Credit: 7,254,814,243
RAC: 2,082,691
Level
Tyr
Scientific publications
wat
Message 52648 - Posted: 18 Sep 2019 | 18:08:30 UTC - in response to Message 52647.

Keith it took me a while before I caught my first. Are you sure you have acemd3 checked in preferences and short & long unchecked???
____________

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 813
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 52649 - Posted: 18 Sep 2019 | 18:11:37 UTC - in response to Message 52648.

I could send more. Unfortunately they also go to linux hosts (which we don't need to test). Please follow up in the "server" forum.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 274
Credit: 234,046,463
RAC: 3
Level
Leu
Scientific publications
wat
Message 52651 - Posted: 18 Sep 2019 | 18:27:55 UTC - in response to Message 52648.

Keith it took me a while before I caught my first. Are you sure you have acemd3 checked in preferences and short & long unchecked???

Yes. I still have new acemd3 app checked from before in July and the acemd2 app unchecked.

I see from Toni's comment that he does not want Linux hosts to participate. So I guess I can just forget about the project again.

klepel
Send message
Joined: 23 Dec 09
Posts: 161
Credit: 2,804,071,738
RAC: 772,044
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52652 - Posted: 18 Sep 2019 | 20:08:02 UTC

I set my three LINUX hosts to "no new work". So, that Keith can pick one LINUX WU up;-)

Be patient: It seems to me, that we are nearing to production status with the new app.

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 36
Credit: 1,195,170,276
RAC: 1,224,436
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52654 - Posted: 18 Sep 2019 | 20:31:48 UTC
Last modified: 18 Sep 2019 | 20:32:39 UTC

I set my three LINUX hosts to "no new work". So, that Keith can pick one LINUX WU up;-)


I've also just configured my Linux systems for not to accept ACEMD3 tasks.
My XP and W10 systems keep waiting for them...

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 274
Credit: 234,046,463
RAC: 3
Level
Leu
Scientific publications
wat
Message 52655 - Posted: 18 Sep 2019 | 22:07:58 UTC - in response to Message 52652.

I set my three LINUX hosts to "no new work". So, that Keith can pick one LINUX WU up;-)

Be patient: It seems to me, that we are nearing to production status with the new app.

I've been patient since February. But my patience is wearing thin. I see other Linux users be able to get some of the new work. I just wonder what miracle method they used so I can duplicate.

I hope that Toni can get the Windows app working correctly very soon so he will release enough work for ALL hosts to participate.

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 144
Credit: 2,479,694,928
RAC: 3,842,126
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52667 - Posted: 20 Sep 2019 | 3:29:11 UTC
Last modified: 20 Sep 2019 | 3:29:54 UTC

Just realized I successfully processed three of the "new" tasks on a Linux system with one 1660ti and five 1060. They all completed successfully. I didn't realize they were running so I failed to do a stop start to test suspend.

I also noticed that the GPU was not identified so I don't know if they all ran on gpu0 or any of the other 5 on this mining system using risers for all cards. One of them was faster so I suspect that was on the 1660ti.

http://www.gpugrid.org/results.php?hostid=509037


Would be interesting to compare to other system that have full 16x bandwidth instead of my 1x.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 274
Credit: 234,046,463
RAC: 3
Level
Leu
Scientific publications
wat
Message 52668 - Posted: 20 Sep 2019 | 6:16:35 UTC

I believe other Linux users have already tested the new acemd3 app for stops, suspends and restarts with no issues. They processed through to completion, even on different cards I believe.

That is what the Windows app needs to achieve.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 85
Credit: 1,558,639,269
RAC: 2,160,906
Level
His
Scientific publications
watwatwatwatwatwatwat
Message 52672 - Posted: 20 Sep 2019 | 11:15:19 UTC - in response to Message 52667.

One of them was faster so I suspect that was on the 1660ti.

http://www.gpugrid.org/results.php?hostid=509037

Would be interesting to compare to other system that have full 16x bandwidth instead of my 1x.

For comparison, another Volunteer crunched this task on Linux host with GTX1660ti
http://www.gpugrid.net/result.php?resultid=21381326

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 144
Credit: 2,479,694,928
RAC: 3,842,126
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52674 - Posted: 20 Sep 2019 | 12:33:11 UTC - in response to Message 52672.
Last modified: 20 Sep 2019 | 13:03:41 UTC

One of them was faster so I suspect that was on the 1660ti.

http://www.gpugrid.org/results.php?hostid=509037

Would be interesting to compare to other system that have full 16x bandwidth instead of my 1x.

For comparison, another Volunteer crunched this task on Linux host with GTX1660ti
http://www.gpugrid.net/result.php?resultid=21381326




From the above two links plus my gtx-1070Ti system


PCIe OS GPU Seconds %Performance
---- ----- ------ ------- ----
x16 18.04 1660Ti 1831.0 100
x1 18.04 1660Ti 2189.79 84
x16 Win10 1070Ti 2268.68 81


There is a loss in performance of %16 due to x1 but on the other hand, Windows with 1070Ti and a full x16 is slightly slower than the 1660Ti hanging on a 1x riser on Ubuntu! Both of my systems have swan_sync enabled and both run CUDA 10.0 Not sure about the other user.

roryd
Send message
Joined: 9 Aug 11
Posts: 2
Credit: 4,821,361
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 52676 - Posted: 20 Sep 2019 | 14:08:29 UTC
Last modified: 20 Sep 2019 | 14:12:25 UTC

Hi all, noob here :-)

I have three 2080ti cards (on Windows) and I'm trying to get work units but Boinc keeps saying there aren't any available since I started yesterday:

20-Sep-2019 14:31:26 [GPUGRID] Sending scheduler request: Requested by project.
20-Sep-2019 14:31:26 [GPUGRID] Requesting new tasks for NVIDIA GPU
20-Sep-2019 14:31:27 [GPUGRID] Scheduler request completed: got 0 new tasks
20-Sep-2019 14:31:27 [GPUGRID] No tasks sent
20-Sep-2019 14:31:27 [GPUGRID] No tasks are available for New version of ACEMD
20-Sep-2019 14:31:27 [GPUGRID] Project has no tasks available


My project settings are:

ACEMD short runs (2-3 hours on fastest card): no
ACEMD long runs (8-12 hours on fastest GPU): no
ACEMD3: yes
Quantum Chemistry (CPU): no
Quantum Chemistry (CPU, beta): no
Python Runtime: no


In my projects folder, the only executable is acemd-923-80.exe.

Am I missing something here?

TIA

Aurum
Send message
Joined: 12 Jul 17
Posts: 105
Credit: 7,254,814,243
RAC: 2,082,691
Level
Tyr
Scientific publications
wat
Message 52677 - Posted: 20 Sep 2019 | 14:44:15 UTC - in response to Message 52676.

roryd, I just go ahead and check the acemd project as well. The test WUs come in small packs so it's catch as catch can. Keep an eye on the Server Status page.
____________

roryd
Send message
Joined: 9 Aug 11
Posts: 2
Credit: 4,821,361
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 52678 - Posted: 20 Sep 2019 | 14:58:57 UTC - in response to Message 52677.

roryd, I just go ahead and check the acemd project as well. The test WUs come in small packs so it's catch as catch can. Keep an eye on the Server Status page.

Hi Aurum,

I tried that yesterday, but they all failed and then I got messages saying
19-Sep-2019 16:23:50 [GPUGRID] This computer has finished a daily quota of 4 tasks


As all the GPUs are RTX, should I enable the acemd anyway?

Profile ServicEnginIC
Avatar
Send message
Joined: 24 Sep 10
Posts: 36
Credit: 1,195,170,276
RAC: 1,224,436
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52679 - Posted: 20 Sep 2019 | 15:30:36 UTC - in response to Message 52674.

There is a loss in performance of %16 due to x1 but on the other hand, Windows with 1070Ti and a full x16 is slightly slower than the 1660Ti hanging on a 1x riser on Ubuntu! Both of my systems have swan_sync enabled and both run CUDA 10.0 Not sure about the other user.

As seen in table from following link, GTX1660TI, SWAN_SYNC enabled, demands 33% of PCIE X16 bandwidth in my system.
https://www.gpugrid.net/forum_thread.php?id=4987&nowrap=true#52633

Aurum
Send message
Joined: 12 Jul 17
Posts: 105
Credit: 7,254,814,243
RAC: 2,082,691
Level
Tyr
Scientific publications
wat
Message 52680 - Posted: 20 Sep 2019 | 18:34:19 UTC - in response to Message 52678.

Sorry, my bad. I'm talking about 1080 Ti's and you're running 2080 Ti's. No, acemd does not work for Turing GPUs.

I get confused as my single 2080 Ti is on a Linux computer.
____________

rod4x4
Send message
Joined: 4 Aug 14
Posts: 85
Credit: 1,558,639,269
RAC: 2,160,906
Level
His
Scientific publications
watwatwatwatwatwatwat
Message 52684 - Posted: 21 Sep 2019 | 1:14:58 UTC

Received a TEST work unit a43-TONI_TESTDHFR207c-23-30-RND4156_0 on a Win10 Host with GTX1060 GPU.

Applied the following test:
Let work unit run for 11 minutes 13 seconds
suspended for 1 minute 20 seconds (approx)
Resumed work unit.

Results:
Work unit had computational error several seconds after resuming

Observations:
Work unit predicted a run time of 36 minutes. This is an improvement on Work unit a89-TONI_TESTDHFR206b-23-30-RND6008_0 , which had a run time of 66 minutes. Speed issues seems to be improved.
ACEMD3 task and Wrapper task disappeared from Task Manager after suspending task.
After resumption / failure, the run time reverted to 2 minutes 12 seconds.
STDerr Output time line reflects the full run time of 11 minutes, but Run Time summary only reflects 2 minutes 12 seconds.
nvidia-smi reported 78% GPU utilization which is inline with CUDA80 tasks on this host.
nvidia-smi reported similar Power usage as CUDA80 tasks on this host.

Link to Work unit here:
http://gpugrid.net/result.php?resultid=21396885

rod4x4
Send message
Joined: 4 Aug 14
Posts: 85
Credit: 1,558,639,269
RAC: 2,160,906
Level
His
Scientific publications
watwatwatwatwatwatwat
Message 52686 - Posted: 21 Sep 2019 | 2:07:53 UTC

Received another TEST work unit a6-TONI_TESTDHFR207-2-3-RND1704

Same testing method as last post, this time allowed work unit to run 40 minutes 37 seconds before suspending. (54% complete)
Task failed after resuming 1 minute later.

The run time may not have improved as indicated in last post. After 40 minutes 37 seconds task was 54% completed. So Windows 10 tasks still seem to have a speed issue compared to Linus ACEMD3 tasks.

Additional I did notice the ACEMD3 task and Wrapper task did reappear in Task Manager for a few seconds before the task failed.

All other observations consistent with last post.

Failed task here:
http://gpugrid.net/result.php?resultid=21397083

Erich56
Send message
Joined: 1 Jan 15
Posts: 585
Credit: 3,043,306,644
RAC: 1,676,293
Level
Arg
Scientific publications
watwatwatwatwatwat
Message 52687 - Posted: 21 Sep 2019 | 5:09:34 UTC - in response to Message 52686.

Failed task here:
http://gpugrid.net/result.php?resultid=21397083

what caught my eye:
in line 8 of the stderr it says "Detected memory leaks!" - whatever this means.

rod4x4
Send message
Joined: 4 Aug 14
Posts: 85
Credit: 1,558,639,269
RAC: 2,160,906
Level
His
Scientific publications
watwatwatwatwatwatwat
Message 52689 - Posted: 21 Sep 2019 | 10:24:03 UTC - in response to Message 52687.

what caught my eye:
in line 8 of the stderr it says "Detected memory leaks!" - whatever this means.


It is a programming error indicating memory is not allocated or de-allocated correctly.

This is the suspend/resume bug they are looking to fix.

Post to thread

Message boards : News : More Acemd3 tests