ATM: Free Energy Calculations new application

Message boards : Number crunching : ATM: Free Energy Calculations new application

Author	Message
Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59751 - Posted: 18 Jan 2023 \| 19:50:34 UTC Last modified: 18 Jan 2023 \| 20:30:40 UTC
	Just starting the thread for discussion of this new application. ATM = AToM. after a little snafu with the first batch (incorrect config files), the latest batch seems to run on my system. no idea for runtime yet or if it will finish successfully. This is another Python-based application. the package ships with the python environment similar to how the PythonGPU Reinforcement Learning (RL) app does. Test Bench: Xeon E5-2697Av4 (16c/32t) 64GB DDR4-2400 RDIMM (ECC) RTX 3060 12GB Ubuntu 22.04.1 So far observed behavior: -uses ~97% of the GPU core, ~45% GPU memory bus, ~0-1% PCIe bus, close to full power use. -about 400-500MB VRAM used (low, like acemd3) -does not like to be paused and resumed, or BOINC stopped and restarted. it causes the task to fail unknown total runtime expectation since the one task I had failed when I restarted BOINC lol. ____________
	ID: 59751 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59752 - Posted: 18 Jan 2023 \| 20:36:00 UTC - in response to Message 59751.
	about the restart failure. looks like it fails trying to create a directory that already exists. mkdir: cannot create directory 'atm_tmp': File exists needs some work to allow for that. ____________
	ID: 59752 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59754 - Posted: 18 Jan 2023 \| 20:55:38 UTC
	another quality of life improvement should be adding a <weight> line to the main task in the job.xml file. right now with 2 tasks in the file, and no weights defined, I'm guessing it splits it 50/50 and it thinks the task is 50% done once the extraction phase is complete. ____________
	ID: 59754 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59755 - Posted: 18 Jan 2023 \| 21:33:51 UTC Last modified: 18 Jan 2023 \| 21:34:03 UTC
	task ran to completion in about an hour. but hit an error and threw it all away because the file size is too big. upload failure: <file_xfer_error> <file_name>T11_4-RAIMIS_TEST_ATM-0-1-RND7054_2_0</file_name> <error_code>-131 (file size too big)</error_code> </file_xfer_error> what a waste. ____________
	ID: 59755 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1288 Credit: 5,091,256,959 RAC: 8,853,655 Level Scientific publications	Message 59771 - Posted: 19 Jan 2023 \| 18:06:13 UTC
	Have over a dozen of quick-failing ATM tasks. The wrapper does not have a correctly name tar file or something. 02:56:29 (1242346): wrapper: running /bin/tar (xf input.tar.bz2) /bin/tar: This does not look like a tar archive bzip2: (stdin) is not a bzip2 file. /bin/tar: Child returned status 2 /bin/tar: Error is not recoverable: exiting now
	ID: 59771 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59792 - Posted: 24 Jan 2023 \| 16:01:40 UTC - in response to Message 59755.
	looks like the small batch of tasks that went out today are better setup. ran for about an hour and completed successfully without the file size issue when complete. great :) still would like a little more background info on these tasks, what they are doing, and the goal of the research. ____________
	ID: 59792 \| Rating: 0 \| rate: / Reply Quote

FritzB Send message Joined: 7 Apr 15 Posts: 11 Credit: 2,025,513,600 RAC: 4,810,655 Level Scientific publications	Message 59899 - Posted: 10 Feb 2023 \| 22:25:09 UTC Last modified: 10 Feb 2023 \| 22:25:52 UTC
	This one https://www.gpugrid.net/workunit.php?wuid=27399736 is runnig for about 11 hours and it is stuck at 66,666% for at least 4 hours now. There is almost no load on the GPU. Just a few percent (3-5) once in a while, but constantly some load on the memory controller (10-30). Hope it will finish some day :)
	ID: 59899 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59936 - Posted: 16 Feb 2023 \| 16:24:10 UTC
	still no official communication from the project about these tasks. the recent batches have been very hit or miss and exhibit much different behavior than my initial post. "TL2" tasks, ran for hours and hours with little to no GPU or CPU use. I aborted them and moved on. "TL3" tasks yesterday, also had little to no GPU or CPU use, but did complete in about 30 mins. "TL4" tasks today seem like a repeat of TL2. no GPU use, runs for hours with no progress. also weights need to be defined in the jobs.xml file so the tasks don't jump to 75% after a few seconds and then sit there for hours doing nothing. ____________
	ID: 59936 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1576 Credit: 5,761,236,851 RAC: 8,640,111 Level Scientific publications	Message 59937 - Posted: 16 Feb 2023 \| 17:01:14 UTC
	Just been sent a TL4 from WU 27405970. I see you've aborted two previous tasks from the same WU, Ian, on two different machines. Did you get any CPU usage figures from previous runs? I think I'll start it up with the GTX 1660 plus one core, but I'll probably abort it myself if it doesn't show much response.
	ID: 59937 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59938 - Posted: 16 Feb 2023 \| 17:04:18 UTC - in response to Message 59937.
	they spin up multiple processes like the Python tasks do. but i didnt catch them at the very beginning to see if they spike in use or anything like that. once they get going, they basically sit idle as far as the GPU and CPU go. little to no use at all. i just killed them rather than letting them sit there for hours occupying my GPU. ____________
	ID: 59938 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1576 Credit: 5,761,236,851 RAC: 8,640,111 Level Scientific publications	Message 59939 - Posted: 16 Feb 2023 \| 17:30:46 UTC
	OK, I've set 3 CPUs for continuity from the current Python task, and I've put weights of 1-1-1-97 in the job file so I can see what's happening. My normal remote monitoring console shows the current average CPU usage, and I've put nvidia-smi on a five second loop. If either of those drops to zero, I'll abort it. Chocks away!
	ID: 59939 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1576 Credit: 5,761,236,851 RAC: 8,640,111 Level Scientific publications	Message 59941 - Posted: 16 Feb 2023 \| 18:15:43 UTC
	I see what you mean. Nearly half an hour in, CPU usage is showing around 25% of a single core, and GPU usage spiked once, to 41%, after about a quarter of an hour. It's one way of saving electricity, but I'd rather be doing something useful. Aborting.
	ID: 59941 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 399 Credit: 13,024,116,132 RAC: 122,853 Level Scientific publications	Message 59961 - Posted: 22 Feb 2023 \| 17:28:34 UTC
	1.13 ATM running fine for me. Keep aborting them and I'll run them for you.
	ID: 59961 \| Rating: 0 \| rate: / Reply Quote

zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 207 Credit: 1,669,151,456 RAC: 719,040 Level Scientific publications	Message 59964 - Posted: 23 Feb 2023 \| 3:08:08 UTC
	FWIW, the first task I received completed successfully. http://www.gpugrid.net/workunit.php?wuid=27410175 ____________ Reno, NV Team: SETI.USA
	ID: 59964 \| Rating: 0 \| rate: / Reply Quote

FritzB Send message Joined: 7 Apr 15 Posts: 11 Credit: 2,025,513,600 RAC: 4,810,655 Level Scientific publications	Message 59967 - Posted: 23 Feb 2023 \| 8:18:47 UTC - in response to Message 59964. Last modified: 23 Feb 2023 \| 8:19:26 UTC
	I've also finished one: https://www.gpugrid.net/workunit.php?wuid=27410166 We're both using Linux Mint. It seems to crash on Win 10 machines (computer #600532 is mine, too).
	ID: 59967 \| Rating: 0 \| rate: / Reply Quote

zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 207 Credit: 1,669,151,456 RAC: 719,040 Level Scientific publications	Message 59968 - Posted: 23 Feb 2023 \| 15:33:06 UTC
	Over night, I had 4 of these tasks cancelled by server. ____________ Reno, NV Team: SETI.USA
	ID: 59968 \| Rating: 0 \| rate: / Reply Quote

KAMasud Send message Joined: 27 Jul 11 Posts: 137 Credit: 523,901,354 RAC: 3 Level Scientific publications	Message 59969 - Posted: 23 Feb 2023 \| 16:50:20 UTC - in response to Message 59961.
	1.13 ATM running fine for me. Keep aborting them and I'll run them for you. _______________ Same here. I quite enjoy completing these WUs. There should be a way to analyse these WUs as to why it is happening on certain machines. We are mostly running the same hardware and OS. It would be fun to see the results. -
	ID: 59969 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59970 - Posted: 23 Feb 2023 \| 17:27:24 UTC Last modified: 23 Feb 2023 \| 17:29:36 UTC
	keep in mind these are bata tasks, and the batch being sent NOW are not necessarily the same as the batch sent last week and wont be the same as whatever is sent sometime in the future, until they get all the bugs worked out. the tasks last week basically ran with no perceived use of the GPU or CPU, so what were they doing? who knows. no official word from the project about these tasks at all. I wasn't willing to let the GPU/CPU be occupied for hours on end with the task spinning it's wheels when they could be doing something more useful. maybe this current batch has been tweaked from last week and thats why they are working OK, for those that have completed this latest batch, did they have any meaningful use of the GPU or CPU? it also seems this batch was released with a new Windows application (they were Linux only before) for testing. ____________
	ID: 59970 \| Rating: 0 \| rate: / Reply Quote

KAMasud Send message Joined: 27 Jul 11 Posts: 137 Credit: 523,901,354 RAC: 3 Level Scientific publications	Message 59971 - Posted: 24 Feb 2023 \| 5:28:26 UTC - in response to Message 59970.
	keep in mind these are bata tasks, and the batch being sent NOW are not necessarily the same as the batch sent last week and wont be the same as whatever is sent sometime in the future, until they get all the bugs worked out. the tasks last week basically ran with no perceived use of the GPU or CPU, so what were they doing? who knows. no official word from the project about these tasks at all. I wasn't willing to let the GPU/CPU be occupied for hours on end with the task spinning it's wheels when they could be doing something more useful. maybe this current batch has been tweaked from last week and thats why they are working OK, for those that have completed this latest batch, did they have any meaningful use of the GPU or CPU? it also seems this batch was released with a new Windows application (they were Linux only before) for testing. _______________________ Well, most of us know that Abouh reads every word written on these threads and without much song and dance, makes changes. He is the Only Admin on all the projects who diligently attend. Maybe, quite possibly. No arguments with your tweaking statement.
	ID: 59971 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1090 Credit: 6,603,906,926 RAC: 3,160,162 Level Scientific publications	Message 59972 - Posted: 24 Feb 2023 \| 8:36:54 UTC - in response to Message 59971.
	Well, most of us know that Abouh reads every word written on these threads and without much song and dance, makes changes. He is the Only Admin on all the projects who diligently attend. Maybe, quite possibly. No arguments with your tweaking statement. well, Abouh is the only one from the project team who actively communicates with us volunteers - which is great. All others obviously don't care, and this has been like this over the years, unfortunately. For example: 9 days ago I asked in the ACEMD 4 thread when new ACEMD 4 task will be around, or whether this subproject is dead. No reply so far; whereas a reply could be very simple, not longer than just a line :-( You know what I want to say ... it's kind of disappointing at times :-(
	ID: 59972 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59973 - Posted: 24 Feb 2023 \| 13:27:38 UTC - in response to Message 59971.
	keep in mind these are bata tasks, and the batch being sent NOW are not necessarily the same as the batch sent last week and wont be the same as whatever is sent sometime in the future, until they get all the bugs worked out. the tasks last week basically ran with no perceived use of the GPU or CPU, so what were they doing? who knows. no official word from the project about these tasks at all. I wasn't willing to let the GPU/CPU be occupied for hours on end with the task spinning it's wheels when they could be doing something more useful. maybe this current batch has been tweaked from last week and thats why they are working OK, for those that have completed this latest batch, did they have any meaningful use of the GPU or CPU? it also seems this batch was released with a new Windows application (they were Linux only before) for testing. _______________________ Well, most of us know that Abouh reads every word written on these threads and without much song and dance, makes changes. He is the Only Admin on all the projects who diligently attend. Maybe, quite possibly. No arguments with your tweaking statement. that's great and all, but abouh is not the researcher working with this application. Abouh deals with the research with the Python RL tasks. These ATM tasks look to be being run by Raimis. (the researcher names are in the filenames of the WUs) ____________
	ID: 59973 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59974 - Posted: 24 Feb 2023 \| 13:30:55 UTC
	https://gpugrid.net/result.php?resultid=33321222 ran for 10+hours, failed due to file size limit after an otherwise successful computation. :( ____________
	ID: 59974 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1090 Credit: 6,603,906,926 RAC: 3,160,162 Level Scientific publications	Message 59975 - Posted: 24 Feb 2023 \| 14:39:26 UTC - in response to Message 59974.
	... failed due to file size limit :( I am just trying to remember with which other application we've had the same problem some time ago - last year or 2 years ago ???
	ID: 59975 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59976 - Posted: 24 Feb 2023 \| 14:51:34 UTC - in response to Message 59975.
	... failed due to file size limit :( I am just trying to remember with which other application we've had the same problem some time ago - last year or 2 years ago ??? it's happened a few times in the past with acemd3 tasks. see here from July 2021: https://www.gpugrid.net/forum_thread.php?id=5239#57117 ____________
	ID: 59976 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 399 Credit: 13,024,116,132 RAC: 122,853 Level Scientific publications	Message 59977 - Posted: 24 Feb 2023 \| 18:19:21 UTC
	Yea, I got my first ATM checkpoint :-) Now my list of ATM ULs are stuck.
	ID: 59977 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59978 - Posted: 24 Feb 2023 \| 18:43:59 UTC - in response to Message 59977.
	Yea, I got my first ATM checkpoint :-) Now my list of ATM ULs are stuck. the uploads are nearly 700MB in size, and likely the same problem from my link that we saw over a year ago. their server can't accept something that big, I don't think they ever figured out how to adjust the settings of their file server and just tried to keep the file sizes below the limit, which they seem to have forgotten about. nothing you do will get them to upload. I've disabled ATM until they get it together with them. ____________
	ID: 59978 \| Rating: 0 \| rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 566 Credit: 6,093,327,024 RAC: 8,533,371 Level Scientific publications	Message 59979 - Posted: 24 Feb 2023 \| 21:20:56 UTC - in response to Message 59978.
	On past chance, I bet and lost. Currently, I'm only processing ACEMD tasks, when available. I happened to catch one this morning.
	ID: 59979 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 399 Credit: 13,024,116,132 RAC: 122,853 Level Scientific publications	Message 59980 - Posted: 24 Feb 2023 \| 23:20:56 UTC
	GDF, Should I Abort these 12 completed ATM WUs that won't upload or is there a reasonable chance you'll fix it?
	ID: 59980 \| Rating: 0 \| rate: / Reply Quote

zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 207 Credit: 1,669,151,456 RAC: 719,040 Level Scientific publications	Message 59981 - Posted: 25 Feb 2023 \| 1:18:39 UTC
	Well, I just achieved my 100 hours, which was my 1st priority. I will abort and reset (if necessary) the completed tasks I have. If/when the project gets its act together, I'll be back. ____________ Reno, NV Team: SETI.USA
	ID: 59981 \| Rating: 0 \| rate: / Reply Quote

gemini8 Send message Joined: 3 Jul 16 Posts: 31 Credit: 1,266,550,176 RAC: 2,737,000 Level Scientific publications	Message 59986 - Posted: 26 Feb 2023 \| 11:04:13 UTC
	For me it's just this: So 26 Feb 2023 11:57:00 CET \| GPUGRID \| Started upload of TL9_55-RAIMIS_TEST_ATM-0-1-RND1804_0_0 So 26 Feb 2023 11:57:02 CET \| GPUGRID \| Backing off 04:12:16 on upload of TL9_55-RAIMIS_TEST_ATM-0-1-RND1804_0_0 So 26 Feb 2023 11:57:19 CET \| GPUGRID \| Started upload of TL9_55-RAIMIS_TEST_ATM-0-1-RND1804_0_0 So 26 Feb 2023 11:57:22 CET \| GPUGRID \| Backing off 05:10:06 on upload of TL9_55-RAIMIS_TEST_ATM-0-1-RND1804_0_0 No message about the size, just about backing off. Hooray! ____________ Greetings, Jens
	ID: 59986 \| Rating: 0 \| rate: / Reply Quote

FritzB Send message Joined: 7 Apr 15 Posts: 11 Credit: 2,025,513,600 RAC: 4,810,655 Level Scientific publications	Message 59988 - Posted: 26 Feb 2023 \| 11:56:45 UTC - in response to Message 59986.
	I just aborted the upload (not the workunit) and then it was reported as valid. https://www.gpugrid.net/results.php?hostid=604029
	ID: 59988 \| Rating: 0 \| rate: / Reply Quote

gemini8 Send message Joined: 3 Jul 16 Posts: 31 Credit: 1,266,550,176 RAC: 2,737,000 Level Scientific publications	Message 59989 - Posted: 26 Feb 2023 \| 13:55:33 UTC - in response to Message 59988.
	I just aborted the upload (not the workunit) and then it was reported as valid. https://www.gpugrid.net/results.php?hostid=604029 Indeed, this worked out for me as well. But is there a result that can be used? ____________ Greetings, Jens
	ID: 59989 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59990 - Posted: 26 Feb 2023 \| 14:03:51 UTC - in response to Message 59986.
	For me it's just this: So 26 Feb 2023 11:57:00 CET \| GPUGRID \| Started upload of TL9_55-RAIMIS_TEST_ATM-0-1-RND1804_0_0 So 26 Feb 2023 11:57:02 CET \| GPUGRID \| Backing off 04:12:16 on upload of TL9_55-RAIMIS_TEST_ATM-0-1-RND1804_0_0 So 26 Feb 2023 11:57:19 CET \| GPUGRID \| Started upload of TL9_55-RAIMIS_TEST_ATM-0-1-RND1804_0_0 So 26 Feb 2023 11:57:22 CET \| GPUGRID \| Backing off 05:10:06 on upload of TL9_55-RAIMIS_TEST_ATM-0-1-RND1804_0_0 No message about the size, just about backing off. Hooray! There won’t be any message about why it failed until you enable debugging messages. See the previous link I posted about when this issues happened 1.5 years ago. ____________
	ID: 59990 \| Rating: 0 \| rate: / Reply Quote

kksplace Send message Joined: 4 Mar 18 Posts: 53 Credit: 1,462,276,749 RAC: 3,437,004 Level Scientific publications	Message 59991 - Posted: 26 Feb 2023 \| 14:12:44 UTC - in response to Message 59988.
	I just aborted the upload (not the workunit) and then it was reported as valid. Partially successful for me. I attempted with two of these and one ended up as "Upload failed" while the other "Completed and validated".
	ID: 59991 \| Rating: 0 \| rate: / Reply Quote

fzs600 Send message Joined: 14 Nov 10 Posts: 2 Credit: 439,287,557 RAC: 1,012,095 Level Scientific publications	Message 59992 - Posted: 26 Feb 2023 \| 15:57:47 UTC - in response to Message 59988.
	I just aborted the upload (not the workunit) and then it was reported as valid. https://www.gpugrid.net/results.php?hostid=604029 Indeed, this worked out for me as well.
	ID: 59992 \| Rating: 0 \| rate: / Reply Quote

mikey Send message Joined: 2 Jan 09 Posts: 292 Credit: 2,238,153,615 RAC: 10,956,134 Level Scientific publications	Message 59993 - Posted: 26 Feb 2023 \| 16:27:45 UTC - in response to Message 59992.
	I just aborted the upload (not the workunit) and then it was reported as valid. https://www.gpugrid.net/results.php?hostid=604029 Indeed, this worked out for me as well. It worked on multiple pc's for me too
	ID: 59993 \| Rating: 0 \| rate: / Reply Quote

Speedy Send message Joined: 19 Aug 07 Posts: 42 Credit: 28,391,082 RAC: 0 Level Scientific publications	Message 60008 - Posted: 4 Mar 2023 \| 3:36:01 UTC - in response to Message 59755.
	task ran to completion in about an hour. but hit an error and threw it all away because the file size is too big. upload failure: <file_xfer_error> <file_name>T11_4-RAIMIS_TEST_ATM-0-1-RND7054_2_0</file_name> <error_code>-131 (file size too big)</error_code> </file_xfer_error> what a waste. No it's not a waste in my opinion because you found something out. You found that "the file size was too big" so it can be corrected so it doesn't happen again hopefully. :-)
	ID: 60008 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1090 Credit: 6,603,906,926 RAC: 3,160,162 Level Scientific publications	Message 60010 - Posted: 4 Mar 2023 \| 6:31:15 UTC
	this now is a topic also on this thread: https://www.gpugrid.net/forum_thread.php?id=5379 which has been opened by the developer Quico
	ID: 60010 \| Rating: 0 \| rate: / Reply Quote

Magiceye04 Send message Joined: 1 Apr 09 Posts: 24 Credit: 67,905,687 RAC: 0 Level Scientific publications	Message 60178 - Posted: 25 Mar 2023 \| 13:28:56 UTC
	How can I get ATM ? Serverstatus tells me, there are more then hundred WUs ready to send at the moment. Boinc Manager tells me: Sa 25 Mär 2023 14:20:07 CET \| GPUGRID \| No tasks are available for ATM: Free energy calculations of protein-ligand binding The PC is running with Ubuntu 20LTS, Geforce1070ti and driver 470.16
	ID: 60178 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 60179 - Posted: 25 Mar 2023 \| 13:38:05 UTC - in response to Message 60178.
	you need to enable beta/test applications in your project preferences ____________
	ID: 60179 \| Rating: 0 \| rate: / Reply Quote

Magiceye04 Send message Joined: 1 Apr 09 Posts: 24 Credit: 67,905,687 RAC: 0 Level Scientific publications	Message 60180 - Posted: 25 Mar 2023 \| 13:40:43 UTC Last modified: 25 Mar 2023 \| 14:06:37 UTC
	Ah, Thanks. The "test application" setting I have missed. Now I have to wait some hours for the download to be finished.
	ID: 60180 \| Rating: 0 \| rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 566 Credit: 6,093,327,024 RAC: 8,533,371 Level Scientific publications	Message 60313 - Posted: 12 Apr 2023 \| 22:11:05 UTC
	So far, I noticed on ATM tasks an abnormal progress notification. Progress usually jumped from 0% to 0.199% in a short first step, and then directly to 100% in a second long step, staying so until task completion. Along this second step, estimated time remaining was not shown ( "---" shown instead) Example of wrong progress notification: CDK2_29_26_5-QUICO_ATM_OFF_STEPS-2-5-RND5867_0 Today, I catched two ATM tasks showing a linear progression and accurate estimated time remaining. At this moment, both of them are still in progress. Examples of right progress notification: Tyk2_jmc_23_jmc_27_2-QUICO_ATM_OFF12_STEPS-0-5-RND0292_2 Tyk2_jmc_23_ejm_55_5-QUICO_ATM_OFF12_STEPS-0-5-RND1896_3
	ID: 60313 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1288 Credit: 5,091,256,959 RAC: 8,853,655 Level Scientific publications	Message 60314 - Posted: 13 Apr 2023 \| 2:12:35 UTC
	There is still a mix of old, broken progress tasks along with fixed progress tasks in rotation. Just depends on whether you get a new _0 or an older _x wingman task.
	ID: 60314 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1576 Credit: 5,761,236,851 RAC: 8,640,111 Level Scientific publications	Message 60315 - Posted: 13 Apr 2023 \| 7:04:17 UTC - in response to Message 60314.
	No, it's not the replication number. The clue is in the task name: one with "STEPS-0-5" will show normal progress, one with any other "STEPS-n-5" will jump quickly to 100%. The old, very long running, tasks processed 341 samples all in one go. The new shorter ones have been split into five shorter runs, processing 70 samples each (confusingly numbered 0 to 4 - I've never seen a 'steps-5-5'). Number zero - the first in the chain - processes samples 1 to 70, which is what the progress display expects. The second processes samples 71 to 140 - so it starts beyond the finishing point. And so on.
	ID: 60315 \| Rating: 0 \| rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 566 Credit: 6,093,327,024 RAC: 8,533,371 Level Scientific publications	Message 60317 - Posted: 13 Apr 2023 \| 10:36:53 UTC - in response to Message 60315.
	Nice explanation. This makes full sense to that behavior.
	ID: 60317 \| Rating: 0 \| rate: / Reply Quote

Post to thread

Message boards : Number crunching : ATM: Free Energy Calculations new application

	About	Science	Volunteers	Performance	Forum	Join us	Donate

Author	Message
Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59751 - Posted: 18 Jan 2023 \| 19:50:34 UTC Last modified: 18 Jan 2023 \| 20:30:40 UTC
	Just starting the thread for discussion of this new application. ATM = AToM. after a little snafu with the first batch (incorrect config files), the latest batch seems to run on my system. no idea for runtime yet or if it will finish successfully. This is another Python-based application. the package ships with the python environment similar to how the PythonGPU Reinforcement Learning (RL) app does. Test Bench: Xeon E5-2697Av4 (16c/32t) 64GB DDR4-2400 RDIMM (ECC) RTX 3060 12GB Ubuntu 22.04.1 So far observed behavior: -uses ~97% of the GPU core, ~45% GPU memory bus, ~0-1% PCIe bus, close to full power use. -about 400-500MB VRAM used (low, like acemd3) -does not like to be paused and resumed, or BOINC stopped and restarted. it causes the task to fail unknown total runtime expectation since the one task I had failed when I restarted BOINC lol. ____________
	ID: 59751 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59752 - Posted: 18 Jan 2023 \| 20:36:00 UTC - in response to Message 59751.
	about the restart failure. looks like it fails trying to create a directory that already exists. mkdir: cannot create directory 'atm_tmp': File exists needs some work to allow for that. ____________
	ID: 59752 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59754 - Posted: 18 Jan 2023 \| 20:55:38 UTC
	another quality of life improvement should be adding a <weight> line to the main task in the job.xml file. right now with 2 tasks in the file, and no weights defined, I'm guessing it splits it 50/50 and it thinks the task is 50% done once the extraction phase is complete. ____________
	ID: 59754 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59755 - Posted: 18 Jan 2023 \| 21:33:51 UTC Last modified: 18 Jan 2023 \| 21:34:03 UTC
	task ran to completion in about an hour. but hit an error and threw it all away because the file size is too big. upload failure: <file_xfer_error> <file_name>T11_4-RAIMIS_TEST_ATM-0-1-RND7054_2_0</file_name> <error_code>-131 (file size too big)</error_code> </file_xfer_error> what a waste. ____________
	ID: 59755 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1288 Credit: 5,091,256,959 RAC: 8,853,655 Level Scientific publications	Message 59771 - Posted: 19 Jan 2023 \| 18:06:13 UTC
	Have over a dozen of quick-failing ATM tasks. The wrapper does not have a correctly name tar file or something. 02:56:29 (1242346): wrapper: running /bin/tar (xf input.tar.bz2) /bin/tar: This does not look like a tar archive bzip2: (stdin) is not a bzip2 file. /bin/tar: Child returned status 2 /bin/tar: Error is not recoverable: exiting now
	ID: 59771 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59792 - Posted: 24 Jan 2023 \| 16:01:40 UTC - in response to Message 59755.
	looks like the small batch of tasks that went out today are better setup. ran for about an hour and completed successfully without the file size issue when complete. great :) still would like a little more background info on these tasks, what they are doing, and the goal of the research. ____________
	ID: 59792 \| Rating: 0 \| rate: / Reply Quote

FritzB Send message Joined: 7 Apr 15 Posts: 11 Credit: 2,025,513,600 RAC: 4,810,655 Level Scientific publications	Message 59899 - Posted: 10 Feb 2023 \| 22:25:09 UTC Last modified: 10 Feb 2023 \| 22:25:52 UTC
	This one https://www.gpugrid.net/workunit.php?wuid=27399736 is runnig for about 11 hours and it is stuck at 66,666% for at least 4 hours now. There is almost no load on the GPU. Just a few percent (3-5) once in a while, but constantly some load on the memory controller (10-30). Hope it will finish some day :)
	ID: 59899 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59936 - Posted: 16 Feb 2023 \| 16:24:10 UTC
	still no official communication from the project about these tasks. the recent batches have been very hit or miss and exhibit much different behavior than my initial post. "TL2" tasks, ran for hours and hours with little to no GPU or CPU use. I aborted them and moved on. "TL3" tasks yesterday, also had little to no GPU or CPU use, but did complete in about 30 mins. "TL4" tasks today seem like a repeat of TL2. no GPU use, runs for hours with no progress. also weights need to be defined in the jobs.xml file so the tasks don't jump to 75% after a few seconds and then sit there for hours doing nothing. ____________
	ID: 59936 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1576 Credit: 5,761,236,851 RAC: 8,640,111 Level Scientific publications	Message 59937 - Posted: 16 Feb 2023 \| 17:01:14 UTC
	Just been sent a TL4 from WU 27405970. I see you've aborted two previous tasks from the same WU, Ian, on two different machines. Did you get any CPU usage figures from previous runs? I think I'll start it up with the GTX 1660 plus one core, but I'll probably abort it myself if it doesn't show much response.
	ID: 59937 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59938 - Posted: 16 Feb 2023 \| 17:04:18 UTC - in response to Message 59937.
	they spin up multiple processes like the Python tasks do. but i didnt catch them at the very beginning to see if they spike in use or anything like that. once they get going, they basically sit idle as far as the GPU and CPU go. little to no use at all. i just killed them rather than letting them sit there for hours occupying my GPU. ____________
	ID: 59938 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1576 Credit: 5,761,236,851 RAC: 8,640,111 Level Scientific publications	Message 59939 - Posted: 16 Feb 2023 \| 17:30:46 UTC
	OK, I've set 3 CPUs for continuity from the current Python task, and I've put weights of 1-1-1-97 in the job file so I can see what's happening. My normal remote monitoring console shows the current average CPU usage, and I've put nvidia-smi on a five second loop. If either of those drops to zero, I'll abort it. Chocks away!
	ID: 59939 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1576 Credit: 5,761,236,851 RAC: 8,640,111 Level Scientific publications	Message 59941 - Posted: 16 Feb 2023 \| 18:15:43 UTC
	I see what you mean. Nearly half an hour in, CPU usage is showing around 25% of a single core, and GPU usage spiked once, to 41%, after about a quarter of an hour. It's one way of saving electricity, but I'd rather be doing something useful. Aborting.
	ID: 59941 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 399 Credit: 13,024,116,132 RAC: 122,853 Level Scientific publications	Message 59961 - Posted: 22 Feb 2023 \| 17:28:34 UTC
	1.13 ATM running fine for me. Keep aborting them and I'll run them for you.
	ID: 59961 \| Rating: 0 \| rate: / Reply Quote

zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 207 Credit: 1,669,151,456 RAC: 719,040 Level Scientific publications	Message 59964 - Posted: 23 Feb 2023 \| 3:08:08 UTC
	FWIW, the first task I received completed successfully. http://www.gpugrid.net/workunit.php?wuid=27410175 ____________ Reno, NV Team: SETI.USA
	ID: 59964 \| Rating: 0 \| rate: / Reply Quote

FritzB Send message Joined: 7 Apr 15 Posts: 11 Credit: 2,025,513,600 RAC: 4,810,655 Level Scientific publications	Message 59967 - Posted: 23 Feb 2023 \| 8:18:47 UTC - in response to Message 59964. Last modified: 23 Feb 2023 \| 8:19:26 UTC
	I've also finished one: https://www.gpugrid.net/workunit.php?wuid=27410166 We're both using Linux Mint. It seems to crash on Win 10 machines (computer #600532 is mine, too).
	ID: 59967 \| Rating: 0 \| rate: / Reply Quote

zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 207 Credit: 1,669,151,456 RAC: 719,040 Level Scientific publications	Message 59968 - Posted: 23 Feb 2023 \| 15:33:06 UTC
	Over night, I had 4 of these tasks cancelled by server. ____________ Reno, NV Team: SETI.USA
	ID: 59968 \| Rating: 0 \| rate: / Reply Quote

KAMasud Send message Joined: 27 Jul 11 Posts: 137 Credit: 523,901,354 RAC: 3 Level Scientific publications	Message 59969 - Posted: 23 Feb 2023 \| 16:50:20 UTC - in response to Message 59961.
	1.13 ATM running fine for me. Keep aborting them and I'll run them for you. _______________ Same here. I quite enjoy completing these WUs. There should be a way to analyse these WUs as to why it is happening on certain machines. We are mostly running the same hardware and OS. It would be fun to see the results. -
	ID: 59969 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59970 - Posted: 23 Feb 2023 \| 17:27:24 UTC Last modified: 23 Feb 2023 \| 17:29:36 UTC
	keep in mind these are bata tasks, and the batch being sent NOW are not necessarily the same as the batch sent last week and wont be the same as whatever is sent sometime in the future, until they get all the bugs worked out. the tasks last week basically ran with no perceived use of the GPU or CPU, so what were they doing? who knows. no official word from the project about these tasks at all. I wasn't willing to let the GPU/CPU be occupied for hours on end with the task spinning it's wheels when they could be doing something more useful. maybe this current batch has been tweaked from last week and thats why they are working OK, for those that have completed this latest batch, did they have any meaningful use of the GPU or CPU? it also seems this batch was released with a new Windows application (they were Linux only before) for testing. ____________
	ID: 59970 \| Rating: 0 \| rate: / Reply Quote

KAMasud Send message Joined: 27 Jul 11 Posts: 137 Credit: 523,901,354 RAC: 3 Level Scientific publications	Message 59971 - Posted: 24 Feb 2023 \| 5:28:26 UTC - in response to Message 59970.
	keep in mind these are bata tasks, and the batch being sent NOW are not necessarily the same as the batch sent last week and wont be the same as whatever is sent sometime in the future, until they get all the bugs worked out. the tasks last week basically ran with no perceived use of the GPU or CPU, so what were they doing? who knows. no official word from the project about these tasks at all. I wasn't willing to let the GPU/CPU be occupied for hours on end with the task spinning it's wheels when they could be doing something more useful. maybe this current batch has been tweaked from last week and thats why they are working OK, for those that have completed this latest batch, did they have any meaningful use of the GPU or CPU? it also seems this batch was released with a new Windows application (they were Linux only before) for testing. _______________________ Well, most of us know that Abouh reads every word written on these threads and without much song and dance, makes changes. He is the Only Admin on all the projects who diligently attend. Maybe, quite possibly. No arguments with your tweaking statement.
	ID: 59971 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1090 Credit: 6,603,906,926 RAC: 3,160,162 Level Scientific publications	Message 59972 - Posted: 24 Feb 2023 \| 8:36:54 UTC - in response to Message 59971.
	Well, most of us know that Abouh reads every word written on these threads and without much song and dance, makes changes. He is the Only Admin on all the projects who diligently attend. Maybe, quite possibly. No arguments with your tweaking statement. well, Abouh is the only one from the project team who actively communicates with us volunteers - which is great. All others obviously don't care, and this has been like this over the years, unfortunately. For example: 9 days ago I asked in the ACEMD 4 thread when new ACEMD 4 task will be around, or whether this subproject is dead. No reply so far; whereas a reply could be very simple, not longer than just a line :-( You know what I want to say ... it's kind of disappointing at times :-(
	ID: 59972 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59973 - Posted: 24 Feb 2023 \| 13:27:38 UTC - in response to Message 59971.
	keep in mind these are bata tasks, and the batch being sent NOW are not necessarily the same as the batch sent last week and wont be the same as whatever is sent sometime in the future, until they get all the bugs worked out. the tasks last week basically ran with no perceived use of the GPU or CPU, so what were they doing? who knows. no official word from the project about these tasks at all. I wasn't willing to let the GPU/CPU be occupied for hours on end with the task spinning it's wheels when they could be doing something more useful. maybe this current batch has been tweaked from last week and thats why they are working OK, for those that have completed this latest batch, did they have any meaningful use of the GPU or CPU? it also seems this batch was released with a new Windows application (they were Linux only before) for testing. _______________________ Well, most of us know that Abouh reads every word written on these threads and without much song and dance, makes changes. He is the Only Admin on all the projects who diligently attend. Maybe, quite possibly. No arguments with your tweaking statement. that's great and all, but abouh is not the researcher working with this application. Abouh deals with the research with the Python RL tasks. These ATM tasks look to be being run by Raimis. (the researcher names are in the filenames of the WUs) ____________
	ID: 59973 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59974 - Posted: 24 Feb 2023 \| 13:30:55 UTC
	https://gpugrid.net/result.php?resultid=33321222 ran for 10+hours, failed due to file size limit after an otherwise successful computation. :( ____________
	ID: 59974 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1090 Credit: 6,603,906,926 RAC: 3,160,162 Level Scientific publications	Message 59975 - Posted: 24 Feb 2023 \| 14:39:26 UTC - in response to Message 59974.
	... failed due to file size limit :( I am just trying to remember with which other application we've had the same problem some time ago - last year or 2 years ago ???
	ID: 59975 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59976 - Posted: 24 Feb 2023 \| 14:51:34 UTC - in response to Message 59975.
	... failed due to file size limit :( I am just trying to remember with which other application we've had the same problem some time ago - last year or 2 years ago ??? it's happened a few times in the past with acemd3 tasks. see here from July 2021: https://www.gpugrid.net/forum_thread.php?id=5239#57117 ____________
	ID: 59976 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 399 Credit: 13,024,116,132 RAC: 122,853 Level Scientific publications	Message 59977 - Posted: 24 Feb 2023 \| 18:19:21 UTC
	Yea, I got my first ATM checkpoint :-) Now my list of ATM ULs are stuck.
	ID: 59977 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59978 - Posted: 24 Feb 2023 \| 18:43:59 UTC - in response to Message 59977.
	Yea, I got my first ATM checkpoint :-) Now my list of ATM ULs are stuck. the uploads are nearly 700MB in size, and likely the same problem from my link that we saw over a year ago. their server can't accept something that big, I don't think they ever figured out how to adjust the settings of their file server and just tried to keep the file sizes below the limit, which they seem to have forgotten about. nothing you do will get them to upload. I've disabled ATM until they get it together with them. ____________
	ID: 59978 \| Rating: 0 \| rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 566 Credit: 6,093,327,024 RAC: 8,533,371 Level Scientific publications	Message 59979 - Posted: 24 Feb 2023 \| 21:20:56 UTC - in response to Message 59978.
	On past chance, I bet and lost. Currently, I'm only processing ACEMD tasks, when available. I happened to catch one this morning.
	ID: 59979 \| Rating: 0 \| rate: / Reply Quote

Aurum Send message Joined: 12 Jul 17 Posts: 399 Credit: 13,024,116,132 RAC: 122,853 Level Scientific publications	Message 59980 - Posted: 24 Feb 2023 \| 23:20:56 UTC
	GDF, Should I Abort these 12 completed ATM WUs that won't upload or is there a reasonable chance you'll fix it?
	ID: 59980 \| Rating: 0 \| rate: / Reply Quote

zombie67 [MM] Send message Joined: 16 Jul 07 Posts: 207 Credit: 1,669,151,456 RAC: 719,040 Level Scientific publications	Message 59981 - Posted: 25 Feb 2023 \| 1:18:39 UTC
	Well, I just achieved my 100 hours, which was my 1st priority. I will abort and reset (if necessary) the completed tasks I have. If/when the project gets its act together, I'll be back. ____________ Reno, NV Team: SETI.USA
	ID: 59981 \| Rating: 0 \| rate: / Reply Quote

gemini8 Send message Joined: 3 Jul 16 Posts: 31 Credit: 1,266,550,176 RAC: 2,737,000 Level Scientific publications	Message 59986 - Posted: 26 Feb 2023 \| 11:04:13 UTC
	For me it's just this: So 26 Feb 2023 11:57:00 CET \| GPUGRID \| Started upload of TL9_55-RAIMIS_TEST_ATM-0-1-RND1804_0_0 So 26 Feb 2023 11:57:02 CET \| GPUGRID \| Backing off 04:12:16 on upload of TL9_55-RAIMIS_TEST_ATM-0-1-RND1804_0_0 So 26 Feb 2023 11:57:19 CET \| GPUGRID \| Started upload of TL9_55-RAIMIS_TEST_ATM-0-1-RND1804_0_0 So 26 Feb 2023 11:57:22 CET \| GPUGRID \| Backing off 05:10:06 on upload of TL9_55-RAIMIS_TEST_ATM-0-1-RND1804_0_0 No message about the size, just about backing off. Hooray! ____________ Greetings, Jens
	ID: 59986 \| Rating: 0 \| rate: / Reply Quote

FritzB Send message Joined: 7 Apr 15 Posts: 11 Credit: 2,025,513,600 RAC: 4,810,655 Level Scientific publications	Message 59988 - Posted: 26 Feb 2023 \| 11:56:45 UTC - in response to Message 59986.
	I just aborted the upload (not the workunit) and then it was reported as valid. https://www.gpugrid.net/results.php?hostid=604029
	ID: 59988 \| Rating: 0 \| rate: / Reply Quote

gemini8 Send message Joined: 3 Jul 16 Posts: 31 Credit: 1,266,550,176 RAC: 2,737,000 Level Scientific publications	Message 59989 - Posted: 26 Feb 2023 \| 13:55:33 UTC - in response to Message 59988.
	I just aborted the upload (not the workunit) and then it was reported as valid. https://www.gpugrid.net/results.php?hostid=604029 Indeed, this worked out for me as well. But is there a result that can be used? ____________ Greetings, Jens
	ID: 59989 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 59990 - Posted: 26 Feb 2023 \| 14:03:51 UTC - in response to Message 59986.
	For me it's just this: So 26 Feb 2023 11:57:00 CET \| GPUGRID \| Started upload of TL9_55-RAIMIS_TEST_ATM-0-1-RND1804_0_0 So 26 Feb 2023 11:57:02 CET \| GPUGRID \| Backing off 04:12:16 on upload of TL9_55-RAIMIS_TEST_ATM-0-1-RND1804_0_0 So 26 Feb 2023 11:57:19 CET \| GPUGRID \| Started upload of TL9_55-RAIMIS_TEST_ATM-0-1-RND1804_0_0 So 26 Feb 2023 11:57:22 CET \| GPUGRID \| Backing off 05:10:06 on upload of TL9_55-RAIMIS_TEST_ATM-0-1-RND1804_0_0 No message about the size, just about backing off. Hooray! There won’t be any message about why it failed until you enable debugging messages. See the previous link I posted about when this issues happened 1.5 years ago. ____________
	ID: 59990 \| Rating: 0 \| rate: / Reply Quote

kksplace Send message Joined: 4 Mar 18 Posts: 53 Credit: 1,462,276,749 RAC: 3,437,004 Level Scientific publications	Message 59991 - Posted: 26 Feb 2023 \| 14:12:44 UTC - in response to Message 59988.
	I just aborted the upload (not the workunit) and then it was reported as valid. Partially successful for me. I attempted with two of these and one ended up as "Upload failed" while the other "Completed and validated".
	ID: 59991 \| Rating: 0 \| rate: / Reply Quote

fzs600 Send message Joined: 14 Nov 10 Posts: 2 Credit: 439,287,557 RAC: 1,012,095 Level Scientific publications	Message 59992 - Posted: 26 Feb 2023 \| 15:57:47 UTC - in response to Message 59988.
	I just aborted the upload (not the workunit) and then it was reported as valid. https://www.gpugrid.net/results.php?hostid=604029 Indeed, this worked out for me as well.
	ID: 59992 \| Rating: 0 \| rate: / Reply Quote

mikey Send message Joined: 2 Jan 09 Posts: 292 Credit: 2,238,153,615 RAC: 10,956,134 Level Scientific publications	Message 59993 - Posted: 26 Feb 2023 \| 16:27:45 UTC - in response to Message 59992.
	I just aborted the upload (not the workunit) and then it was reported as valid. https://www.gpugrid.net/results.php?hostid=604029 Indeed, this worked out for me as well. It worked on multiple pc's for me too
	ID: 59993 \| Rating: 0 \| rate: / Reply Quote

Speedy Send message Joined: 19 Aug 07 Posts: 42 Credit: 28,391,082 RAC: 0 Level Scientific publications	Message 60008 - Posted: 4 Mar 2023 \| 3:36:01 UTC - in response to Message 59755.
	task ran to completion in about an hour. but hit an error and threw it all away because the file size is too big. upload failure: <file_xfer_error> <file_name>T11_4-RAIMIS_TEST_ATM-0-1-RND7054_2_0</file_name> <error_code>-131 (file size too big)</error_code> </file_xfer_error> what a waste. No it's not a waste in my opinion because you found something out. You found that "the file size was too big" so it can be corrected so it doesn't happen again hopefully. :-)
	ID: 60008 \| Rating: 0 \| rate: / Reply Quote

Erich56 Send message Joined: 1 Jan 15 Posts: 1090 Credit: 6,603,906,926 RAC: 3,160,162 Level Scientific publications	Message 60010 - Posted: 4 Mar 2023 \| 6:31:15 UTC
	this now is a topic also on this thread: https://www.gpugrid.net/forum_thread.php?id=5379 which has been opened by the developer Quico
	ID: 60010 \| Rating: 0 \| rate: / Reply Quote

Magiceye04 Send message Joined: 1 Apr 09 Posts: 24 Credit: 67,905,687 RAC: 0 Level Scientific publications	Message 60178 - Posted: 25 Mar 2023 \| 13:28:56 UTC
	How can I get ATM ? Serverstatus tells me, there are more then hundred WUs ready to send at the moment. Boinc Manager tells me: Sa 25 Mär 2023 14:20:07 CET \| GPUGRID \| No tasks are available for ATM: Free energy calculations of protein-ligand binding The PC is running with Ubuntu 20LTS, Geforce1070ti and driver 470.16
	ID: 60178 \| Rating: 0 \| rate: / Reply Quote

Ian&Steve C. Send message Joined: 21 Feb 20 Posts: 1035 Credit: 36,941,282,483 RAC: 47,536,110 Level Scientific publications	Message 60179 - Posted: 25 Mar 2023 \| 13:38:05 UTC - in response to Message 60178.
	you need to enable beta/test applications in your project preferences ____________
	ID: 60179 \| Rating: 0 \| rate: / Reply Quote

Magiceye04 Send message Joined: 1 Apr 09 Posts: 24 Credit: 67,905,687 RAC: 0 Level Scientific publications	Message 60180 - Posted: 25 Mar 2023 \| 13:40:43 UTC Last modified: 25 Mar 2023 \| 14:06:37 UTC
	Ah, Thanks. The "test application" setting I have missed. Now I have to wait some hours for the download to be finished.
	ID: 60180 \| Rating: 0 \| rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 566 Credit: 6,093,327,024 RAC: 8,533,371 Level Scientific publications	Message 60313 - Posted: 12 Apr 2023 \| 22:11:05 UTC
	So far, I noticed on ATM tasks an abnormal progress notification. Progress usually jumped from 0% to 0.199% in a short first step, and then directly to 100% in a second long step, staying so until task completion. Along this second step, estimated time remaining was not shown ( "---" shown instead) Example of wrong progress notification: CDK2_29_26_5-QUICO_ATM_OFF_STEPS-2-5-RND5867_0 Today, I catched two ATM tasks showing a linear progression and accurate estimated time remaining. At this moment, both of them are still in progress. Examples of right progress notification: Tyk2_jmc_23_jmc_27_2-QUICO_ATM_OFF12_STEPS-0-5-RND0292_2 Tyk2_jmc_23_ejm_55_5-QUICO_ATM_OFF12_STEPS-0-5-RND1896_3
	ID: 60313 \| Rating: 0 \| rate: / Reply Quote

Keith Myers Send message Joined: 13 Dec 17 Posts: 1288 Credit: 5,091,256,959 RAC: 8,853,655 Level Scientific publications	Message 60314 - Posted: 13 Apr 2023 \| 2:12:35 UTC
	There is still a mix of old, broken progress tasks along with fixed progress tasks in rotation. Just depends on whether you get a new _0 or an older _x wingman task.
	ID: 60314 \| Rating: 0 \| rate: / Reply Quote

Richard Haselgrove Send message Joined: 11 Jul 09 Posts: 1576 Credit: 5,761,236,851 RAC: 8,640,111 Level Scientific publications	Message 60315 - Posted: 13 Apr 2023 \| 7:04:17 UTC - in response to Message 60314.
	No, it's not the replication number. The clue is in the task name: one with "STEPS-0-5" will show normal progress, one with any other "STEPS-n-5" will jump quickly to 100%. The old, very long running, tasks processed 341 samples all in one go. The new shorter ones have been split into five shorter runs, processing 70 samples each (confusingly numbered 0 to 4 - I've never seen a 'steps-5-5'). Number zero - the first in the chain - processes samples 1 to 70, which is what the progress display expects. The second processes samples 71 to 140 - so it starts beyond the finishing point. And so on.
	ID: 60315 \| Rating: 0 \| rate: / Reply Quote

ServicEnginIC Send message Joined: 24 Sep 10 Posts: 566 Credit: 6,093,327,024 RAC: 8,533,371 Level Scientific publications	Message 60317 - Posted: 13 Apr 2023 \| 10:36:53 UTC - in response to Message 60315.
	Nice explanation. This makes full sense to that behavior.
	ID: 60317 \| Rating: 0 \| rate: / Reply Quote