Message boards : Multicore CPUs : Simultaneously starting MCs
Author | Message |
---|---|
Problem with simultaneously starting Multicore CPU tasks has not been fixed yet! | |
ID: 49378 | Rating: 0 | rate:
![]() ![]() ![]() | |
Problem with simultaneously starting Multicore CPU tasks has not been fixed yet! I have heard about this error but maybe I don't understand the symptoms. I have had three 4 thread WUs start at once and sometimes they work, sometimes they don't. Could this be the cause of my errors? Linked below are the tasks from the system: http://www.gpugrid.net/results.php?hostid=424454 | |
ID: 49382 | Rating: 0 | rate:
![]() ![]() ![]() | |
If two WUs start at the same time, in most cases one of the WU failes right at start and throws an calculation error, which is very inconveniant, if you have to start often times more than one WU at the same time (in my case up to 20 WUs on my 80 threads machine). Would like to hear some statement of the developer(s) or simply a bugfix within the WUs, because at the moment, I do not see any suitable work around. Don't think it's just your fault or mine, because several users have reported this issue for a while now. | |
ID: 49383 | Rating: 0 | rate:
![]() ![]() ![]() | |
If it is still failing, please provide a task number for me to check. | |
ID: 49614 | Rating: 0 | rate:
![]() ![]() ![]() | |
Toni, | |
ID: 49615 | Rating: 0 | rate:
![]() ![]() ![]() | |
@captainjack: please try two things for me flock command. See if it gives an error (command not found) or a longer message.2. reset the project Thanks | |
ID: 49616 | Rating: 0 | rate:
![]() ![]() ![]() | |
Toni, | |
ID: 49617 | Rating: 0 | rate:
![]() ![]() ![]() | |
:( | |
ID: 49618 | Rating: 0 | rate:
![]() ![]() ![]() | |
Statistically though the new app seems to have worked on other hosts. We went from 900 WU to 1500 WU in progress. | |
ID: 49620 | Rating: 0 | rate:
![]() ![]() ![]() | |
with all my computers just the same... Toni, does it help if I give you by private message the remote control access credentials of one of my Linux machines, so you can test on your own? | |
ID: 49621 | Rating: 0 | rate:
![]() ![]() ![]() | |
Thomas, that would help, but perhaps let me ask another thing first: | |
ID: 49622 | Rating: 0 | rate:
![]() ![]() ![]() | |
Sometimes they work and sometimes not. If two or more WUs start at the same time, they all throw calculation errors. This was the state until now. But with the very new version you just released today, it seems like they are not working anymore at all. My operating system is Linux Mint 18.3 64 Bit with actual linux kernel. I installed boinc with "sudo apt-get install boinc". gcc-5 and g++-5 are installed; also python-support. | |
ID: 49623 | Rating: 0 | rate:
![]() ![]() ![]() | |
Ok thanks. This will need a while to debug. As you can see there is no error info to help (as usual in boinc...). | |
ID: 49624 | Rating: 0 | rate:
![]() ![]() ![]() | |
All errors on my VM (with 4 virtual core). <message> I have not app_config. Addendum: i also tried to stop all wus and started manually one-by-one. Same error. | |
ID: 49625 | Rating: 0 | rate:
![]() ![]() ![]() | |
Version 320 out | |
ID: 49626 | Rating: 0 | rate:
![]() ![]() ![]() | |
Also, you need the libc6-dev package sudo apt install libc6-dev | |
ID: 49627 | Rating: 0 | rate:
![]() ![]() ![]() | |
All tasks exit with error: 14:19:18 (8806): wrapper: running /bin/bash (-c "flock /var/lib/boinc-client/projects/www.gpugrid.net/miniconda.lock ./miniconda-installer -b -u -p /var/lib/boinc-client/projects/www.gpugrid.net/miniconda") Please run using "bash" or "sh", but not "." or "source"\n14:19:19 (8806): /bin/bash exited; CPU time 0.001596 14:19:19 (8806): app exit status: 0x1 | |
ID: 49629 | Rating: 0 | rate:
![]() ![]() ![]() | |
These were 3.19. See with 3.20 | |
ID: 49630 | Rating: 0 | rate:
![]() ![]() ![]() | |
When I tried to start two at the same time, one of them runs okay and the other one aborts. | |
ID: 49631 | Rating: 0 | rate:
![]() ![]() ![]() | |
I have had libc6-dev already installed... Earlier I got a bunch of new WUs, but all failed after several minutes of calculation (~ 5 - 15 minutes). | |
ID: 49632 | Rating: 0 | rate:
![]() ![]() ![]() | |
Started two QC Wu's simultaneously using app version 3.20 and one failed with this stderr report: | |
ID: 49634 | Rating: 0 | rate:
![]() ![]() ![]() | |
Yes, now simultaneous starts crash with "text file busy". Will look for yet another workaround. | |
ID: 49636 | Rating: 0 | rate:
![]() ![]() ![]() | |
Attempting fix at version 321 | |
ID: 49640 | Rating: 0 | rate:
![]() ![]() ![]() | |
Just got a couple of errors from last night's WUs version 3.20. | |
ID: 49641 | Rating: 0 | rate:
![]() ![]() ![]() | |
^^ These seem connection errors (network down or so) | |
ID: 49642 | Rating: 0 | rate:
![]() ![]() ![]() | |
^^ These seem connection errors (network down or so) Network can affect computation after the WU is already downloaded? | |
ID: 49643 | Rating: 0 | rate:
![]() ![]() ![]() | |
I have exactly the same problems so I dont think this is connection related. | |
ID: 49644 | Rating: 0 | rate:
![]() ![]() ![]() | |
Version 321 looks promising. Just finished two that started at the same time and they finished normally. Just started three at the same time and they are all processing as they should. | |
ID: 49645 | Rating: 0 | rate:
![]() ![]() ![]() | |
Success with two QC WU's simultaneous start using app 3.21. Both WU's happily crunching @ 1.098% completed so far. | |
ID: 49646 | Rating: 0 | rate:
![]() ![]() ![]() | |
^^ These seem connection errors (network down or so) Yes. WUs check the latest version of conda packages/libraries right after start (from conda cloud). | |
ID: 49647 | Rating: 0 | rate:
![]() ![]() ![]() | |
^^ These seem connection errors (network down or so) Dang is that only after start? If tasks were started, paused, another started, etc could networking be disabled after they start? Or at each start/resume? Not me, but I've heard of some setup schedules to allow downloads at certain times of the day due to varying bandwidth costs. | |
ID: 49652 | Rating: 0 | rate:
![]() ![]() ![]() | |
Today I have had three successful simultaneous starts and returns without error on two different FX8350 machines. As far as I am concerned, this bug has been resolved at least for the Fedora distro. | |
ID: 49654 | Rating: 0 | rate:
![]() ![]() ![]() | |
Haven't gotten a single error with version 3.21 and it's been running all day. What did you change? | |
ID: 49655 | Rating: 0 | rate:
![]() ![]() ![]() | |
The main change was locking the miniconda directory upon initial installation/update. This in turn required some workarounds. May not be perfect but should be much better. | |
ID: 49656 | Rating: 0 | rate:
![]() ![]() ![]() | |
No more error, but a strange behaviour. | |
ID: 49657 | Rating: 0 | rate:
![]() ![]() ![]() | |
Didn't get any response from my post in the cpu tasks thread. | |
ID: 49661 | Rating: 0 | rate:
![]() ![]() ![]() | |
Did you check: use CPU as well? You might not have allowed it. | |
ID: 49662 | Rating: 0 | rate:
![]() ![]() ![]() | |
Yes CPU was checked for use both places and the QC app. Didn't get any QC tasks both time I tried. Just gpu tasks. The scheduler request was for both cpu and gpu work. I know there is plenty of cpu tasks to farm out. Couldn't explain why I didn't get any work. I'll have to try again without gpu work checked I guess. | |
ID: 49663 | Rating: 0 | rate:
![]() ![]() ![]() | |
Yes CPU was checked for use both places and the QC app. Didn't get any QC tasks both time I tried. Just gpu tasks. That is your problem. The BOINC scheduler can get all mixed up when you select both CPU and GPU work on the same project, and you are eventually left high and dry on one or the other. There are several discussions on it at Einstein, which has the same problem since they do both CPU and GPU work. Here is one recent discussion, where the moderator explains why the requester is not getting GPU work. https://einsteinathome.org/content/not-getting-gpu-wus-anymore#comment-165295 I use separate machines for the CPU work and the GPU work on GPUGrid. | |
ID: 49664 | Rating: 0 | rate:
![]() ![]() ![]() | |
Thanks for the reply. I don't know. It worked a couple of months ago when the QC app and tasks first showed up. I was crunching both gpu and cpu at the same time. I know that shutting off a gpu request will probably work just to get some of the new QC tasks along with the latest 3.21 app. | |
ID: 49665 | Rating: 0 | rate:
![]() ![]() ![]() | |
I was just wondering how well the app works now with concurrent starts. Let's put it this way. I was running three 3.21 QC on my i7-8700 and rebooted. They all resumed normally without error. So it is solved enough. | |
ID: 49666 | Rating: 0 | rate:
![]() ![]() ![]() | |
My threadripper this night crunched 50 WU's, no issues. Good job :) | |
ID: 49667 | Rating: 0 | rate:
![]() ![]() ![]() | |
Yes CPU was checked for use both places and the QC app. Didn't get any QC tasks both time I tried. Just gpu tasks. The scheduler request was for both cpu and gpu work. I know there is plenty of cpu tasks to farm out. Couldn't explain why I didn't get any work. I'll have to try again without gpu work checked I guess. At risk of stating trivialities: did you check in the log if by chance it's a matter of disk space (either allocated to boinc, or actually free)? QC tasks are unusually demanding on disk space. Anyway: thanks for trying to make it work :) | |
ID: 49668 | Rating: 0 | rate:
![]() ![]() ![]() | |
Not an important thing, but are you planning to implement separate badges for CPU points? | |
ID: 49669 | Rating: 0 | rate:
![]() ![]() ![]() | |
Hello, I just got a new error with v3.21. Does anyone have any idea what could be causing it? | |
ID: 49673 | Rating: 0 | rate:
![]() ![]() ![]() | |
Yes CPU was checked for use both places and the QC app. Didn't get any QC tasks both time I tried. Just gpu tasks. The scheduler request was for both cpu and gpu work. I know there is plenty of cpu tasks to farm out. Couldn't explain why I didn't get any work. I'll have to try again without gpu work checked I guess. Well the disk space allotted to BOINC is 10GB. Have about 8GB free for BOINC/project use. That wasn't the issue. Probably the request for both gpu and cpu at the same time. I got loaded up with gpu work on my multiple requests for cpu work along with my normal gpu work. Waiting till I clear out the gpu work and can set only cpu work requested. Will see if that makes the scheduler send me cpu work. | |
ID: 49675 | Rating: 0 | rate:
![]() ![]() ![]() | |
Hello, I am having issues with a new Ubuntu 18.04 installation. I have installed the necessary packages but I am still getting errors. Can you let me know what the issue is? Errors linked below | |
ID: 49699 | Rating: 0 | rate:
![]() ![]() ![]() | |
Lots of things don't work the same in 18.04 the way they did on 16.04. I figure the change in GTK and Python is the base cause of why compute doesn't work the same. | |
ID: 49700 | Rating: 0 | rate:
![]() ![]() ![]() | |
Hello, I am having issues with a new Ubuntu 18.04 installation. I have installed the necessary packages but I am still getting errors. Can you let me know what the issue is? Errors linked below Have you carried over the boinc dir from a previous installation? In any case, try resetting the project. | |
ID: 49701 | Rating: 0 | rate:
![]() ![]() ![]() | |
Hello, I am having issues with a new Ubuntu 18.04 installation. I have installed the necessary packages but I am still getting errors. Can you let me know what the issue is? Errors linked below Unfortunately I tried resetting the project with no luck. This is a brand new installation. gcc and libc6-dev said they were already the most recent version after I sudo apt update'd. Do I need to sudo apt upgrade after sudo apt update? | |
ID: 49702 | Rating: 0 | rate:
![]() ![]() ![]() | |
Can you try installing python-support (if not already?) | |
ID: 49703 | Rating: 0 | rate:
![]() ![]() ![]() | |
CPU tasks do not fail on my HP laptop with SuSE Leap 42.3 which is constantly updated by SuSE. They fail instead on my SUN workstation, also with SuSE Linux 42.3 which is not updated by SuSE, I don't know why. On the other hand, GPU tasks run on the GTX 750 Ti board on the SUN, giving me huge credits. | |
ID: 49705 | Rating: 0 | rate:
![]() ![]() ![]() | |
Can you try installing python-support (if not already?) I tried installing python-support and was able to get different errors! Here is the link again to the error page: http://www.gpugrid.net/results.php?hostid=480159&offset=0&show_names=0&state=5&appid= The abandons are due to completely detaching then adding project again which didn't help. | |
ID: 49706 | Rating: 0 | rate:
![]() ![]() ![]() | |
From what I can tell the error is the same, the segmentation fault in pthread. Try to play around with your gcc installation, e.g. see if you have the latest one, g++, and so on. Ubuntu 18.04 is a widespread distro so it's surprising it doesn't work. | |
ID: 49710 | Rating: 0 | rate:
![]() ![]() ![]() | |
Message boards : Multicore CPUs : Simultaneously starting MCs