Advanced search

Message boards : Number crunching : system crashing after missing "vcruntime140_1.dll" installed

Author Message
Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,331,546,800
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57458 - Posted: 4 Oct 2021 | 16:42:11 UTC
Last modified: 4 Oct 2021 | 17:17:45 UTC

I added this project to an old but working (einstein) system with gtx1070. After an hour or two I got the message about the missing 140 file. TechPowerUp has an "all in one" install so I put that in, suspended gpugrid and rebooted the system. Boinc does not run on startup so I had to start Boinc and then resume gpugrid. The system rebooted ie: it crashed, after the resume.

I had been hoping to save the data file as I rarely get any gpugrid work units.

Anytime I start boinc the system now crashes.

I went to ProgramData\boinc\slots\0 looking for restart.chk to delete it but it was not there. I did find other files such as this fragment of stderr.txt

02:51:01 (6980): wrapper (7.9.26016): starting
02:51:01 (6980): wrapper: running bin/acemd3.exe (--boinc --device 0)
Detected memory leaks!
Dumping objects ->
..\api\boinc_api.cpp(309) : {322862} normal block at 0x000001FAC4196C70, 8 bytes long.
Data: < > 00 00 12 C4 FA 01 00 00


However, I suspect a memory leak is not the same as a missing dll.

I remember seeing a similar problem like this before (crash problem) and had to boot in safe mode and delete all gpugrid files.

Amy suggestions?

[edit] Also it would be helpful if the check for the missing DLL had been done when gpugrid started, not after a couple of hours.

[edit-2] I deleted all files in slot 0 but system still rebooted when boinc starts. Gpggrid is the only project enabled. Slot 0 was filled up with stuff but that stderr files only had 2 lines in it before system crashed.
Going to poke around in boinc\projectd\www.gpugrid.net and see if I can restart the work unit w/o crashing

[edit-3] I deleted all "data" files that started with e5 as they seemed to have come from the big "zip" file. That caused the work unit to report an error and the error was uploaded to gpugrid. So I lost the WU.

Things are back to normal: Boinc start fine, Einstein and WCG are running, and possibly I may get another gpugrid file. Keeping my finger crossed.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 36,941,282,483
RAC: 47,536,110
Level
Trp
Scientific publications
wat
Message 57460 - Posted: 4 Oct 2021 | 17:28:24 UTC - in response to Message 57458.

The memory leaks message has been present in the Windows apps for a long time. It's benign and not an indication of any problem.
____________

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,331,546,800
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57461 - Posted: 4 Oct 2021 | 17:34:53 UTC - in response to Message 57460.

The memory leaks message has been present in the Windows apps for a long time. It's benign and not an indication of any problem.



I guess it is like that "printer out of paper" error message I would see occasionally over at Einstein on some failed workunits.

Ian&Steve C.
Avatar
Send message
Joined: 21 Feb 20
Posts: 1035
Credit: 36,941,282,483
RAC: 47,536,110
Level
Trp
Scientific publications
wat
Message 57463 - Posted: 4 Oct 2021 | 18:01:39 UTC - in response to Message 57461.

The message is present in Valid workunits too.

Example: https://www.gpugrid.net/result.php?resultid=32648579
____________

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,331,546,800
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57464 - Posted: 4 Oct 2021 | 18:26:07 UTC
Last modified: 4 Oct 2021 | 18:28:27 UTC

Having more problems:

1 - A reset of gpugrid did not fix the problem. I got another work unit in and it crashed the system the same way. I then removed the project from ProgramData\Boinc and any files that were identified as gpugrid. I assume that is the same as a detach. I had to do this with file explorer as the system crashes if I bring up boinc and try to detach.

2 - The reboots (crashes) of the system developed the "project url is missing from the state file" or some such wording error message from WCG. I didn't think that was a problem but unaccountably I cannot run any WCG work units for more than a few minutes as the system crashes. I was thinking this might have been the problem instead of gpugrid but I remember the system crashing the exact moment I selected "resume" for the gpugrid app. WCG was not running at that time. Nothing is ever as simple as it seems.

Currently I am running Einstein and so far have processed several work units. I will look at starting WCG back up later.

This system is indoors with A/C and does not have the problem of my mining racks.

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 186
Credit: 3,331,546,800
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 57468 - Posted: 4 Oct 2021 | 21:54:53 UTC

Follow up on problems I posted about.

The was indeed a problem with the missing dll file but the addition of gpugrid to the project list triggered other problems:

I ran x86 memtest for a long time with no errors until I realized it was only checking up to 8gb where the system had 24gb (6x4) plugged in. I pulled mem sticks until I got 12gb recognized and working. Not sure why memtest did not report a problem but I assume this contributed to the system crashes when trying to run several large apps on a 12 thread system.

Post to thread

Message boards : Number crunching : system crashing after missing "vcruntime140_1.dll" installed

//