Posts by Retvari Zoltan

31) Message boards : News : Experimental Python tasks (beta) - task description (Message 58340)
Posted 831 days ago by

I do the following dance whenever setting up BOINC from Ubuntu Software or LocutusOfBorg:

Join the root group: sudo adduser (Username) root
• Join the BOINC group: sudo adduser (Username) boinc
• Allow access by all: sudo chmod -R 777 /etc/boinc-client
• Allow access by all: sudo chmod -R 777 /var/lib/boinc-client

By doing so, you nullify your system's security provided by different access rights levels.
This practice should be avoided by all costs.

32) Message boards : Number crunching : Managing non-high-end hosts (Message 58070)
Posted 886 days ago by

Retvari Zoltan

I've noticed that the present acemd3 app does not use a full CPU core (thread) on Windows while it does on Linux. There's a discrepancy between the run time and the CPU time, also the CPU usage is lower on Windows.

hm, I acutally cannot confirm, see here:

e7s141_e3s56p0f226-ADRIA_BanditGPCR_APJ_b1-0-1-RND0691_0 27100764 588817 10 Dec 2021 | 12:29:46 UTC 11 Dec 2021 | 5:50:18 UTC Fertig und Bestätigt 31,250.27 31,228.75 420,000.00 New version of ACEMD v2.19 (cuda1121)

The discrepancy is smaller in some cases, perhaps it depends on more factors than the OS. Newer CPUs show less discrepancy. I'll test it with my E8500. Now I'm using Windows 11 on it, but I couldn't get a new workunit yet. My next attempt will be with Linux.

33) Message boards : Number crunching : The hardware enthusiast's corner (Message 58069)
Posted 886 days ago by

Retvari Zoltan

I find it odd how many computer trays or cases align the PSU screw holes to have the PSU fan suck in the hot air from the computer. I use open trays and always point my PSU fan away from the motherboard.

That odd PSU orientation is inherited from the era when the only active cooled component in a PC was the PSU. Back then the PSUs had about 60-70% efficiency, which was quite good compared to regulated linear PSUs. (not used in IBM PCs as far as I remember, but used in home computers like Sinclair ZX series and the comparable Commodore series.)

34) Message boards : Number crunching : Managing non-high-end hosts (Message 58065)
Posted 887 days ago by

Retvari Zoltan

The task is finihed successfully in 12h 35m 23s.
On a Core i3-4xxx it takes about 12h 1m 44s, so it take 34m more for the Core2Duo, it was only 4.6% slower than an i3-4xxx.
I've noticed that the present acemd3 app does not use a full CPU core (thread) on Windows while it does on Linux. There's a discrepancy between the run time and the CPU time, also the CPU usage is lower on Windows.

35) Message boards : Number crunching : Managing non-high-end hosts (Message 58055)
Posted 887 days ago by

Retvari Zoltan

I was lucky again, the host received another workunit and it's running just fine for 90 minutes. (it needs another 12 hours to complete).
The Core2Duo is definitvely struggling to feed the GTX1080Ti (the GPU usage has frequent deep drops), but I don't think it will run into that "Error invoking kernel: CUDA_ERROR_LAUNCH_TIMEOUT (702)" error. We'll see. I've tried to maximize GPU usage by changing process affinities and the priority of the acemd3.exe, making not much difference.

36) Message boards : Number crunching : The hardware enthusiast's corner (Message 58054)
Posted 887 days ago by

Retvari Zoltan

double post...

37) Message boards : Number crunching : The hardware enthusiast's corner (Message 58053)
Posted 887 days ago by

Retvari Zoltan

I think you need a good oscilloscope and a set of artificial variable loads to test a PSU. When a PSU is about to fail (making the PC unreliable), its DC output voltages are probably still within the specification under constant load (a running PC is not a constant load, even when it's idle). The unreliability is caused by the larger spikes (transients) on the DC voltage when a load (for example a GPU) is turned on or off. Such a test equipment (and the expertise to conduct the test) is way more expensive than the extra cost of a reliable PSU, however if you have an oscilloscope, you can test a live system (very carefully, as in the worst case you can break the components which is way more expensive than the PSU).
The unreliability of the PSU usually comes from the aging of the elecrtolytic (or similar) capacitors in it, as their capacity degrades very quickly when the eletcrolyte leaks or evaporates (or both) from them, making the switching elements switch more frequently, which is raising the temperature inside the PSU, which makes the elecrolyte evaporate even faster. The leaking elecrolyte usually leaves visible signs, so a technician can see which capacitor has to be replaced by new ones once the PSU is opened up (but it's a dangerous process, as there are high voltages inside the PSU even days after it's been unpugged from the wall outlet).
There are many DC-DC converters (it's similar to the PSU) on almost every component of a PC (on the motherboard for the CPU and the memory and the chipset, on the GPU, in the SSD drive etc.), as modern chips need very low voltages (around 1V) and high current. For achieving the highest possible efficiency these voltages are converted from the 12V rail (even the PSU itself supplies the 5V and the 3.3V through DC-DC converters from its own inside 12V rail). The electrolythic capactors of these DC-DC converters (if they need such) age the same way as those inside a PSU, but they are much harder to replace (as the motherboards are designed to spread the heat very well, so it's impossible to desolder a single component from them without special equipment). So an aging motherboard can tolerate lower spikes (and ripple) than the same mb when it was brand new.
A general advice for selecting PSUs for crunching:
The PSU should have at least 5 years warranty (the more the merrier).
The PSU should have at least 80+ Gold certification.
The maximum output power rating of the PSU should be 180-200% of the constant load of the cruncher PC.

38) Message boards : Number crunching : Managing non-high-end hosts (Message 58049)
Posted 887 days ago by

Retvari Zoltan

However, meanwhile my suspicion is that the old processor
Intel(R) Core(TM)2 Duo CPU E7400 @ 2.80GHz
may be the culprit.

I've reanimated (that was quite an adventure on its own) one ancient DQ45CB motherboard with a Core2Duo E8500 CPU in it, and I've put a GTX 1080Ti in it to test with GPUGrid, but there's no work available at the moment. You can follow the unfolding of this adventure here.
EDIT: I've managed to receive one task...
EDIT2: It failed because I've forget to install the Visual C++ runtime :(

39) Message boards : Number crunching : Managing non-high-end hosts (Message 58022)
Posted 891 days ago by

Retvari Zoltan

could it be that a GTX1650 is not able to crunch the current series of tasks?

I don't think so.

Today, for the first time I tried GPUGRID on my host with a GTX1650 inside, and the task failed after some 4 hours

I can suggest only the ususal:
check your GPU temperatures, lower the GPU frequency by 50MHz.

40) Message boards : Graphics cards (GPUs) : Ampere 10496 & 8704 & 5888 fp32 cores! (Message 58017)
Posted 896 days ago by

Retvari Zoltan

... the GPUGRID app must be coded with a fair number of INT operations which cut into the FP cores available (half of the FP cores are actually shared FP/INT cores and can only do one type of operation at a time, while the other half are dedicated FP32).

Perhaps MD simulations don't rely on that many INT operations, so it's independent from the coder and from the cruncher's wishes (demands).

Einstein scales much better with FP core count on Ampere ...

The Einstein app is not a native CUDA application (it's openCL), it's not good at utilizing (previous) NVidia GPUs, making this comparison inconsequential regarding the GPUGrid app performance improvement on Ampere. It's the Ampere architecture that saved the day for the Einsten app, so if the Einstein app would be coded (in CUDA) the way it could run great on Turing also, you would see the same (low) performance improvement on Ampere.

all depends on how the app is coded to use the hardware, and sometimes you can't make an app totally optimized for a certain GPU architecture depending on what kinds of computations you need to do or what coding methods you use.

Well, how the app is coded depends on the problem (the research area) and the methodology of the given research, and the program language, which is chosen by the targeted range of hardware. The method is the reason for the impossibility of "the GPUGRID app must be coded with a fair number of INT operations" demand. (FP32 is needed to calculate trajectories.) The targeted (broader) range of hardware is the reason for Einstein is coded in openCL, resulting in lower NVidia GPU utilization on previous GPU generations.

Previous 10 | Next 10

	About	Science	Volunteers	Performance	Forum	Join us	Donate