I have a very odd situation. I have two plotting systems that are absolutely the same. All parts, software and configuration are exactly the same (except Madmax). One of them does a plot in 2430 seconds and the other does it in 2622. The only difference that I can find is the version of Madmax. And the only reason I know there is a difference in the version is that on the slower one, the “Total plot creation time” line contains a minute conversion at the end (example: “Total plot creation time was 2622.49 sec (43.7082 min)”. The one that is running faster does not have that minute entry.
What is going on?
Just a few thoughts for debugging…
MadMax is still under rapid development, maybe just purge on both and install the latest?
Did you check SSD / CPU temps while plotting? This could give you a hint why one would under perform.
Yes, I have checked the temps. Both systems are running cool (cpu and nvme). I would hate to go and load the latest version and find that now they both run slower. Since the older version is the one currently running faster.
Then checkout the older version on the slow machine and see if it speeds up. Then take a look at git diff and see what changed.
Check the Networkconnections of the machines. Maybe the slower one needs longer to transfer the plot? As plotting and copying is done in parallel, maybe it slows down the plotting?
Are the processes running on both machines identical?
The network transfer times are the same. The processes are the same. One odd thing I just noticed when looking through task manager is that the slow one is using more GPU. The fast one is almost always at 0-1% GPU but the slow one is using 10-15% constantly. The process using the GPU is Desktop Window Manager. That process is also using 1.5% CPU on the slow one and next to nothing on the fast one. I just don’t know why it is different. There are no extra windows open. Like I have said, they are the same. The only window open is the Ubuntu CLI.
Check the bucket size between both versions. New version uses 256 and old version use 128 buckets (as I know). this could be the important change
Switch over to a real linux system would be the first I would do…
The command running the plots is the same (buckets and all). But I think I just found the culprit in my GPU usage. Google Remote Desktop. I use it to control the systems from my home office and I had it open all the time. I guess the one on the slow system had a problem and started to use GPU/CPU. I closed the interface and it all cleared up when I opened it back up. I’m going to leave the remote interface closed today and see if that makes a difference. It is possible that the 1.5% CPU that was tied up by that process could have been making the difference.
Isn’t 200 seconds over 2600 seconds just noise? Maybe one machine was doing some background process that the other machine wasn’t?
Not when it is consistent. Not just a 200-400 swing that averages at my stated number. It is constant. I will check in a couple hours to see if closing that remote console helped.
I’ve got a second server coming very similar spec to the first one, will be interesting to see how they compare with each other.
Well crap. That was it. Google Remote Desktop was screwing me over. After I closed the remote session the process “Desktop Window Manager” dropped the CPU and GPU usage to 0 (or close to it). I left it running for a cycle and sure enough I am now getting the 2400 second plots that match the other system. Geez. Well, now I know. Don’t leave Google Remote Desktop running for days at a time.
Thank you to @GavtronZERO for pointing me in the right direction. I wouldn’t have keyed onto that without first walking through the process list. That is when I noticed the GPU usage.
@WolfGT glad you got it sorted!
What do people use to remotely monitor? I’m running remote desktop connections for a window pc to connect to multiple ubuntu plotting pcs. I glance at the trace line of cpu usage in the System Monitor to make sure that the plotter is working.
Works perfectly fine but wondering if there is a better way.
Very interesting. I wouldn’t have expected a 10% hit from having Remote Desktop open. Was planning to use this, or something similar.
I have this problem all the time when GPU mining and remoting in to a desktop. Make sure you use the onboard video and not the GPU to power the remote display
It is normally not an issue. I guess because I left the remote session open for days something went wrong. After closing it, everything is good. Even opening the remote session back up doesn’t cause an issue.
Check memory temp.
Have the very similar symptoms.
One of the comps was slower or even stuck.
put memory close to the fan - and now 2216.64sec (old version)/2235.31 (new version)
Both comps are absolutely identical.
Hm… I think, that in high temp ECC memory make a few read/wright try if find the errors. It takes time.