Questions about adding plots or maybe replotting existing

Toaster_Potato · January 30, 2024, 2:43am

Finally had time to get the system installed with Linux and do all of the steps you provided. Got everything done, including missing the part of sudo and then figuring out how to fix sudo. But I tried running the command before I left for the day and just got syntax errors. I will post the command I am trying specifically and hopefully you can help.

Posting this again for my own sanity as well, hopefully I have time tomorrow to give this another go to get running.
~/bladebit/build/bladebit_cuda -c xch123 -f abcdef -z 4 -n 1 cudaplot --disk-128 -t1 /tmp /media/$USER/USB-DRIVE-LABEL/folder/

I also don’t have 128GB in the system yet, it was still at home but I am taking that in tomorrow. I was ok with the system exploding with not enough ram once just to see if it even outputs correctly or not. So figuring out if it will post with the 128GB and then figuring out the error in the command or my install, should be all that is left.

Ronski · January 30, 2024, 6:38am

I always open terminal in the folder that contains Bladebit, then you need to enter ./bladebit_cuda and the rest of your command.

dctech · January 30, 2024, 2:41pm

try running ~/bladebit/build/bladebit_cuda --help in terminal to check if BB is compiled and executes.

Regarding the paths, /tmp that I used in the example is your Linux OS temp path which although would likely work (if it has enough free space) is not the best choice as it is likely Ext4 partition. You may want to format a dedicated partition on NVMe with F2FS as that gave me best results. As for final output path at the end of the command it can be anything as long as it is mapped and has enough space. In my example I’m pointing to the default path where external (ex: USB) media is mapped on Debian which is /media/$USER/… ($USER is automatically replaced with the username of the currently logged in user). The “USB-DRIVE-LABEL” is just a placeholder for the mount point and it will be different depending on how the external HDD is setup. What I do is format external HDDs with Ext4 and assign a label to the single partition on the HDD and that label is then automatically used by OS when mounting the drive. You have to decide what FS you use on the final destination which will depend on whether it is an external drive and if so then do you want to plug it in to Win PC at some point or just Linux? If you do want to use it with Win then I suggest formatting with ExtFS (stay away from NTFS even though you could mount it but there is additional over head with it on Linux). If formatting external HDD make sure you are using options for the chosen FS that result in least amount of space wasted. You do not need any reserved space on the FS hosting the plots so with Ext4 I typically use the following when formatting the HDD: sudo mkfs.ext4 -m 0 -T largefile4 -L <partition label, ex: XCH01> /dev/<drive identifier>

To find simply run sudo lsblk in terminal and it will be in the first/name column.Then if you want to get more details about the partitions or drive layout you can do sudo fdisk -l /dev/<drive identifier>. You can install GParted from Software app and use that to prepare your external HDD then Disks app to take ownership of the partition (required for Ext4 and other Linux based FS) and to mount the partition if you prefer GUI.

Jacek · January 30, 2024, 8:47pm

The error suggests that you may have not enough VRAM (on GPU), not RAM (in your box). Not sure what GPU you have, maybe it doesn’t have 6GB? Try installing nvtop, as it will show you how your GPU is utilized.

As mentioned by others, /tmp folder is your OS disk (most likely). It is preferred to not use the OS disk for plotting tmp folder. A dedicated NVMe (on PCIe card, if there is no slot on mobo) will make a big difference for non-full-RAM plotting.

By the way, GH can plot with 64 GB RAM and 4 GB VRAM (-S 2 switch). By default, it wants 6 GB VRAM, and of course 128 GB RAM will make huge difference.

Toaster_Potato · January 30, 2024, 8:49pm

It is a 3070 and the only drive in the system is a 2TB PCI 4.0 Saberant NVME M.2 should have more then enough speed and room to plot from?

Jacek · January 30, 2024, 9:11pm

3070 has most likely 12 GB, so should be more than plenty. Not sure why the error suggests VRAM problem. Install nvtop, it will give you some info about how your card is running (bus, PCI lines, … as well as how VRAM is used).

That NVMe is also more than plenty. Could you run df -h to check what is behind that /tmp folder, does it point to NVMe?

Also, if you have problems with nvtop, that may suggest that there may be a driver issue with your card. Maybe this is the reason that VRAM allocation fails (if the error is what it really is).

Toaster_Potato · January 30, 2024, 10:33pm

It’s an 8gb card when I shut down the system it asked to update something, I said yes and now Nvidia gives an error on boot and won’t let the system finish booting… Linux is so much fun. I’ll just do a complete reinstall tommarow if the 128gb ram posts. If not then I guess it’s back to the drawing board anyways

dctech · January 30, 2024, 10:52pm

Tell me what error you are seeing on boot. BTW, if you add a mount point to fstab file and that location/drive is not found during boot (ex: it was USB drive which is disconnected) you will get a hard stop at boot as any mounts added to fstab are presumed to be fixed.

regarding the 96GB stick, did you update the MB BIOS? Does the version of your BIOS support these new modules? BIOS update is definitely required for 96GB support and if that was done but you are still having issues then try booting with just the 96GB in DIMM 1 (I think the one furthest away from the CPU). If this is Ryzen then RAM training on boot can take a long time (>10min) during first boot so turn it on and come back in 30min as it may have just been training the RAM for the first time.

Regarding the BB memory error it is 100% due to not enough system RAM as it even tells you how much host RAM it will need in the screenshot. You can try the --disk-16 flag instead of --disk-128 to check if the 16GB RAM mode will work for you on your GPU but it did not work for me on GTX 1080. Lastly you can add --benchmark flag to test performance without writing to final destination. This flag needs to be added right after bladebit_cuda (ex: ~/bladebit/build/bladebit_cuda --benchmark -c ...) and all the other parameters are required as-is including final destination even though it will not be used.

dctech · January 30, 2024, 10:59pm

As for tmp, you can use the same NVMe as the OS but I would portion it so that OS takes 50% of space so that you can use the other 50% of your NVMe for the dedicated tmp. I have a 4TB NVMe setup exactly like that with F2FS on the 50% used for plotting tmp.

Toaster_Potato · January 31, 2024, 12:45am

Ill have to check tomorrow again but it was something like Nvidia persistence module not found or something like that.

Yeah it has the newest bios and it is 2x48gb sticks and 2x16GB sticks. Both kits work by themselves but not together. DDR5 is weird for both AMD and Intel in 4 slot configs. So on top of that issue, plus mixing the new memory modules with old ones I am hoping is the issue. Ill try 4x32GB tomorrow when it comes in. I was trying to find 2x64GB mods but with the new modules out now, I don’t think they make them anymore?! I think they make something like really dam big like 2x 128GB now, but that is excessively expensive compared to the 96 or 128gb kit. (Below I posted corsairs DDR5 ram kits in sizes and availability, I am sure gskill is similar as well.)

I have also been super sick the past few days, so I only have so much patience with stuff not working before just not being able to deal with it and going back to things that actually pay me lol. I do really appreciate your guys help though.

Could not having a USB stick connected really break linux? Maybe i’ll try that first as well because the very first error is something not mounted with USB then goes into Nvidia errors.

I should also probably disable the integrated graphics in bios so that is not put into the install at all maybe?

Here’s gskill (I only have 4 slots, no thread ripper or xeon here lol)

drhicom · January 31, 2024, 12:48am

Your going to wear out your OS drive and it will slow down the process, as the data base grows that will use up space also and you still need to keep syncing…

Toaster_Potato · January 31, 2024, 12:52am

Idk about DCtech but my OS drive for the plotter is sacrificial at this point with 1PB written already. Still benchmarks like the day it was brand new. I am not trying to get world record plots from my plotting box. Just enable lower compression without going to noSSD or giving up block rewards to max.

My farm box is a completely separate computer.

Jacek · January 31, 2024, 1:57am

This is exactly how Flash memory works. You cannot tell the health of the Flash memory by running read/write benchmarks.

By writing (changing cell content) performance is not affected at all. The damage is on the gate level. It is a gradual gate degradation that causes increased cell charge leakage. Basically, Flash will hold the data for much shorter periods (if not powered up). For a plotting drive, even if it will drop the data, once powered up and formatted, will still perform the same as the brand new one. For the OS drive, it quickly gets painful. Therefore, the TBW for an SSD doesn’t really reflect degradation in performance, but rather time duration that data will be kept on it if not powered up.

As far as that GPU driver, when you run apt update / upgrade (CLI), it will let you know whether there are some older drives still hanging to their dear life, and how to purge those. Once you run the purge / reboot, things may be smooth for some time.

As far as fstab, if drives that are declared there are missing, the OS may get some fits during the boot process. For me, this is the main reason to not have any plot drives in fstab, rather have them mounted from a script file. I run all my Linux boxes headless (xRDP), and if anything like that happens, the box suddenly needs a mouse/keyboard/monitor to access single user terminal.

Ronski · January 31, 2024, 10:17am

Good luck with that I spent days trying to purge drivers, to the point that nothing showed up with the various checks I was advised to run. Fresh install solved the problem, given it’s just a plotter this route is probably quicker and easier, it certainly was in my case.

dctech · January 31, 2024, 3:20pm

This is just a plotter and not farmer plus my 4TB NVMe has a high TBW so already done reploting all my >9k plots.

Toaster_Potato · January 31, 2024, 5:56pm

and @Ronski well the system posts with the 32gbx4 sticks. I’m just going to do a fresh install because I am sure there is probably something I missed in the original instructions and reinstall sounds easier then trying to debug linux with no knowledge of it.

Jacek · January 31, 2024, 8:18pm

I think that we are talking about different things. You were manually removing and reinstalling drivers, where I was talking about doing apt cleanup after an [auto] update where no adding / removing the currently active driver happens.

Also, if you recall, I couldn’t make the non-LTE Ubuntu working with nvidia drivers as well. After that installed the latest LTE, but that was basically an unstable system for the GH plotter. Finally, ended up getting one before last LTE version working (still on it).

What it also means is that potentially straight Debian is the least friendly for a new Linux user. Also, its nvidia drivers may not be the latest (for stability reasons). Therefore, either Ubuntu or Mint would be a better choice. All those Debian distros are more or less he same on CLI, so everything written in this thread will work. However, the GUI parts may be more friendly and forgiving to the new users.

Toaster_Potato · January 31, 2024, 8:24pm

So I can load into the recovery where I can enter commands, what command do you want me to run before just redoing all of it? If it gets back back to where I was, i would be ecstatic.

You gotta give me the full command like a gimp child though lol.

Jacek · January 31, 2024, 8:39pm

Sure, first check whether you are at the latest (sudo apt update). Below is what I got running it right now:

llama@llama:~$ sudo apt update
[sudo] password for llama:
Get:1 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB]
Hit:2 http://us.archive.ubuntu.com/ubuntu jammy InRelease
Get:3 http://us.archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB]
Get:4 http://security.ubuntu.com/ubuntu jammy-security/main amd64 Packages [1,109 kB]
Hit:5 http://us.archive.ubuntu.com/ubuntu jammy-backports InRelease
Get:6 http://us.archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [1,325 kB]
Get:7 http://security.ubuntu.com/ubuntu jammy-security/main i386 Packages [393 kB]
Get:8 http://security.ubuntu.com/ubuntu jammy-security/main Translation-en [207 kB]
Get:9 http://security.ubuntu.com/ubuntu jammy-security/universe i386 Packages [588 kB]
Get:10 http://security.ubuntu.com/ubuntu jammy-security/universe amd64 Packages [837 kB]
Get:11 http://security.ubuntu.com/ubuntu jammy-security/universe Translation-en [160 kB]
Get:12 http://us.archive.ubuntu.com/ubuntu jammy-updates/main i386 Packages [560 kB]
Get:13 http://us.archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1,037 kB]
Fetched 6,446 kB in 2s (3,074 kB/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
16 packages can be upgraded. Run 'apt list --upgradable' to see them.
llama@llama:~$

That takes few secs. If you have that last line there as well, run sudo apt list --upgradable

llama@llama:~$ sudo apt list --upgradable
Listing... Done
gjs/jammy-updates 1.72.4-0ubuntu0.22.04.1 amd64 [upgradable from: 1.72.2-0ubuntu1]
libgjs0g/jammy-updates 1.72.4-0ubuntu0.22.04.1 amd64 [upgradable from: 1.72.2-0ubuntu1]
libnss-systemd/jammy-updates 249.11-0ubuntu3.12 amd64 [upgradable from: 249.11-0ubuntu3.11]
libpam-systemd/jammy-updates 249.11-0ubuntu3.12 amd64 [upgradable from: 249.11-0ubuntu3.11]
libssl3/jammy-updates 3.0.2-0ubuntu1.13 amd64 [upgradable from: 3.0.2-0ubuntu1.12]
libssl3/jammy-updates 3.0.2-0ubuntu1.13 i386 [upgradable from: 3.0.2-0ubuntu1.12]
libsystemd0/jammy-updates 249.11-0ubuntu3.12 amd64 [upgradable from: 249.11-0ubuntu3.11]
libudev1/jammy-updates 249.11-0ubuntu3.12 amd64 [upgradable from: 249.11-0ubuntu3.11]
linux-firmware/jammy-updates,jammy-updates 20220329.git681281e4-0ubuntu3.26 all [upgradable from: 20220329.git681281e4-0ubuntu3.24]
openssl/jammy-updates 3.0.2-0ubuntu1.13 amd64 [upgradable from: 3.0.2-0ubuntu1.12]
python3-pil/jammy-updates,jammy-security 9.0.1-1ubuntu0.2 amd64 [upgradable from: 9.0.1-1ubuntu0.1]
systemd-oomd/jammy-updates 249.11-0ubuntu3.12 amd64 [upgradable from: 249.11-0ubuntu3.11]
systemd-sysv/jammy-updates 249.11-0ubuntu3.12 amd64 [upgradable from: 249.11-0ubuntu3.11]
systemd-timesyncd/jammy-updates 249.11-0ubuntu3.12 amd64 [upgradable from: 249.11-0ubuntu3.11]
systemd/jammy-updates 249.11-0ubuntu3.12 amd64 [upgradable from: 249.11-0ubuntu3.11]
udev/jammy-updates 249.11-0ubuntu3.12 amd64 [upgradable from: 249.11-0ubuntu3.11]
llama@llama:~$

In this one, do you see anything about nvidia drivers (just the first word on the line should tell).

After that you can run apt upgrade

llama@llama:~$ sudo apt upgrade
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
Get more security updates through Ubuntu Pro with 'esm-apps' enabled:
  libpostproc55 libavcodec58 libavutil56 libswscale5 xrdp libswresample3
  libavformat58 libavfilter7
Learn more about Ubuntu Pro at https://ubuntu.com/pro
The following packages have been kept back:
  gjs libgjs0g
The following packages will be upgraded:
  libnss-systemd libpam-systemd libssl3 libssl3:i386 libsystemd0 libudev1 linux-firmware openssl python3-pil systemd systemd-oomd systemd-sysv
  systemd-timesyncd udev
14 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.
1 standard LTS security update
Need to get 279 MB of archives.
After this operation, 21.6 MB of additional disk space will be used.
Do you want to continue? [Y/n]

Don’t press ‘y’ yet, just post what you got there. We will be still looking for nvidia and whether there is something to be removed - especially this line:

14 upgraded, 0 newly installed, 0 to remove and 2 not upgraded.

Anyway, once you hit that question about proceeding, hit ‘y’ and let it go. For me, that helped me on Ubuntu side already a couple of times (my system was not bricked, but nvidia drives were screwed up).

If that doesn’t help, I would as well strongly suggest giving up on Debian.

Jacek · January 31, 2024, 11:22pm

Looks like your Ethernet card may not be working. Could you try ‘ping google.com’

Although, I don’t know that much about Linux, maybe in a recovery console the Ethernet is not available.

Still, if you got to that console on reboot, could you get a screen capture of ‘cat /etc/fstab’. Maybe that one is holding you back.

I guess, events like that make most of the folks super excited about running Linux and desktop penetration already reached maybe 2% (after 20-30 years of working on it and creating countless distros). I do love headless Linux and think that is the best thing out there. However, I also cannot stomach Linux desktop, regardless of flavors.