I think this is the correct command based on your suggestion. Still looking bad.
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=write
test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=425MiB/s][w=109k IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=16392: Sat May 22 18:01:19 2021
write: IOPS=104k, BW=406MiB/s (426MB/s)(4096MiB/10091msec); 0 zone resets
bw ( KiB/s): min=254008, max=452392, per=99.95%, avg=415420.20, stdev=41903.25, samples=20
iops : min=63502, max=113098, avg=103855.15, stdev=10475.77, samples=20
cpu : usr=19.58%, sys=51.34%, ctx=976647, majf=0, minf=103
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=0,1048576,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
WRITE: bw=406MiB/s (426MB/s), 406MiB/s-406MiB/s (426MB/s-426MB/s), io=4096MiB (4295MB), run=10091-10091msec
Disk stats (read/write):
nvme0n1: ios=2525/1029235, merge=0/4, ticks=146/48270, in_queue=48429, util=99.07%