A note about NVMe performance measurements. Measure NVMe performance using the following three tools:
1, dd command 2, fio tool 3, vdbench tool
OS: CentOS 7.7 NVMe: Micron 9100 3.2TB CPU: Xeon Gold 5117M 14Core x 2 Memory: 32GB 2400 x6 System:1029U-TN10RT(Supermicro)
Write command> Result: 1.7GB / s </ font> time dd if=/dev/zero of=/nvme/testfile bs=1024k count=8192 oflag=direct Read command> Result: 2.2GB / s </ font> time dd if=/nvme/testfile of=/dev/null bs=1024k iflag=direct
[root@localhost Downloads]# time dd if=/dev/zero of=/nvme/testfile bs=1024k count=8192 oflag=direct
8589934592 bytes (8.6 GB) copied, 4.95339 s, 1.7 GB/s
[root@localhost Downloads]# time dd if=/nvme/testfile of=/dev/null bs=1024k iflag=direct
8589934592 bytes (8.6 GB) copied, 3.85802 s, 2.2 GB/s
[root@localhost Downloads]#
On Ubuntu, you can install it with apt-get install fio.
It seems that CentOS will not bring it with yum, so I borrowed it from RIKEN. If you don't have the libpmem package, you may get an error during installation, so put it in.
#yum install libpmem-devel
#wget http://ftp.riken.jp/Linux/fedora/epel/7/x86_64/Packages/f/fio-3.1-1.el7.x86_64.rpm
#rpm -ivh fio-3.1-1.el7.x86_64.rpm
#fio <Test configuration file>
I made a sample of the test configuration file as follows. Random access is 4K and Queue Depth is deepened Sequential is 1MB and Queue Depth is shallow
File name: random.fio
[global]
bs=4k
ioengine=libaio
iodepth=32
size=1g
numjobs=16
direct=1
refill_buffers=1
runtime=60
directory=/nvme
group_reporting=1
filename=ssd.test.file
[rand-read]
rw=randread
stonewall
[rand-write]
rw=randwrite
stonewall
The result is as fast as below, but since the CPU is almost 100% used, how to control it with the actual application may be a matter of consideration. Random reading: 419,000 IO / S </ font> Random writing: 331,000 IO / S </ font>
abridgement
rand-read: (groupid=0, jobs=16): err= 0: pid=101145: Tue Apr 21 14:27:33 2020
read: IOPS=419k, BW=1637MiB/s (1716MB/s)(95.9GiB/60001msec)
lat (usec) : 50=0.01%, 100=0.01%, 250=0.01%, 500=0.04%, 750=0.30%
lat (usec) : 1000=8.95%
lat (msec) : 2=90.27%, 4=0.43%, 10=0.01%, 20=0.01%
cpu : usr=4.36%, sys=91.72%, ctx=4582969, majf=0, minf=18859
rand-write: (groupid=1, jobs=16): err= 0: pid=101170: Tue Apr 21 14:27:33 2020
write: IOPS=331k, BW=1294MiB/s (1357MB/s)(75.8GiB/60001msec)
lat (usec) : 50=0.01%, 100=0.01%, 250=0.01%, 500=0.15%, 750=0.35%
lat (usec) : 1000=0.30%
lat (msec) : 2=97.12%, 4=2.04%, 10=0.04%
cpu : usr=5.20%, sys=92.32%, ctx=1623406, majf=0, minf=13469
File name: seq.fio
[global]
bs=1024k
ioengine=libaio
iodepth=4
size=1g
numjobs=4
direct=1
refill_buffers=1
runtime=60
directory=/nvme
group_reporting=1
filename=ssd.test.file
[seq-read]
rw=read
stonewall
[seq-write]
rw=write
stonewall
The result is also fast, but it doesn't use much CPU, so it shouldn't be a problem for streaming or file server. Sequential reading: 3081MB / s </ font> Sequential writing: 2235MB / s </ font>
seq-read: (groupid=0, jobs=4): err= 0: pid=101279: Tue Apr 21 14:32:02 2020
read: IOPS=2938, BW=2939MiB/s (3081MB/s)(172GiB/60004msec)
lat (usec) : 1000=0.01%
lat (msec) : 2=0.08%, 4=19.51%, 10=79.91%, 20=0.43%, 50=0.08%
cpu : usr=0.34%, sys=25.11%, ctx=142386, majf=0, minf=13361
seq-write: (groupid=1, jobs=4): err= 0: pid=101291: Tue Apr 21 14:32:02 2020
write: IOPS=2131, BW=2132MiB/s (2235MB/s)(125GiB/60006msec)
lat (usec) : 750=0.01%, 1000=0.05%
lat (msec) : 2=2.48%, 4=15.78%, 10=74.68%, 20=7.00%
cpu : usr=23.27%, sys=10.63%, ctx=71113, majf=0, minf=16003
What if you read / write directly to the device instead of the file system? Download it here https://www.oracle.com/downloads/server-storage/vdbench-downloads.html Apart from that, Java (jre) is required, so the following is also dropped. I think the latest one is fine, but I always use the one with a proven track record because Java related things are always troublesome. jre-7u1-linux-x64.rpm
#rpm -ivh jre-7u1-linux-x64.rpm
#unzip vdbench.zip
#cd vdbench
#./vdbench -f <setting file>.prm
The configuration file random.prm for testing is as follows. seekpct = 0 is a percentage of random seek, so 100 is random and 0 is sequential. It's hard to understand.
compratio=1
*
sd=s1,lun=/dev/nvme0n1,align=4096,openflags=o_direct
*
wd=wd1,sd=(s1),xfersize=4KB,seekpct=100,rdpct=100
wd=wd1,sd=(s1),xfersize=4KB,seekpct=100,rdpct=0
*
rd=rd1,wd=wd*,iorate=max,forthreads=32,elapsed=300,interval=1
Random read execution result (workload: 1) Average: 219081.86 IO / s
Apr 21, 2020 interval i/o MB/sec bytes read resp read write resp resp queue cpu% cpu%
rate 1024**2 i/o pct time resp resp max stddev depth sys+u sys
18:06:23.050 33 188037.00 734.52 4096 100.00 0.024 0.024 0.000 0.971 0.000 4.4 9.0 3.3
18:06:24.054 34 216224.00 844.63 4096 100.00 0.024 0.024 0.000 1.337 0.000 5.2 10.3 4.3
18:06:25.046 35 217075.00 847.95 4096 100.00 0.024 0.024 0.000 1.878 0.000 5.3 10.1 4.0
18:06:26.054 36 224799.00 878.12 4096 100.00 0.024 0.024 0.000 2.129 0.000 5.5 10.5 4.3
18:06:50.048 avg_2-60 219081.86 855.79 4096 100.00 0.024 0.024 0.000 4.086 0.000 5.2 9.9 3.9
Random write execution result (workload: 1) Average: 263525.95 IO / s
Apr 21, 2020 interval i/o MB/sec bytes read resp read write resp resp queue cpu% cpu%
rate 1024**2 i/o pct time resp resp max stddev depth sys+u sys
18:16:46.088 1 155367.00 606.90 4096 0.00 0.033 0.000 0.033 0.196 0.000 5.2 6.6 1.6
18:16:47.012 2 194912.00 761.38 4096 0.00 0.033 0.000 0.033 3.979 0.000 6.3 11.2 3.8
18:16:48.052 3 274287.00 1071.43 4096 0.00 0.041 0.000 0.041 1.344 0.000 11.2 13.2 5.6
18:17:45.061 avg_2-60 263525.95 1029.40 4096 0.00 0.036 0.000 0.036 5.737 0.000 9.6 12.2 4.7
It saves the log in HTML format in the Output directory for easy viewing. You can also see the latency in the histogram.
min(ms) < max(ms) count %% cum%% '+': Individual%; '+-': Cumulative%
0.000 < 0.020 7,578,813 58.6 58.6 +++++++++++++++++++++++++++++
0.020 < 0.040 4,734,960 36.6 95.3 ++++++++++++++++++-----------------------------
0.040 < 0.060 90,903 0.7 96.0 -----------------------------------------------
0.060 < 0.080 20,341 0.2 96.1 -----------------------------------------------
0.080 < 0.100 15,663 0.1 96.2 ------------------------------------------------
0.100 < 0.200 483,566 3.7 100.0 +------------------------------------------------
How about measuring with 6 NVMe luxuriously?
The configuration file looks like this.
sd=s1,lun=/dev/nvme2n1,align=4096,openflags=o_direct
sd=s2,lun=/dev/nvme3n1,align=4096,openflags=o_direct
sd=s3,lun=/dev/nvme4n1,align=4096,openflags=o_direct
sd=s4,lun=/dev/nvme5n1,align=4096,openflags=o_direct
sd=s5,lun=/dev/nvme4n1,align=4096,openflags=o_direct
sd=s6,lun=/dev/nvme5n1,align=4096,openflags=o_direct
*
wd=wd1,sd=(s1),xfersize=1024KB,seekpct=0,rdpct=100
wd=wd2,sd=(s2),xfersize=1024KB,seekpct=0,rdpct=100
wd=wd3,sd=(s3),xfersize=1024KB,seekpct=0,rdpct=100
wd=wd4,sd=(s4),xfersize=1024KB,seekpct=0,rdpct=100
wd=wd5,sd=(s4),xfersize=1024KB,seekpct=0,rdpct=100
wd=wd6,sd=(s4),xfersize=1024KB,seekpct=0,rdpct=100
*
rd=rd1,wd=wd*,iorate=max,forthreads=4,elapsed=60,interval=1
The result is, 18GB / s! </ Font> It is so easy to exceed 10GB / s. .. .. I felt that 4K editing and 8K editing seemed to be cool.
Apr 21, 2020 interval i/o MB/sec bytes read resp read write resp resp queue cpu% cpu%
rate 1024**2 i/o pct time resp resp max stddev depth sys+u sys
15:38:59.052 55 18509.00 18509.00 1048576 100.00 1.943 1.943 0.000 2.933 0.000 36.0 2.9 2.2
15:39:00.052 56 18592.00 18592.00 1048576 100.00 1.935 1.935 0.000 4.156 0.000 36.0 2.5 2.2
15:39:01.054 57 18552.00 18552.00 1048576 100.00 1.939 1.939 0.000 2.804 0.000 36.0 2.6 2.3
15:39:02.054 58 18590.00 18590.00 1048576 100.00 1.937 1.937 0.000 2.914 0.000 36.0 2.7 2.3
15:39:03.048 59 19280.00 19280.00 1048576 100.00 1.939 1.939 0.000 2.756 0.000 37.4 2.7 2.3
15:39:04.053 60 17825.00 17825.00 1048576 100.00 1.940 1.940 0.000 2.741 0.000 34.6 2.6 2.3
15:39:04.083 avg_2-60 18524.08 18524.08 1048576 100.00 1.941 1.941 0.000 6.164 0.000 36.0 2.7 2.2
Random performance will also improve as you increase the workload! With Random Read, over 3,500,000 IO / s </ font> ...! ?? However, be careful as the CPU will go straight to the red zone. If I raised it too much, I got a Warning.
Apr 21, 2020 interval i/o MB/sec bytes read resp read write resp resp queue cpu% cpu%
rate 1024**2 i/o pct time resp resp max stddev depth sys+u sys
15:32:41.052 55 3571729.00 13952.07 4096 100.00 0.050 0.050 0.000 19.540 0.000 179.5 87.8 61.1
15:32:42.052 56 3602147.00 14070.89 4096 100.00 0.050 0.050 0.000 14.770 0.000 180.5 87.6 61.7
15:32:43.051 57 3623401.00 14153.91 4096 100.00 0.050 0.050 0.000 18.074 0.000 181.1 87.7 61.9
15:32:44.051 58 3591990.00 14031.21 4096 100.00 0.050 0.050 0.000 19.873 0.000 179.6 87.6 61.6
15:32:45.051 59 3739079.00 14605.78 4096 100.00 0.050 0.050 0.000 12.645 0.000 186.6 87.7 61.7
15:32:46.053 60 3453817.00 13491.47 4096 100.00 0.050 0.050 0.000 14.438 0.000 172.7 87.8 61.5
15:32:46.096 avg_2-60 3613343.63 14114.62 4096 100.00 0.050 0.050 0.000 29.592 0.000 180.5 87.7 61.8
15:32:46.097 * Warning: average processor utilization 87.68%
15:32:46.097 * Any processor utilization over 80% could mean that your system
15:32:46.097 * does not have enough cycles to run the highest rate possible