Insights, comments, tips and random ramblings.
The performance impact for software based storage systems is not entire known yet.
Reports range from minimal performance impact to a whopping 45% drop on some Lustre setups.
This article compares scrub performance on a ZFS pool with the patches enabled and disabled.
It's not an exhaustive test by any means and really only tests ready heavy performance. Comparing every IO profile will take some time.
Recent longterm maintenance Linux kernels have been patched to address the vulnerabilities.
The Meltdown issue is addressed by the kernel page-table isolation (KPTI) feature. This is the PAGE_TABLE_ISOLATION option when building the kernel.
The Spectre issue is addressed by the Retpoline feature. This is the RETPOLINE option when building the kernel.
To check for the patch to address the Meltdown vulnerability:
dmesg | grep -i 'page tables'
Kernel/User page tables isolation: enabled
To check for the patch to address the Spectre vulnerability:
dmesg | grep -i 'spectre'
Spectre V2 mitigation: Vulnerable: Minimal generic ASM retpoline
Our test involved scrubbing a pool (basically a full integrity check of the data) on one of our internal backup servers.
The storage disks are 12 x 3.0 TB WD Red WD30EFRX. The pool layout is RAID-Z2 with a total of 20.7 TB of data.
All incoming traffic to the server was disabled while the test was running. Other than the scrub itself, all other IO was disabled.
Linux kernel 4.14.14 was used with ZFS on Linux 0.7.5.
To disable KPTI, add the "nopti" option to Grub:
mkdir /etc/default/grub.d echo 'GRUB_CMDLINE_LINUX="nopti"' > /etc/default/grub.d/disable-kpti.cfg update-grub
Then reboot the server.
The scrub took just over 9 hours to complete.
The read values are are MB/s.
|Test||Time||Read Peak||Read Avg|
KPTI off. Extra 7 minutes of idle IO at the end to make the x-axis scale the same between the 2 graphs.
With KPTI enabled, the read speed was 1.54% slower.
This increased the time to complete the scrub by nearly 7 minutes.
KTPI is easily disabled should you find the performance drop is unacceptable.
For now, we are going to enable the patches on our internal production servers.