Zetavault Blog

Insights, comments, tips and random ramblings.

ZFS Performance With Meltdown And Spectre Patches

22nd January 2018

Introduction

The Linux kernel has recently been patched to address the Meltdown and Spectre vulnerabilities.

The performance impact for software based storage systems is not entire known yet.

Reports range from minimal performance impact to a whopping 45% drop on some Lustre setups.

This article compares scrub performance on a ZFS pool with the patches enabled and disabled.

It's not an exhaustive test by any means and really only tests ready heavy performance. Comparing every IO profile will take some time.

Checking The Kernel Is Patched

Recent longterm maintenance Linux kernels have been patched to address the vulnerabilities.

The Meltdown issue is addressed by the kernel page-table isolation (KPTI) feature. This is the PAGE_TABLE_ISOLATION option when building the kernel.

The Spectre issue is addressed by the Retpoline feature. This is the RETPOLINE option when building the kernel.


To check for the patch to address the Meltdown vulnerability:

dmesg | grep -i 'page tables'

Should output:

Kernel/User page tables isolation: enabled

To check for the patch to address the Spectre vulnerability:

dmesg | grep -i 'spectre'

Should output:

Spectre V2 mitigation: Vulnerable: Minimal generic ASM retpoline

The Test

Our test involved scrubbing a pool (basically a full integrity check of the data) on one of our internal backup servers.

The storage disks are 12 x 3.0 TB WD Red WD30EFRX. The pool layout is RAID-Z2 with a total of 20.7 TB of data.

All incoming traffic to the server was disabled while the test was running. Other than the scrub itself, all other IO was disabled.

Linux kernel 4.14.14 was used with ZFS on Linux 0.7.5.

To disable KPTI, add the "nopti" option to Grub:

mkdir /etc/default/grub.d
echo 'GRUB_CMDLINE_LINUX="nopti"' > /etc/default/grub.d/disable-kpti.cfg
update-grub

Then reboot the server.

Results

The scrub took just over 9 hours to complete.

The read values are are MB/s.

Test Time Read Peak Read Avg
KPTI off 09:22:10 1695.33 645.81
KPTI on 09:29:02 1710.32 635.89

KPTI off

KPTI off. Extra 7 minutes of idle IO at the end to make the x-axis scale the same between the 2 graphs.


KPTI on

KPTI on.

Conclusion

With KPTI enabled, the read speed was 1.54% slower.

This increased the time to complete the scrub by nearly 7 minutes.

KTPI is easily disabled should you find the performance drop is unacceptable.

For now, we are going to enable the patches on our internal production servers.