, , , , , , Posted on 14 November 2016 by

Improving transfer speeds for ZFS send/receive in a local network

atop screenshot

I recently had to send a couple of ZFS file systems from one server to another, both running Ubuntu Linux 16.04. Since both servers were in the same trusted local network, there was no need for encryption. As various other websites will tell you, the fastest way to transfer data over a local network is with the netcat (nc) utility. Since nc doesn’t encrypt anything (like SSH would) and only acts as a simple pass-through this should indeed give best performance.

On the receiving end (the backup machine), I ran

nc -l 8000 | \
  mbuffer -q -s 128k -m 1G | \
  pv -rtab | \
  sudo zfs receive -vF backups/home

Here the -l option to nc tells it to listen on port 8000. In addition to nc I also added mbuffer1 to act as a buffering layer. The -s option sets the block size of the buffer, given that ZFS’ default block size is 128 kB, I chose the same size. The -m option sets the buffer size. The 1 GB I specified is probably a bit big, but since both machines had ample RAM, this was of no concern. The pv tool displays the progress of data transfer. The -r option shows the rate of data transfer, the -t option shows the transfer time, the -a option shows the average transfer rate, and the -b option shows the total bytes sent.

On the sending side, I started the following to send an incremental series of snapshots:

sudo zfs send -I storage/home@2016-06-01_monthly storage/home@2016-09-01_monthly | \
   mbuffer -q -s 128k -m 1G | \
   pv -b | \
   nc 192.168.20.3  8000

The reason for explicitly specifying the ip address and not the host name of the receiving server is because, in addition to normal gigabit network connections via a switch, both servers also have a direct 10Gb/s link (two Mellanox ConnectX EN PCIe 2.0 cards). I wanted to make sure that that connection was used for the transfer.

I then started atop to observe how the transfer proceeded. This was somewhat disappointing, because transfer rates were stuck just above Gbit speeds, roughly between 990 Mbps and 1200 Mbps. However, on the sending end, CPU usage of nc was close to or at 100%. Since this is a rather low-spec machine with an old dual core Intel Core i3 2100 CPU, this may not have been too surprising. A closer look at atop‘s output showed that a lot of CPU time was actually spent servicing irqs. So I needed to find ways to reduce the number of interrupts sent by the network card.

Because I know from previous experience that optimising TCP/IP parameters can have a dramatic effect on transfer rates and CPU usage, I decided to see what I could optimise on this setup. The first thing that comes to mind is to increase the MTU of both network cards. The default is 1500, so I set it to 9000 on both cards:

ip link set dev eth3 mtu 9000

This increased transfer rates to roughly 1200 to 1400 Mbps and reduced the CPU usage of nc a bit.

Next, I looked at the transmit and receive buffers of the network cards:

$ ethtool -g eth3
Ring parameters for eth3:
Pre-set maximums:
RX:             8192
RX Mini:        0
RX Jumbo:       0
TX:             8192
Current hardware settings:
RX:             1024
RX Mini:        0
RX Jumbo:       0
TX:             512

Increasing these buffers should also reduce the number of interrupts:

$ sudo ethtool -G eth3 rx 2048 tx 2048
$ ethtool -g eth3
Ring parameters for eth3:
Pre-set maximums:
RX:             8192
RX Mini:        0
RX Jumbo:       0
TX:             8192
Current hardware settings:
RX:             2048
RX Mini:        0
RX Jumbo:       0
TX:             2048

This led to in increase in transfer rate to approximately 1800 – 2500 Mbps according to atop and the CPU usage of nc now hovered between 30% and 91%. Moreover, atop showed me that by this time, the three disks comprising the RAIDZ1 volume on the sending server were the bottleneck. The disks spent all their time reading the data, so no point in optimising the transfer rate even further.

By this time, the output of pv on the receiving server showed me transfer peaks of about 240 MB/s, with an average transfer rate of 173 MB/s after 38 minutes of transfer time and 390 GiB transferred.

So the bottom line is: there is always a bottleneck, however, some bottlenecks are more easily overcome than others. With a simple change of two parameters I roughly doubled the transfer rate of this transfer of several TB of data.

Footnotes

  1. One thing I probably should have tried in hindsight, is to remove nc from the equation altogether and let mbuffer handle the transfer (see also here).

No Comments

Let us keep you updated!

Once a month we’ll send you an overview of our newest articles. No spam, we promise.

Thank you for signing up! We've send you an confirmation mail.