You are on page 1of 13

www.mellanox.

com
Performance Tuning Guidelines for
Mellanox Network Adapters
Mellanox Technologies
350 Oakmead Parkway, Suite 100
Sunnyvale, CA 94085
U.S.A.
www.mellanox.com
Tel: (408) 970-3400
Fax: (408) 970-3403
Mellanox Technologies, Ltd.
PO Box 586 Hermon Building
Yokneam 20692
Israel
Tel: +972-4-909-7200
Fax: +972-4-959-3245
Copyright 2010. Mellanox Technologies, Inc. All Rights Reserved.
Mellanox, BridgeX, ConnectX, InfiniBlast, InfiniBridge, InfiniHost, InfiniRISC, InfiniScale, InfiniPCI, PhyX
and Virtual Protocol Interconnect are registered trademarks of Mellanox Technologies, Ltd.
CORE-Direct and FabricIT are trademarks of Mellanox Technologies, Ltd.
All other marks and names mentioned herein may be trademarks of their respective companies.
Document Number: 3368
Rev 1.0
Mellanox Technologies 2
NOTE:
THIS INFORMATION IS PROVIDED BY MELLANOX FOR INFORMATIONAL PURPOSES
ONLY AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIM-
ITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL MELLANOX BE LIA-
BLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSE-
QUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS HARDWARE, EVEN
IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Performance Tuning Guidelines for Mellanox Network Adapters Rev 1.0
Mellanox Technologies 3
Table of Contents
Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1 Relevant Mellanox Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Chapter 2 General System Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 PCI Express (PCIe) Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 BIOS Power Management Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Intel Hyper-Threading Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Chapter 3 Performance Tuning for Linux. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1 Tuning the Network Adapter for Improved IPv4 Traffic Performance. . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Tuning the Network Adapter for Improved IPv6 Traffic Performance. . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Interrupt Moderation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4 Interrupt Affinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.4.1 Example: Script for Setting Interrupt Affinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.5 Preserving Your Performance Settings After A Reboot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Chapter 4 Performance Tuning for Windows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.1 Warning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Tuning the Network Adapter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Tuning Windows Server 2003. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3.1 Registry Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3.2 Windows Firewall. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.4 Tuning for Windows Server 2008 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.4.1 Registry Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.4.2 Enable RSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.4.3 Windows Firewall and Filtering Engine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Rev 1.0
Mellanox Technologies 4
Revision History
Table 1 - Revision History
Revision No. Date Description
1.0 November 2010 First release
Performance Tuning Guidelines for Mellanox Network Adapters Rev 1.0
Mellanox Technologies 5
1 Introduction
Depending on the application of the user's system, it may be necessary to modify the default con-
figuration of network adapters based on the ConnectX or ConnectX-2 VPI / Ethernet adapters.
This document describes important tuning parameters and settings that can result in improved per-
formance for Mellanox drivers. Each setting, along with its potential effect, is described to help in
making an informed judgment concerning its relevance to the user's system, the system workload,
and the performance goals.
1.1 Relevant Mellanox Drivers
The tuning guidelines described in this document apply to the following Mellanox Softwar drivers:
On Linux: Mellanox Ethernet Driver MLNX_EN for Linux version 1.5.1.4
On Windows: Mellanox OFED for Windows MLNX_VPI version 2.1.2
General System Configurations Rev 1.0
Mellanox Technologies 6
2 General System Configurations
The following sections describe recommended configurations for system components and/or inter-
faces. Different systems may have different features, thus some recommendations below may not
be applicable.
2.1 PCI Express (PCIe) Capabilities
Note: For VPI / Ethernet adapters with ports configured to run 40Gb/s or above, it is recom-
mended to use an x16 PCIe slot to benefit from the additional buffers allocated by the
system.
2.2 BIOS Power Management Settings
Set BIOS power management to Maximum Performance
On Intel Processors Only: Disable C-states of PCI Express
Note that these performance optimizations may result in higher power consumption.
2.3 Intel Hyper-Threading Technology
Note: This section applies to Intel processors only supporting Hyper-Threading.
For latency and message rate sensitive applications, it is recommended to disable Hyper-Thread-
ing.
Table 2 - Recommended PCIe Configuration
PCIe Generation 2.0
Speed 5GT/s
Width x8
Max Payload size 256
Max Read Request 512
Performance Tuning Guidelines for Mellanox Network Adapters Rev 1.0
Mellanox Technologies 7
3 Performance Tuning for Linux
You can use the Linux sysctl command to modify default system network parameters that are set
by the operating system in order to improve IPv4 and IPv6 traffic performance. Note, however,
that changing the network parameters may yield different results on different systems. The results
are significantly dependent on the CPU and chipset efficiency.
3.1 Tuning the Network Adapter for Improved IPv4 Traffic Performance
The following changes are recommended for improving IPv4 traffic performance:
Disable the TCP timestamps option for better CPU utilization:
sysctl -w net.ipv4.tcp_timestamps=0
Disable the TCP selective acks option for better CPU utilization:
sysctl -w net.ipv4.tcp_sack=0
Increase the maximum length of processor input queues:
sysctl -w net.core.netdev_max_backlog=250000
Increase the TCP maximum and default buffer sizes using setsockopt():
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.wmem_max=16777216
sysctl -w net.core.rmem_default=16777216
sysctl -w net.core.wmem_default=16777216
sysctl -w net.core.optmem_max=16777216
Increase memory thresholds to prevent packet dropping:
sysctl -w net.ipv4.tcp_mem="16777216 16777216 16777216"
Increase Linuxs auto-tuning of TCP buffer limits. The minimum, default, and maximum num-
ber of bytes to use are:
sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"
sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216"
3.2 Tuning the Network Adapter for Improved IPv6 Traffic Performance
The following changes are recommended for improving IPv6 traffic performance:
Disable the TCP timestamps option for better CPU utilization:
sysctl -w net.ipv4.tcp_timestamps=0
Disable the TCP selective acks option for better CPU utilization:
sysctl -w net.ipv4.tcp_sack=0
Performance Tuning for Linux Rev 1.0
Mellanox Technologies 8
3.3 Interrupt Moderation
Interrupt moderation is used to decrease the frequency of network adapter interrupts to the CPU.
Mellanox network adapters use an adaptive interrupt moderation algorithm by default. The algo-
rithm checks the transmission (Tx) and receive (Rx) packet rates and modifies the Rx interrupt
moderation settings accordingly.
To manually set Tx and/or Rx interrupt moderation, use the ethtool utility. For example, the fol-
lowing commands first show the current (default) setting of interrupt moderation on the interface
eth1, then turns off Rx interrupt moderation, and last shows the new setting.
> ethtool -c eth1
Coalesce parameters for eth1:
Adaptive RX: on TX: off
...
pkt-rate-low: 400000
pkt-rate-high: 450000
rx-usecs: 16
rx-frames: 88
rx-usecs-irq: 0
rx-frames-irq: 0
...
> ethtool -C eth1 adaptive-rx off rx-usecs 0 rx-frames 0
> ethtool -c eth1
Coalesce parameters for eth1:
Adaptive RX: off TX: off
...
pkt-rate-low: 400000
pkt-rate-high: 450000
rx-usecs: 0
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 0
...
3.4 Interrupt Affinity
The affinity of an interrupt is defined as the set of processor cores that service that interrupt. To
improve application scalability and latency, it is recommended to distribute interrupt requests
(IRQs) between the available processor cores. To prevent the Linux IRQ balancer application from
interfering with the interrupt affinity scheme, the IRQ balancer must be turned off.
Performance Tuning Guidelines for Mellanox Network Adapters Rev 1.0
Mellanox Technologies 9
The following command turns off the IRQ balancer:
> /etc/init.d/irqbalancer stop
The following command assigns the affinity of a single interrupt vector:
> echo <hexadecimal bit mask> > /proc/irq/<irq vector>/smp_affinity
where bit i in <hexadecimal bit mask> indicates whether processor core i is in
<irq vector>s affinity or not.
3.4.1 Example: Script for Setting Interrupt Affinity
Note: On systems that support NUMA, it is recommended to set IRQs from different network
devices to processor cores that reside on different physical CPU sockets.
#! /bin/bash
CORES=$((`cat /proc/cpuinfo | grep processor | tail -1 | awk '{print $3}'`+1))
limit=1
while [ $CORES -gt 0 ]
do
limit=$((limit*2))
CORES=$((CORES-1))
done
if [ -z $1 ]; then
IRQS=$(cat /proc/interrupts | grep eth-mlx | awk '{print $1}' |
sed 's/://')
else
IRQS=$(cat /proc/interrupts | grep $1 | awk '{print $1}' | sed 's/://')
fi
echo Discovered irqs: $IRQS
mask=1 ; for IRQ in $IRQS ; do echo $(printf "%x" $mask) > /proc/irq/$IRQ/
smp_affinity ; mask=$(( mask * 2)) ; if [ $mask -ge $limit ] ; then mask=1 ; fi
; done
echo irqs were set OK.
3.5 Preserving Your Performance Settings After A Reboot
To preserve your performance settings after a reboot, you need to add them to the file
/etc/sysctl.conf as follows:
<sysctl name1>=<value1>
<sysctl name2>=<value2>
<sysctl name3>=<value3>
<sysctl name4>=<value4>
Performance Tuning for Linux Rev 1.0
Mellanox Technologies 10
For example, Section 3.1, Tuning the Network Adapter for Improved IPv4 Traffic Performance
listed the following setting to disable the TCP timestamps option:
sysctl -w net.ipv4.tcp_timestamps=0
In order to keep the TCP timestamps option disabled after a reboot, add the following line to
/etc/sysctl.conf:
net.ipv4.tcp_timestamps=0
Performance Tuning Guidelines for Mellanox Network Adapters Rev 1.0
Mellanox Technologies 11
4 Performance Tuning for Windows
4.1 Warning
This document describes how to modify Windows registry parameters in order to
improve performance. Please note that modifying the registry incorrectly might
lead to serious problems, including the loss of data, system hang, and you may
need to reinstall Windows. As such it is recommended to back up the registry on
your system before implementing recommendations included in this document.
If the modifications you apply lead to serious problems, you will be able to
restore the original registry state. For more details about backing up and restor-
ing the registry, please visit www.microsoft.com.
4.2 Tuning the Network Adapter
To improve the network adapter performance, activate the performance tuning tool as follows:
1. Click start-->Control Panel.
2. Open Network Connections.
3. Right click on one of the entries Mellanox ConnectX 10Gbit Ethernet Adapter and select
Properties.
4. Select the Performance tab.
5. Click on General Tuning button.
Clicking the General Tuning button will change several registry entries (described below), and
will check for system services that may decrease network performance. It will also generate a log
including the applied changes.
Users can view this log to restore the previous values. The log path is:
%HOMEDRIVE%\Windows\System32\LogFiles\PerformanceTunning.log
This tuning is needed on one adapter only, and only once after the installation (as long as these
entries are not changed directly in the registry, or by some other installation or script).
Please note that a reboot may be required for the changes to take effect.
4.3 Tuning Windows Server 2003
4.3.1 Registry Tuning
The registry entries that may be added/changed by this General Tuning procedure are described
below.
Under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters:
Increase the TCP buffer size for TCP connections:
TcpWindowSize, type REG_DWORD, value set to 512K.
Performance Tuning for Windows Rev 1.0
Mellanox Technologies 12
GlobalMaxWindowSize, type REG_DWORD, value set to 512K.
Enable the Window scaling option:
Tcp1323Opts, type REG_DWORD, value set to 1.
Disable TCP selective acks option for better CPU utilization:
SackOpts, type REG_DWORD, value set to 0.
Enable Receive Side Scaling:
EnableRss, type REG_DWORD, value set to 1.
RssBaseCpu, type REG_DWORD, value set to 1.
Under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AFD\Parameters:
Enable fast datagram sending for UDP traffic:
FastSendDatagramThreshold, type REG_DWORD, value set to 64K.
4.3.2 Windows Firewall
It is recommended to disable the following service in order to enable TCP segmentation offload:
Windows Firewall/Internet Connection Sharing (ICS)
4.4 Tuning for Windows Server 2008
4.4.1 Registry Tuning
The registry entries that may be added/changed by this General Tuning procedure are:
Under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters:
Disable TCP selective acks option for better cpu utilization:
SackOpts, type REG_DWORD, value set to 0.
Under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AFD\Parameters:
Enable fast datagram sending for UDP traffic:
FastSendDatagramThreshold, type REG_DWORD, value set to 64K.
Under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Ndis\Parameters:
Set RSS parameters:
RssBaseCpu, type REG_DWORD, value set to 1.
4.4.2 Enable RSS
Enabling Receive Side Scaling (RSS) is performed by means of the following command:
netsh int tcp set global rss = enabled
Performance Tuning Guidelines for Mellanox Network Adapters Rev 1.0
Mellanox Technologies 13
4.4.3 Windows Firewall and Filtering Engine
It is recommended to disable the following services:
Windows Firewall (MpsSvc)
Base Filtering Engine (BFE)

You might also like