dinsdag 24 november 2009

VIO : Client path failover

Following a recent discussion with IBM L2 support, all parameters that affect VIO client path failover will be explained briefly.

# lsattr -El hdisk0
...
algorithm                    fail_over
hcheck_interval         60
hcheck_mode             nonactive
...

Currently, MPIO on the VIO client only supports failover from one VSCSI client adapter to another (fail_over algorithm). Load balancing over multiple VSCSI client adapters is currently not supported.
The heartbeat check interval for each disk using MPIO should be configured so that the path status is updated automatically. Specifying hcheck_mode=nonactive means that healthcheck commands are sent down paths that have no active I/O, including paths with a state of "Failed". The hcheck_interval attribute defines how often the healthcheck is performed. In the client partition the hcheck_interval for virtual SCSI devices is set to 0 by default which means healthchecking is disabled.

# lsattr -El vscsi2
vscsi_err_recov        fast_fail
vscsi_path_to            30


vscsi_path_to, when enabled, allows the virtual client adapter driver to determine the health of the VIO Server to improve and expedite path failover processing.
A value of 0 (default) disables it, while any other value defines the number of seconds the VSCSI client adapter will wait for commands issued to the VSCSI server adapter that were not serviced meanwhile. If that time is exceeded, the VSCSI client adapter attempts the commands again and waits up to 60 seconds until it fails the outstanding requests. An error will be writen to the error log and, if MPIO is used, another path to the disk will be tried to service the requests. Therefore, this parameter should only be set for MPIO installations with dual VIO servers.

Similar to the attribute fc_error_recov for real FC adapters, the attribute vscsi_err_recov is used by the VSCSI adapter driver. When this parameter is set to fast_fail, the VIO client adapter will send a FAST_FAIL datagram to the VIO server and it will subsequently fail the I/O immediately rather than delayed. This may help to improve MPIO failover.

vscsi_err_recov has been added since AIX 5.3 TL9 (APAR IZ28537) and AIX 6.1 TL2 (APAR IZ28554).
It requires VIO server 2.1.