1.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2.. include:: <isonum.txt> 3 4======= 5Devlink 6======= 7 8:Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. 9 10Contents 11======== 12 13- `Info`_ 14- `Parameters`_ 15- `Health reporters`_ 16 17Info 18==== 19 20The devlink info reports the running and stored firmware versions on device. 21It also prints the device PSID which represents the HCA board type ID. 22 23User command example:: 24 25 $ devlink dev info pci/0000:00:06.0 26 pci/0000:00:06.0: 27 driver mlx5_core 28 versions: 29 fixed: 30 fw.psid MT_0000000009 31 running: 32 fw.version 16.26.0100 33 stored: 34 fw.version 16.26.0100 35 36Parameters 37========== 38 39flow_steering_mode: Device flow steering mode 40--------------------------------------------- 41The flow steering mode parameter controls the flow steering mode of the driver. 42Two modes are supported: 431. 'dmfs' - Device managed flow steering. 442. 'smfs' - Software/Driver managed flow steering. 45 46In DMFS mode, the HW steering entities are created and managed through the 47Firmware. 48In SMFS mode, the HW steering entities are created and managed though by 49the driver directly into hardware without firmware intervention. 50 51SMFS mode is faster and provides better rule insertion rate compared to default DMFS mode. 52 53User command examples: 54 55- Set SMFS flow steering mode:: 56 57 $ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime 58 59- Read device flow steering mode:: 60 61 $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode 62 pci/0000:06:00.0: 63 name flow_steering_mode type driver-specific 64 values: 65 cmode runtime value smfs 66 67enable_roce: RoCE enablement state 68---------------------------------- 69If the device supports RoCE disablement, RoCE enablement state controls device 70support for RoCE capability. Otherwise, the control occurs in the driver stack. 71When RoCE is disabled at the driver level, only raw ethernet QPs are supported. 72 73To change RoCE enablement state, a user must change the driverinit cmode value 74and run devlink reload. 75 76User command examples: 77 78- Disable RoCE:: 79 80 $ devlink dev param set pci/0000:06:00.0 name enable_roce value false cmode driverinit 81 $ devlink dev reload pci/0000:06:00.0 82 83- Read RoCE enablement state:: 84 85 $ devlink dev param show pci/0000:06:00.0 name enable_roce 86 pci/0000:06:00.0: 87 name enable_roce type generic 88 values: 89 cmode driverinit value true 90 91esw_port_metadata: Eswitch port metadata state 92---------------------------------------------- 93When applicable, disabling eswitch metadata can increase packet rate 94up to 20% depending on the use case and packet sizes. 95 96Eswitch port metadata state controls whether to internally tag packets with 97metadata. Metadata tagging must be enabled for multi-port RoCE, failover 98between representors and stacked devices. 99By default metadata is enabled on the supported devices in E-switch. 100Metadata is applicable only for E-switch in switchdev mode and 101users may disable it when NONE of the below use cases will be in use: 1021. HCA is in Dual/multi-port RoCE mode. 1032. VF/SF representor bonding (Usually used for Live migration) 1043. Stacked devices 105 106When metadata is disabled, the above use cases will fail to initialize if 107users try to enable them. 108 109- Show eswitch port metadata:: 110 111 $ devlink dev param show pci/0000:06:00.0 name esw_port_metadata 112 pci/0000:06:00.0: 113 name esw_port_metadata type driver-specific 114 values: 115 cmode runtime value true 116 117- Disable eswitch port metadata:: 118 119 $ devlink dev param set pci/0000:06:00.0 name esw_port_metadata value false cmode runtime 120 121- Change eswitch mode to switchdev mode where after choosing the metadata value:: 122 123 $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev 124 125Health reporters 126================ 127 128tx reporter 129----------- 130The tx reporter is responsible for reporting and recovering of the following two error scenarios: 131 132- tx timeout 133 Report on kernel tx timeout detection. 134 Recover by searching lost interrupts. 135- tx error completion 136 Report on error tx completion. 137 Recover by flushing the tx queue and reset it. 138 139tx reporter also support on demand diagnose callback, on which it provides 140real time information of its send queues status. 141 142User commands examples: 143 144- Diagnose send queues status:: 145 146 $ devlink health diagnose pci/0000:82:00.0 reporter tx 147 148NOTE: This command has valid output only when interface is up, otherwise the command has empty output. 149 150- Show number of tx errors indicated, number of recover flows ended successfully, 151 is autorecover enabled and graceful period from last recover:: 152 153 $ devlink health show pci/0000:82:00.0 reporter tx 154 155rx reporter 156----------- 157The rx reporter is responsible for reporting and recovering of the following two error scenarios: 158 159- rx queues' initialization (population) timeout 160 Population of rx queues' descriptors on ring initialization is done 161 in napi context via triggering an irq. In case of a failure to get 162 the minimum amount of descriptors, a timeout would occur, and 163 descriptors could be recovered by polling the EQ (Event Queue). 164- rx completions with errors (reported by HW on interrupt context) 165 Report on rx completion error. 166 Recover (if needed) by flushing the related queue and reset it. 167 168rx reporter also supports on demand diagnose callback, on which it 169provides real time information of its receive queues' status. 170 171- Diagnose rx queues' status and corresponding completion queue:: 172 173 $ devlink health diagnose pci/0000:82:00.0 reporter rx 174 175NOTE: This command has valid output only when interface is up. Otherwise, the command has empty output. 176 177- Show number of rx errors indicated, number of recover flows ended successfully, 178 is autorecover enabled, and graceful period from last recover:: 179 180 $ devlink health show pci/0000:82:00.0 reporter rx 181 182fw reporter 183----------- 184The fw reporter implements `diagnose` and `dump` callbacks. 185It follows symptoms of fw error such as fw syndrome by triggering 186fw core dump and storing it into the dump buffer. 187The fw reporter diagnose command can be triggered any time by the user to check 188current fw status. 189 190User commands examples: 191 192- Check fw heath status:: 193 194 $ devlink health diagnose pci/0000:82:00.0 reporter fw 195 196- Read FW core dump if already stored or trigger new one:: 197 198 $ devlink health dump show pci/0000:82:00.0 reporter fw 199 200NOTE: This command can run only on the PF which has fw tracer ownership, 201running it on other PF or any VF will return "Operation not permitted". 202 203fw fatal reporter 204----------------- 205The fw fatal reporter implements `dump` and `recover` callbacks. 206It follows fatal errors indications by CR-space dump and recover flow. 207The CR-space dump uses vsc interface which is valid even if the FW command 208interface is not functional, which is the case in most FW fatal errors. 209The recover function runs recover flow which reloads the driver and triggers fw 210reset if needed. 211On firmware error, the health buffer is dumped into the dmesg. The log 212level is derived from the error's severity (given in health buffer). 213 214User commands examples: 215 216- Run fw recover flow manually:: 217 218 $ devlink health recover pci/0000:82:00.0 reporter fw_fatal 219 220- Read FW CR-space dump if already stored or trigger new one:: 221 222 $ devlink health dump show pci/0000:82:00.1 reporter fw_fatal 223 224NOTE: This command can run only on PF. 225