1.. SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
2.. include:: <isonum.txt>
3
4=======
5Devlink
6=======
7
8:Copyright: |copy| 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
9
10Contents
11========
12
13- `Info`_
14- `Parameters`_
15- `Health reporters`_
16
17Info
18====
19
20The devlink info reports the running and stored firmware versions on device.
21It also prints the device PSID which represents the HCA board type ID.
22
23User command example::
24
25   $ devlink dev info pci/0000:00:06.0
26      pci/0000:00:06.0:
27      driver mlx5_core
28      versions:
29         fixed:
30            fw.psid MT_0000000009
31         running:
32            fw.version 16.26.0100
33         stored:
34            fw.version 16.26.0100
35
36Parameters
37==========
38
39flow_steering_mode: Device flow steering mode
40---------------------------------------------
41The flow steering mode parameter controls the flow steering mode of the driver.
42Two modes are supported:
431. 'dmfs' - Device managed flow steering.
442. 'smfs' - Software/Driver managed flow steering.
45
46In DMFS mode, the HW steering entities are created and managed through the
47Firmware.
48In SMFS mode, the HW steering entities are created and managed though by
49the driver directly into hardware without firmware intervention.
50
51SMFS mode is faster and provides better rule insertion rate compared to default DMFS mode.
52
53User command examples:
54
55- Set SMFS flow steering mode::
56
57    $ devlink dev param set pci/0000:06:00.0 name flow_steering_mode value "smfs" cmode runtime
58
59- Read device flow steering mode::
60
61    $ devlink dev param show pci/0000:06:00.0 name flow_steering_mode
62      pci/0000:06:00.0:
63      name flow_steering_mode type driver-specific
64      values:
65         cmode runtime value smfs
66
67enable_roce: RoCE enablement state
68----------------------------------
69If the device supports RoCE disablement, RoCE enablement state controls device
70support for RoCE capability. Otherwise, the control occurs in the driver stack.
71When RoCE is disabled at the driver level, only raw ethernet QPs are supported.
72
73To change RoCE enablement state, a user must change the driverinit cmode value
74and run devlink reload.
75
76User command examples:
77
78- Disable RoCE::
79
80    $ devlink dev param set pci/0000:06:00.0 name enable_roce value false cmode driverinit
81    $ devlink dev reload pci/0000:06:00.0
82
83- Read RoCE enablement state::
84
85    $ devlink dev param show pci/0000:06:00.0 name enable_roce
86      pci/0000:06:00.0:
87      name enable_roce type generic
88      values:
89         cmode driverinit value true
90
91esw_port_metadata: Eswitch port metadata state
92----------------------------------------------
93When applicable, disabling eswitch metadata can increase packet rate
94up to 20% depending on the use case and packet sizes.
95
96Eswitch port metadata state controls whether to internally tag packets with
97metadata. Metadata tagging must be enabled for multi-port RoCE, failover
98between representors and stacked devices.
99By default metadata is enabled on the supported devices in E-switch.
100Metadata is applicable only for E-switch in switchdev mode and
101users may disable it when NONE of the below use cases will be in use:
1021. HCA is in Dual/multi-port RoCE mode.
1032. VF/SF representor bonding (Usually used for Live migration)
1043. Stacked devices
105
106When metadata is disabled, the above use cases will fail to initialize if
107users try to enable them.
108
109- Show eswitch port metadata::
110
111    $ devlink dev param show pci/0000:06:00.0 name esw_port_metadata
112      pci/0000:06:00.0:
113        name esw_port_metadata type driver-specific
114          values:
115            cmode runtime value true
116
117- Disable eswitch port metadata::
118
119    $ devlink dev param set pci/0000:06:00.0 name esw_port_metadata value false cmode runtime
120
121- Change eswitch mode to switchdev mode where after choosing the metadata value::
122
123    $ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
124
125Health reporters
126================
127
128tx reporter
129-----------
130The tx reporter is responsible for reporting and recovering of the following two error scenarios:
131
132- tx timeout
133    Report on kernel tx timeout detection.
134    Recover by searching lost interrupts.
135- tx error completion
136    Report on error tx completion.
137    Recover by flushing the tx queue and reset it.
138
139tx reporter also support on demand diagnose callback, on which it provides
140real time information of its send queues status.
141
142User commands examples:
143
144- Diagnose send queues status::
145
146    $ devlink health diagnose pci/0000:82:00.0 reporter tx
147
148NOTE: This command has valid output only when interface is up, otherwise the command has empty output.
149
150- Show number of tx errors indicated, number of recover flows ended successfully,
151  is autorecover enabled and graceful period from last recover::
152
153    $ devlink health show pci/0000:82:00.0 reporter tx
154
155rx reporter
156-----------
157The rx reporter is responsible for reporting and recovering of the following two error scenarios:
158
159- rx queues' initialization (population) timeout
160    Population of rx queues' descriptors on ring initialization is done
161    in napi context via triggering an irq. In case of a failure to get
162    the minimum amount of descriptors, a timeout would occur, and
163    descriptors could be recovered by polling the EQ (Event Queue).
164- rx completions with errors (reported by HW on interrupt context)
165    Report on rx completion error.
166    Recover (if needed) by flushing the related queue and reset it.
167
168rx reporter also supports on demand diagnose callback, on which it
169provides real time information of its receive queues' status.
170
171- Diagnose rx queues' status and corresponding completion queue::
172
173    $ devlink health diagnose pci/0000:82:00.0 reporter rx
174
175NOTE: This command has valid output only when interface is up. Otherwise, the command has empty output.
176
177- Show number of rx errors indicated, number of recover flows ended successfully,
178  is autorecover enabled, and graceful period from last recover::
179
180    $ devlink health show pci/0000:82:00.0 reporter rx
181
182fw reporter
183-----------
184The fw reporter implements `diagnose` and `dump` callbacks.
185It follows symptoms of fw error such as fw syndrome by triggering
186fw core dump and storing it into the dump buffer.
187The fw reporter diagnose command can be triggered any time by the user to check
188current fw status.
189
190User commands examples:
191
192- Check fw heath status::
193
194    $ devlink health diagnose pci/0000:82:00.0 reporter fw
195
196- Read FW core dump if already stored or trigger new one::
197
198    $ devlink health dump show pci/0000:82:00.0 reporter fw
199
200NOTE: This command can run only on the PF which has fw tracer ownership,
201running it on other PF or any VF will return "Operation not permitted".
202
203fw fatal reporter
204-----------------
205The fw fatal reporter implements `dump` and `recover` callbacks.
206It follows fatal errors indications by CR-space dump and recover flow.
207The CR-space dump uses vsc interface which is valid even if the FW command
208interface is not functional, which is the case in most FW fatal errors.
209The recover function runs recover flow which reloads the driver and triggers fw
210reset if needed.
211On firmware error, the health buffer is dumped into the dmesg. The log
212level is derived from the error's severity (given in health buffer).
213
214User commands examples:
215
216- Run fw recover flow manually::
217
218    $ devlink health recover pci/0000:82:00.0 reporter fw_fatal
219
220- Read FW CR-space dump if already stored or trigger new one::
221
222    $ devlink health dump show pci/0000:82:00.1 reporter fw_fatal
223
224NOTE: This command can run only on PF.
225