Topology on SA1 ONC
Looking into pitfalls when copying data between CUDA devices, some mixed bits I learned:
- Device topology,
NIC0appears to be the Infiniband adapter
[spbonc@exflong101 ~]$ nvidia-smi topo --matrix
GPU0 GPU1 NIC0 CPU Affinity NUMA Affinity
GPU0 X SYS SYS 0-31,64-95 0
GPU1 SYS X NODE 32-63,96-127 1
NIC0 SYS NODE X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
- Support for peer-to-peer access,
walso supported
[spbonc@exflong101 ~]$ nvidia-smi topo -p2p r
GPU0 GPU1
GPU0 X OK
GPU1 OK X
Legend:
X = Self
OK = Status Ok
CNS = Chipset not supported
GNS = GPU not supported
TNS = Topology not supported
NS = Not supported
U = Unknown