Skip to content

Topology on SA1 ONC

Looking into pitfalls when copying data between CUDA devices, some mixed bits I learned:

  • Device topology, NIC0 appears to be the Infiniband adapter
[spbonc@exflong101 ~]$ nvidia-smi topo --matrix
	GPU0	GPU1	NIC0	CPU Affinity	NUMA Affinity
GPU0	 X 	SYS	SYS	0-31,64-95	0
GPU1	SYS	 X 	NODE	32-63,96-127	1
NIC0	SYS	NODE	 X 		

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  • Support for peer-to-peer access, w also supported
[spbonc@exflong101 ~]$ nvidia-smi topo -p2p r
 	GPU0	GPU1	
 GPU0	X	OK	
 GPU1	OK	X	

Legend:

  X    = Self
  OK   = Status Ok
  CNS  = Chipset not supported
  GNS  = GPU not supported
  TNS  = Topology not supported
  NS   = Not supported
  U    = Unknown

@esobolev @hammerd