Tuesday, September 15, 2015

Bash wrapper around iblinkinfo that shows IB switch names rather than their GUIDs

We have many IB switches in our Infiniband network on our HPC cluster. I use iblinkinfo a lot to find out which node is connected which port on which switch, etc. Problem with iblinkinfo is that it doesn't show any names for unmanaged switches. Most of our switches are unmanaged. One of the Mellanox representative gave us a script that made it possible to give names for few of these unmanaged switches. But it didn't work for the switches in Dell chassis. So, I decided to put in a bit of work and came up with a list of GUIDs for all the switches and then matched them with the names I wanted to see (I put the labels with the same names on the switches as well). So here is the script. Below you can find out the output with just iblinko and with wrapper around it. It makes life much much easier.
[2015-09-15 22:01:35:7793 root@soho post]# cat /usr/local/sbin/iblinkinfo_wrapper 
#!/bin/bash

# by Sreedhar Manchu

sw_guid=(0xf45214030095b2c0 0xf4521403009564e0 0x0002c9020048d260 0x0002c9020048d9b8 0x0002c9020048d940 0x0002c9020048d8e8 0x0002c9020048d240 0x0002c90200489d18 0x0002c902004a7998 0x0002c902004b5d00 0xf452140300f61d20 0xf4521403009571c0 0x0002c902004100f0 0x0002c9020040ff28 0x0002c90200422938 0x0002c902004239c8 0x0002c90200423950 0x0002c90200423a60 0x0002c9020040fe80 0x0002c9020040d668 0xf452140300868de0 0xf452140300680100 0xf452140300680180 0xf45214030067f800 0xf452140300680080 0x0002c9020041e098 0x0002c9020040c868 0x0002c90200422498 0x0002c90200422258 0x0002c9020041e0c0 0x0002c9020041e108 0x0002c903006be3f0 0x0002c903006bfa70 0x0002c903006bfb70 0x0002c903006bfe70 0x0002c903006be6f0 0x0002c903006be7f0 0x0002c903007b6a30 0x0002c903006bfaf0 0x0002c903006be670 0x0002c903006bf970)

sw_name=(ibswcore0 ibswcore1 ibswspine0 ibswspine1 ibswspine2 ibswspine3 ibswspine4 ibswspine5:spmercerib0 ibswspine6 ibswspine7:spmercerib1 ibswspine8:spmercerib3 ibswspine9:spmercerib4 ibswspine10:spboweryib ibswspine11 ibswspine12 ibswspine13 ibswspine14 ibswspine15 ibswspine16:splibb ibswspine17 ibswspine18:spmercerib2 ibswedge0 ibswedge1 ibswedge2 ibswedge3 ibswedge4 ibswedge5 ibswedge6 ibswedge7 ibswedge8 ibswedge9 ibswedge14 ibswedge15 ibswedge16 ibswedge17 ibswedge18 ibswedge19 ibswedge20 ibswedge21 ibswedge22 ibswedge23)

/usr/sbin/iblinkinfo > /tmp/iblinkinfo_wrapper_$$

echo
echo
for ((i=0;i<${#sw_guid[@]};i++));do
 awk -F: -v OFS=: -v ss="${sw_guid[$i]}" -v rs="${sw_name[$i]}" '$0 ~ ss {$2 = ss" "rs; print }' /tmp/iblinkinfo_wrapper_$$
done
echo
echo

for ((i=0;i<${#sw_guid[@]};i++));do
 lids[$i]=$(smpquery NI -G ${sw_guid[$i]} | awk 'NR == 1 {print $5}')
 sed -i "/${sw_guid[$i]}/s//& ${sw_name[$i]}/" /tmp/iblinkinfo_wrapper_$$
done

IFS=$'\n'
while read -r i
do
 slid1=$(echo "$i" | awk '{print $10}')
 slid2=$(echo "$i" | awk '{print $11}')
 for ((j=0;j<${#sw_guid[@]};j++))
 do
  if [ "$slid1" = '(' ]
  then
   echo $i
   break
  elif [ "$slid1" = "${lids[$j]}" -o "$slid2" = "${lids[$j]}" ]
  then
   echo $i|awk -F'"' -v OFS=\" -v ss="${sw_name[$j]}" '{$2 = ss; print}'
   break
  elif [ $((j+1)) -eq ${#lids[@]} ]
  then
   echo $i
  fi
 done
done< /tmp/iblinkinfo_wrapper_$$

rm -f /tmp/iblinkinfo_wrapper_$$
As you can see, at the beginning of the script I put a list of names of switches and then matched them with GUIDs. All this does is to replace the Mellanox generic names for these switches with my names by matching GUIDs. Without wrapper I am going to paste only part of the output here.
[2015-09-15 22:51:28:7795 root@master post]# iblinkinfo
Switch: 0x0002c9020041e098 Infiniscale-IV Mellanox Technologies:
         395    1[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     342   15[  ] "Infiniscale-IV Mellanox Technologies" ( )
         395    2[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     342   17[  ] "Infiniscale-IV Mellanox Technologies" ( )
         395    3[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     342   13[  ] "Infiniscale-IV Mellanox Technologies" ( )
         395    4[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     325   13[  ] "Infiniscale-IV Mellanox Technologies" ( )
         395    5[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     325   15[  ] "Infiniscale-IV Mellanox Technologies" ( )
         395    6[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     325   17[  ] "Infiniscale-IV Mellanox Technologies" ( )
         395    7[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     424   15[  ] "Infiniscale-IV Mellanox Technologies" ( )
         395    8[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     424   13[  ] "Infiniscale-IV Mellanox Technologies" ( )
         395    9[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     424   17[  ] "Infiniscale-IV Mellanox Technologies" ( )
         395   10[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     301   13[  ] "Infiniscale-IV Mellanox Technologies" ( )
         395   11[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     301   15[  ] "Infiniscale-IV Mellanox Technologies" ( )
         395   12[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     301   17[  ] "Infiniscale-IV Mellanox Technologies" ( )
         395   13[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     289   26[  ] "Infiniscale-IV Mellanox Technologies" ( )
         395   14[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     289   25[  ] "Infiniscale-IV Mellanox Technologies" ( )
         395   15[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     442   26[  ] "spboweryib SW-1" ( )
         395   16[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     442   25[  ] "spboweryib SW-1" ( )
         395   17[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     276    1[  ] "compute-4-0 HCA-1" ( )
         395   18[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     279    1[  ] "compute-4-1 HCA-1" ( )
         395   19[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     277    1[  ] "compute-4-2 HCA-1" ( )
         395   20[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     283    1[  ] "compute-4-3 HCA-1" ( )
         395   21[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     281    1[  ] "compute-4-4 HCA-1" ( )
         395   22[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     278    1[  ] "compute-4-5 HCA-1" ( )
         395   23[  ] ==(                Down/Disabled)==>             [  ] "" ( )
         395   24[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     284    1[  ] "compute-4-7 HCA-1" ( )
         395   25[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     286    1[  ] "compute-4-8 HCA-1" ( )
         395   26[  ] ==(                Down/ Polling)==>             [  ] "" ( )
         395   27[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     290    1[  ] "compute-4-10 HCA-1" ( )
         395   28[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     292    1[  ] "compute-4-11 mlx4_0" ( )
         395   29[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     287    1[  ] "compute-4-12 HCA-1" ( )
         395   30[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     288    1[  ] "compute-4-13 HCA-1" ( )
         395   31[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     293    1[  ] "compute-4-14 HCA-1" ( )
         395   32[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     291    1[  ] "compute-4-15 HCA-1" ( )
With the wrapper I am pasting the exact part of the output I pasted above
[2015-09-15 22:51:28:7795 root@master post]# bash /usr/local/sbin/iblinkinfo_wrapper
Switch: 0x0002c9020041e098 ibswedge4 Infiniscale-IV Mellanox Technologies:
         395    1[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     342   15[  ] "ibswspine15" ( )
         395    2[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     342   17[  ] "ibswspine15" ( )
         395    3[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     342   13[  ] "ibswspine15" ( )
         395    4[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     325   13[  ] "ibswspine14" ( )
         395    5[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     325   15[  ] "ibswspine14" ( )
         395    6[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     325   17[  ] "ibswspine14" ( )
         395    7[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     424   15[  ] "ibswspine13" ( )
         395    8[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     424   13[  ] "ibswspine13" ( )
         395    9[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     424   17[  ] "ibswspine13" ( )
         395   10[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     301   13[  ] "ibswspine12" ( )
         395   11[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     301   15[  ] "ibswspine12" ( )
         395   12[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     301   17[  ] "ibswspine12" ( )
         395   13[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     289   26[  ] "ibswspine11" ( )
         395   14[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     289   25[  ] "ibswspine11" ( )
         395   15[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     442   26[  ] "ibswspine10:spboweryib" ( )
         395   16[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     442   25[  ] "ibswspine10:spboweryib" ( )
         395   17[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     276    1[  ] "compute-4-0 HCA-1" ( )
         395   18[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     279    1[  ] "compute-4-1 HCA-1" ( )
         395   19[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     277    1[  ] "compute-4-2 HCA-1" ( )
         395   20[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     283    1[  ] "compute-4-3 HCA-1" ( )
         395   21[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     281    1[  ] "compute-4-4 HCA-1" ( )
         395   22[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     278    1[  ] "compute-4-5 HCA-1" ( )
         395   23[  ] ==(                Down/Disabled)==>             [  ] "" ( )
         395   24[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     284    1[  ] "compute-4-7 HCA-1" ( )
         395   25[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     286    1[  ] "compute-4-8 HCA-1" ( )
         395   26[  ] ==(                Down/ Polling)==>             [  ] "" ( )
         395   27[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     290    1[  ] "compute-4-10 HCA-1" ( )
         395   28[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     292    1[  ] "compute-4-11 mlx4_0" ( )
         395   29[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     287    1[  ] "compute-4-12 HCA-1" ( )
         395   30[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     288    1[  ] "compute-4-13 HCA-1" ( )
         395   31[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     293    1[  ] "compute-4-14 HCA-1" ( )
         395   32[  ] ==( 4X          10.0 Gbps Active/  LinkUp)==>     291    1[  ] "compute-4-15 HCA-1" ( )
As you can see, it took out generic names and replaced them with the names we have given for the switches and labelled the switches with. It comes pretty handy.

No comments:

Post a Comment