Professional Documents
Culture Documents
BRKACI-2102
Mioljub Jovanovic, Technical Leader
Agenda
Introduction
Tools
Troubleshooting scenarios
Conclusion / Q&A
6494649266 bytes
0 short frame
0 overrun
0 underrun
0 ignored
72 input discard
John Chambers
@CiscoLive #clus, San Diego 2015
: topology/pod-1/node-101/sys/phys-[eth1/34]/CDeqptIngrPkts5min
unicastRate
: 1742.12
direct URL
https://apic/doc/html/
APIC UI
apic 1
Web Browser
apic 2
APIC Cluster
Connect to APIC
Visore
CLI (ssh)
apic 3
10
spine 1
spine 2
Connect to switch
ACI Fabric
leaf 1
leaf 2
leaf 3
leaf 4
leaf 5
Login directly via serial console port on the switch front panel or SSH to management
username "admin".
IP via out of band or inband Using
Application Policy Infrastructure Controller
12
13
Statistics
Faults
Diagnostics
Thresholds
Faults,
Health Scores
Troubleshooting, Drill Downs
Drill-Downs
Stats
Atomic
Counters
ELAM
SPAN
On-Demand
Diagnostics
Switch
Nxos Cli
14
15
The APIC dashboard provides you with an at-a-glance view of the system
health and fault counts.
16
17
API Inspector
enables us to see REST API calls (GET, DELETE, POST) from WebUI to APIC
82
# fabric.OverallHealthHist5min
index
: 0
childAction
:
cnt
: 31
dn
: /topology/HDfabricOverallHealth5min-0
healthAvg
: 82
healthMax
: 82
healthMin
: 82
healthSpct
: 0
healthThr
:
healthTr
: 0
lastCollOffset : 310
modTs
: never
repIntvEnd
: 2015-04-10T19:24:03.530+01:00
repIntvStart
: 2015-04-10T19:18:53.442+01:00
rn
: HDfabricOverallHealth5min-0
status
:
18
admin@apic1:~>
# fabric.Link
n1
:
s1
:
p1
:
n2
:
s2
:
p2
:
dn
:
lcOwn
:
linkState
:
modTs
:
monPolDn
:
rn
:
status
:
wiringIssues :
moquery -c fabricLink
203
1
1
101
1
51
topology/pod-1/lnkcnt-101/lnk-203-1-1-to-101-1-51
local
ok
2015-03-13T14:26:39.526+01:00
uni/fabric/monfab-default
lnk-203-1-1-to-101-1-51
19
fabricNode
adSt
on
childAction
delayedHeartbeat
no
dn
topology/pod-1/node-101
fabricSt
active
id
101
lcOwn
local
modTs
2015-04-08T14:38:44.546+02:00
model
N9K-C9396PX
monPolDn
uni/fabric/monfab-default
name
bdsol-9396px-02
role
leaf
serial
SAL18CLUS15
status
uid
vendor
version
icurl 'http://apic/api/node/class/fabricNode.xml?query-target-filter=and(eq(fabricNode.id,"101"))'
20
The lower half of the screen shows node and tenant health.
21
The lower half of the screen shows node and tenant health.
Move these sliders
down to show only
nodes / tenants with
lower health.
22
type
(config, environmental, etc)
and APIC cluster health.
23
24
Health Score
Number
between
0 and 100
100
Health Score
25
27
Abstracted Network
traceroute
iping, itraceroute
atomic counters
syslog
statistics
diagnostics (on-demand)
SPAN
ELAM
ping
SPAN
28
UI Tools
Health
Faults
Audits
Events
Statistics
Call-home
Syslogs
SNMP
29
Capacity Dashboard
ACI Optimizer
EP Tracker
Visualization
30
512
512
512
512
512
512
Apr
Apr
Apr
Apr
Apr
Apr
2422:48
2422:48
2422:48
2422:48
2422:48
2422:48
comp
dbgs
expcont
fwrepo
topology
uni
31
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
1
on
no
topology/pod-1/node-1
unknown
local
2015-04-08T14:27:16.290+02:00
APIC
uni/fabric/monfab-default
apic1
node-1
controller
SAL18CLUS15
0
Cisco Systems, Inc
32
or simply use
WebUI
33
34
NXOS Process
NXOS Process
NXOS Process
Switch
NXOS Process
NXOS Process
NXOS Process
Switch
Delegate localObjectstore
faults, events,
records, health score
(Shared memory)
35
NXOS Process
NXOS Process
NXOS Process
Switch
Objectstore
Opflex(Shared
server for memory)
external
opflex elem
35
NXOS Process
NXOS Process
NXOS Process
Switch
NXOS Process
NXOS Process
NXOS Process
Switch
APIC Logs
Switch Logs
/var/log/dme/log
/var/log/dme/log
/var/log/dme/oldlog
/var/log/dme/oldlog
/var/sysmgr/tmp_logs/
admin@apic1:~> cd /var/log/dme/log
admin@apic1:log> ls altr *
admin@apic1:log> ls al svc_ifc_policymgr.*
admin@apic1:~> cd /var/log/dme/log
admin@apic1:log> ls altr *
admin@apic1:log> ls -al svc_ifc_policyelem.*
40
41
> faultInfo.xml
icurl 'http://localhost:7777/api/class/faultRecord.xml'
> faultRecord.xml
icurl 'http://localhost:7777/api/class/eventRecord.xml'
> eventRecord.xml
icurl 'http://localhost:7777/api/class/aaaModLR.xml'
> aaaModLR.xml
icurl 'http://localhost:7777/api/class/aaaSessionLR.xml'
> aaaSessionLR.xml
cd /tmp
tar zcvf tac-655555555.tgz tac-655555555
cp tac-655555555.tgz /data/techsupport
42
admin shell
/ - ishell root folder
/var/log/dme/log
/debug
/aci
/mit
/mgmt/log/scriptcontainer.log
43
Troubleshooting scenarios
44
spine 1
spine 2
Topology
2 x spine
2 x leaf N9K-9396px
(48 x 1/10G SFP+)
ACI Fabric
2 x leaf N9K-93128tx
(96 x 1/10G Base-T)
1 x leaf N9K-C9372px
leaf 1
leaf 2
leaf 3
leaf 4
leaf 5
3 x APIC
10Gbps
apic 1
apic 2
apic 3
45
Troubleshooting Scenario
46
47
http://apic/api/aaaListDomains.xml
Double-click on the
specific request to
check timing details.
48
49
DME
Debug URL
http://apic1/api/nginx/debug/tacacs.xml
50
Troubleshooting Scenario
52
Dj vu?
Change!
55
aaaModLR
aaaModLR - AAA audit log record,
which is automatically generated
whenever a user modifies
an object.
57
double click
faultRecord in GUI
We could also check:
eventRecord
healthRecord
58
descr
descr
descr
descr
descr
descr
descr
descr
:
:
:
:
:
:
:
:
Configuration failed for EPG default due to Not Associated With Management Zone
Datetime Policy Configuration for F5clock failed due to : access-epg-not-specified
Failed to form relation to MO AbsGraph-VEStandAloneFuncProfile of class vnsAbsGraph
Failed to form relation to MO fwP-default of class nwsFwPol in context uni/infra
Ntp configuration on leaf leaf1 is Not Synchronized
Ntp configuration on leaf leaf2 is Not Synchronized
Ntp configuration on spine spine1 is Not Synchronized
Power supply shutdown. (serial number DCB18CLUS15)
Troubleshooting Scenario
60
Unit
-----------------------packets
packets
packets
%
packets
bytes
bytes-per-second
packets
packets-per-second
packets
packets
packets
%
bytes
bytes-per-second
packets
packets-per-second
62
Troubleshooting Scenario
63
Troubleshooting:
APIC Faults / Visore / debug.log / LTM log
https://<APIC>/visore.html
APIC Faults
/data/devicescript/F5.BIGIP.1.1.0/logs/debug.log
/var/log/*
64
65
APIC Faults
Double click
on faults
66
67
APIC debug.log
Locate the APIC that contains the shard configuring the BIG-IP, then go to
the following location:
admin@apic1:~> cd /data/devicescript/F5.BIGIP.1.0.0/logs
You will see debug.log and periodic.log
admin@apic1:logs> ls all
-rw-r--r-- 2 nobody nobody 52688 Sep 30 11:31 debug.log
-rw-r--r-- 2 nobody nobody 35492 Sep 30 11:30 periodic.log
You can tail -f debug.log to monitor the process
68
Example: mcpd
69
70
Sync] log
Sync] log
ltm.2.gz
ltm.3.gz
# cd /var/log
# ls ltm*
ltm.4.gz ltm.6.gz
ltm.5.gz ltm.7.gz
ltm.8.gz
ltm.9.gz
Example
output
Jul 19 11:57:53 apic-bigip2 notice mcpd[7439]: 01070638:5: Pool /apic_5668/apic_5668_webPool member /apic_5668/192.168.10.101%1295:80 monitor status
down. [ /apic_5668/apic_5668_webMonitor: down ] [ was up for 20hrs:55mins:46sec ]
Jul 19 11:57:54 apic-bigip2 notice mcpd[7439]: 01070638:5: Pool /apic_5668/apic_5668_webPool member /apic_5668/192.168.10.102%1295:80 monitor status
down. [ /apic_5668/apic_5668_webMonitor: down ] [ was up for 20hrs:55mins:47sec ]
Jul 19 11:57:54 apic-bigip2 notice mcpd[7439]: 01071682:5: SNMP_TRAP: Virtual /apic_5668/apic_5668_4096_Virtual-Server has become unavailable
Jul 19 11:57:54 apic-bigip2 err tmm[9357]: 01010028:3: No members available for pool /apic_5668/apic_5668_webPool
Jul 19 11:57:54 apic-bigip2 err tmm1[9357]: 01010028:3: No members available for pool /apic_5668/apic_5668_webPool
Jul 19 11:57:54 apic-bigip2 err tmm2[9357]: 01010028:3: No members available for pool /apic_5668/apic_5668_webPool
Jul 19 11:57:54 apic-bigip2 err tmm3[9357]: 01010028:3: No members available for pool /apic_5668/apic_5668_webPool
Jul 19 12:03:02 apic-bigip2 err iprepd[6725]: 015c0004:3: failed connect to 208.87.136.155 on 443
Jul 19 12:03:03 apic-bigip2 err iprepd[6725]: 015c0004:3: Certificate verification error: 18
Jul 19 12:03:03 apic-bigip2 err iprepd[6725]: 015c0004:3: nSendReceiveSsl failed SSL handshake
Jul 19 12:04:11 apic-bigip2 info pfmand[6925]: 01660009:6: Link: 2.1 is DOWN
Jul 19 12:04:11 apic-bigip2 info pfmand[6925]: 01660009:6: Link: 2.2 is DOWN
71
Access Encap
to
Fabric Encap
72
spine 1
spine 2
EP A to EPB - simplified
2
1 Regular L2 packet
2 iVXLAN packet
3 Regular L2 packet
leaf 1
leaf 2
leaf 3
leaf 5
leaf 4
1
EP A
EP B
73
spine 1
spine 2
leaf 1
leaf 2
leaf 3
leaf 4
leaf 5
linux VM A:
connected to ACI fabric
VM A
MAC: 00:00:33:33:33:33
VLAN 3399
74
8/12 x 40G
leaf 1
Cisco
ASIC
2 forwarded to destination
if its known on BCM
3 if destination not
learned in BCM
forwarding table, then
send to Cisco ASIC
leaf 1
eth 1/34
8/12 x 40G
Merchant
ASIC
48/96 x 10G
To servers/blade, switches
EP A
MAC: 00:00:33:33:33:33
75
Linux view
VM MAC: 00:00:33:33:33:33
bcm-shell-hw
switch# bcm-shell-hw "l2 show"
Static
VLAN
Type
age
---------+-----------------+--------+---------+------+----+-----------------* 53
0000.3333.3333
dynamic
eth1/34
* 53
5254.00b0.c481
dynamic
eth1/34
* 54
5254.00c3.b82c
dynamic
eth1/34
53
:::
hw_vlan_id:
57
vlan_type:
FD_VLAN
:::
bd_vlan:
52
access_encap_type:
802.1q
:::
access_encap:
3399
fabric_encap_type:
VXLAN
:::
fabric_encap:
9891
sclass:
16387
:::
scope:
bd_vnid:
9891
:::
untagged:
acess_encap_hex:
0xd47
:::
fabric_enc_hex:
0x26a3
80
Troubleshooting Scenario
81
82
Troubleshooting Scenario
83
iPing CLI
show vrf
usage:
iping [-V vrf] [-c count] [-S source ip] host
options:
-V
: vrf to use for ping (management/overlay-1/Tenant VRF)
-c
: # of requests to send.
-i
: interval between ICMP echo packets.
-t
: Timeout for responses.
-p
: Data pattern in payload.
-s
: Size
-S : Source Interface name/ IP address.
84
spine 1
spine 2
iping internals
leaf1# iping V tenant:vrf01 S 64.101.1.1 64.101.1.22
leaf 1
leaf 2
leaf 3
leaf 4
leaf 5
1
EP A
85
spine 1
spine 2
iping internals
leaf4# iping V tenant:vrf01 S 64.101.1.1 64.101.1.22
leaf 1
leaf 2
leaf 3
leaf 4
leaf 5
EP A
86
Troubleshooting Scenario
87
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
368,075,657
351,308,235
332,607,921
0
60,649
60,696
0
0
193,423
1,493,189
10,965,614
0
0
0
6,577,648
0
+253
+264
+212
+0
+0
+0
+0
+0
+0
+1
+5
+0
+0
+0
+0
+0
84/s
87/s
70/s
2/s
*try also
Troubleshooting Scenario
89
Capacity Dashboard
Troubleshooting Scenario
91
3 start
We define session name and select End Points wed like to troubleshoot visually
92
93
Troubleshooting Scenario
94
ELAM
95
What is ELAM?
96
From Fabric
Parser Block
Sideband
Packet RW
Lookup Block
ELAM
ELAM
Input
Select
Lines
Output
Select
Lines
ELAM
ELAM
Output
Select
Lines
Packet RW
Input
Select
Lines
Sideband
To BCM
Lookup Block
Parser Block
From BCM
North Star data path divided into ingress and egress pipelines
2 ELAMs are present in each pipeline (Input ELAM and Output ELAM)
These ELAMs are present at the beginning and end of the lookup block.
Limitations
Packets can be captured based on either input select lines or output select
lines but not both.
98
ELAM Support
Cisco ASIC data path divided into ingress and egress pipelines
2 ELAMs are present in each pipeline (Input ELAM and Output ELAM)
These ELAMs are present at the beginning and end of the lookup block.
Limitations
Packets can be captured based on either input select lines or output select lines but
not both.
ELAM Support
Input Select Lines Supported
3 Outerl2-outerl3-outerl4
4 Innerl2-innerl3-inner l4
5 Outerl2-innerl2
6 Outerl3-innerl3
7 Outerl4-innerl4
Output Select Lines Supported
0 Pktrw
5 Sideband
Note:
Only output select lines 0 and 5 are supported
for capturing
packets based on output at both output and
input
100
ELAM Configuration
1. Init
2. Config
in the packet
3. Arm
Trigger
4. Read
hardware
Read Once the trigger is triggered, read the report.
Reset Once the process is complete, reset the trigger
5. Reset
101
ELAM configuration
Show the trigger
The configured trigger can be verified using the show command
root@module-1(NS-elam-insel3)# show
102
adj_index
ol_encap_idx
sclass
src_tep_idx
sup_redirect
l2flood
fwddrop
bnce
103
ELAM Example
104
8/12 x 40G
leaf 1
Cisco
ASIC
2 forwarded to destination
if its known on BCM
3 if destination not
learned in BCM
forwarding table, then
send to Cisco ASIC
leaf 1
eth 1/10
8/12 x 40G
Merchant
ASIC
48/96 x 10G
To servers/blade, switches
EP A
MAC: 00:25:b5:aa:00:0a
105
spine 1
spine 2
ELAM Example
ingress
1 leaf1: input ingress
outer header
leaf 1
3
leaf 2
leaf 3
leaf 4
leaf 5
1
EP A
EP B
106
spine 1
spine 2
ELAM Example
1 leaf1: input ingress
ingress
outer header
outer
vsh_lc
debug platform internal ns elam asic 0
trigger reset
trigger init ingress in-select 3 out-select 0
set outer l2 src_mac 00:25:b5:aa:00:0a
set outer l2 dst_mac ff:ff:ff:ff:ff:ff
start
status
report
leaf 1
leaf 2
leaf 3
leaf 4
leaf 5
1
EP A
MAC: 00:25:b5:aa:00:0a
EP B
MAC: 00:25:b5:bb:00:0b
107
ELAM configuration
leaf1# vsh_lc
module-1# debug platform internal ns elam asic 0
module-1(NS-elam)# trigger reset
module-1(NS-elam)# trigger init ingress in-select 3 out-select 0
module-1(NS-elam-insel3)# set outer l2 src_mac 00:25:b5:aa:00:0a
module-1(NS-elam-insel3)# set outer l2 dst_mac ff:ff:ff:ff:ff:ff
module-1(NS-elam-insel3)# start
module-1(NS-elam-insel3)# status
Status: Armed
module-1(NS-elam-insel3)# ?
report Show trigger report
module-1(NS-elam-insel3)# report
ELAM not triggered. No report available
Were looking to
confirm if broadcast
packet sourced from
MAC
00:25:b5:aa:00:0a
is reaching
Cisco ASIC
NOTE:
1) Without the "reset" command, trigger buffers are never reset other than reboot.
2) Users can move in and out of the ELAM mode, and there will be no impact on the configured
108
triggers.
ML:
MET
Last
TD: TTL Dec Disable
GBL_C++: [INFO]
ol_encap_idx: 2FF6
DV: Dst Valid
DT-PT: Dest Port
TEP address derived
GBL_C++: [INFO]
ol_ttl: 08
ET: Encap Type
GBL_C++: [INFO]
ol_segid: 2A8001 DT-NP: Dest Port Not-PC
from encap:
OP: Override PIF Pinning
HR: Higig DstMod RW
GBL_C++: [INFO]
sclass: C005
HG-MD: Higig DstMode
KV: Keep VNTAG
10.0.200.127
GBL_C++: [INFO]
sup_redirect: 0
-----------------------------------------------------------GBL_C++: [INFO]
mcast: 0
On APIC or Switch
switch output
APIC is not running ISIS
protocol
110
spine 1
spine 2
ELAM Example
ingress
2 spine: input ingress
inner header
inner
2
Cisco ASIC
in spine
vsh_lc
debug platform internal alp elam asic 0 | 1
trigger init ingress in-select 3 out-select 0
set inner l2 src_mac 00:25:b5:aa:00:0a
set inner l2 dst_mac 00:25:b5:bb:00:0b
start
status
report
leaf 1
leaf 2
leaf 3
leaf 4
leaf 5
1
EP A
MAC: 00:25:b5:aa:00:0a
EP B
MAC: 00:25:b5:bb:00:0b
111
spine 1
spine 2
ELAM Example
egress
3 leaf4: input egress
inner header
inner
Egress because were egressing the fabric
3
leaf 1
leaf 2
leaf 3
leaf 4
leaf 5
Cisco ASIC
in leaf
vsh_lc
debug platform internal ns elam asic 0
trigger init egress in-select 3 out-select 0
set inner l2 src_mac 00:25:b5:aa:00:0a
set inner l2 dst_mac 00:25:b5:bb:00:0b
start
status
report
1
host A
MAC: 00:25:b5:aa:00:0a
report
host B
MAC: 00:25:b5:bb:00:0b
112
References
113
APIC resources
API Documentation
Python SDK
114
Online resources
ACI Toolkit:
http://datacenter.github.io/acitoolkit/
https://github.com/datacenter/acitoolkit
ACI Diagram
https://github.com/cgascoig/aci-diagram
116
Troubleshooting
Cisco ACI
Available at GitHub
117
118
Designing Data
Centers with
Cisco's ACI
LiveLessons-Networking Talks
ISBN: 978-1-58714-436-3
119
Call to Action
120
121
Thank you
122