You are on page 1of 5

Page: 1 of 5

After Action Review

Issue Description

Issue
Issue Date
Impacted
Applications
/tracks
/process
Author Name
& Email Id

MOL Application & Crx DE were down due to slow


response of MOL WEB API.
AEM Publish Instance index files are corrupted due to
Online compaction
Incident# INC4177562
08/01/2016
MOL web Application

Suneel ASM team & skarthik@truevalue.com

Release 1.0

Page: 2 of 5

Contents

1. Fault Description.............................................................................................................3
1.1 Reported Problem.................................................................................................3
1.2 Actual Observation...............................................................................................3
2. Business Impact.............................................................................................................3
3. Root Cause Analysis......................................................................................................3
3.1 Causal Analysis.........................................................................................................3
4. Corrective Actions..........................................................................................................4
5. Preventive Actions.........................................................................................................4
6. Improvement Actions...................................................................................................4
7. Timelines - Sequence of Events...............................................................................5

Release 1.0

Page: 3 of 5

1. Fault Description
1.1Reported Problem

Mol web application is down. Issue is with Passport connecting to the HTML.
1.2Actual Observation

No exception errors on AEM logs. Observed online compaction task took place (which in
general take place every day b/w 1:00 A.M 2:00 A.M) checked the index files, noticed they
are corrupted. On further debug by capturing thread jumps found that MOL WEB APIs are
not responding.
2. Business Impact

Members unable to access any of the Web Portals via MOL to submit orders.
3. Root Cause Analysis
3.1 Causal Analysis

1st Why?
2nd Why?
3rd Why?
4th Why?

Why Analysis
Question
Answer
Why Index corrupted
Due to Online compaction
Why online compaction
To reduce
application
outage
configured.
window for maintenance
Why after rebuilding index Index is not a root cause.
still environment is down.
Why Crx De is down.
AEM is waiting for MOL API
response.

Root Cause

4. Corrective Actions

Took the backup and loaded the last snap version (Sunday 12:00 A.M)
Disabled the online compaction.
Rebuilt the corrupted indexes.
Captured the thread dumps.
Replaced the production MOL WEB API with UAT MOL WEB APIs on AEM
publish & author instance.

Release 1.0

Page: 4 of 5

5. Preventive Actions
Sl No. Action
1.
Disable

Owner

Remarks

Online compactionAEM ASM Team As recommended by adobe


(Author & Publish)
disable the online compaction
get approval with business.

2.

6. Improvement Actions
#

Lesson Learnt/Action Item

2
3

Busine
ss
Group

Upload Adobe Log files


immediately on day care
when ticket is raised.
Keep Adobe AEM Sales
engineer number handy
Ensure that the Adobe
engineer does a proper
handover when he leaves
the call to his/her
colleague taking over
MOL-Adobe AEM VM
snapshot can be used as
done during prod issue
Evaluate the suggestions
recommended by Adobe
during the outage
Look for alternatives how
to identify if some
backend system failure is
causing AEM instance
down.

Owner

Due
Date

Status

AEM ASM
Team

Done

AEM ASM
Team
AEM ASM
Team

Done

AEM ASM
Team

Done

AEM ASM
Team

WIP

AEM ASM
Team

WIP

Done

7. Timelines - Sequence of Events


Date

Time

08/01/2016

10.45
P.M
05:30
P.M

Replaced MOL PROD WEB API with UAT WEB API

2:04

5.30 P.M

AEM team explained that the issue is with the Indexing


which needs to be rebuild.

08/01/2016

08/01/2016

Action

Captured the thread dumps came to know MOL WEB


API not responding

Release 1.0

Page: 5 of 5

3:25 PM Indexing rebuild completed


Issue still persist.
4:26 PM CST: Andrew from Adobe joined.
Index rebuilt required.
05:00 pm - Index rebuild completed
05:02 Issue still persist

08/01
1:39 P.M

Shared screen and sharing the logs with AEM for further
investigation.
Team confirmed for the recovery of the system.

08/01
10.50

AEM team joined the call.

08/01
09:53
08/01
09:20
08/01
08:20
08/01

ASM team raised a P1 ticket with AEM to check on the


HTML issue with Passport
ASM team investigating the Login Passport Expired
Key
MOL is down

07:00

Release 1.0

You might also like