You are on page 1of 91

W ith the concept of "Serving with dedication and being committed to our

customers", we keep striving to create new values for our customers. In order to
provide user with the latest and the most practical maintenance experience and
methods, ZTE maintenance experts composed this issue of pocket book of MSTP
product maintenance experience. All the troubleshooting cases in this pocket
book come from maintenance practice and can help you a lot to maintain ZTE
power supply products. We appreciate your comments and suggestions. Thanks!

For any need or suggestion, please feel free to contact us.


Email: zheng.lijun@zte.com.cn
Office phone: (86) 755 26773776
Fax: (86) 755 26773778

Transmission and Power Supply Documentation Team


2010-11
Contents
Chapter 1 Hardware Faults......................................................................... 1

1.1 NE Off-management and Service Interruption Caused by the

Replacement of O4CSD Board ..................................................................... 1

1.2 NE Off-management Caused by S320 NCP Fault................................... 3

1.3 2M Service of a Site’s S360 Equipment Reports for AIS Alarm............. 5

1.4 CSB Board Malfunction .......................................................................... 7

1.5 All Optical Boards in Self-loop Report LOF or LOS .............................. 8

1.6 Optical Board Failure Leads to B1 Error................................................. 9

1.7 CSC16x16 Board Malfunction in Power-up.......................................... 13

Chapter 2 Performance Faults .................................................................... 1

2.1 Optical Board Causes B1 Error Codes .................................................... 1

Chapter 3 Data Configuration Faults ......................................................... 1

3.1 SEC Board Reports LFD and VC12 Extensible Markup

Mismatching.................................................................................................. 1

3.2 Service Commissioning of TGE2B-E Board in an Office Fails .............. 2

-i-
3.3 AU not Configured with Service in 10G Optical Board of ZXMP

S390 Reports AU-AIS Alarm........................................................................ 4

Chapter 4 Power Faults ............................................................................... 1

4.1 Service Boards in Some Slots Report Channel Alarm............................. 1

4.2 Some Boards’ Service Failure ................................................................. 3

Chapter 5 Protection Faults ........................................................................ 1

5.1 MS Switching Causes Temporary Break in Service................................ 1

5.2 Timeslot Configuration Confusion Causes Path Protection

Configuration Failure .................................................................................... 2

5.3 Cross-connect Board Failure Causes MS-ring Switching

Unsuccessful ................................................................................................. 8

5.4 LP16 Board Failure Causes MS Protective Switching Unsuccessful .... 11

5.5 ZXMP S360’s OL1 Board Fault Causes Path Ring Switching

Failure ......................................................................................................... 13

5.6 S360 MS Switching Causes Part Services Unstable.............................. 16

Chapter 6 NM Faults ................................................................................. 21

6.1 E300 NM Alerts “Database Disconnected”........................................... 21

6.2 T31 NM’s Client Program Cannot Start up Normally........................... 27

6.3 E300 NM S320 NEs’ Board Indicator Lights Cannot Flash.................. 29


-ii-
Chapter 7 ECC Faults................................................................................ 31

7.1 Board Reset with Telnet ........................................................................ 31

7.2 NCP Board Fault Causes ECC Failure .................................................. 32

Chapter 8 Clock Sync Faults ..................................................................... 35

8.1 Clock Configuration Error Causes Unstable Clock ............................... 35

Chapter 9 ASON Faults ............................................................................. 39

9.1 Call Connection Cannot Reply 1—Insufficient Bandwidth................... 39

9.2 Call Connection Cannot Reply 2—Restriction of Route Policy ............ 41

Chapter 10 Interconnection Faults ........................................................... 45

10.1 10G Optical Boards of ZTE S390 Equipment and Marconi MSH64

Equipment Fails in Interconnection............................................................. 45

10.2 C2 Byte Causes ATM Service Interconnection Failure....................... 49

-iii-
Hardware Faults

Chapter 1 Hardware Faults


1.1 NE Off-management and Service
Interruption Caused by the Replacement of
O4CSD Board
Fault Description

The Network Management (NM) software detects the Regenerator


Section (RS) and Multiplexing Section (MS) error codes at the O4CSD
board of a site’s S320 equipment. The cause of the fault is located as
O4CSD board failure.

After the 04CSD board is replaced on the spot, the network element
(NE) is off management and the service is interrupted. Insert the former
O4CSD board, and then the NE’s monitoring is recovered, and the
service is recovered. However, the RS and MS error codes still exist.

Cause Analysis

 The O4CSD version configured by the NM software is


inconsistent with the version of the replaced board.

 The replaced 04CSD board is faulty.

Troubleshooting

1. Confirm and acquire the information on the spot:

-1-
MSTP Routine Troubleshooting Manual

 The PCB hardware version of the former O4CSD board is


20010900, and the software version is 20040716. In the NM
software, the hardware version is all configured as 100. And the
NCP software version is 20050728.

 The PCB version of the O4CSD board replaced on the spot is


20040900, and the software version is 20060905. The board of
this version requires the hardware version configured at the NM
software to be all 200, and the NCP software version to be
20061027 or higher.

Based on the above, the fault is caused by low NCP software version
and the inconformity of NM configuration with the new version
configuration requirement of the O4CSD board. The fault leads to
abnormal operation of the equipment.

2. Replace with the former O4CSD board, and confirm that the NE’s
monitoring is recovered.

3. Upgrade the NCP board version to 20061027.

4. Modify the version setting of the O4CSD board to 200 through the
NM software.

5. Pull out the O4CSD board, and then insert the new board.

-2-
Hardware Faults

Conclusion

During the process of fault handling, note whether the new board
version is consistent with the on-site board version when replacing the
O4CSD board or NCP board.

1.2 NE Off-management Caused by S320 NCP


Fault
Fault Description

The NE of a site’s S320 equipment is off management, and its


underlying NEs are all off management. However, the service of each
off-managed NE is normal. The neighbor NEs over the ring fail to
report LOS and MS-RDI alarms.

Cause Analysis

 The neighbor NEs on the ring did not report LOS and MS-RDI
alarm, indicating that the optical path is normal.

 The off-management situation is less likely caused by the


simultaneous failure of multiple optic boards. Therefore, the fault
is basically judged as NCP board down or NCP board failure.

Troubleshooting

1.Tele-reset this NCP board.

-3-
MSTP Routine Troubleshooting Manual

Since this NE is off-management, the NE’s NCP cannot be reset at the


NM software directly. However, since DCC connection failure is not
reported, use the following methods to tele-reset the NCP board.

(1)Telnet to any neighbor NE of the off-management NE.

Command format: telnet NE’s IP address

(2)Check the optical port connection by using the if command:

Command format: if –a

( 3 ) Tele-reset the NCP board of the faulty NE by using the


resetpeerncp command:

Command format: resetpeerncp 6 1

Command description: 6 refers to the slot No. of the board connected


with the faulty NE; 1 refers to the port No. of the optical port connected
with the faulty NE

2.If the fault cannot be tele-processed, it has to be processed on the spot.

(1)Pull and plug the NCP board on the spot to see if the problem is
solved. If not, proceed to the next step.

(2)Re-initialize the NCP board to see if the problem is solved. If not,


it must be the NCP board failure. Proceed to the next step.

(3)Replace the NCP board and the problem is solved.

-4-
Hardware Faults

Conclusion

The NE suddenly escapes from management, and there is no alarm in


the corresponding optical path of the upstream NE. Solve the problem
by tele-reset the NCP board first. Then re-initialize the NCP board on
the spot or replace the NCP board.

1.3 2M Service of a Site’s S360 Equipment


Reports for AIS Alarm
Fault Description

There are AIS, down time and remote defect indication in all 2M
services of the 2# EP1 board in a site’s S360 equipment. Related
channels over the ring also report AIS and down time. The service
corresponding to the central site reports AIS and remote defect
indication.

Cause Analysis

This fault is caused by blocked path. The possible reasons are:

 Faulty 2# EP1 of the site

 Abnormal slot configuration

 Failure in the cross-connect board or optical board of the site


through which the service passes.

-5-
MSTP Routine Troubleshooting Manual

Troubleshooting

1. Reset the 2# EP1 board. If the fault still exists, proceed to the next
step.

2. According to the whole network service report of the NM software,


check the service slot configuration to see if it is normal.

3. Upload and compare the timeslots of all services in passing


through the NE to see if the NM software and NE NCP service are
consistent.

4. Drop services to the tributary board hop by hop by using


dichotomy, and then locate the fault, which is between two sites.

5. Switch the two sites’ cross-connect (DXC) boards respectively to


see if the problem is solved. If not, proceed to the next step.

6. Perform loopback at the terminal side of the optical port for the
site that drop services. If the alarm disappears, it is the fault in the
optical board of this site.

7. Replace this optical board on the spot. Restore the previous data,
and the problem is solved.

Conclusion

For path troubleshooting, dropping with dichotomy or AU loopback


method is effective, and can quickly locate the problem in a segment,
thus saving great time.

-6-
Hardware Faults

1.4 CSB Board Malfunction


Fault Description

One ZXMP-S360 equipment malfunctions. In the cross-connect board


inserted with a TCS4, CSB passes the power-on self-test (POST). Then
the red light and green light go out simultaneously, not on any more.

Cause Analysis

The red light and green light go out simultaneously after POST,
indicating that the time-division module of the CSB board fails to detect
the clock signal sent by the clock board. It is the crystal oscillator
failure of the clock board.

Troubleshooting

1. According to preliminary judgment, it is the failure in CSB board.


Replace the CSB board with the time division module, and the
problem still exists.

2. Replace it with the CSC board (whether it has the time division
module) to see if it works normally.

3. In the test, the CSB board can work normally without the time
division module.

4. Debug with the CSB board with no time division module. The
devices connecting both ends of the equipment are detected
reporting Lose Of Frame (LOF), and the self-loop local end’s

-7-
MSTP Routine Troubleshooting Manual

optical board is also detected reporting LOF. It is judged as the


failure in the clock board.

5. Replace the clock board, and the problem is solved. Insert the time
division module to the CSB board and the board can work
normally.

Conclusion

Only the CSB board with the time division module is conducted with
the test of this software. If the cross-connect board can work normally
without the time division module or with the CSC board (no matter
inserted with the time division module), use the alarm of the time
division module to locate the fault, which is in the clock board. It thus
eliminates the potential hidden trouble of the equipment.

1.5 All Optical Boards in Self-loop Report LOF


or LOS
Fault Description

All optical boards of a NE report LOF and LOS. Self-loop optical


boards, and the alarm still exists.

Cause Analysis

All optical boards report LOS or LOF alarms, and even after the
self-loop of optical boards. The possibility of the damage in all optical
boards is small.

-8-
Hardware Faults

First, check the NCP board to locate the trouble and report the error
alarm.

Then, check the clock board to see if it is faulty, because it can lead to
unusable framing clock in the whole system. The signals transmitted by
the optical board cannot form frame.

Troubleshooting

1. Replace the NCP board to see if the problem is solved.

2. If not, replace the clock board and the problem is solved.

Conclusion

The optical boards in self-loop give out the alarm. The problem may be
caused by the self-loop optical boards, or by the NCP board or clock
board.

1.6 Optical Board Failure Leads to B1 Error


Fault Description

ZTE’s ZXMP-S360 equipment is applied in a local transmission


network. The whole network consists of three-end ZXMP-S360 NEs, to
form an unprotected link structure. The transmission rate is 2.5 Gbit/s.
The network architecture is shown in Figure 1-1. The central office is
located in NE A.

-9-
MSTP Routine Troubleshooting Manual

Figure 1-1 Network Architecture

The optical fiber connection relationship is shown as in Figure 1-1. 2M


service is between NEs.

Query the monitor performance data at the NM software.

Checking NE A:

 The service with NE B has a large amount of lower order


errors-V5 BBE in the tributary;

 The service with NE C has a large amount of lower order


errors-V5 BBE in the tributary;

 Some B1 BBEs are detected in the 5# OI16 line every 15


minutes’ performance;

 LP16 has B2 BBE and B3 BBE errors.

Checking NE B:

 There is no error in 5# OI16 and 11# OI16 lines;

 10# LP16 and 13# LP16 have B2 FEBBE and B3 FEBBE


errors.

-10-
Hardware Faults

 The service with NE A has a large amount of V5 FEBBE in the


tributary;

 The service with NE C is normal.

Checking NE C:

 The service with NE A has a large amount of V5 FEBBE in the


tributary;

Cause Analysis

Analyze the performance data in the line first. There are three kinds of
error codes monitoring the overhead byte in lines, including B1, B2 and
B3. They respectively monitor the quality of routes between the start
point and the end point.

 B1 only monitors the route between two sites’ regenerator


section (RS), and the error codes are ended within the RS. That
is, B1 error codes of NE A and NE B will not be transferred to
NE C.

 B2 only monitors the route between two sites’ multiplexing


section (MS), and the error codes are ended within the MS. NE
A and NE B are ADM type NEs, so B2 error codes will not be
transferred to NE C.

 B3 only monitors the route between higher order paths of two


sites. Obviously, the routes monitored by B3 contain the routes
monitored by B2 and B1, and the routes monitored by B2

-11-
MSTP Routine Troubleshooting Manual

contain the routes monitored by B1. Since the services of the


same AU drop in NE B and NE C, B3 error codes generated by
NE A and NE B will not be transferred to NE C.

Based on the analysis of Fault Description, the error codes occur


between NE A and NE B. Therefore, it is hard to confirm whether it is
the receiving failure in NE A or the transmitting failure in NE B.

Troubleshooting

Locate the trouble by eliminating sites one by one.

1. Measure the receiving optical power of NE A to see if it is normal


(short-distance sensitivity is -18 dBm, long-distance sensitivity is
-28dBm).

2. Self-loop the 5# optical board of the local site A. If there are still
error codes in the local site, the fault is in NE A.

3. Replace the 5# optical board of NE A. If the error codes of the


whole network disappear, the problem is solved.

Conclusion

If there are B1 error codes, the fault is located between two points. If
the optical power is normal, the fault is in the optical board. Then leave
B2, B3, and V5 error codes alone. After B1 error codes are solved, if
the problem still exists, solve B2, B3, and V5 error codes respectively.

-12-
Hardware Faults

Since the routes monitored by B1 contain B2, B3, and V5 routes, it is


thought that B1 can cause B2, B3, and V5 error codes. Surely, there are
exceptions, such as error codes in RS overhead. Then there is B1 yet no
B2 and B3. However, this situation is very rare.

Fully apprehend the generation principles of B1, B2, B3 and V5 error


codes, and their relationship. Normally, the generation of error codes is
related to the corresponding optical board. Therefore, the fault should
be located step by step. Analyze layer by layer according to the error
generation mechanism of different layers. Finally, the fault is located in
the optical board.

Note:

Check the NM software performance value periodically, and process the


error codes if they are detected. Otherwise, when the error codes reach a
certain amount, it will affect the normal receiving of services. If it is
severe, the service may be interrupted.

1.7 CSC16x16 Board Malfunction in Power-up


Fault Description

ZTE’s ZXMP-S360 equipment is applied in a local transmission


network. The whole network consists of six-end ZXMP-S360 NEs, to
form a path protective ring structure. The transmission rate is 2.5 Gbit/s.

-13-
MSTP Routine Troubleshooting Manual

The network architecture is shown in Figure 1-2. The central office is


located in NE A.

Figure 1-2 Network Architecture

The optical fiber connection relationship is shown as in Figure 1-2.

The original equipment is CSC8x8, which is now upgraded to CSC


16x16. After correct configuration at the NM software, CSC16x16 fails
to operation in power-up. NOM and ALM lights are ever bright. The
NM software alarms for mismatching board type.

Cause Analysis

CSC board fails to function. The possible reasons are as below:

 Incorrect configuration at the NM software

-14-
Hardware Faults

 The version of the NCP program is too out of date.

 CSC board is not inserted well.

 TCS board is not inserted well.

 CSC board malfunctions.

Troubleshooting

1. After confirmation, the version of the NCP main program is


v1.00.023. The hardware and software version of CSC 16x16 is
consistent with the version of NCP.

2. Check the NM software configuration and it is correct.

3. Pull and plug the CSC board. The fault still exists.

4. Pull out the TCS 16x16 module inserted in the CSC board, and
reinsert it. The fault disappears.

Conclusion

The earliest NCP version matching the CSC16x16 is V1.00.001. The


NCP program of V1.00.023 has no problem in matching the CSC16x16.
If the configuration at the NM software is correct, check if the
CSC16x16 is inserted tight, or if the board is faulty.

Note:

-15-
MSTP Routine Troubleshooting Manual

When replacing the CSC board, the TCS time division board over it
should be inserted tight. Otherwise, the board may not function
normally.

-16-
Performance Faults

Chapter 2 Performance Faults


2.1 Optical Board Causes B1 Error Codes
Fault Description

ZTE’s ZXMP-S360 equipment is applied in a local transmission


network. The whole network consists of four-end ZXMP-S360 NEs, to
form a MS protective ring. The transmission rate is 2.5 Gbit/s. The
network architecture is shown in Figure 2-1. The central office is
located in NE A.

Figure 2-1 Network Architecture

The optical fiber connection relationship is as below:

The 11#OI16 board of NE A connects the 5#OI16 board of NE B, and


the 11#OI16 board of NE B connects the 5#OI16 board of NE C. There
are 2M services between the NEs.

-1-
MSTP Routine Troubleshooting Manual

Query the monitor performance data at the NM software, and some B1


BBE error codes are detected in the 5#OI16 of NE B, being about a
dozen every 15 minutes’ performance. However, the 11#OI16 of NE A
is sound in performance and has no error code.

Cause Analysis

The fault with only B1 error is easier to be processed. There are the
following causes of B1 error codes:

 The optical board at the transmitting end malfunctions, leading


to error codes in the transmitted signals.

 The optical board at the receiving end malfunctions, leading to


error codes during processing even if normal signals are
received.

 The optical path fails. The optical power received by the


receiving end is too low, which goes beyond its sensitivity. It
leads to B1 error codes.

 Clock failure

Troubleshooting

Since the ring is MS-ring, in order not to affect the service, MS


switching between NE A and NE B is implemented through the NM
software for troubleshooting.

-2-
Performance Faults

If the service switching is normal, switch the clock board and the fault
still exists.

Check if the received light is normal. If not, check whether the internal
and external connection of ODF rack is loose. If yes, check the optical
interface inside the optical board. Though this situation is very rare, it
needs to be checked.

After eliminating the cause of external optical power according to the


checks above, replace the 11#OI16 board of site A and observe its
performance value to see if there is B1 error code. Thus, the fault is
located at the 11#OI16 board of site A.

Conclusion

The fault analysis lists out four possible causes. During normal usage,
the case of sudden lessening received light and beyond the optical board
sensitivity is very rare. Therefore, the problem caused by the third
reason is the rarest. For the optical board of the transmitting end and the
optical board of the receiving end, most faults occur in the transmitting
end. Therefore, start troubleshooting at the transmitting end.

-3-
Data Configuration Faults

Chapter 3 Data Configuration


Faults
3.1 SEC Board Reports LFD and VC12
Extensible Markup Mismatching
Fault Description

In the new commissioning of the ZXMP S390 equipment in an office,


after the service from SEC board to SFE8 between sites are configured,
the SEC board alarms for VCG Loss Of Frame Delineation (LFD) and
VC12 Extensible Signal Markup Mismatching.

Cause Analysis

LFD: When the frame header of GFP cannot be locked (being in search
and pre-sync state), it reports LFD alarm. If in locked status, the alarm
disappears. These alarms are caused by inconsistent encapsulation
protocols adopted by both ends. Therefore, both ends should select the
same GFP for encapsulation. If V1.0 SFE board is adopted in one end, it
should be upgraded to V2.0.

Troubleshooting

The software version of SFE8 on the spot is V2.0. Replace the


encapsulation protocol with GFP encapsulation to solve the problem.

-1-
MSTP Routine Troubleshooting Manual

Conclusion

In case of this fault, first check if the encapsulation protocols of the two
ends’ Ethernet network boards are consistent. If not, change their
encapsulation protocols into consistent.

Note:

The encapsulation mode of S330 equipment’s SFE board is determined


by the logic software of the board. That is, the logic software of the
boards adopted by GFP and HDLC’s protocol is inconsistent. The
encapsulation protocol should be consistent with the logic software
during usage.

3.2 Service Commissioning of TGE2B-E Board


in an Office Fails
Fault Description

ZXMP S390 equipment is applied in an office, mainly used for 1000M


Ethernet transparent transmission board service. Recently, due to
network expansion, a new group of devices arrives. The commissioning
of 1000M Ethernet transparent transmission service is required between
new S390 sites and old sites, yet it fails on the spot. There should be no
hardware fault in the newly delivered devices and boards. According to
the on-site maintenance personnel, replace the TGE2B-E board

-2-
Data Configuration Faults

(hardware version: B030801) in the new equipment with the previous


TGE2B-E board (hardware version: B030300) and the commissioning
of service is successful.

Cause Analysis

After analysis, there is no problem in the board software version and


version matching, but in the interconnection setting of different
hardware version TGE2B-E board. The main difference of the two
hardware version TGE2B-E boards is: the B030801 version supports
standard LCAS protocol, yet the B030300 version does not support it.
Therefore, when the service is established between the two hardware
version boards, the LCAS function cannot be enabled; otherwise, the
service setup failure. Recover the field data configuration and the
problem is solved.

Troubleshooting

Change the NM software configuration. Disable the LCAS function at


both ends, and their service is normal. The problem is then solved.

Conclusion

The setting of LCAS function should be consistent at both ends, being


enabled or disabled.

-3-
MSTP Routine Troubleshooting Manual

3.3 AU not Configured with Service in 10G


Optical Board of ZXMP S390 Reports AU-AIS
Alarm
Fault Description

The 10G optical boards of site A3 and site B3 in an office report


AU-AIS alarm. However, the AU channel reporting these alarms is not
configured with service and acts as the protective AU channel of MS.
Compare the content of equipment database

Compare the equipment data with the NM database and they are
consistent.

Cause Analysis

Check the history alarms and NM setting by restoring the field data. It is
found that the Idle AU detection setting item under the Alarm menu in
the field NM data is set as enabled, which causes the AU not configured
with service reports the AU-AIS alarm.

In the SNCI mode, the system sends AU-AIS to the idle channel by
default. For instance, there are site A and site B, and they are
interconnected. Site A sends the AU-AIS, if site B is configured with
Idle AU Channel Detection, site B will detect the AU-AIS and report it.

-4-
Data Configuration Faults

Troubleshooting

Deselect the Idle AU Detection Setting item at the NM software to


cancel the alarm.

Conclusion

The AU channel not configured with service reports AU-AIS because


the NM software is enabled with the Idle AU Detection Setting
function.

-5-
Power Faults

Chapter 4 Power Faults


4.1 Service Boards in Some Slots Report
Channel Alarm
Fault Description

Service boards of part slots prompt for channel alarm or pointer loss
alarm.

Take the expanded subrack for instance: it is inserted with the following
boards, as shown in Figure 4-1.

Figure 4-1 Boards Inserted in Expanded Subrack

-1-
MSTP Routine Troubleshooting Manual

Service configuration is: Each AUG of the 7# OL4 optical board is


configured to two EP1 boards. The 32nd tributary of the second EP1
board is disused. OL4 optical board has four AUGs in total. Ideally, it
needs eight EP1 boards to completely download the service, without
occupying the time division resource.

However, the service configured to two EP1 boards of 22# and 23# by
the first AUG of the 7# OL4 optical board always reports loss of TU12
channel alarm indication signal and loss of TU12 pointer. The service of
other slots' tributary boards is all normal.

Cause Analysis

The subrack is inserted with two power boards. Due to over-low current
output, the power board cannot supply power and becomes the load.
Therefore, the power supply to a specific slot’s board is too low, and the
board cannot work normally.

Troubleshooting

Pull out the power clock board with over-low output, or replace the
power clock board.

Conclusion

Whether the current output of the power clock board is stable will affect
the normal operation of all boards. Therefore, when more than one
boards malfunction, first check the operation status of the power clock
board.
-2-
Power Faults

4.2 Some Boards’ Service Failure


Fault Description

The S360 equipment is inserted with multiple boards, yet there is only
one power board, which may lead to service failure of some boards.

Cause Analysis

The subrack is inserted with multiple boards. Though they can work,
due to insufficient power supply and voltage, part high power
consumption chips of some boards cannot work normally.

Troubleshooting

Replace it with dual power clock boards.

Conclusion

When many boards are inserted in the subrack, the S360 equipment
should be configured with dual power clock boards.

-3-
Protection Faults

Chapter 5 Protection Faults


5.1 MS Switching Causes Temporary Break in
Service
Fault Description

During the networking of S360 device, NE A inserts the alarm to NE B


to implement Multiplex Section (MS) switching test. The transmission
service is interrupted for 7 seconds. Insert the alarm from NE B to NE
A to implement switching, and the test is normal.

Cause Analysis

Read the version of the board through the NM software, and the
difference is found. Refer to the table below for details.

NE Name NE A NE B

NCP 0X200106061030 0X200103051000

LP16 0X200204301640 0X200107021742

CSC 0X200107131058 0X200107131058

From the aspect of NE board version, this fault is supposed to be caused


by the inconformity in the software version of NE A and NE B’s
boards.

-1-
MSTP Routine Troubleshooting Manual

The distance of software time between NE A’s NCP board and LP16
board is too long, which leads to inconsistent version of the new LP16
and the old NCP board.

Troubleshooting

Upgrade the software of the NCP, LP16 and CSC boards, and the fault
disappears.

Conclusion

Before the stop production of S360 device, the final version is launched.
The old version boards of the existing network should be upgraded to
this final version as possible, to avoid the fault caused by too large
version discrepancy in the device’s board.

5.2 Timeslot Configuration Confusion Causes


Path Protection Configuration Failure
Fault Description

At the initial stage of a GSM engineering project, many BSCs’ SDH


slot configuration at the equipment room side is not in accordance with
the standard (that is, Ts1~Ts16 respectively correspond to the 1st to
16th E1 of the ET1 tributary board). The drop of many Ts is confused,
as shown in Figure 5-1.

-2-
Protection Faults

Figure 5-1 Network Timeslot Configuration

However, after the network reconstruction for MS-ring, due to the


confused grounding of slots in two directions and the reutilization of
slots in direction A and direction B, the path protection cannot be
configured.

When BTS side and BSC DDF side E1 lines are completed, the site is
also put into commercial application. Therefore, it is required to
complete path protection within the shortest period when E1 lines need
not to be remade.

-3-
MSTP Routine Troubleshooting Manual

Cause Analysis

Under the condition when the BSC side and BTS side E1 ports are
unchanged, the path protection configuration can be completed by
adjusting the configuration of slots at the BSC side and BTS side.

Besides, since the network is already in commercial application, it is


required to complete the configuration within the shortest period as
possible to shorten the service interruption time. Therefore, the slot
configuration information which needs to be configured as MS
protected NE NCP is loaded, to assure consistent data between the NE
NCP and the NM software. Then, export the related service report to
calculate the port information, and match the BSC side port to the BTS
side port in one-to-one mode. Implement slot configuration in off-line
mode. After that, export the related service report and contrast it with
the former port information to see if they are consistent.

Note:

During the slot delivery after configuration, the instantaneous break of


service might occur.

Troubleshooting

1. Calculate the sites working normally over the ring and observe the
performance of the network formed in ring, including 15-minute

-4-
Protection Faults

analog/digital performance, and 24-hour analog/digital


performance, to confirm that the network can reach the
configuration requirement of path protection.

2. Check if the optic connection of the network is correct.

3. Upload the NCP configuration information of all sites over the


ring, to assure that the data is the latest.

4. Export the current slot and port configuration information; that is,
the report in “related service query” and save it.

5. All NEs over the ring are off-line (For this step, the configuration
can be checked outside the equipment room).

6. Configure the BSC side SDH according to the one-to-one


correspondence relationship of the slot and the port, as shown in
Figure 5-2.

-5-
MSTP Routine Troubleshooting Manual

Figure 5-2 Network Timeslot Configuration after Reconfiguration

7. At the BTS side, according to the corresponding port information,


and the corresponding slot of two directions’ drop, all other slots
are through, completing the path protection configuration.

8. After configuration, export the service report. Check the modified


port information, to keep consistent with the previous port
information.

9. Set the NEs over the ring as on-line.

-6-
Protection Faults

10. Download the updated slot configuration to all NCP boards over
the ring.

11. The path protection configuration is completed.

12. Check the operation status of sites with BSC engineers to assure
that the path protection has been configured.

13. In direction A and direction B, at the optical path connected with


the SDH at the BSC side, insert the MS-AIS step by step, to assure
that the path protection configuration is successful.

Conclusion

1. Pay attention to the slot configuration mode and method at the initial
stage of engineering project. Make periodic check and patrol, so that the
problem can be detected and processed in time.

2. When the timeslot configuration is confused, this method can serve


as reference.

3. Fully apprehend the relationship between the port and the slot, as
well as the configuration method of path protection.

-7-
MSTP Routine Troubleshooting Manual

5.3 Cross-connect Board Failure Causes


MS-ring Switching Unsuccessful
Fault Description

A network consists of three ZTE’s ZXMP-S360 equipments, to form a


2.5G MS protective ring, as shown in Figure 5-3.

11# 5#

11#

5#
11# 5#
B
C

Figure 5-3 Network Structure

The 5# OI16 of NE A reports B1 error, and generates B2 error


simultaneously.

The 10# LP16 of NE B reports the remote end B2 error.

NE A and NE B report MS protection switching event simultaneously.


However, the service from point A to point B is interrupted.

Cause Analysis

 Whether the MS protective switching is normal? Check the MS

-8-
Protection Faults

protective switching status of site A and site B. Choose


Maintenance>Diagnosis>Protective Switching to query the status
of site A’s 5 # OI16 and site B’s 11 # OI16, being
“Auto-switching completed, waiting for recovery”. It proves that
the MS switching event has occurred. Check the switching status
of site C, and it shows no request.

 Whether the MS protection configuration is correct and whether it


starts up normally? Query the register 0x50009 (2-byte) of each
site’s 7#LP16, which all display “0100”. It proves that they are
configured with two-fiber bi-direction MS protecting protocol and
are in start-up status. Query the 0x40000 (3-byte) of 7#LP16,
and they respectively display east-direction APS ID,
west-direction APS ID and this node APS ID. The configuration is
correct. Query the 0x40005 (1-byte) of 7#LP16, and site A
displays 01, site B displays 00, site C displays 04. It means that
switching occurs in west direction in site A, switching occurs in
east direction in site B, and no switching occurs in site C. It
proves that the switching is successful. Query the register address
3000 (4-byte) of site A’s LP16 board to check the received K1K2
bytes in west and east direction; query the 3001c (4-byte) to check
the transmitted K1K2 bytes. If all are normal, check the K1K2
bytes of site B and site C. If no error, it proves that the MS
protection is normal.

 It may be the fault in a board which leads to the problem in the

-9-
MSTP Routine Troubleshooting Manual

latter eight AU service channels used as protection and the service


failure after switching. It can only be judged by loopback. Check
the flow direction of service between point A and point B during
switching: suppose that the service between point A and point B
goes to upper and lower tributary at the 2#EP1. During the
switching between A and B, the service flow direction of point A
is: 2#EP1 of point A→5#OI16 AU1 of point A→(switching)
11#OI16 AU9 of point A→5#OI16 AU9 of point C→11#OI16
AU9 of point C→5#OI16 AU9 of point B→(switching) 11#OI16
AU1 of point B→2#EP1 of point B. The switching process is then
completed.

 After analysis, we can make corresponding loopback operation.

Troubleshooting

1. Loopback operation: Hang the BER table on a tributary of 2# EP1


in point A, and then loopback the AU mentioned in cause analysis
section by section. The board with fault is then located.

2. Loopback at the line side of 5# OI16’s AU9 in site B, and it is


normal.

3. Loopback at the terminal side of 11# OI16 in site B, yet the


service fails. Locate the fault in site B.

4. The faulty boards might be CSC, 10# LP16 or 4# LP16. Since


there is a standby board for CSC board, switch the cross-connect

-10-
Protection Faults

board to switch the service to the standby cross-connect board.


The service is then recovered.

Conclusion

MS protective switching is unsuccessful, which is normally located at


the LP16 board or optical board. Yet the cross-connect board is
neglected. Sometimes, change the thought can locate the fault faster.

5.4 LP16 Board Failure Causes MS Protective


Switching Unsuccessful
Fault Description

The network consists of five ZXMP-S360 equipments to form a 2.5G


MS protective ring, and the service is normal.

During network operation, the service is interrupted when the optic


fiber is disconnected. The MS switching is unsuccessful.

Cause Analysis

 The configuration of the MS is abnormal.

 Check if the MS APS starts up normally.

 LP16 board failure

 Optical board failure

 Cross-connect board failure

-11-
MSTP Routine Troubleshooting Manual

Troubleshooting

1. Check the configuration of the MS and it is normal.

2. Check if the MS APS starts up normally. It can only be judged


from the register. Read the 50009 byte of ROM register of LP16
board in No. 7 slot. The length of the read-out byte is set as 2
bytes, and the read-out value is 0100. It indicates that the MS
configuration is correct and it can start up normally (for LP16F
board, read the a0009 byte).

3. Then, read the K1K2 bytes of each site. Read the 30000 byte of
ROM register of LP16 board in No. 7 slot. The length of the
read-out byte is 4 bytes. The data respectively means that the node
receives K1 byte in east direction, receives K2 byte in west
direction, receives K1 byte in west direction, and receives K2 byte
in west direction. In normal operation, the first four bits of K1
byte and the last four bits of K2 byte should all be 0. However, the
first four bits of K1 byte and the latter four bits of K2 byte read
out at a site are not all 0. Therefore, the 7# LP16 of this site is
doubtful.

4. Replace the 7# LP16 board of this site according to the steps


below:

(1) Disconnect the 5# optical fiber of this site.

(2) Suspend the APS protocol of the two sites whose optical fiber is
disconnected.

-12-
Protection Faults

(3) Set the 77777 register of the two sites’ cross-connect board to
01.

(4) Pull out the 7# LP16 board to replace it.

(5) When the LP16 board is normal, start up the APS protocol of
the two sites.

(6) Set the 77777 register of the two sites’ cross-connect board to
00.

(7) Reconnect the optical fiber.

Conclusion

The register is frequently used during troubleshooting. It is also a very


effective tool.

5.5 ZXMP S360’s OL1 Board Fault Causes Path


Ring Switching Failure
Fault Description

A local transmission network adopts ZTE’s ZXMP S360 equipment in


networking. The whole network consists of four ZXMP S360 NEs to
form a path protecting ring. The transmission rate is 155 Mbit/s.

The network structure is shown as in Figure 5-4, and the central office
is located at NE A.

-13-
MSTP Routine Troubleshooting Manual

Figure 5-4 Network Structure

The connection relationship of optical fiber is: 10# OL1 of NE A is


connected to 7# OL1 of NE B, and 10# OL1 of NE B is connected to 7#
OL1 of NE C.

Service configuration: There are 2M services from NE B, C, and D to


NE A. The B-to-A work path is from the 7# of NE B to the 10# of NE
A, and the protecting path is from link B-C-D to the 7# optical fiber of
NE A. The services of the three NEs are in the same AU.

One day, the fiber between NE A and B is disconnected, and all 2M


services from NE B to NE A are interrupted.

Cause Analysis

1. For the path ring protection failure, first judge if the service
configuration is correct. The protecting path is checked and found
to be normal. To assure the consistency of the NM data and NE
-14-
Protection Faults

data, re-deliver timeslots to each site of the protecting path, yet the
fault still exists.

2. Check the timeslot configuration, the protection timeslot from NE


B to NE A is in straight through from NE C and NE D. Since the
service of NE C is in straight through from NE ED, the
straight-through service of NE D is judged as normal. To further
confirm the situation, configure the NE A-to-NE B service to NE
C and the service is normal. It therefore proves that the
straight-through of NE D is normal.

3. After eliminating the doubt on NE D, mainly check NE C and NE


B. Since the local service of NE C is normal and only the
straight-service fails, it may be caused by the fault in EP1 board,
cross-connect board, or 7# OL1 board of NE C, or caused by the
fault in EP1 board, cross-connect board, or 10# OL1 board of NE
B itself. Since the service of NE B is broken, check and operate
NE B first.

4. To locate the faulty site, loopback at AU's terminal side for NE B's
10# optical board, and find that the tributary board alarm of NE B
still exists. Loopback at the terminal side of NE C's 7# optical
board, and the alarm of NE A corresponding to NE B's service
disappears. Thus, the fault is surely in NE B.

-15-
MSTP Routine Troubleshooting Manual

Troubleshooting

1. First, switch the cross-connect board of NE B, and the problem


still exists.

2. Then, replace the EP1 board, and the problem still does not
disappear.

3. Finally, replace the OL1 board and the fault disappears. In this
way, the fault is judged to be in the 10# OL1 board of NE B.

Conclusion

For the service break caused by protection switching, check if the


protection configuration and data are correct. Then locate the fault point
and analyze the reason.

Shrink the range of fault location through methods such as switching


back or changing configuration.

5.6 S360 MS Switching Causes Part Services


Unstable
Fault Description

Figure 5-5 shows a 2.5G MS-ring consisting of six-end S360


equipments. When the MS does not switch, all services are normal.
There is no abnormal alarm and performance in service board or optical
board.

-16-
Protection Faults

One day, when MS switching test is implemented over the ring, after
the switching between NE D and NE E, a short break occurs in part
services every 3 to 5 seconds. Switch back and the service is recovered.

Figure 5-5 Network Structure

Cause Analysis

The service is broken when protection switching occurs, so the problem


is located in the channel, which can be analyzed from the following two
aspects:

 NE D or NE E’s cross-connect board fault

 NE D or NE E’s LP16 board fault

Troubleshooting

In the STM-16 two-fiber MS protection (MSP) ring consisting of S360


equipments, the services go through the working channel when there is

-17-
MSTP Routine Troubleshooting Manual

no switching. The LP16 board near the cross-connect board processes


the service of 1-8# AU.

After MS switching occurs, part services pass through the protection


channel, and the LP16 board used for protection needs be used (that is,
away from the LP16 board of cross-connect board).

Follow the operations below to be away from the LP16 board of


cross-connect board:

1. Switch the master/standby cross-connect boards of the above NEs


at the NM software, and the master/standby cross-connect boards
of all sites are verified as normal.

2. The maintenance personnel provide an important clue: the air


conditioning of NE E’s site malfunctions before (already repaired
now). Therefore, the LP16 hardware may be faulty due to high
temperature. After the LP16 board is replaced, in the switching
status, part services passing through the protection channel are still
broken, yet one break in every 2 minutes, a little bit improved.
Since the problem is still not solved, the LP16 of NE D’s site is
also doubtful.

3. In the equipment room of NE D, the temperature is found to be a


little high. Check the fan and the dust screen. The dust screen is
heavily dusted, and the fan does not rotate. The power of the fan is

-18-
Protection Faults

on. So the malfunction of fan is doubted to be caused by high


temperature or heavy dust. Dissipate the heat of the equipment
with the fan, and three fans in the fan subrack start working. When
the temperature returns to normal, the fault still exists.

4. Replace the 24# LP16 board, and the fault disappears. The fault of
the 24# LP16 board is likely caused by high temperature.

Conclusion

Check the operation environment of the equipment periodically. In case


of the failure in air conditioning in the equipment room, repair it
instantly.

Clean the dust screen of the equipment periodically and check if the
fans of the equipment work normally. In case of the fan fault, replace
the fan in time.

-19-
NM Faults

Chapter 6 NM Faults
6.1 E300 NM Alerts “Database Disconnected”
Fault Description

In the process of transmitting E300 V3.18R2 version NM software, the computer and

the NM software are restarted due to sudden power failure, and the login to the NM

client end fails. The detail info table displays “Database disconnected”.

Check the NM process and find that the database service


process—dbsvr.exe is not started or disappears quickly after startup.

At the dbman tool page, execute 3 and then 1. Check the operation
status of each database, and find that the status of the config. database is
suspend, and the object status of the config. database is unknown, as
shown in Figure 6-1.

-21-
MSTP Routine Troubleshooting Manual

Figure 6-1 NM Process Page

Cause Analysis

1. Analyze the log file and find that the config. database is suspended
in the Sybase database, so the dbsvr.exe process cannot start up
normally.

The faulty section in ..\db\dbsvr .log shows that the dbsvr.exe process
keeps restarting yet fails all along.

1 2008/10/14 13:35:07 20480

ZXONM E300 for NT DBSVR V3.18 R2P08a


COPYRIGHT(C) 2001-2007

2 2008/10/14 13:35:07 20481 [dbserver]

-22-
NM Faults

dbserver thread starts up

3 2008/10/14 13:35:07 20482 [dbserver]

dbserver thread exits

4 2008/10/14 13:35:07 28674 [dbserver]

DBSVR exits

1 2008/10/14 13:35:22 20480

ZXONM E300 for NT DBSVR V3.18 R2P08a


COPYRIGHT(C) 2001-2007

2 2008/10/14 13:35:22 20481 [dbserver]

dbserver thread starts up

3 2008/10/14 13:35:22 20482 [dbserver]

dbserver thread exits

4 2008/10/14 13:35:22 28674 [dbserver]

DBSVR exits

2. The faulty section in ..\db\dboperate_error.log first attempts the


config. database recovery yet fails. Finally, it keeps reporting that
the config. database is suspended.

Database 'TransDB' has not been recovered yet - please wait and try
again.

-23-
MSTP Routine Troubleshooting Manual

Database 'TransDB' cannot be opened. An earlier attempt at recovery


marked it 'suspect'. Check the SQL Server errorlog for information as to
the cause.

3. The faulty section


in ..\sybase\ASE-12_5\install\SQL_ZXONM.log has no TransDB
online status, yet has the above recovery attempt and suspended
record.

Record of normal startup and online status:

00:00000:00001:2007/06/19 09:09:56.44 server Recovering database


'TransDB'.

00:00000:00001:2007/06/19 09:09:56.44 server Redo pass of


recovery has processed 1 committed and 0 aborted transactions.

00:00000:00001:2007/06/19 09:09:56.51 server Checking external


objects.

00:00000:00001:2007/06/19 09:09:56.52 server The transaction log in


the database 'TransDB' will use I/O size of 2 Kb.

00:00000:00001:2007/06/19 09:09:56.52 server Database 'TransDB' is


now online.

Record of recovery attempt and suspended status:

00:00000:00001:2008/10/14 08:16:24.31 server Database 'TransDB'


has not been recovered yet - please wait and retry.

-24-
NM Faults

00:00000:00001:2008/10/14 10:11:30.89 server Database 'TransDB'


cannot be opened. An earlier attempt at recovery marked it 'suspect'.
Check the SQL Server errorlog for information as to the cause.

4. To sum up, the config. database in the Sybase database fails after
sudden power down and is suspended. The cause is that the
internal in-database mode of sybase in the NM software of E300
V3.18R2 or above version is changed to the asynchronous mode.
Though the efficiency of writing to database is raised, the risk of
the database being suspended due to the sudden power down in the
process of in-database is great.

5. The NM software is low in efficiency when processing huge data


in sync mode, typically represented in database error report and
history data loss. In the rare case when the history data for daily
in-database is huge all along, operations such as wrap connection
and storage transfer of the database every six hours will take up
great CPU and memory resources, and the NM software cannot
operate. The consequence is also quite serious.

Troubleshooting

1. Assure the electric safety of the NM software, and prevent sudden


power failure during the operation of the NM software. If there is
no guarantee to the power supply, change the in-database mode for
sites which have no huge history data or low requirement on
history data.

-25-
MSTP Routine Troubleshooting Manual

2. In case of problem, the temporary solution can be adopted. That is,


recover the database in usage to readable by using the dbman tool,
and then re-activate the database.

3. Re-install the NM software (Upload after the latest backup data is


ready or the more recent data is recovered).

Use the DBMAN tool to solve the fault of Sybase’s being suspended.
The steps of re-creating database after recovery are as below:

(1) Set the database to "bypass recovery" status.

1>sp_configure "allow updates",1

2>go

1>use master

2>go

(2) Set the database to " readable" status.

1>update sysdatabases set status=-32768

2>where name="database_name"

3>go

1>shutdown with nowait

2>go

(3)Restart Sybase by using the dbman tool (execute 2 and then 1) for

-26-
NM Faults

Start dataserver. Use the dbman tool (execute 4 and then 1) to


Backup database. Backup the data of the current NM software
(may select whether to backup history data).

(4)Re-create database through the dbman tool (respectively execute


3->3, and 3->2) for Drop database and Create database. After
creating the database, recover the backup data through the dbman
tool.

Conclusion

The above problem should be noted if the NM software of E300 V3.18


or above version is adopted in the engineering. This solution can
recover the data of the NM software in usage, and avoid NM software
reinstallation.

6.2 T31 NM’s Client Program Cannot Start up


Normally
Fault Description

After installing the T31 client program at a maintenance terminal, click


the client program yet the login page cannot appear normally. The
program is not loaded.

Cause Analysis

1. Check Task Manager of the operation system (OS) and find no


client program loading or related java process. It indicates that the

-27-
MSTP Routine Troubleshooting Manual

client program is not executed by the OS and directly exits


abnormally.

2. Close some run programs of the OS, such as antivirus software


and firewall. Then start up the client. There is certain probability
of starting up the client program normally.

Troubleshooting

1. Check the hardware configuration of the computer. If it is uni-core,


1G memory and built-in video card, it might be caused by
insufficient memory. It needs to modify the configuration of T31
client program to verify the problem.

2. Modify the file \ums\clnt\bin\run.bat, and search for ‘set


JVM_MX=-Xmx512m’. If the value is too big, it may cause over
large demands on memory. The computer with insufficient
memory and low hardware configuration may not run the program
normally.

Conclusion

1. T31 NM software requires high hardware configuration of


computer. Even the client is used solely, it may cause insufficient
memory and the program may not run normally.

2. Some other software may also lead to the abnormal running of


client program. Try to avoid installing programs such as firewall
and antivirus software on the OS of computer.

-28-
NM Faults

6.3 E300 NM S320 NEs’ Board Indicator Lights


Cannot Flash
Fault Description

In some S320 NEs of E300 NM software, all boards’ indicator lights are
shown grey in the boards view, unlike the boards view of other S320
NEs (flashing slowly in green normally).

Cause Analysis

1. NCP board down or S port blocked

2. Abnormal working of the NM software

3. Old NCP board version

Troubleshooting

1. The NM software implements communication test for S port, and


the S port of all boards is tested as normal. The problem of NCP
board down or S port blocked is eliminated.

2. The board performance is checked as normal by the NM software,


and other NEs’ indicator lights are also normal. The abnormal
working of the NM software is eliminated.

3. Check the version update of the network. The E300 NM software


is upgraded from 3.16 to 3.18 version, yet the 3.18 version should
be cooperated with the new version of NCP to support the flashing

-29-
MSTP Routine Troubleshooting Manual

of indicator lights in NE’s boards view. The NCP version is found


to be old and cannot support the indicator light function.

4. After the version problem is confirmed, there are two solutions for
it: 1. keep the status unchanged since the board’s indicator light
function does not affect the normal maintenance; 2. upgrade the
NCP version of the S320 equipment to keep it consistent with
other NCP’s version. It is suggested to adopt the second solution
to achieve the indicator light function.

Conclusion

After upgrading the NM software, check if the new NM software


function can be used in the current network. Once a problem is found,
check in time and confirm if it needs to upgrade part NEs with old
version.

-30-
ECC Faults

Chapter 7 ECC Faults


7.1 Board Reset with Telnet
Fault Description

The IP and ID of the newly deployed ZXMP-S320 equipment are


already set at the central equipment room, and the commissioning of the
service is not started yet. The engineering construction team installs the
equipment at the subordinate sites, and connects the optic fiber. In this
way, to open a new service only needs to set the data at the central site,
with no need of testing at the subordinate sites.

However, after the engineering construction team completes the


installation and returns, when to open the service, the newly established
NE can be ping successfully at the NM software, yet the NCP time
cannot be acquired.

Cause Analysis

 If the NE can be ping, the configuration of the IP address is


correct.

 Check if the ID setting is correct. After enquiry, the ID setting


has no problem.

 Check if the NCP state is normal. Telnet the NCP board.

-31-
MSTP Routine Troubleshooting Manual

Troubleshooting

1. After Telnet the NCP board, execute the resetmcu 1 command to


reset the NCP board.

2. Acquire the NCP time at the NM software, and it is acquired


normally. The NCP monitor is normal.

Conclusion

Method for judging if some devices support the resetmcu command:


After Telnet the NCP board, check with the Help command. If there is
the resetmcu command, the equipment supports it; if not, the equipment
does not support it.

7.2 NCP Board Fault Causes ECC Failure


Fault Description

Site A and B adopt the ZXMP S360 equipment. Site C, D, and E adopt
the ZXMP S320 equipment. Site A is the access NE, as shown in Figure
7-1. The NM software of site A can monitor other sites except site B.

-32-
ECC Faults

Figure 7-1 Fault Analysis

Cause Analysis

1. Telnet the NCP board at site A. Check the connection status of the
port and find that the route of site B’s optical direction is already
established.

2. The IP address of site B can be ping at the NCP board of site A,


yet the IP address of site B cannot be ping at the NM computer.

3. Access a laptop at site B to implement normal monitor over the


whole network. It is judged that site B is normal and the fault is in
site A.

4. Telnet the NCP board of site A and check the ECC route. If it is
normal, the optical board has no fault.

5. Finally, the fault is located in the NCP board of site A.

-33-
MSTP Routine Troubleshooting Manual

Troubleshooting

Reset the NCP board of site A and the problem is solved. Keep
observing. If there is any problem again, replace the board.

Conclusion

Get familiar with ECC related commands and usage. Solve the problem
on the basis of judgments from many aspects.

-34-
Clock Sync Faults

Chapter 8 Clock Sync Faults


8.1 Clock Configuration Error Causes Unstable
Clock
Fault Description

ZXMP S360 equipment is adopted in forming chain network, as shown


in Figure 8-1. NE A and NE I are equipped with external clocks. After
the commissioning of the equipment, the clocks have always been
unstable, and there are sudden AU PJ pointer adjustments. When the
link is broken, some sites’ clocks loose lock.

Figure 8-1 Network Structure

Cause Analysis

If there is AU pointer adjustment, it might be caused by the clock sync


problem. The rule for processing clock sync faults is: Whether there is
B1, B2?  Is there only the TU pointer adjustment?  Process the AU

-35-
MSTP Routine Troubleshooting Manual

pointer adjustment.  Switch the optical receiving direction. 


Replace the clock board.

The clock board makes the external clock or extracted line clock as the
input of phase-locking circuit to compare the phase. Therefore, the
quality of this board’s crystal oscillator will affect the quality of the
clock.

Troubleshooting

1. Reach site I and check the clock configuration of each site.

 Clock setting in site I: first, extract the clock of site H; then


the external clock.

 Clock setting in site A: first, the external clock; then, extract


the clock of site B.

 Clock setting of other sites: extract the line clocks at both


sides.

2. Get the clock status of each site, and they are all locked. However,
site B, C, D extracts the clock in site A direction, site H extracts
the clock of site I, site A and I are external clocks. After analysis,
the clock instability should be caused by configuration error.

3. It is confirmed that site A's clock is a 3-level clock instead of a


G.811 clock, and its clock level is lower than that of site I.
Therefore, site I adopts its own external clock, and does not enable
the S1 byte.

-36-
Clock Sync Faults

Note:

The S1 byte is not configured only in the ring network. Here, the cause
of line clocks being extracted from both sides is that the S1 byte is not
enabled.

4. Enable the S1 byte and do not change the clock setting. Extract the
clock from the equipment adjacent to site A (here is the G.811
clock) as the external clock of site A.

After modification, the whole network’s NEs synchronize site A, and


the external clock of site I is used as secondary reserved clock. When
the external clock of site A fails and enters into auto-oscillation, it sends
S1:0B. After receiving it, site I starts up switching and sends S1:04.

Conclusion

For ZTE SDH serial equipment, note the following points:

 Do not configure the internal clock if possible.

If the internal clock is not configured, it enters into 24-hour


auto-oscillation is; if the internal clock is configured, it enters into
internal clock directly. Besides, the internal clock can hardly be
switched to other clock. For instance, external clock 1, external clock 2,
and internal clock are configured. External clock 1 and external clock 2
can switch mutually. However, to switch the internal clock back to
external clock, it needs the NM software to resend the command.

-37-
MSTP Routine Troubleshooting Manual

 Adopt SSM

Currently, ZTE’s 10G equipment only supports the ITU-T standard.


ZXMP S360 equipment not only supports the ITU-T standard
mode, but also supports the self-defined mode. For the ring
network consisting of 10G and ZXMP S360 equipments, only the
ITU-T standard mode can be adopted.

The protection of SSM cannot be formed in the following two


kinds of situations:

(1) Ring network, which has the access of two external clocks to
achieve active/standby protection.

(2) Ring network, in which the access clock is internal clock.

-38-
ASON Faults

Chapter 9 ASON Faults


9.1 Call Connection Cannot Reply
1—Insufficient Bandwidth
Fault Description

In a network, a line section is interrupted due to cut-over. After the


optical fiber is broken, part service fails. Check the view of the route
called by the service, and find that its protective connection is not
established.

Cause Analysis

1. Check the connection status of the call with the set-up failure and
find no abnormal attributes configuration or limit strategy.

2. Check the TE resource of the line, and find that an NE at the


broken fiber has only two-line STM-16 resources. Among them,
an optical direction has already displayed 16 AU4s of the total
bandwidth, and the idle bandwidth is 0, which indicates that all
bandwidth is engaged.

3. Due to insufficient bandwidth, other services cannot set up new


call connection over this line.

-39-
MSTP Routine Troubleshooting Manual

Troubleshooting

1. Check the fully-configured line resource, and confirm the service


path passing through it.

2. Check if the service path can travel other route, and empty the
AU4 resource as possible.

3. Optimize the rerouting method manually and designate the path of


call service to recover the interrupted connection. Or, select the
service, and send the command of startup recovery to recover the
service.

Conclusion

1. During the configuration of the Mesh network, services should be


balanced, to prevent a large numbers of services from engaging a
line’s resource. Otherwise, when the line is disconnected, there are
no enough idle timeslots to be allocated for connection recovery.

2. When the service is interrupted, check the line the service passing
through to see if there is idle bandwidth for newly established
connection. If the bandwidth is not enough, adjust part
connections by hand and recover the service in priority. It is
suggested to use no higher than 60% network resources, so that
the left resources can be reserved for recovery.

-40-
ASON Faults

9.2 Call Connection Cannot Reply


2—Restriction of Route Policy
Fault Description

In 1+1 SNCP protection service of a Mesh network, after an optical


path is broken, it shows that the protective route is interrupted, and the
protective recovery connection cannot be set up, as shown in Figure 9-1.
In the figure, the protective route is the broken route in red.

Figure 9-1 1+1 SNCP Service Protection

-41-
MSTP Routine Troubleshooting Manual

Cause Analysis

1. Check the call attribute setting of SNCP service, and the TE link
the protective connection passing through. It is found that the
route policy set for this call service is: selecting “node irrelevant”
and “link irrelevant” items.

2. The service-set route policy has removed the interrupted TE link.


The protective route does not satisfy “node irrelevant” and should
pass through a node in the work path. Therefore, when calculating
the route, the control panel believes that this protective connection
has no route satisfying the condition.

Troubleshooting

1. Edit the route policy of this 1+1 SNCP and deselect the “node
irrelevant” item. Then optimize re-routing for protective
connection, to set up protective connection automatically.

2. Or keep the present route policy unchanged and wait for the
recovery of lines and protective route.

Conclusion

1. When configuring 1+1 SNCP service, take the route in the existing
network into consideration, including work/protective route and
the possible recovery route. If a section of optical path is broken,
the work/protective connection should be able to find the third
independent route as the recovery route.

-42-
ASON Faults

2. When configuring 1+1 SNCP service, if not to select “node


irrelevant” or “link irrelevant”, the work/protective connection
may pass through the same node or link during recovery, which
leads to the risk of interruption in work and protection
simultaneously.

3. It is suggested to set the “reply” attribute when configuring 1+1


SNCP service, so that the service can automatically reply after the
fault recovery of initial work/protecting route.

-43-
Interconnection Faults

Chapter 10 Interconnection
Faults
10.1 10G Optical Boards of ZTE S390
Equipment and Marconi MSH64 Equipment
Fails in Interconnection
Fault Description

In a network, ZTE S390 equipment’s 10G optical board is


interconnected with Marconi MSH equipment’s 10G optical board.

The interconnection fails after the debugging by the NM software for


several times. ZTE equipment reports for LOF error, and Marconi
equipment reports for RS great error code.

Cause Analysis

Test ZTE S390 equipment’s 10G optical board by using the SHD
analyzer (ONT50) and no problem is found. No matter adopting the port
self-loop or VC4 timeslot self-loop mode, the meter tests with no
problem and no alarm appears. The error code is 0.

Test Marconi MSH equipment’s 10G optical board. After setting


loopback for the board, the meter tests with problem, and there is
always the alarm code. Get the signal frame transmitted by the

-45-
MSTP Routine Troubleshooting Manual

equipment, and the A2 byte is found to be inconsistent with the


international standard. Refer to Figure 10-1 and Figure 10-2.

Figure 10-1 SDH Frame Overhead 1

-46-
Interconnection Faults

Figure 10-2 SDH Frame Overhead 2

 Hint:

SDH standard’s explanation on A1 and A2 bytes:

The function of framing bytes is to distinguish the starting point of


frame, so that the receiving end and the transmitting end can keep
frame synchronization. The first step of receiving SDH code streams is
to select and separate each STM-N frame from the received signal

-47-
MSTP Routine Troubleshooting Manual

streams correctly. That is, first locate the starting position of each
STM-N frame, and then identify the position of corresponding overhead
and payload in each frame. A1 and A2 bytes can perform the function of
framing. Through it, the receiving end can locate and separate the
STM-N frame from the information flow, and then find a VC
information packet in the frame through the location of the pointer.

How the receiving end locates the frame through the A1 and A2 bytes?
A1 and A2 have fixed value, namely, fixed bit pattern: A1: 11110110
(F6H), A2: 00101000 (28H). The receiving end checks each byte in the
signal flow. When 3N A1 (F6H)s appear successively, and 3N A2
(28H)s appear subsequently (STM-1 frame has three A1 and A2 bytes
respectively), it judges that it has received one STM-1 frame. The
receiving end distinguishes different STM-1 frames by locating the
starting point of each STM-1 frame, to reach the aim of separating
different frames.

If correct A1 and A2 bytes cannot be received from more than five


frames (625μs) successively; that is, if the framing bytes cannot be
distinguished for more than five frames successively (to distinguish
different frames), the receiving end enters into the loss of frame
alignment (LOA) status and generates related alarms—OOF. If OOF
lasts for 3 ms, it enters into the loss of frame (LOF) status, and the
equipment generates LOF alarms. That is, it sends AIS signal to the

-48-
Interconnection Faults

downstream direction. The whole service is interrupted. In the LOF


status, if the receiving end receives correct A1 and A2 bytes for over 1
ms, the equipment returns to the framing status (IF) in normal
operation.

Since the 10G optical board of Marconi MSH equipment is not


configured with A2 value as per the standard, it cannot be
interconnected with other manufacturer’s equipment. Since A2 value is
the fixed bit pattern prescribed in the standard, no manufacturer is
allowed to adjust it without permission. ZTE also cannot adjust it
according to Marconi’s setting. Therefore, the 10G optical boards of
ZTE S390 equipment and Marconi MSH equipment cannot be
interconnected successfully.

Troubleshooting

There is no solution at present. Since Marconi MSH equipment’s 10G


optical board is not configured with A2 value as per standard, it cannot
be interconnected with other manufacturer’s equipment.

10.2 C2 Byte Causes ATM Service


Interconnection Failure
Fault Description

The former network (consisting of three-end ZTE S600 V2 equipments)


and one ZXMP S360 equipment form a 622 M ring network, as shown
-49-
MSTP Routine Troubleshooting Manual

in Figure 10-3. Among them, one ZXMP S320 equipment is connected


with the ATM equipment through the 155M optical path. The gateway
NE is ZXMP S360, which is interconnected with Huawei’s transmission
equipments through the 155M optical boards, and Huawei’ equipment is
connected with the ATM equipment at the other end.

The whole service channel is 622M and in direct-through. The service


operates normally.

Figure 10-3 Former Network Structure

After the gateway is replaced from ZXMP S360 to ZXMP S380


equipment, the OL1 optical port of ZXMP S380 equipment reports the

-50-
Interconnection Faults

VC4-RDI error, and the ATM data service fails, as shown in Figure
10-4.

Figure 10-4 Network Structure after Adjustment

Cause Analysis

ZXMP S380 equipment’s processing of C2 value is most likely to be


the problem.

Besides, it is also likely to be caused by inconsistent C2 values of the


interconnected transmission equipments belonging to two different
manufacturers. When the ATM data equipment detects that the C2
value is non x13 or 0x01, the signal is deemed as invalid.

-51-
MSTP Routine Troubleshooting Manual

When ZXMP S360 equipment is interconnected with Huawei’s


equipment, the C2 valued received by Huawei’s equipment is detected
as 0x01. Therefore, if the C2 value transmitted by ZTE’s equipment is
also 0x01, the service will be successful.

The following are the processing of common optical boards over C2


value:

 ZXMP S360 equipment’s OL1 optical board terminates and


regenerates the C2 value. The transmission value is 0x01 by
default.

 ZXMP S320 equipment’s O4CSD optical board terminates and


regenerates the C2 value. The transmission value is 0x02 by
default.

 ZXMP S320 equipment’s OIB1 optical board terminates and


regenerates the C2 value. The transmission value is 0x01 by
default.

 ZXMP S380 equipment’s OL1 and OL4 optical boards feed


through the C2 value.

Therefore, it is OK to interconnect with the ZXMP S360 equipment. As


for the failure after the gateway NE is replaced with the ZXMP S380
equipment, it is because that the C2 value transmitted by the ZXMP
S320 equipment is 0x02, being inconsistent with the C2 value of the
interconnected equipment at the other end.

-52-
Interconnection Faults

As for the 2500E equipment reporting VC4-RDI error, it is because that


Huawei’s equipment sets the higher order overhead as feed-through and
it is the remote alarm sent from the ATM data equipment side. When
Huawei’s equipment terminates the C2 value, there is no alarm.

Troubleshooting

1. At the OL1 optical port of ZXMP S380 equipment, implement


loopback at the port line side for the ATM data equipment at the
Huawei equipment side. The service is not successful.

2. Implement loopback for the ATM equipment at Huawei’s


equipment and the service is recovered. Here, Huawei’s optical
port is in feed-through status for higher order overhead.

3. Set the higher order overhead of Huawei’s optical board for


interconnection as terminated. The RDI error of ZTE ZXMP S380
equipment’s OL1 board disappears, yet the service is still
unsuccessful. The terminal-loopback service at the Huawei
equipment side also fails.

4. The C2 byte transmitted by the ZXMP S380 equipment is 0x02, as


read at Huawei’s equipment. The C2 value transmitted by
Huawei’s equipment is 0x13, as read at the 2500E equipment.

5. To recover the service, interconnect the ATM service through the


OL1 board of ZXMP S360 equipment. That is, connect the OL1

-53-
MSTP Routine Troubleshooting Manual

board of II-model equipment between ZXMP S380 equipment and


Huawei’s equipment in a chain.

Conclusion

The transmission equipment’s rules of detecting the trace ID


mismatching alarm (J0、J1、J2) are as below:

1. If the NM software is configured with the expected value, the


transmission equipment will detect according to the expected
value.

2. If the NM software is not configured with the expected value or if


the configured expected value is deleted, the board deems the
expected value to be any character. That is, any value received
will not be deemed as the mismatching of trace ID.

Note:

Since it is a gross command, the delete command will not be shown. It


just has no trace ID item in the overhead setting command.

The rules of setting boards J0, J1, and J2 are as below:

 Adopt the 16-byte frame format of E.164.

-54-
Interconnection Faults

 Set as UNITRANS by default.

 Input “0DH 0AH” for the last two positions.

 Fill other vacant positions with “20H” (space).

 The settings of the NM software exclude the check value.

 The transmit value and expected value set at the NM software


should no higher than 13 digits.

Explanation on C2 byte:

 For time division and crossing, the transmit value of C2 byte is


fixed as 02.

 For ET1, TT1, and the data board using TU11 and TU12, the
transmit value of C2 byte is fixed as 02.

 For ET3, TT3, and VC4, the transmit value of C2 is fixed as 02,
and VC3’s C2 value is fixed as 04.

 For ET4, the transmit value of C2 is 0x12.

 For the data board using VC4, the transmit value of C2 is 0x16
(HDLC/PPP encapsulated), 0x18 (LAPS encapsulated), or
0x1B (GFP encapsulated).

 For the ATM board, the transmit value of C2 is 0x13.

-55-

You might also like