You are on page 1of 7

"Backend cleanup process for factory re-installation of VNX OE for File (NAS software for VNX File)"

ID:

emc257758

Usage:

14

Date Created:

12/10/2010

Last Modified:

02/03/2012

STATUS:

Approved

Audience:

Support

Question:

Backend cleanup process for factory re-installation of VNX OE for File (NAS software for VNX
File)

Environment: EMC SW: VNX Operating Environment (OE) for File


Environment: Product: VNX File/Unified
Environment: Backend cleanup using nas_raid -s cleanup
Environment: Factory re-installation using Express Installation DVD image on Control Station
Problem:

Requirements to perform a factory re-installation of the Operating Environment for File (that
is, NAS code)
nas_raid script fails if system is part of a multi-domain system:

Problem:

Cannot cleanup domain master. Please move master to another array.


Backend cleanup for factory re-installation of the File O/S
Cautions:

During the cleanup process, the Control LUNs are zeroed out so as to make a fresh reinstallation possible. It should be noted that the Control LUNs and default Storage
Group (~filestorage) are now part of the FLARE private LUN space, and no longer
directly accessible from the GUI or NaviCLI.
After the cleanup process, verify that all Control LUNs are owned by SP A on Chain 0,
or the installation process will fail.
Cleanup script may not remove other Storage Groups, Storage Pools, and the like.
Backend cleanup script does not remove the default ~filestorage HBAUID records and
must be done manually.

VNX FILE/UNIFIED BACKEND CLEANUP PROCEDURE


Fix:

1.

Deconfigure Proxy ARPthe main task here is to get the SPs back on the 128.221.252
& 253 networks:
# /nasmcd/sbin/clariion_mgmt stop
Note: If you cannot stop Proxy ARP services or cleanup the backend, see emc287103
for possible workarounds. LUNs 0 & 1 must be zeroed out in order to perform a fresh
reinstall of File OE.

2.

Verify that the storage processors (SPs) are up and running with the default internal
network IP addresses:
# ping 128.221.252.200
PING 128.221.252.200 (128.221.252.200) 56(84) bytes of data.
64 bytes from 128.221.252.200: icmp_seq=1 ttl=128 time=0.535 ms
# ping 128.221.253.201
PING 128.221.253.201 (128.221.253.201) 56(84) bytes of data.
64 bytes from 128.221.253.201: icmp_seq=1 ttl=128 time=0.353 ms

3.

Make sure /tftpboot directory is available at root of system--untar from /nas/tools if


required:
# cd /
# tar zxvf /nas/tools/tftpboot.tar.gz

4.

Unset the NAS_DB environment and stop NAS Services:


Note: If running dual Control Stations, shutdown CS1. If onsite, unplug the power
cable from CS1 and leave it offline.
# unset NAS_DB
# /sbin/service nas stop

5.

Run the Cleanup script (which may take 15-20 minutes to complete):
# cd /tftpboot/setup_backend
# ./nas_raid -n ../bin/navicli -a 128.221.252.200 -b 128.221.253.201 s cleanup
Do you want to clean up the system [yes or no]?: yes
Cleaning Storage Group "~filestorage"
Removing LUN
PXE boot slot 2...
Starting NBS on all control LUN
Zero LUN 1 with dd.
Finished with LUN 1.
Zero LUN 0 with dd.
Finished with LUN 0.
Removing diskgroup
The following storage groups still exist:
~filestorage
Removing spares
Security domain removed
Done
Note: If the nas_raid script fails with 'Cannot cleanup domain master', you will need to
remove any other systems from the domain before the script will complete.
# /tftpboot/bin/navicli -h 128.221.252.200 domain -messner -remove 10.241.216.233
[SP IP of other domain to remove from the current domain]

6.

Verify that Control LUNs have been properly zeroed out:


# /sbin/fdisk l | grep partition
Disk /dev/nda doesn't contain a valid partition table
Disk /dev/ndb doesn't contain a valid partition table
Disk /dev/ndc doesn't contain a valid partition table
Disk /dev/ndd doesn't contain a valid partition table
Disk /dev/ndf doesn't contain a valid partition table
Note: It is possible that you may not have NBS access to the backend LUNs from your
Blades. If this is so, you must first PXEBoot a blade in order to restore backend LUN
access. The /dev/nde partition is not zeroed out.
Optional Method for zeroing LUNs 0 & 1:
# /nas/sbin/t2pxe -force_pxe ALL -->Force PXE boot of all servers, then if it reports

success, try to zero out LUNs 0 & 1


# dd if=/dev/zero of=/dev/nda bs=1MB count=134
# dd if=/dev/zero of=/dev/nde bs=1MB count=134
7.

Manually remove other Storage Groups, if necessary:


# /tftpboot/bin/naviseccli -h 128.221.252.200 -user sysadmin -password sysadmin scope 0 storagegroup -list
or
# /tftpboot/bin/navicli -h 128.221.252.200 storagegroup -list
Storage Group Name: SG_Celerra_c125
Storage Group UID:
E2:12:0B:D6:F5:FC:DF:11:8F:CA:00:60:16:41:67:7D
HLU/ALU Pairs:
HLU Number
ALU Number
------------------0
3
1
1
2
0
3
2
#/tftpboot/bin/navicli -h 128.221.252.200 storagegroup -destroy -gname
SG_Celerra_c125
Destroy Storage Group SG_Celerra_c125 (y/n)? y

8.

Manually remove Pool luns from the ~filestorage Storage Group if required:
# /tftpboot/bin/naviseccli -h 128.221.252.200 -user sysadmin -password sysadmin scope 0 lun -destroy -l 13
Are you sure you want to perform this operation?(y/n): y
Cannot unbind LUN because its contained in a Storage Group
Get List of HLU numbers for ~filestorage SG:
# /tftpboot/bin/naviseccli -h 128.221.252.200 -user sysadmin -password sysadmin scope 0 storagegroup -list -gname ~filestorage
Remove HLU Luns from ~filestorage:
# /tftpboot/bin/naviseccli -h 128.221.252.200 -user sysadmin -password sysadmin scope 0 storagegroup -removehlu -gname ~filestorage -hlu 18
Remove HLU 18 from ~filestorage
The specified operation will potentially affect a File System Storage configuration. Do
you want to continue (y/n)? y

9.

Manually destroy Pool LUNs first, if necessary:


# /tftpboot/bin/naviseccli -h 128.221.252.200 -user sysadmin -password sysadmin scope 0 storagegroup -removehlu -gname ~filestorage -hlu 25 -->Remove Pool lun
from SG first

# /tftpboot/bin/naviseccli -h 128.221.252.200 -user sysadmin -password sysadmin scope 0 lun -list


# /tftpboot/bin/naviseccli -h 128.221.252.200 -user sysadmin -password sysadmin scope 0 lun -destroy -l 0 -->Syntax for removing Pool luns once the StorageGroup
has been destroyed
Are you sure you want to perform this operation?(y/n): y
10. Destroy the Storage Pool once the Pool LUNs are removed:
# /tftpboot/bin/naviseccli -h 128.221.252.200 -user sysadmin -password sysadmin scope 0 storagepool -list
Pool Name: Pool 0
Pool ID: 0
Raid Type: r_10
# /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope
0 storagepool -destroy -id 0
Are you sure you want to perform this operation?(y/n): y
# /tftpboot/bin/naviseccli -h 128.221.252.200 -user sysadmin -password sysadmin scope 0 storagepool -list
11. Manually destroy RAID Group LUNs and RAID Groups, if necessary:
# /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope
0 getrg -lunlist -->In this example, there are RAID Group LUNs and a RAID Group to
destroy
RaidGroup ID:
1
List of luns:
78
# /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope
0 unbind 7
Unbinding a LUN will cause all data stored on that LUN to be lost.
Unbind LUN 7 (y/n)? y
# /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope
0 unbind 8
Unbinding a LUN will cause all data stored on that LUN to be lost.
Unbind LUN 8 (y/n)? y
# /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope
0 removerg 1
MetaLuns:
# /tftpboot/bin/navicli -h 128.221.252.200 metalun -list -->There may be metaluns
[e.g., 8184, 8185, etc] if layered apps were in use
# /tftpboot/bin/navicli -h 128.221.252.200 metalun -destroy -metalun 12 -->Select
metalun from list, within the 8184 lun
12. Verify whether any Control LUNs are trespassed from SP A to SP B:
# /nasmcd/sbin/t2tty -c 2 "camshowconfig"
CAM Devices on scsi-0:
TID 00: 0:d0+ 1:d1+ 2:d2+ 3:d3+ 4:d4- 5:d5- -->d4 & d5 are trespassed to SP B
CAM Devices on scsi-16:
TID 00: 0:d6- 1:d7- 2:d8- 3:d9- 4:d10+ 5:d11+
1291584475: ADMIN: 6: Command succeeded: camshowconfig
Note: Through the use of - and +, the above output shows that Control Luns d4 and
d5 are NOT on Chain 0 (SPA). These LUNs must be trespassed back to Chain 0 on all

servers before a new install can succeed.


13. Trespass back all Control LUNs to SPA Chain 0 as required using the following
commands:
#
#
#
#

/tftpboot/bin/t2tty
/tftpboot/bin/t2tty
/tftpboot/bin/t2tty
/tftpboot/bin/t2tty

c
c
c
c

2
2
3
3

camtrespass
camtrespass
camtrespass
camtrespass

c0t0l4
c0t0l5
c0t0l4
c0t0l5

Note: In the above example, LUNs 4 & 5 were trespassed back to Chain 0 SPA on
each of the two blades present on the system.
14. Verify existing Data Mover WWN HBAUID Records, remove HBAUID records, and
verify:
# /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope
0 storagegroup -list -gname ~filestorage |head -15
Storage Group Name: ~filestorage
Storage Group UID:
60:06:01:60:00:00:00:00:00:00:00:00:00:00:00:04
HBA/SP Pairs:
HBA UID
SP Name
SPPort
-----------------50:06:01:60:C6:E0:14:97:50:06:01:69:46:E0:14:97 SP B
50:06:01:60:C6:E0:14:97:50:06:01:61:46:E0:14:97 SP B
50:06:01:60:C6:E0:14:97:50:06:01:68:46:E0:14:97 SP A
50:06:01:60:C6:E0:14:97:50:06:01:60:46:E0:14:97 SP A

2
3
2
3

# /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope


0 port -removehba -o -hbauid 50:06:01:60:C6:E0:14:97:50:06:01:69:46:E0:14:97
# /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope
0 port -removehba -o -hbauid 50:06:01:60:C6:E0:14:97:50:06:01:61:46:E0:14:97
# /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope
0 port -removehba -o -hbauid 50:06:01:60:C6:E0:14:97:50:06:01:68:46:E0:14:97
# /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope
0 port -removehba -o -hbauid 50:06:01:60:C6:E0:14:97:50:06:01:60:46:E0:14:97
# /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope
0 storagegroup -list -gname ~filestorage
Storage Group Name: ~filestorage
Storage Group UID:
60:06:01:60:00:00:00:00:00:00:00:00:00:00:00:04
15. Verify whether array security was destroyed. If not a shared system, manually destroy
security.
# /tftpboot/bin/naviseccli -h 128.221.252.200 -user sysadmin -password sysadmin scope 0 domain -list
Security is not initialized. Security must be initialized before any domain operations can
be performed in this system. Create a global administrator to initialize security.
Note: The above return indicates that no security domain remains and has been
destroyed--no further action required
# /tftpboot/bin/navicli -h 128.221.252.200 domain -list
Node:
c250
IP Address:
128.221.253.201

Name:
spb
Port:
80
Secure Port:
443
IP Address:
128.221.252.200 (Master)
Name:
spa
Port:
80
Secure Port:
443
IP Address:
10.241.216.235
Name:
c250
Port:
80
Secure Port:
443
Note: The above return indicates that a security domain does exist and must be
destroyed
# /tftpboot/bin/navicli -h 128.221.252.200 -user sysadmin -password sysadmin -scope
0 domain -messner -destroy
WARNING: You are about to destroy the local directories on the following
systems:128.221.252.200
Please note that this operation will not update the master directory database.Proceed?
(y/n) y
# /tftpboot/bin/navicli -h 128.221.253.201 -user sysadmin -password sysadmin -scope
0 domain -messner -destroy
WARNING: You are about to destroy the local directories on the following
systems:128.221.253.201
Please note that this operation will not update the master directory database.Proceed?
(y/n) y
16. Using the proper bootable Express Install media, reboot the Linux system and perform
a "boot:install". For Dual Control Station environments, make sure that CS1 is
powered off during the factory install of CS0. See the "Note" section below for a
representative example of the questions and answers given for a typical Express
Installation. Make sure to toggle the option from Yes to No when the screen for
setting up the Control Station LAN IP address appears, since you DO NOT want to set
the External IP address yet (you will use the VNX Installation Assistant after the
installation is completed to set the Control Station name and IP address and
to initialize the File/Unified system). Reboot CS0 after the File OE installation
completes so as to generate the "Waiting for VIA..." initialization message.
17. Once CS0 has completed software installation and reboot, perform the factory
installation of CS1 using either the DVD media or CD2 media, and keep CS0 powered
up during the CS1 installation. At the end of the successful FIle OE installation on CS1,
reboot it, and via the serial console, ensure that it displays the "Waiting for VIA..."
initialization message. At this point, the dual CS environment is ready to be initialized
using the VNX Installation Assistant.
18. Before running the VIA, however, perform the following actions, depending on whether
the system is a File-only or Unified configuration:
For File-only VNX Systems:
a) A File-only installation should not have the -UnisphereBlock enabler installed--use navicli ndu
-list to check.
b) A File-only installation should have the -UnisphereFile enabler installed--use navicli ndu -list
to check.
c) Run the VNX Installation Assistant to complete the system initialization.
For Unified VNX Systems:

a) A Unified installation should have both the -UnisphereBlock and -UnisphereFile


enablers installed--use navicli ndu -list to check.
b) Set the Unified flag on the Control Station:
# /nas/sbin/nas_hw_upgrade -option -enable -clariionfc
c) Run the VNX Installation Assistant to complete the system initialization.
Typical Express Installation questions, inputs, and/or answers:
1. Express Installation using DVD or 2-disc CD set:

Notes:

boot:install
----------------Is this a Secondary Control Station (y/n/a)?
n
----------------Is this a Control Station Fresh Install?
yes
---------------Is this a Secondary Control Station? [yes or no]: no
---------------Accept the defaults for the "Primary Internal Network Setup", "IPMI Network Setup", and
"Backup Internal Network Setup" screens
---------------DO NOT SETUP THE EXTERNAL LAN NETWORKING AT THIS TIME (we will setup the Control
Station external LAN using the VIA initialization wizard after the File OE reinstallation is
completed)
For the Network Configuration screen, "Do you want to configure LAN (not dialup) networking
for your installed system?"
Tab to "No" and enter.
--------------Detecting movers in cabinet: 2
Is this the expected number of movers in the cabinet? [yes or no]: yes
--------------Pick a NAS Administrator username
Username [default: nasadmin] : nasadmin
New UNIX password: nasadmin
Retype new UNIX password: nasadmin
---------------Do you wish to enable UNICODE? [yes or no]: yes
2. At the end of the File OE installation, log into the Control Station as nasadmin, su to root,
then reboot the Control Station. When the following message is displayed at the Control Station
serial console, initialize the Unified system using the VIA:
# reboot
--------------------------Waiting for VNX Installation Assistant to continue.......