How to restore ASM based OCR after complete loss of the CRS diskgroup
It is not possible to directly restore a manual or automatic OCR backup if the OCR is located in an ASM disk group. This is caused by the fact that the command 'ocrconfig -restore' requires ASM to be up and running in order to restore an OCR backup to an ASM disk group. However, for ASM to be available, the CRS stack must have been successfully started. For the restore to succeed, the OCR also must not be in use (r/w), i.e. no CRS daemon must be running while the OCR is being restored
The following steps must be used in order to recover from a situation where the disks belonging to the ASM diskgroup +MGMT have been lost. The instructions assumes that new/fixed disks are now available on the effected servers and the following have already been carried out on both nodes of the cluster.
- Disks for OCR and Voting are visible to the oracle cluster
- The disk ownership has been changed from root:disk to oracle:dba or its equivalent accounts
- /etc/udev/rules/50-udev.rules files has been updated/verified ensuring the disk details are listed with the correct ownership and access permissions.
Steps
1. Locate the latest automatic OCR backup file
Run the following command to locate the automatic OCR backup files. This will display the files
representing Full, Weekly, Daily, and 3 four hourly backup. Ideally chose the backup file with
the most recent timestamp unless you suspect a recent change caused the problem and want
to go back to a previous known good version.
$ ocrconfig -showbackup
[hostname] 2013/01/09 12:29:26 $ORACLE_CRS_HOME/cdata/[cluster_name]/backup00.ocr
[hostname] 2013/01/09 08:29:25 $ORACLE_CRS_HOME/cdata/[cluster_name]/backup01.ocr
[hostname] 2013/01/09 04:29:24 $ORACLE_CRS_HOME/cdata/[cluster_name]/backup02.ocr
[hostname] 2013/01/08 00:29:20 $ORACLE_CRS_HOME/cdata/[cluster_name]/day.ocr
[hostname] 2013/01/01 00:28:57 $ORACLE_CRS_HOME/cdata/[cluster_name]/week.ocr
[hostname] 2012/04/18 12:29:39 $ORACLE_CRS_HOME/cdata/[cluster_name]/backup_20120418_122939.ocr
2. Make sure the GI is shut-down across all nodes
Quite often you will find that GI is stuck in hung state and needs to be shut-down across all
nodes. Run the following command on each node while logged in as "root" user.
$ crstl stop crs -f
Note: Sometimes the stop may take quite long and if you want to abort it, then you could run
the following by interrupting the above command. This will ensure post reboot GI stays
down. Ensure you enable crs once the restore is complete.
$ crsctl disable crs
$ reboot
3. Start CRS stack in exclusive mode
Start the CRS stack in exclusive mode on the node that has the most recent or the last known
good OCR backup file. The below command must be run as "root" user.
Note: According to Oracle doc, it is important to use the "-nocrs" flag, if not used "then the
failure to start ora.crsd resource will tear down ora.cluster_interconnect.haip, this in
turn will case ASM to crash"
$ crsctl start crs -excl -nocrs
4. Create the MGMT diskgroup in ASM
Once CRS is running exclusive mode, we must recreate the MGMT disk-group using sqlplus. Run
the following set of commands as "oracle" user
$ su - oracle
$ . oraenv -- To change the database environment to point to ASM instance.
$ sqlplus / as sysasm
SQL> create diskgroup MGMT external redundancy disk
'/dev/[emc-disk-1]',
'/dev/[emc-disk-2]'
attribute 'compatible.asm'='11.2';
SQL> exit;
6. Recreate the voting file
Now that we have new diskgroup, the voting file must be recreated. Run the following as
"root" user
$ crsctl replace votedisk +MGMT
Sample Output
Successful addition of voting disk 00caa5b9c0f54f3abf5bd2a2609f09a9.
Successfully replaced voting disk group with +MGMT.
CRS-4266: Voting file(s) successfully replaced
9. Recreate the ASM spfile
Since the ASM spfile resided in the +MGMT disk-group, we need to recreate this. Run the
following set of commands as "oracle" user
$ vi /home/oracle/asm.pfile
Add the following lines into the file
*.asm_power_limit=1
*.asm_diskstring='/dev/emcpower*'
*.diagnostic_dest='/u01/app/oracle'
*.instance_type='asm'
*.large_pool_size=12M
*.remote_login_passwordfile='EXCLUSIVE'
$ sqlplus / as sysasm
SQL> create spfile='+MGMT' from pfile='/home/oracle/asm.pfile';
10. Shutdown CRS
We must bring CRS out of its exclusive mode and prepare to start it normally. Run the
following as "root" user
$ crsctl stop crs -f
11. Start CRS
As root user run the following and wait for the command to complete
$ crsctl start crs
12. Verify CRS is online
$ crsctl check cluster -all
Sample Output
Header 1
**************************************************************
[cluster_node_1]:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
[cluster_node_2]:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
*************************************************************
12. Enable CRS start-up is disabled earlier.
If the CRS start-up was disabled as listed in step 2, then ensure its re-enabled. Otherwise
post reboot GI will never run on its own. Run the following as "root" user
$ crsctl enable crs
The following steps must be used in order to recover from a situation where the disks belonging to the ASM diskgroup +MGMT have been lost. The instructions assumes that new/fixed disks are now available on the effected servers and the following have already been carried out on both nodes of the cluster.
- Disks for OCR and Voting are visible to the oracle cluster
- The disk ownership has been changed from root:disk to oracle:dba or its equivalent accounts
- /etc/udev/rules/50-udev.rules files has been updated/verified ensuring the disk details are listed with the correct ownership and access permissions.
Steps
1. Locate the latest automatic OCR backup file
Run the following command to locate the automatic OCR backup files. This will display the files
representing Full, Weekly, Daily, and 3 four hourly backup. Ideally chose the backup file with
the most recent timestamp unless you suspect a recent change caused the problem and want
to go back to a previous known good version.
$ ocrconfig -showbackup
[hostname] 2013/01/09 12:29:26 $ORACLE_CRS_HOME/cdata/[cluster_name]/backup00.ocr
[hostname] 2013/01/09 08:29:25 $ORACLE_CRS_HOME/cdata/[cluster_name]/backup01.ocr
[hostname] 2013/01/09 04:29:24 $ORACLE_CRS_HOME/cdata/[cluster_name]/backup02.ocr
[hostname] 2013/01/08 00:29:20 $ORACLE_CRS_HOME/cdata/[cluster_name]/day.ocr
[hostname] 2013/01/01 00:28:57 $ORACLE_CRS_HOME/cdata/[cluster_name]/week.ocr
[hostname] 2012/04/18 12:29:39 $ORACLE_CRS_HOME/cdata/[cluster_name]/backup_20120418_122939.ocr
Quite often you will find that GI is stuck in hung state and needs to be shut-down across all
nodes. Run the following command on each node while logged in as "root" user.
$ crstl stop crs -f
Note: Sometimes the stop may take quite long and if you want to abort it, then you could run
the following by interrupting the above command. This will ensure post reboot GI stays
down. Ensure you enable crs once the restore is complete.
$ crsctl disable crs
$ reboot
3. Start CRS stack in exclusive mode
Start the CRS stack in exclusive mode on the node that has the most recent or the last known
good OCR backup file. The below command must be run as "root" user.
Note: According to Oracle doc, it is important to use the "-nocrs" flag, if not used "then the
failure to start ora.crsd resource will tear down ora.cluster_interconnect.haip, this in
turn will case ASM to crash"
$ crsctl start crs -excl -nocrs
4. Create the MGMT diskgroup in ASM
Once CRS is running exclusive mode, we must recreate the MGMT disk-group using sqlplus. Run
the following set of commands as "oracle" user
$ su - oracle
$ . oraenv -- To change the database environment to point to ASM instance.
$ sqlplus / as sysasm
SQL> create diskgroup MGMT external redundancy disk
'/dev/[emc-disk-1]',
'/dev/[emc-disk-2]'
attribute 'compatible.asm'='11.2';
SQL> exit;
5. Restore the latest OCR backup
Now that MGMT disk group has been created, we can restore the OCR backup using the file
identified by running "ocrconfig -showbackup" command. Run the following as "root" user
$ ocrconfig -restore [fullpath]
e.g. ocrconfig -restore /u01/app/product/11.2.0/grid/cdata/DTCN-DTRD/backup00.ocr
6. Recreate the voting file
Now that we have new diskgroup, the voting file must be recreated. Run the following as
"root" user
$ crsctl replace votedisk +MGMT
Sample Output
Successful addition of voting disk 00caa5b9c0f54f3abf5bd2a2609f09a9.
Successfully replaced voting disk group with +MGMT.
CRS-4266: Voting file(s) successfully replaced
9. Recreate the ASM spfile
Since the ASM spfile resided in the +MGMT disk-group, we need to recreate this. Run the
following set of commands as "oracle" user
$ vi /home/oracle/asm.pfile
Add the following lines into the file
*.asm_power_limit=1
*.asm_diskstring='/dev/emcpower*'
*.diagnostic_dest='/u01/app/oracle'
*.instance_type='asm'
*.large_pool_size=12M
*.remote_login_passwordfile='EXCLUSIVE'
$ sqlplus / as sysasm
SQL> create spfile='+MGMT' from pfile='/home/oracle/asm.pfile';
10. Shutdown CRS
We must bring CRS out of its exclusive mode and prepare to start it normally. Run the
following as "root" user
$ crsctl stop crs -f
11. Start CRS
As root user run the following and wait for the command to complete
$ crsctl start crs
12. Verify CRS is online
$ crsctl check cluster -all
Sample Output
Header 1
**************************************************************
[cluster_node_1]:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
[cluster_node_2]:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
*************************************************************
12. Enable CRS start-up is disabled earlier.
If the CRS start-up was disabled as listed in step 2, then ensure its re-enabled. Otherwise
post reboot GI will never run on its own. Run the following as "root" user
$ crsctl enable crs