How to restore ASM based OCR after complete loss of the CRS diskgroup

It is not possible to directly restore a manual or automatic OCR backup if the OCR is located in an ASM disk group. This is caused by the fact that the command 'ocrconfig -restore' requires ASM to be up and running in order to restore an OCR backup to an ASM disk group. However, for ASM to be available, the CRS stack must have been successfully started. For the restore to succeed, the OCR also must not be in use (r/w), i.e. no CRS daemon must be running while the OCR is being restored


The following steps must be used in order to recover from a situation where the disks belonging to the ASM diskgroup +MGMT have been lost. The instructions assumes that new/fixed disks are now available on the effected servers and the following have already been carried out on both nodes of the cluster.

- Disks for OCR and Voting are visible to the oracle cluster
- The disk ownership has been changed from root:disk to oracle:dba or its equivalent accounts
- /etc/udev/rules/50-udev.rules files has been updated/verified ensuring the disk details are listed with the correct ownership and access permissions.

Steps
1. Locate the latest automatic OCR backup file
    Run the following command to locate the automatic OCR backup files. This will display the files 
    representing Full, Weekly, Daily, and 3 four hourly backup. Ideally chose the backup file with 
    the most recent timestamp unless you suspect a recent change caused the problem and want 
    to go back to a previous known good version.

    $ ocrconfig -showbackup
[hostname] 2013/01/09 12:29:26 $ORACLE_CRS_HOME/cdata/[cluster_name]/backup00.ocr
[hostname] 2013/01/09 08:29:25 $ORACLE_CRS_HOME/cdata/[cluster_name]/backup01.ocr
[hostname] 2013/01/09 04:29:24 $ORACLE_CRS_HOME/cdata/[cluster_name]/backup02.ocr
[hostname] 2013/01/08 00:29:20 $ORACLE_CRS_HOME/cdata/[cluster_name]/day.ocr
[hostname] 2013/01/01 00:28:57 $ORACLE_CRS_HOME/cdata/[cluster_name]/week.ocr
[hostname] 2012/04/18 12:29:39 $ORACLE_CRS_HOME/cdata/[cluster_name]/backup_20120418_122939.ocr

2. Make sure the GI is shut-down across all nodes
    Quite often you will find that GI is stuck in hung state and needs to be shut-down across all 
    nodes. Run the following command on each node while logged in as "root" user.
    $ crstl stop crs -f

    Note: Sometimes the stop may take quite long and if you want to abort it, then you could run 
            the following by interrupting the above command. This will ensure post reboot GI stays 
            down. Ensure you enable crs once the restore is complete.
       $ crsctl disable crs
       $ reboot

3. Start CRS stack in exclusive mode
    Start the CRS stack in exclusive mode on the node that has the most recent or the last known 
    good OCR backup file.  The below command must be run as "root" user.

    Note: According to Oracle doc, it is important to use the "-nocrs" flag, if not used "then the 
            failure to start ora.crsd resource will tear down ora.cluster_interconnect.haip, this in 
            turn will case ASM to crash"
    $ crsctl start crs -excl -nocrs

4. Create the MGMT diskgroup in ASM
    Once CRS is running exclusive mode, we must recreate the MGMT disk-group using sqlplus. Run 
    the following set of commands as "oracle" user
     $ su - oracle
   $ . oraenv -- To change the database environment to point to ASM instance.
   $ sqlplus / as sysasm
   SQL> create diskgroup MGMT external redundancy disk 
              '/dev/[emc-disk-1]',
              '/dev/[emc-disk-2]' 
        attribute 'compatible.asm'='11.2';
   SQL> exit;

5. Restore the latest OCR backup
    Now that MGMT disk group has been created, we can restore the OCR backup using the file 
    identified by running "ocrconfig -showbackup" command. Run the following as "root" user
    $ ocrconfig -restore [fullpath]
    e.g. ocrconfig -restore /u01/app/product/11.2.0/grid/cdata/DTCN-DTRD/backup00.ocr

6. Recreate the voting file
    Now that we have new diskgroup, the voting file must be recreated. Run the following as 
    "root" user
   $ crsctl replace votedisk +MGMT
   
  Sample Output
  Successful addition of voting disk 00caa5b9c0f54f3abf5bd2a2609f09a9.
  Successfully replaced voting disk group with +MGMT.
  CRS-4266: Voting file(s) successfully replaced

9. Recreate the ASM spfile
     Since the ASM spfile resided in the +MGMT disk-group, we need to recreate this. Run the 
     following set of commands as "oracle" user

   $ vi /home/oracle/asm.pfile

      Add the following lines into the file   

    *.asm_power_limit=1
    *.asm_diskstring='/dev/emcpower*'
    *.diagnostic_dest='/u01/app/oracle'
    *.instance_type='asm'
    *.large_pool_size=12M
    *.remote_login_passwordfile='EXCLUSIVE'

      $ sqlplus / as sysasm
      SQL> create spfile='+MGMT' from pfile='/home/oracle/asm.pfile';

10. Shutdown CRS
      We must bring CRS out of its exclusive mode and prepare to start it normally. Run the 
      following as "root" user
    $ crsctl stop crs -f

11. Start CRS
     As root user run the following and wait for the command to complete
   $ crsctl start crs 

12. Verify CRS is online
   $ crsctl check cluster -all     

     Sample Output
   Header 1
   **************************************************************
   [cluster_node_1]:
   CRS-4537: Cluster Ready Services is online
   CRS-4529: Cluster Synchronization Services is online
   CRS-4533: Event Manager is online
   **************************************************************
   [cluster_node_2]:
   CRS-4537: Cluster Ready Services is online
   CRS-4529: Cluster Synchronization Services is online
   CRS-4533: Event Manager is online
   *************************************************************

12. Enable CRS start-up is disabled earlier.    
      If the CRS start-up was disabled as listed in step 2, then ensure its re-enabled. Otherwise 
      post reboot GI will never run on its own. Run the following as "root" user
    $ crsctl enable crs


Popular posts from this blog

Recover Standby database from ORA-00332: archived log is too small

Check Active Dataguard Replication Status

Top CPU consuming oracle sessions