Thursday, February 11, 2010

Recovering Oracle Block corruption

Occurence of Problem:
While trying to login to our application running on Oracle 10g R2 (10.2.0.4) we received the following error:

ORA-01115: IO error reading block from file 1 (block # 94697)
ORA-01110: data file 1: '(...)\SYSTEM01.DBF'
ORA-27091: unable to queue I/O
ORA-27070: async read/write failed
OSD-04006: ReadFile() failure, unable to read from file
O/S-Error: (OS 23) Data error (cyclic redundancy check).


Investigation:

  • Oracle's Metalink pointed out that the problem was a block corruption on the disk.

  • It is a 500GB Seagate SATA disk. The Event Viewer of Windows Server 2008 also gave the 'disk' block corrupt error.

  • A DBVERIFY was done on the corrupted datafile and it returned an O/S Error 0RA-01115.

  • A complete Check Disk was run on the disk and the corrupted block were fixed.

  • DBVERIFY now gave the following error for the datafile

    Corrupt block relative dba: 0x004171e8 (file 1, block 94696)
    Fractured block found during buffer read
    Data in bad block:
    type: 6 format: 2 rdba: 0x004171e8
    last change scn: 0x0000.01c61503 seq: 0x1 flg: 0x06
    spare1: 0x0 spare2: 0x0 spare3: 0x0
    consistency value in tail: 0x00000000
    check value in block header: 0xa06b
    computed block checksum: 0xdfa7
    Reread of rdba: 0x004171e8 (file 1, block 94696) found same corrupted data

    Same error also appears in the alert.log file

  • Attempting to login into the application gave a block corruption error in Oracle in the datafile.

    ORA-01578: ORACLE data block corrupted (file # 1, block # 94696)
    ORA-01110: data file 1: '(...)\SYSTEM01.DBF'

    Do note that it is the previous block ID from last error message

  • Alert.log contained the following information

    Corrupt Block Found
    TSN = 0, TSNAME = SYSTEM
    RFN = 1, BLK = 94696, RDBA = 4289000
    OBJN = 331, OBJD = 331, OBJECT = I_PROCEDUREPLSQL$, SUBOBJECT =
    SEGMENT OWNER = SYS, SEGMENT TYPE = Index Segment



Resolution:
Refer to Oracle's documentation on Detecting and Repairing Data Block Corruption.

  1. Logon to SYS user and create a REPAIR Table:

    BEGIN
    DBMS_REPAIR.ADMIN_TABLES (
    TABLE_NAME => 'REPAIR_TABLE',
    TABLE_TYPE => dbms_repair.repair_table,
    ACTION => dbms_repair.create_action,
    TABLESPACE => 'USERS');
    END;

  2. Check the Object for corruption

    DECLARE num_corrupt INT;
    BEGIN
    num_corrupt := 0;
    DBMS_REPAIR.CHECK_OBJECT (
    SCHEMA_NAME => 'SYS',
    OBJECT_NAME => 'I_PROCEDUREPLSQL$',
    OBJECT_TYPE => dbms_repair.index_object,
    REPAIR_TABLE_NAME => 'REPAIR_TABLE',
    CORRUPT_COUNT => num_corrupt);
    DBMS_OUTPUT.PUT_LINE('number corrupt: ' || TO_CHAR (num_corrupt));
    END;

    Note: If parameter Object Type is ignored, it takes the default as TABLE_OBJECT and the query will return a table or view not found error. The object turned out to be an INDEX OBJECT.

  3. Command - "select * from repair_object" will show the corrupt blocks and associated information

  4. Fix the errors using the following code:

    DECLARE num_fix INT;
    BEGIN
    num_fix := 0;
    DBMS_REPAIR.FIX_CORRUPT_BLOCKS (
    SCHEMA_NAME => 'SYS',
    OBJECT_NAME=> 'I_PROCEDUREPLSQL$',
    OBJECT_TYPE => dbms_repair.INDEX_OBJECT,
    REPAIR_TABLE_NAME => 'REPAIR_TABLE',
    FIX_COUNT=> num_fix);
    DBMS_OUTPUT.PUT_LINE('num fix: ' || TO_CHAR(num_fix));
    END;

    num fix: 0 was returned.
    Corrupt block could not be fixed

  5. It was a UNIQUE index on procedures. The index was dropped!

  6. Trying to recreate the index failed due to duplicate records

  7. From the object ID, the problematic procedure (from our application) was located and dropped.

  8. Index creation was successful.

  9. Application procedure which was dropped was re-created.

  10. Login to the application was successful.

  11. DB Verify still gives the same error informing that a block has been marked as corrupt and is not usable.