Monday, July 27, 2015

Database hang or may fail to OPEN in 12c IBM AIX - ORA-742

Oracle 12c introduces a new default feature of using multiple LGWRs which may lead to DEADLOCK / Database Hang or ORA-742 "Log read detects lost write" or ORA-600 [kcrfrgv_nextlwn_scn] during instance OPEN or ORA-600 [krr_process_read_error_2] during Recovery on IBM AIX.

DEADLOCK or ORA-742 "Log read detects lost write" or ORA-600 [kcrfrgv_nextlwn_scn] during instance OPEN or ORA-600 [krr_process_read_error_2] during Recovery.

PMON may terminate the instance while extensive block recovery is being performed.



Please disable the new feature of multiple LGWR slave processes by proactively setting _use_single_log_writer=true.  This is a temporary recommendation for IBM AIX installations until a formal fix is identified for this problem.
Setting _use_single_log_writer = true is a safe workaround; it is the behavior before 12c where multiple LGWR slave groups were not available. 
ALTER SYSTEM SET "_use_single_log_writer"=TRUE SID='*' SCOPE=SPFILE;
-- Restart the database or all instances of the RAC database
Note that while _use_single_log_writer=true is not set, then error ORA-600 [kcrfrgv_nextlwn_scn] might be produced avoiding the database to OPEN.  Once the problem is introduced, _use_single_log_writer=true may or may not fix it. _use_single_log_writer = true prevents inconsistencies in the redo log to be introduced which causes that error. 
If the parameter does not help, because the problem was already introduced when _use_single_log_writer=true had not been proactively set, then Point in Time Recovery (PITR) or Flashback Database are the options to recover from this situation.