RESOLVING SHUTDOWN IMMEDIATE HANG SITUATIONS

Many times Oracle DBA's are often faced with situation where shutting down the instance with shutdown immediate seems to take infinite time and gives an impression that the session in hung.
In most of the cases this is not a hang. Hang can be defined as a scenario when few sessions/processes are waiting on some action and in turn blocking other session/processes to complete their tasks. Now the original sessions could be waiting because of resource issues or Oracle Bugs.


Shutdown immediate can take long time to complete (appear to be hung) because of three reasons:

  1. Uncommitted transactions are being rolled back.
  2. SMON is cleaning temp segments or performing delayed block clean-outs.
  3. Processes still continue to be connected to the database and do not terminate.

1. Uncommitted transactions are being rolled back:


This is the case when the message 'Waiting for smon  to disable tx recovery' is posted in the alert log after we issue shutdown immediate.

There are two reasons for this:
  • A large query was running at the time of shutdown immediate.
  • A large transaction was running at the time of shutdown immediate. 
For large queries:

SQL > SELECT count(*) FROM v$session_longops where time_remaining > 0;

If it returns a value > 0 then we can do a shutdown abort and then startup restrict and then again shutdown immediate.

For large transactions: 

SQL > SELECT count(*) FROM v$fast_start_transaction; 

Decreasing count will show that recovery is going on and when the recovery is completed the database will be shutdown.

But it is not desirable under some circumstances such as, when we have very short maintenance window and we need to perform a shutdown immediate to do some work, in those cases we can use the following event and set in the init.ora file TEMPORARILY too disable transaction recovery:

even="10513 trace name context forever, level 2"

and bounce the instance and issue shutdown immediate to get complete without transaction recovery. SMON will not do a transaction recovery until this event is set in the init.ora file so it is necessary to remove this event whenever you get a chance to shutdown the database again, this time shutdown immediate can even take 3-5 hours (Just remove this even from pfile).

2. SMON is cleaning temp segments or performing delayed block cleanouts:


During a SHUTDOWN IMMEDIATE and SHUTDOWN NORMAL, SMON cleans up extends which are no longer needed and marking them as freed. It means that count from uet$ will decrease and count in fet$ will increase.

To verify that the temporary segments are decreasing have an active session available in SQL during the SHUTDOWN IMMEDIATE. Run query to ensure that database is not hanging, but is actually performing extent cleanup:

SQL > SELECT count(block#) FROM fet$;

SQL > SELECT count(block#) FROM uet$;

After some time, issue the query again and check the results.

If you do not have sufficient time to wait for this cleanup then you can set the following event and bounce the database and reissue shutdown immediate to skip this cleanup:

even="10061 trace name context forever, level 10"

It allows you to prevent SMON from cleaning up temporary segments. Again it is not recommended to set this event forever. Whenever you have large downtime remove this event and allow SMON to do its work.

3. Processes still continue to be connected to the database and do not terminate:


After issuing shutdown immediate, If we see entries in alert log file as:

SHUTDOWN: waiting for active calls to complete.

SHUTDOWN: Active sessions prevent database close operation

It shows that there are some active calls which PMON is not able to clear up. This message is due to the fact that database is waiting for PMON to clean up processes, but pmon is unable to clean them. The client connections to the server are causing the shutdown immediate or normal to hang. Do the following in this case:

1. Before shutdown immediate, shutdown the listener:

$ lsnrctl stop

2. Now check  if there are any connection present at the database as:

$ ps -ef | grep LOCAL

It will give you the OSPIDs of the client connected to database.

3. Manually kill them as:

# kill -9 <OSPID>

4. Issue shutdown immediate now.

Do not forget to bring up the listener after startup ✌✌

In addition to this you can set 10046 event in the session used to shutdown the instance. This will help to tell the event on which session is waiting

SQL > alter session set events '10046 trace name context forever, level 12'

SQL > shutdown immediate;

Look for the trace file user_dump_dest location. Also look at the alert.log for any other messages. They might be helpful in case the shutdown is experiencing hang situation.





Comments