A blog about my experience in the IT world.

ORA-00210 ORA-00202 - Missing controlfiles (and redologs)

November 8, 2017

Scenario: Database is shutdown in NOARCHIVELOG with no backups ever made and controlfiles are lost.

When starting the database the following message is shown:
ORA-00205: error in identifying control file, check alert log for more info

On alert log:
ALTER DATABASE   MOUNT
ORA-00210: cannot open the specified control file
ORA-00202: control file: '/oracle/fast_recovery_area/ORCL/control02.ctl'
ORA-27037: unable to obtain file status

Shutdown the database:
SQL> shutdown immediate;

The CONTROLFILE recreation script will need to be created manually, or based on a previous execution of:
SQL> alter database backup controlfile to trace;
Or browse the FS and locate all datafiles and redologs

Example of a CREATE CONTROLFILE script:
I'll use this as create_control.sql below
CREATE CONTROLFILE REUSE DATABASE "ORCL" NORESETLOGS  NOARCHIVELOG
    MAXLOGFILES 16
    MAXLOGMEMBERS 3
    MAXDATAFILES 100
    MAXINSTANCES 8
    MAXLOGHISTORY 4672
LOGFILE
  GROUP 1 '/oracle/oradata/ORCL/redo01.log'  SIZE 50M BLOCKSIZE 512,
  GROUP 2 '/oracle/oradata/ORCL/redo02.log'  SIZE 50M BLOCKSIZE 512,
  GROUP 3 '/oracle/oradata/ORCL/redo03.log'  SIZE 50M BLOCKSIZE 512
-- STANDBY LOGFILE
DATAFILE
  '/oracle/oradata/ORCL/system01.dbf',
  '/oracle/oradata/ORCL/sysaux01.dbf',
  '/oracle/oradata/ORCL/undotbs01.dbf',
  '/oracle/oradata/ORCL/users01.dbf',
  '/oracle/oradata/ORCL/test01.dbf'
CHARACTER SET AL32UTF8
;

Start the database in NOMOUNT a execute the script:

SQL> startup nomount;
SQL> @create_control.sql
Control file created.

Database is left in mount mode and can now be opened:
SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1: '/oracle/oradata/ORCL/system01.dbf'
SQL> recover database;
Media recovery complete.
SQL> alter database open;
Database altered.
Information will probably be shown that some file need recover, and RECOVER DATABASE command should help fixing the issue.

After this steps database is OPENED and everything should be OK!

BUT what if REDOLOGS were also lost (alas most times they are on the same mount points of CONTROLFILES) or someone tried to open with resetlogs!? 

Changed first line of create_control.sql to:
CREATE CONTROLFILE REUSE SET DATABASE "ORCL" RESETLOGS  NOARCHIVELOG

In this scenario, you probably won't be able to complete media recovered due to missing redologs and inexistent archives:

SQL> startup
ORA-00205: error in identifying control file, check alert log for more info
SQL> shutdown immediate;
ORA-01507: database not mounted
ORACLE instance shut down.
SQL> startup nomount;
SQL> @create_control.sql
Control file created.
SQL> alter database open resetlogs;
alter database open resetlogs
*
ERROR at line 1:
ORA-01194: file 1 needs more recovery to be consistent
ORA-01110: data file 1: '/oracle/oradata/ORCL/system01.dbf'
SQL> recover database;
ORA-00283: recovery session canceled due to errors
ORA-01610: recovery using the BACKUP CONTROLFILE option must be done
SQL> recover database using backup controlfile; -- noarchivelog database no luck
ORA-00279: change 76885912 generated at 11/08/2017 12:50:24 needed for thread 1
ORA-00289: suggestion : /oracle/oradata/arch1/1_1034_950889305.dbf
ORA-00280: change 76885912 for thread 1 is in sequence #1034
Specify log: {=suggested | filename | AUTO | CANCEL}
ORA-00308: cannot open archived log
'/oracle/oradata/arch1/1_1034_950889305.dbf'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

There is a step that can be taken as a last resort.
The hidden parameter _allow_resetlogs_corruption , the name is pretty self explanatory
SQL> alter system set "_allow_resetlogs_corruption" = true scope = spfile; -- last resort
System altered.
SQL> shutdown;
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
SQL> startup mount;
SQL> alter database open resetlogs;
Database altered.

On the alert log:
Wed Nov 08 12:59:22 2017
alter database open resetlogs
RESETLOGS is being done without consistancy checks. This may result
in a corrupted database. The database should be recreated.
RESETLOGS after incomplete recovery UNTIL CHANGE 76885912

This has a 50/50 percent change of working. In some test cases database opening ended up aborting after 01555 errors with no more change of recovery.

Verify consistency after it opens successfully reveals no problems:

RMAN> validate database;
...
List of Datafiles
=================
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
1    OK     0              11751        89602           76886011
  File Name: /oracle/oradata/ORCL/system01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              29366
  Index      0              7094
  Other      0              41389
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
2    OK     0              16330        232961          76886020
  File Name: /oracle/oradata/ORCL/sysaux01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              113974
  Index      0              94231
  Other      0              8425
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
3    OK     0              1            64000           76886013
  File Name: /oracle/oradata/ORCL/undotbs01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              0
  Index      0              0
  Other      0              63999
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
4    OK     0              0            1284            76081965
  File Name: /oracle/oradata/ORCL/users01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              528
  Index      0              414
  Other      0              338
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN
---- ------ -------------- ------------ --------------- ----------
5    OK     0              588          638976          76662763
  File Name: /oracle/oradata/ORCL/test01.dbf
  Block Type Blocks Failing Blocks Processed
  ---------- -------------- ----------------
  Data       0              577853
  Index      0              57493
  Other      0              3042
channel ORA_DISK_1: starting validation of datafile
channel ORA_DISK_1: specifying datafile(s) for validation
including current control file for validation
including current SPFILE in backup set
channel ORA_DISK_1: validation complete, elapsed time: 00:00:01
List of Control File and SPFILE
===============================
File Type    Status Blocks Failing Blocks Examined
------------ ------ -------------- ---------------
SPFILE       OK     0              2
Control File OK     0              966
Finished validate at 2017-11-08 13:07:42

And do not forget to re-add the tempfiles:


SQL> select * from dba_temp_files;
no rows selected

SQL> alter tablespace temp add tempfile size 1g;
Tablespace altered.

SQL> select file_name, tablespace_name from dba_temp_files;
FILE_NAME                                                    TABLESPACE_NAME
------------------------------------------------------------ ------------------------------
/oracle/oradata/ORCL/datafile/o1_mf_temp_f05xdr2y_.tmp       TEMP

Connection Reset using ORACLE JDBC

June 14, 2017

Applies to:
     Oracle 11.2.0.4
          on Linux  Platform

Symptoms:

example 1:
Caused by: java.sql.SQLRecoverableException: Erreur d'E/S: Connection reset

example 2:
ORACLE CONNECTION PROBLEM (Attempt number 1 of 3) MESSAGE:Cannot create PoolableConnectionFactory (Erreur d'E/S: Connection reset)]

example 3:
return jpype.java.sql.DriverManager.getConnection(*driver_args)
jpype._jexception.SQLRecoverableExceptionPyRaisable: java.sql.SQLRecoverableException: Erreur d'E/S: Connection reset
 
Cause:

java.security.SecureRandom (#5 ) is a standard API provided by sun. Among various methods offered by this class void nextBytes(byte[]) is one. This method is used for generating random bytes. Oracle 11g JDBC drivers use this API to generate random number during login. Users using Linux have been encountering SQLException("Io exception: Connection reset").
The problem is two fold
  1. The JVM tries to list all the files in the /tmp (or alternate tmp directory set by -Djava.io.tmpdir) when SecureRandom.nextBytes(byte[]) is invoked. If the number of files is large the method takes a long time to respond and hence cause the server to timeout
  2. The method void nextBytes(byte[]) uses /dev/random on Linux and on some machines which lack the random number generating hardware the operation slows down to the extent of bringing the whole login process to a halt. Ultimately the the user encounters SQLException("Io exception: Connection reset")
Users upgrading to 11g can encounter this issue if the underlying OS is Linux which is running on a faulty hardware.
The cause of this has not yet been determined exactly. It could either be a problem in your hardware or the fact that for some reason the software cannot read from dev/random    
Solution:

Change the setup for your application, so you add the next parameter to the java command:

-Djava.security.egd=file:/dev/./urandom

Or edit $JAVA_HOME/jre/lib/security/java.security and add the line : securerandom.source=file:/dev/./urandom

Why not securerandom.source=file/dev/urandom (#4)
Note the "/./" characters in the value. They are needed to work around known Oracle JRE bug #6202721.
See also JDK Enhancement Proposal 123. It is known that implementation of SecureRandom was improved in Java 8 onwards.

Sources:

Change TFA default OSWATCHER parameters

May 26, 2017

By default, after TFA installation OSWATCHER runs with:

> ps -edf | grep -i osw
...  /bin/sh ./OSWatcher.sh 30 48 ...

Meaning that a snapshot is made every 30 minutes and kept for 48 hours.

Since this 48 hours seem to be insufficient for a problem detection/analysis so I wanted to double it, but nowhere on tfactl was I able to find any way to change it.

Looking into the file $TFA_BASE/$hostname/tfa_home/ext/oswbb/oswbb.pm

Where $TFA_BASE is the path you specified when install TFA:

Enter a location for installing TFA (/tfa will be appended if not supplied) [/root/tfa]

I found that a file called .osw.prop is being read to get the values for the variables:

my $tool = "oswbb";
my $tfa_base = tfactlshare_get_tfa_base($tfa_home);
..
my $tool_base = catfile($tfa_base, "suptools", "$hostname", $tool, $current_user);
my $osw_prop = catfile($tool_base, ".osw.prop");
..
    if ( -f $osw_prop )
    {
      open(ORF, $osw_prop);
      while()
      {
        if ( ! $interval && /interval=(\d+)/ )
        {
          $interval = $1;
        }
         elsif ( ! $hours && /hours=(\d+)/ )
        {
          $hours = $1;
        }
         elsif ( ! $zip && /zip=(.*)/ )

The location of the file is not clear since it is given by the function tfactlshare_get_tfa_base, so:

find . -name ".osw.prop"

And the file is found in:

> $TFA_BASE/repository/suptools/$hostname/oswbb/$current_user/.osw.prop

Note: after know this the file is easily found by checking the archive location of the running process:

[root@hostname root]# ps -edf | grep -i osw
root     31662     1  0 20:57 pts/0    00:00:00 /bin/sh ./OSWatcher.sh 30 48 NONE /oracle/products/tfa/repository/suptools/hostname/oswbb/root/archive


The .osw.prop resides in the parent directory.

Having located the file is just a matter of editing it:

> cat $TFA_BASE/repository/suptools/$hostname/oswbb//$current_user/.osw.prop
interval=30
hours=48
zip=NONE
runuser=oracle

> vi $TFA_BASE/repository/suptools/$hostname/oswbb//$current_user/.osw.prop

> cat $TFA_BASE/repository/suptools/$hostname/oswbb//$current_user/.osw.prop
interval=30
hours=96
zip=NONE
runuser=oracle

> tfactl stop oswbb
> tfactl start oswddb

And now the values are modified.

> ps -edf | grep -i osw
... /bin/sh ./OSWatcher.sh 30 96 ...

I think there should be a easier way to do this: Making this change in a cluster environment we'll need to run it in all nodes of the cluster.