Oracle7 Server Administrator's Guide

Contents Index Home Previous Next

Fundamental Recovery Concepts and Strategies

Before recovering a database, familiarize yourself with the fundamental data structures, concepts and strategies of Oracle recovery. This section describes basic recovery issues, and includes the following topics:

Important Recovery Data Structures

Table 24 - 1 describes important data structures involved in recovery processes. Be familiar with these data structures before starting any recovery procedure.

Data Structure Description
Control File The control file contains records that describe and maintain information about the physical structure of a database. The control file is updated continuously during database use, and must be available for writing whenever the database is open. If the control file is not accessible, the database will not function properly.
System Change Number (SCN) The system change number is a clock value for the Oracle database that describes a committed version of the database. The SCN functions as a sequence generator for a database, and controls concurrency and redo record ordering. Think of the SCN as a timestamp that helps ensure transaction consistency.
Redo Records A redo record is a group of change vectors describing a single, atomic change to the database. Redo records are constructed for all data block changes and saved on disk in the redo log. Redo records allow multiple database blocks to be changed so that either all changes occur or no changes occur, despite arbitrary failures.
Redo Logs All changes to the Oracle database are recorded in redo logs, which consist of at least two redo log files that are separate from the datafiles. During database recovery from an instance or media failure, Oracle applies the appropriate changes in the database's redo log to the datafiles; this updates database data to the instant that the failure occurred.
Backup A database backup consists of operating system backups of the physical files that constitute the Oracle database. To begin database recovery from a media failure, Oracle uses file backups to restore damaged datafiles or control files.
Checkpoint A checkpoint is a data structure in the control file that defines a consistent point of the database across all threads of a redo log. Checkpoints are similar to SCNs, and also describe which threads exist at that SCN. Checkpoints are used by recovery to ensure that Oracle starts reading the log threads for the redo application at the correct point. For Parallel Server, each checkpoint has its own redo information.
Table 24 - 1. Important Recovery Data Structures

See Also: For more information about these and other data structures, see the Oracle7 Server Concepts manual.

Recovery Operations

Media recovery restores a database's datafiles to the most recent point-in-time before disk failure, and includes the committed data in memory that was lost due to failure. Following is a list of media recovery operations:

1. Complete Media Recovery

2. Incomplete Media Recovery

Recovery Planning and Strategies

Before recovering a database, you should create a recovery plan or strategy. This section describes important issues to consider when defining your plan.

Test Backup and Recovery Strategies

You should test your backup and recovery strategies in a test environment before moving to a production system. You should continue to test your system regularly. That way, you can test the thoroughness of your strategies and later avoid real-life crises. Performing test recoveries regularly ensures that your archiving and backup procedures work. It also keeps you familiar with recovery procedures, so that you are less likely to make mistakes in a crisis.

Determine What Type of Recovery Operation Is Appropriate

You can use the RECOVER command when faced with any of the following problems:

Before recovering a database, you must choose an appropriate recovery operation. Your answers to the following questions will determine the most appropriate operation.

See Also: For a detailed list of different problems that media failures can cause and the appropriate recovery operations, see [*].

Moving Datafiles

The goal of database recovery is to reopen a database for normal operation as soon as possible. If a media failure occurs because of a hardware problem, the damage should be repaired as soon as possible. However, database recovery does not depend on the resolution of long-lasting hardware problems. Table 24 - 2 lists sections in this Guide that contain procedures for restoring files from a damaged device to other storage devices.

Type of File Section Name See
Datafile Renaming and Relocating Datafiles for Tablespace [*]
Online Redo Log File Renaming and Relocating Online Redo Log Members [*]
Control File Creating Additional Copies of the Control File, and Renaming or Relocating Control Files [*]
Table 24 - 2. Damaged File Restoration

Coordinate Distributed Recovery

The Oracle distributed database architecture is autonomous in nature. Therefore, depending on the type of recovery operation selected for a single, damaged database, recovery operations may, or may not, have to be coordinated globally among all databases in the distributed database system. Table 24 - 3 summarizes the different types of recovery operations and whether coordination among nodes of a distributed database system is required.

Type of Recovery Operation Implication for Distributed Database System
Restoring a full backup for a database that was never accessed (updated or queried) from a remote node Use non-coordinated, autonomous database recovery.
Restoring a full backup for a database that was accessed by a remote node Shut down all databases and restore them using the same coordinated full backup.
Complete media recovery of one or more databases in a distributed database Use non-coordinated, autonomous database recovery.
Incomplete media recovery of a database that was never accessed by a remote node Use non-coordinated, autonomous database recovery.
Incomplete media recovery of a database that was accessed by a remote node Use coordinated, incomplete media recovery to the same global point-in-time for all databases in the distributed database.
Table 24 - 3. Database Recovery in a Distributed Database System

Coordinate Time-Based and Change-Based Distributed Database Recovery In special circumstances, one node in a distributed database may require recovery to a past point-in-time. To preserve global data consistency, it is often necessary to recover all other nodes in the system to the same point-in-time. This is called "coordinated, time-based, distributed database recovery." The following tasks should be performed with the standard procedures of time-based and change-based recovery described in this chapter.

To Coordinate Time-Based, Distributed Recovery Among Many Nodes in a Distributed Database System

Recover Database with Snapshots If a master database is independently recovered to a past point in time (that is, coordinated, time-based distributed database recovery is not performed), any dependent remote snapshot that was refreshed in the interval of lost time will be inconsistent with its master table. In this case, the administrator of the master database should instruct the remote administrators to perform a complete refresh of any inconsistent snapshot.


Contents Index Home Previous Next