Wednesday, August 24, 2011

Is Oracle ACID compliant database?

I attended Jonathan Lewis's NOCOUG session on May 18th, 2011. Towards the end of his session, he demonstrated a test case suspending lgwr process using oradebug, issue commit for an open transaction in another session, and read the same data from third session successfully even though the second session's commit record is not yet preserved to disk.

You can read more about it at:
http://tonyhasler.wordpress.com/2011/08/22/why-is-acid-important/
http://jonathanlewis.wordpress.com/2011/08/19/redo-2/
http://www.freelists.org/post/oracle-l/ACID

This basically means Oracle is not fully ACID complaint.

Even though some people are categorizing it as an architectural limitation, it is a bug to the customer.

In reality it is very rare to run into this problem because:

1) Quite often we see that the session (or process) committing a transaction is the one moving on to finish other dependent transactions before returning acknowledgment back to the customer.
2) Session reading commit in flight data usually go on to initiate it's own transaction which will either get blocked by the same underlying redo problem or the whole transaction will fail if redo writes are still blocked.
3) Redo write times are typically under couple of milli seconds and hence the exposure window is very narrow.
4) Distributed databases are not that widely used
5) Event based scheduling with changes spanning more than 1 database is not very common, i.e. batch job waiting for an event to happen (can be simple change in status table) and then run the job.

Ironically, if an application is just reading data from one database and making decisions on transactions happening on other databases, then using active data guard standby (ADG) for the read part will solve the problem naturally, although the same problem can still happen in theory if both primary and ADG standby go down at the same time (It'd be very interesting to test this out).

Here is my take on solving this problem:

When a query is executed, Oracle computes current SCN and construct read consistent data as of that SCN. This is very fundamental to this problem, do they really have to run the query as of current SCN?

I think it won't add too much overhead, if lgwr can post the SCN, to SGA location, up to which it has written redo to the disk from log buffer. Every session executing sql statements can check this memory location for determining on disk SCN up to which redo was flushed and then execute the query as of that SCN. This will solve the problem.

It should not be done for DML's, including select for updates, all of which need to read the data as of current SCN.

By default Oracle should still maintain the current behavior but provide session/system/query level parameter to change the behavior of executing read only queries as of disk SCN.

Since Oracle can write redo out of scn order from log buffer to redo log files (recovery process does sort it by SCN before applying the redo), determining the correct on disk SCN may pose a challenge.