Tuesday, September 03, 2013

How to accomplish on commit trigger in Oracle

If you have any use case for on commit trigger, it is almost always due to a given application using database improperly, fixing application design can take some time, especially if it is third party vendor application. In the spirit of knowing and keeping all possible lever's at DBA's disposal, one can use below approach to define on commit trigger for enforcing any data integrity checks at transaction level. This post is not to encourage anybody to use triggers, one should avoid them as much as possible.

The trick here is to leverage userenv('commitscn') feature.

You can read more about on commit trigger research at http://technology.amis.nl/2006/08/07/the-hunt-for-the-oracle-database-on-commit-trigger/

Jonathan Lewis has great information about commitscn at http://www.jlcomp.demon.co.uk/commit.html

Here is the test case:

01:38:59 SQL> create table tr1(a number not null, b number, primary key(a));

Table created.

01:38:59 SQL> create table tr2(a number not null, b number);

Table created.

01:38:59 SQL> create table tr3(a number not null, b number, primary key(a));

Table created.

01:38:59 SQL>
01:38:59 SQL> create or replace trigger tr1_tr before insert on tr1
01:38:59   2  for each row
01:38:59   3  begin
01:38:59   4  insert into tr2 values (1,userenv('commitscn'));
01:38:59   5  end;
01:38:59   6  /

Trigger created.

01:38:59 SQL>
01:38:59 SQL> create or replace trigger tr2_tr before update on tr2
01:38:59   2  for each row
01:38:59   3  begin
01:38:59   4  insert into tr3 values (0,0);
01:38:59   5  end;
01:38:59   6  /

Trigger created.

01:38:59 SQL>
01:38:59 SQL> insert into tr1 values (1,1);

1 row created.

01:38:59 SQL> --insert into tr2 values (1, userenv('commitscn'));
01:38:59 SQL> select * from tr1;

                   A                    B
-------------------- --------------------
                   1                    1

01:38:59 SQL> select * from tr2;

                   A                    B
-------------------- --------------------
                   1             40208814

01:38:59 SQL> select * from tr3;

no rows selected

01:38:59 SQL> commit;

Commit complete.

01:38:59 SQL> select * from tr1;

                   A                    B
-------------------- --------------------
                   1                    1

01:38:59 SQL> select * from tr2;

                   A                    B
-------------------- --------------------
                   1             40208816

01:38:59 SQL> select * from tr3;

                   A                    B
-------------------- --------------------
                   0                    0

01:38:59 SQL>
01:38:59 SQL>
01:38:59 SQL> insert into tr1 values (2,2);

1 row created.

01:38:59 SQL> --insert into tr2 values (2, userenv('commitscn'));
01:38:59 SQL> select * from tr1;

                   A                    B
-------------------- --------------------
                   1                    1
                   2                    2

01:38:59 SQL> select * from tr2;

                   A                    B
-------------------- --------------------
                   1             40208816
                   1             40208819

01:38:59 SQL> select * from tr3;

                   A                    B
-------------------- --------------------
                   0                    0

01:38:59 SQL> commit;
commit
*
ERROR at line 1:
ORA-00001: unique constraint (TEST123.SYS_C003852) violated
ORA-06512: at "TEST123.TR2_TR", line 2
ORA-04088: error during execution of trigger 'TEST123.TR2_TR'

01:38:59 SQL> select * from tr1;

                   A                    B
-------------------- --------------------
                   1                    1

01:38:59 SQL> select * from tr2;

                   A                    B
-------------------- --------------------
                   1             40208816

01:38:59 SQL> select * from tr3;

                   A                    B
-------------------- --------------------
                   0                    0

TR1 can be real application table, I introduced TR2 and TR3 tables to leverage commitscn feature, and all of this can be done without changing application code. By inserting one row into TR2 table with commitscn value, Oracle would automatically update commitscn value it inserted in TR2 table at commit time. Since we have before update trigger created on TR2 table, it will be fired at commit time. 

Any error during TR2 table based trigger execution will automatically cause the entire transaction to be rolled back (as seen above).

Monday, January 28, 2013

Oracle breaking read consistency by design

Read consistency is one feature Oracle rdbms offer by default and it is one of the reasons why Oracle is an industry leader, but there are few types of queries where Oracle by design doesn't offer read consistency.

As part of read consistency, Oracle read current scn and use it as query scn for executing any given sql query.

Here is the list of sql query types where read consistency is not guaranteed by design:

1) Any queries on dynamic performance views (v$) and fixed tables (x$)
2) Any queries on table functions (including pipelined functions)
3) Any queries on external table
4) Reading any global variables set in memory like context values set as part of a query, dbms_application_info, etc.

Demo for v$ views:
create or replace function sleep1 return number is
begin
sys.dbms_lock.sleep(10);
return 1;
end;
/

select a.sid, sleep1(), (select b.current_scn from v$database b where b.current_scn > a.sid) current_scn from v$session a;

  SID             SLEEP1()          CURRENT_SCN
----- -------------------- --------------------
    1                    1        6727855283317
  119                    1        6727855285974
  237                    1        6727855288603
  238                    1        6727855291192

current_scn values in the above query are different for each record and hence it is read inconsistent

Demo for pipelined function:


create table t_piped as select 1 a, rownum b from dual connect by level <= 10;

create type o_tab as table of number;
/


create or replace function f_piped1(val1 in number) return o_tab
as
v1 number;
v2 o_tab := o_tab();
cursor c1 is select a.a from t_piped a, (select sleep1 b from dual) b where rownum=1 and a.a >= b.b;
begin
open c1;
fetch c1 bulk collect into v2;
return v2;
end;
/



Now run below pl/sql in one session.


begin
update t_piped set a=1;
commit;
for i in 1..10 loop
update t_piped set a=(a*(i+1));
commit;
dbms_lock.sleep(2);
end loop;
end;
/



Run the below query in second session at the same time.
select a.a, b.column_value from t_piped a, table(f_piped1(1)) b  where rownum<=5;


         A COLUMN_VALUE
---------- ------------
        24           24
        24           24
        24          120
        24          120
        24          720



Above query is selecting same column from the table and table function, while the underlying table is being updated with different value for each run, we expect to see result set for column "A" and column "COLUMN_VALUE", but they are different and  the table function is accessing t_piped table as of different query scn each time the function is accessed as part of the same query.

We can make the function deterministic to make sure that it only executes once for the whole query and see what happens.


create or replace function f_piped2(val1 in number) return o_tab
deterministic
as
v1 number;
v2 o_tab := o_tab();
cursor c1 is select a.a from t_piped a, (select sleep1 b from dual) b where rownum=1 and a.a >= b.b;
begin
open c1;
fetch c1 bulk collect into v2;
return v2;
end;
/

Run the below query and pl/sql table update in a loop at the same time from different sessions.

select a.a, b.column_value, c.column_value from t_piped a, table(f_piped2(1)) b, table(f_piped2(1)) c  where rownum<=5;

         A COLUMN_VALUE COLUMN_VALUE
---------- ------------ ------------
     40320        40320       362880
     40320        40320       362880
     40320        40320       362880
     40320        40320       362880
     40320        40320       362880

Now we see same "COLUMN_VALUE" for each record, but they are different across two table functions in the from clause. It means each invocation of table function will run as of current scn as each query within that function is being executed. 

The only way to guarantee read consistency at the query level when table functions are used is to use "AS OF SCN" clause for the tables and each table function being accessed in the query.



create or replace function f_piped3(val1 in number) return o_tab
as
v1 number;
v2 o_tab := o_tab();
--v4 number :=dbms_flashback.GET_SYSTEM_CHANGE_NUMBER();
--cursor c1 is select sleep1||(select b.a from t_piped as of scn v4 b where b.b=a.rn) from (select rn, dummy from
--(select rownum rn, dummy from dual connect by level <= 10) ) a ;
cursor c1 is select a.a from t_piped as of scn val1 a, (select sleep1 b from dual) b where rownum=1 and a.a >= b.b;
begin
open c1;
fetch c1 bulk collect into v2;
return v2;
end;
/

Run the below query and pl/sql table update in a loop at the same time from different sessions.

set numwidth 20
column current_scn new_value queryscn
select current_scn from v$database;

select a.a, b.column_value, c.column_value from t_piped as of scn (&&queryscn) a,
table(f_piped3(&&queryscn)) b, table(f_piped3(&&queryscn)) c  where rownum<=5
/

                   A         COLUMN_VALUE         COLUMN_VALUE
-------------------- -------------------- --------------------
                 120                  120                  120
                 120                  120                  120
                 120                  120                  120
                 120                  120                  120
                 120                  120                  120

Based on the above query output, I recommend using "AS OF SCN" clause for all queries within each table functions if the read consistency is needed for any queries accessing table functions. This could be a quite a bit of work and it'd be nice to have a feature as an option to execute table functions as of query scn. I'll be opening enhancement request with Oracle support for this.

Saturday, November 17, 2012

Deep dive into ASH

Below is the link for my presentation on "Deep dive into ASH" at IOUG 2012 on April 26th at Vegas.

Deep dive into ASH
Whitepaper
ASH demo scripts

Thursday, November 15, 2012

Internals of Active Dataguard

I presented Internals of Active Dataguard session at NoCoug Winter conference on November 15th, 2012.

Here are the links for the presentation and demo scripts:

Internals of Active Dataguard
Demo Script
Demo Script
Demo Script
Demo Script
Demo Script

Wednesday, February 29, 2012

Internals of online index build presentation at Nocoug

I presented session on Internals of online index build at Nocoug on 02/23/2012.

Here are the links for presentation and demo scripts:

Internals of online index build powerpoint

Demo scripts

Labels:

Thursday, September 08, 2011

Enhancing Oracle Active Data Guard.

I think Active Data Guard (ADG) is one of the best features Oracle has ever released and it is remarkably popular considering how easily read work load can be off loaded while guaranteeing data quality (due to the nature of binary replication, good luck to those who rely on logical replication for DR) and ability to ensure zero data loss.

Here are some of the enhancements which can make ADG even more adaptable:

1) Ability to create global temporary tables. Oracle doesn't have to generate redo for GTT's and hence they can be supported on ADG standby. This will enable batch jobs and ETL job's to run on the ADG standby.

2) Continue using parallel threads for redo apply when redo stream encounters new datafile addition. When you add new datafile on primary, redo apply will serialize on creating the datafile on ADG standby. When you add lot of datafiles in quick succession on primary, standby can fall behind by several minutes.

3) When new datafile addition fails on standby for any reason, manual fix requires ADG standby to go in "MOUNT" mode, which is an outage. Most common reason for datafile addition to fail is lack of space, once the space issue is addressed manual fix is to use "create datafile as" command which requires standby to be in "MOUNT" mode.

4) Ability to maintain partial physical standby for ADG. When you use RAC for database consolidation, multiple schemas become part of one big physical database. Not all applications have same business priority, it is very natural that only some of those schema's require read only traffic offload to ADG standby. Instead of creating full physical standby, you only want to create standby with the tablespaces you are interested in and it requires quite a bit of manual scripting and processes involved to take care of new datafiles being created on primary which are not part of the interested tablespaces. Oracle can introduce init parameters like "KEEP_TABLESPACES" and "UNKEEP_TABLESPACES" on the standby which can address this issue by managing new datafiles depending on how it falls into these parameters.

5) Block cleanout that happens as part of commit on primary database is non logged (not recorded in the redolog), which means commit time block cleanout can't happen on the standby. This will cause all read queries on ADG standby to keep visiting undo transaction table (one additional undo block visit for every data segment block that is not cleaned out). I think Oracle should log commit time block cleanout so that standby can also see it and this problem will be resolved.

6) Data Guard role switchover from standby to primary causing ADG standby to go from open read only state to "MOUNT" state and then to "OPEN" state causing all established connections to fail, which can extend total database outage. Although it is not significant, every second counts for availability.

7) Like "CURRENT_SCN" column in v$database, additional column for corresponding time would make it very easier to determine standby lag in real time. All other options currently available are either expensive to query for multiple executions or not fully reliable.

8) When you enable supplemental logging (or force logging) on the primary database, it only gets saved in the control file and this change will not go to the standby unless you regenerate standby control file from the primary or run that command explicitly on the standby too. This can cause huge problem, if you have logical standby or Goldengate being used, after you switchover to the standby. Oracle should record any change in supplemental logging definition to the dictionary also than just to the control file.

9) Ability to determine datafiles impacted by NOLOGGING changes on the standby. Querying "UNRECOVERABLE_TIME" from v$datafile doesn't yield correct result and customer's can falsely rely on this data. Since metadata about blocks impacted by NOLOGGING change is already logged, redo apply on the standby can update standby control file with this information and hence v$datafile can be queried reliably for determining nologging impacted datafiles.

10) Ability to use multiple processes for redo thread merging as part of redo apply for RAC databases. As redo generation is increasing with each release of Oracle and customers increasingly using RAC for database consolidation, one process may not be able to merge redo from all the threads fast enough for recovery slaves to consume. It can be a problem if large number of RAC nodes are being used and redo generation in most of the threads is quite high.

I've opened enhancement request for some of the above points.

Wednesday, August 24, 2011

Is Oracle ACID compliant database?

I attended Jonathan Lewis's NOCOUG session on May 18th, 2011. Towards the end of his session, he demonstrated a test case suspending lgwr process using oradebug, issue commit for an open transaction in another session, and read the same data from third session successfully even though the second session's commit record is not yet preserved to disk.

You can read more about it at:
http://tonyhasler.wordpress.com/2011/08/22/why-is-acid-important/
http://jonathanlewis.wordpress.com/2011/08/19/redo-2/
http://www.freelists.org/post/oracle-l/ACID

This basically means Oracle is not fully ACID complaint.

Even though some people are categorizing it as an architectural limitation, it is a bug to the customer.

In reality it is very rare to run into this problem because:

1) Quite often we see that the session (or process) committing a transaction is the one moving on to finish other dependent transactions before returning acknowledgment back to the customer.
2) Session reading commit in flight data usually go on to initiate it's own transaction which will either get blocked by the same underlying redo problem or the whole transaction will fail if redo writes are still blocked.
3) Redo write times are typically under couple of milli seconds and hence the exposure window is very narrow.
4) Distributed databases are not that widely used
5) Event based scheduling with changes spanning more than 1 database is not very common, i.e. batch job waiting for an event to happen (can be simple change in status table) and then run the job.

Ironically, if an application is just reading data from one database and making decisions on transactions happening on other databases, then using active data guard standby (ADG) for the read part will solve the problem naturally, although the same problem can still happen in theory if both primary and ADG standby go down at the same time (It'd be very interesting to test this out).

Here is my take on solving this problem:

When a query is executed, Oracle computes current SCN and construct read consistent data as of that SCN. This is very fundamental to this problem, do they really have to run the query as of current SCN?

I think it won't add too much overhead, if lgwr can post the SCN, to SGA location, up to which it has written redo to the disk from log buffer. Every session executing sql statements can check this memory location for determining on disk SCN up to which redo was flushed and then execute the query as of that SCN. This will solve the problem.

It should not be done for DML's, including select for updates, all of which need to read the data as of current SCN.

By default Oracle should still maintain the current behavior but provide session/system/query level parameter to change the behavior of executing read only queries as of disk SCN.

Since Oracle can write redo out of scn order from log buffer to redo log files (recovery process does sort it by SCN before applying the redo), determining the correct on disk SCN may pose a challenge.