Oracle Cache Fusion Explained – Part II

Jul 19th, 2010

Cache fusion potentially expose your database to many contention problem on resources. Typical contentions are:

  • Read/Read contention – a false problem as a shared storage can be deal this contention without GCS intervention
  • Write/Read contention – solved using consistent read technology
  • Write/Write contention – contention addresses to Cache Fusion technology solved shipping needed block across instances using interconnect

Consistent Read Server

How cache fusion can deal the write/read contention? Behavior used in past is well known. When an instance needs to read a block that was currently in transaction from another one, it has to request to DLM that requests to the owning instance to downgrade the lock. If owning instance can downgrade the lock write the block to disk and inform DLM, so the other instance is able to read block from disk. This process is in itself I/O intensive has you have to write and read from disk the block.

What is done with new cache fusion? Suppose that you need to modify a block that is recently modified by another instance…..Cache Fusion will do:

  1. Instance sends a message to lock manager to request a shared lock on the block
  2. Lock manager asks the owning instance to build a CR copy and ship it to requesting instance
  3. The owning instance builds up CR  version in own buffer cache and shit thought interconnect to requesting instance
  4. The owning instance informs DLM and requesting instance about block already shipped
  5. Requesting instance informs DLM that now it holds the block

This process is really faster than in past and does not involve, normally, disk I/O. In fact during build of CR copy if owning instance does not find block in his own buffer cache will refuse to build the CR copy.

Also owning instance will prevent itself from flooding requests of CR copies. After the fourth request it will release the lock, write block to disk and let requesting instance to read if from disk. This behavior may be controlled with the hidden parameter _fairness_threshold.

Write/Write Cache Fusion

Write/write cache fusion sometimes is called Cache Fusion II just to address this as a second new features. Background is fairly the same. An instance is requesting a block that is recently modified by another instance. To do so in past block would be written and read again from disk. What about with Cache Fusion? The operation is easy:

  1. Instance sends a message to lock manager to request a shared lock on the block
  2. Lock Manager asks the owning instance to release the lock
  3. Owning instance sends current block to requesting instance via interconnect. For recovery porpurses owning instance write all change vectors to online redolog. Also keep a past image of the block and inform DLM that block is sent to requesting instance.
  4. The requesting instance acquire the lock and when block is written to disk informs the GCS to broadcast the message to discard the past images of the block.

Sequence bug or undocumented feature of 11gR2

Jun 10th, 2010

I’ve hit this bug in my environments. See below:

SQL> create sequence ste_seq
  2  start with 1
  3  increment by 1
  4  nocache
  5  noorder;

Sequence created.

SQL> select ste_seq.nextval from dual;

   NEXTVAL
----------
         1

SQL> drop sequence ste_seq;

Sequence dropped.

Now, this is a normal sequence creation, use and drop. Now you sure expect that a new sequence creation and insertion in a table will start again from 1.  Now see below:

SQL> create sequence ste_seq start with 1 increment by 1 nocache noorder ;

Sequence created.

SQL> create table x(x number);

Table created.

SQL> insert into x(x) values(ste_seq.nextval) ;

1 row created.

SQL> select * from x;

         X
----------
         2

You see that sequence is starting from 2. Going to metalink….note 1050901.3 SEQUENCES NOT STARING WITH ’1′ WHEN USED IN INSERT … good. Two published bugs closed as ‘NOT A BUG’.
All is related to a feature called deferred segment creationg that is document in metalink note: 887962.1.
This feature permits you to defer the first segment creation when you create an empty table. Also helps to save space that sometimes may be wasted creating empty tables.
Better works when your developers create a table with INITIAL clause oversized that cause, in normal condition, oracle to allocate the first big extent for an empty table.

If you hit this bug and you can’t go ahead you can workaround setting

SQL> show parameter def

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
deferred_segment_creation            boolean     FALSE

Oracle Cache Fusion Explained – Part I

May 9th, 2010

A core component of Oracle RAC Database is the Cache Fusion. Cache Fusion should not be intended as a merged buffer cache across all RAC instances. This is what many people thinks and what you may see but the reality is that each node has a private buffer cache and thought Cache Fusion blocks needed by other instances are sent using private interconnect in a complex distributed locking environment.

Cache Fusion Components – PING
Ping is the word used to describe the transfer of a block from instance’s buffer cache to another. Process is quite interesting: when an instance needs a block sends a request to the lock master to obtain the desired lock type for that block. If requested block is already locked by other transaction/instance, master will ask to downgrade or release the current lock. This process is called blocking asynchronous trap, or somewhere simple BAST. An instance receiving the BAST usually try to downgrade or release the lock as soon as possibile and to do this many times block has to be flushed to disk. The whole process is called disk ping, or in some books, hard ping. However, from version 10g, when possible, instances try to send the block across private interconnect without write it to disk. Not always this operation is unexpensive. An high transfer of blocks across instances reveal a design problem and impact performances.

Deferred Ping
Deferred Ping is one method used by Oracle to reduce pinging. An instance receiving a BAST may defer to send the block or downgrade its current lock for a maximum of ten milliseconds. This time is used to complete the transaction and write at least block header. Doing this eliminate the need to check the transaction for the receiving instance. Consider that if you read a block header where is marked that a transaction is currently running the instance need to do some checks for example to undo data blocks and this another expensive operation.

Past Image Block (PI)
PI blocks are the copies of blocks recently modified that are sent to another instance due to a BAST. PI blocks are taken by the instance until blocks are flushed to disk by the current owner. When the instance currently owning the blocks write them to disk the Global Cache Services – GCS – tell to the other instances to discard the related PIs. Blocks are written to disk due to checkpoint. If checkpoint is issued from another instance is responsibility to GCS to find the most recent block image to write to disk.

Lock Mastering
Global Cache Services has a memory structure where stores informations about data blocks, this is called lock resource. Every instance tracks its locks and share them also with other instances. As this is a distributed process for each lock exists a Master node. Master node contains information about block, current user, locking request and PIs block. Whenever an instance need a block have to request a data block, it need to contact the master of lock resources. If master is not local, private interconnect will be used to send messages.
Lock combinations available are:

Lock type Description
NL0 Null Local and No past Images
SL0 Shared Local with no past image
XL0 Exclusive Local with no past image
NG0 Null Global – Instance owns current block image
SG0 Global Shared Lock – Instance owns current image
XG0 Global Exclusive Lock – Instance own current image
NG1 Global Null – Instance Owns the Past Image Block.
SG1 Shared Global – Instance owns past Image
XG1 Global Exclusive Lock – Instance owns Past Image.

Remember to query view gv$bh to get block status.

Tags:

Does column position affect performance?

May 2nd, 2010

Does column position affect performance? The answer is Yes, definitely. Even if many people does not think so or worse they ignore it. Let’s explain in few words how Oracle stores row for a table.

Oracle row design

For each row you have  a row header that contains usually general information about the data as block address, and type of segments. Also it contains the information about table and the actual row (address) which that holds the data.While Oracle reads each row starting, obviously, from the header and then walking to right till the needed column it’s clear that in a table with 300 columns, reading the 300th column is more expensive than reading the 10th. This cost may be computed in about 20 * column_position and as you may think is quite high.


SQL> @demoscripts\create300col.sql

Table create.

SQL> @demoscripts\populate300col.sql

SQL> select count(*) from t300col;

  COUNT(*)
----------
     10502                                                                                                                                                                                              

SQL> execute dbms_stats.gather_table_stats(user,'t300col');

PL/SQL procedure successfully completed.

SQL> -- now explain plan for select c1 ... c300 from t300col
SQL> -- then I gather information from plan table.
SQL> -- Costs will be computed for every 10 rows.
SQL> @demoscripts\exp300col.sql

PL/SQL procedure successfully completed.

SQL>SELECT statement_id, cpu_cost AS TOT_CPU_COST,
 2     cpu_cost-lag(cpu_cost) OVER (ORDER BY statement_id) AS CPU_10ROW_COSTS,
 3     io_cost
 4     FROM plan_table
 5     WHERE id = 0
 6     ORDER BY statement_id;

STATEMENT_ID                   TOT_CPU_COST   CPU_10ROWS_COSTS   IO_COST
------------------------------ -------------- ---------------- ----------
1                                   756494187                      23184
10                                  758384547         1890360      23184
20                                  760484947         2100400      23184
30                                  762585347         2100400      23184
40                                  764685747         2100400      23184
50                                  766786147         2100400      23184
60                                  768886547         2100400      23184
70                                  770986947         2100400      23184
80                                  773087347         2100400      23184
90                                  775187747         2100400      23184
100                                 777288147         2100400      23184
110                                 779388547         2100400      23184
120                                 781488947         2100400      23184
130                                 783589347         2100400      23184
140                                 785689747         2100400      23184
150                                 787790147         2100400      23184
160                                 789890547         2100400      23184
170                                 791990947         2100400      23184
180                                 794091347         2100400      23184
190                                 796191747         2100400      23184
200                                 798292147         2100400      23184
210                                 800392547         2100400      23184
220                                 802492947         2100400      23184
230                                 804593347         2100400      23184
240                                 806693747         2100400      23184
250                                 808794147         2100400      23184
260                                 810894547         2100400      23184
270                                 812994947         2100400      23184
280                                 815095347         2100400      23184
290                                 817195747         2100400      23184
300                                 900449897        83254150      23184

So you may see that for each 10 columns there’s the same difference in CPU costs and this is quite high. In my example all column were defined as varchar2(255) except for the column 1 that was NUMBER. Now it’s clear that this part of database design is very important and may really influence your performances. Many times you can change columns order of a table in online mode using dbms_redefinition but only if your software for each DML and query operation reference any single column. Any star query may not work anymore if columns order is changed.

Oracle OPAQUE_TRANSFORM undocumented hint

Apr 20th, 2010

OPAQUE_TRANSFOR comes as undocumented hint you can usually see in a data warehouse environment. Oracle rewrite query INSERT INTO..SELECT or CREATE TABLE..AS every time you refer to a remote instance using DBLink. OPAQUE_TRANSFORM appears every time query is distributed and involves the SELECT part from remote database.

SQL> insert into t1 select * from t1@lab11r2 ;

1099999 rows created.

Execution Plan
----------------------------------------------------------
Plan hash value: 1788691278

-----------------------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     | Inst   |IN-OUT|
-----------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT |      |  1074K|   409M|  2291   (1)| 00:00:28 |        |      |
|   1 |  REMOTE          | T1   |  1074K|   409M|  2291   (1)| 00:00:28 | LAB11~ | R->S |
-----------------------------------------------------------------------------------------

Remote SQL Information (identified by operation id):
----------------------------------------------------

 1 - SELECT /*+ OPAQUE_TRANSFORM */ "ID","DATA1","DATA2","DATA5" FROM "T1"
 "T1" (accessing 'LAB11R2' )

Metalink reports for the search string “OPAQUE_TRANSFORM” only one result as note id 780503.1 and of course two bugs: 5565178 and 7417255. Metalink says exactly:

The OPAQUE_TRANSFORM hint is to help with the transformation of datatype when certain type of operations are done within the database.

Don Burleson refers here to two theories: first suppose that OPAQUE_TRANSFORM is used by Oracle to transform query in an ‘earlier dialect’ to bypass the possibily to involve different Oracle version in a distributed query. The other theory refers to an error ORA-22997 that may outcome during conversion, in different Oracle version in a distributed query, converting abstract data types to LOB columns…

Metalink describes a method to disable OPAQUE_TRANSFORM hint. This way is to set the trace for event 22825 at level 1. Doing this will disable the automatic use of OPAQUE_TRANSFORM. Event 22825 is related to import/export functions and its actions is to enable logical transformation of opaque types.

Maybe this is a smoking undocumented hint… but why?

Tags: , , ,

VMFS Corruption and recover

Mar 3rd, 2010

These days I’ve dealt with a big problem. Consider an infrastructure where you have some ESX4 connected with 4 paths to a MetroCluster NetApp 3040. For some not really known reasons out storage came out with the following error:


Wed Feb 10 10:58:12 CET [nas1b: scsitarget.fcp.dump.warning:warning]: FCP target SRAM dump disabled for adapter 0b, isp2400fct_process_ctio: Invalid CTIO Status: S_ID: 10600, OX_ID: 4F, Status 0x8 (INVALID_RX), Cmdblk 0x0000000012620a00, state Wed Feb 10 10:58:12 CET [nas1b: scsitarget.ispfct.errorReset:CRITICAL]: Error processing target scsi command on Fibre Channel target adapter 0b. Resetting the adapter to clear INVALID_RX condition.
Wed Feb 10 10:58:12 CET [nas1b: scsitarget.ispfct.reset:notice]: Resetting Fibre Channel target adapter 0b.

This for all four paths. A complete resets. Obviously VMWare didn’t understand the situation and from its own side reported lost connectivity to the datastore. Which was the problem? Vmware lost some opened files. Worse. Some files and directories were corrupted. Also was not prossibile to remove them due to a filesystem inconsistency.

We’ve called VMWare that confirmed us that there’s no way to solve the issue. We had to remove datastore and create a new one. VMFS does not provide, publicly, a way to make a consistency check. Also, VMFS  and VMWare technology does not commit data always, so in case of connectivity lost many files will result opened and not corrupted.

Looking around there’s a feature, available with argv -R, that should check a VMFS. But reading more, this works only to unlock locked files after a crash of an ESX server. Does not repair consistency problem. Since VMFS is a proprietary filesystem I’m not able, and maybe there’s not, a document describing itself to understand how and why you may find it corrupted.

Tags:

Do you really need a RAC Database?

Jan 19th, 2010

Really you need an Oracle RAC database? I’m sure this question maybe a joke, but isn’t it. I’ve seens dozens of RAC database used simply to provide a clustering or a sort of failover and high availabilty service. But this is not true, definitly not.

First if you ask for an Oracle database high availabilty solution and your DBA or your consultat answer saying “RAC” please fight him.
RAC, Real Application Cluster, as acronymous is not a normal cluster but a particular database configuration and structure that only in some case may provide an high availabilty service. Usually many RAC are configured doubling the same hardware that many times is not good for an instance at all and using a low performance storage and a not efficient interconnect network.

The big attention should be paid to the application. An application that was built for a standalone single Oracle instance does not work obviously in a RAC environment. Today I think that only few application may run on a RAC gaining performance. Too many times let a normal application run on a RAC generate only a performance problem and also expose your database to some new, and sometimes interesting, wait events. I’ve to say that to take full advantage of the particular high availability services in a RAC enviroment few special considerations are to be checked with application and the TAF (Transparent Application Failover) protocol.

RAC can solve problem only if the whole environment is Oracle RAC friendly, otherwise you will face with many problems. Don’t trust a DBA that propose without any care a RAC solutions. You will only pay more to get less.

Tags: ,

How to track when an initialization parameter was change

Jan 9th, 2010

I was looking online for this issue: are you able to know when a parameter was changed in your database or also track all parameters changes in your database? Kerry Osborne has the solution. Even if you’re able to check in your alert log file for changes (Oracle logs every changes to parameters) you can also use AWR! Table WRH$_PARAMETER holds the answer.

select instance_number instance, snap_id, time, parameter_name, old_value, new_value from (
select a.snap_id,to_char(end_interval_time,'DD-MON-YY HH24:MI') TIME,  a.instance_number, parameter_name, value new_value,
lag(parameter_name,1) over (partition by parameter_name, a.instance_number order by a.snap_id) old_pname,
lag(value,1) over (partition by parameter_name, a.instance_number  order by a.snap_id) old_value ,
decode(substr(parameter_name,1,2),'__',2,1) calc_flag
from dba_hist_parameter a, dba_Hist_snapshot b , v$instance v
where a.snap_id=b.snap_id
and a.instance_number=b.instance_number
and parameter_name like nvl('&parameter_name',parameter_name)
and a.instance_number like nvl('&instance_number',v.instance_number)
)
where
new_value != old_value
and calc_flag not in (decode('&show_calculated','Y',3,2))
order by 1,2
/

Original query (by Kerry Osborne) is available here. For the whole explaination check the trackback.

Oracle undocumented trace parameters

Jan 9th, 2010

Do you know that there are more than thirty not documented and hidden parameters for tracing?
You can find them with this query:

select 	ksppinm "Parameter Name",
		ksppstvl "Value",
		ksppstdf "Default",
from	x$ksppi x, x$ksppcv y
where	x.indx = y.indx
and		ksppinm like '/_%trace%' escape '/' ;

Note that Oracle 9i has about 26 parameters (if I remember right). Oracle 10g has 35 parameters. Not yet test on 11g, I’ll do it. Also some parameters are only RAC related as _ges_trace (for GES/GCS trace)

Unable to run DBCONSOLE – OC4J Configuration issue

Dec 28th, 2009

Sometimes you can get this error:

[oracle@oratest03 scripts]$ emctl start dbconsole
Environment variable ORACLE_UNQNAME not defined. Please set ORACLE_UNQNAME to database unique name.
[oracle@oratest03 scripts]$
[oracle@oratest03 scripts]$ export ORACLE_UNQNAME=T11R2
[oracle@oratest03 scripts]$ emctl start dbconsole
OC4J Configuration issue. /app/oracle/product/11.2.0.1/oc4j/j2ee/OC4J_DBConsole_oratest03_T11R2 not found.

This is due to a misconfiguration of dbconsole during database setup. The fastest solution is usually to recreate repository and configuration.

First, deconfigure dbconsole:

[oracle@oratest03 bin]$ ./emca -deconfig dbcontrol db -repos drop

STARTED EMCA at Dec 28, 2009 12:15:33 PM
EM Configuration Assistant, Version 11.2.0.0.2 Production
Copyright (c) 2003, 2005, Oracle.  All rights reserved.

Enter the following information:
Database SID: T11R2
Listener port number: 1521
Password for SYS user:
Password for SYSMAN user:  

Do you wish to continue? [yes(Y)/no(N)]: Y
Dec 28, 2009 12:15:44 PM oracle.sysman.emcp.EMConfig perform
INFO: This operation is being logged at /app/oracle/cfgtoollogs/emca/T11R2/emca_2009_12_28_12_15_32.log.
Dec 28, 2009 12:15:44 PM oracle.sysman.emcp.EMDBPreConfig performDeconfiguration
WARNING: EM is not configured for this database. No EM-specific actions can be performed.
Dec 28, 2009 12:15:45 PM oracle.sysman.emcp.EMReposConfig invoke
INFO: Dropping the EM repository (this may take a while) ...
Dec 28, 2009 12:17:33 PM oracle.sysman.emcp.EMReposConfig invoke
INFO: Repository successfully dropped
Enterprise Manager configuration completed successfully
FINISHED EMCA at Dec 28, 2009 12:17:33 PM

Now, let’s configure it again

[oracle@oratest03 bin]$ ./emca -config dbcontrol db -repos create                                          

STARTED EMCA at Dec 28, 2009 12:17:43 PM
EM Configuration Assistant, Version 11.2.0.0.2 Production
Copyright (c) 2003, 2005, Oracle.  All rights reserved.

Enter the following information:
Database SID: T11R2
Listener port number: 1521
Listener ORACLE_HOME [ /app/oracle/product/11.2.0.1 ]:
Password for SYS user:
Password for DBSNMP user:
Password for SYSMAN user:
Email address for notifications (optional):
Outgoing Mail (SMTP) server for notifications (optional):
-----------------------------------------------------------------

You have specified the following settings

Database ORACLE_HOME ................ /app/oracle/product/11.2.0.1

Local hostname ................ oratest03
Listener ORACLE_HOME ................ /app/oracle/product/11.2.0.1
Listener port number ................ 1521
Database SID ................ T11R2
Email address for notifications ...............
Outgoing Mail (SMTP) server for notifications ............... 

-----------------------------------------------------------------
Do you wish to continue? [yes(Y)/no(N)]: Y
Dec 28, 2009 12:17:58 PM oracle.sysman.emcp.EMConfig perform
INFO: This operation is being logged at /app/oracle/cfgtoollogs/emca/T11R2/emca_2009_12_28_12_17_43.log.
Dec 28, 2009 12:17:59 PM oracle.sysman.emcp.EMReposConfig createRepository
INFO: Creating the EM repository (this may take a while) ...
Dec 28, 2009 12:23:44 PM oracle.sysman.emcp.EMReposConfig invoke
INFO: Repository successfully created
Dec 28, 2009 12:23:47 PM oracle.sysman.emcp.EMReposConfig uploadConfigDataToRepository
INFO: Uploading configuration data to EM repository (this may take a while) ...
Dec 28, 2009 12:24:41 PM oracle.sysman.emcp.EMReposConfig invoke
INFO: Uploaded configuration data successfully
Dec 28, 2009 12:24:43 PM oracle.sysman.emcp.util.DBControlUtil configureSoftwareLib
INFO: Software library configured successfully.
Dec 28, 2009 12:24:43 PM oracle.sysman.emcp.EMDBPostConfig configureSoftwareLibrary
INFO: Deploying Provisioning archives ...
Dec 28, 2009 12:25:13 PM oracle.sysman.emcp.EMDBPostConfig configureSoftwareLibrary
INFO: Provisioning archives deployed successfully.
Dec 28, 2009 12:25:13 PM oracle.sysman.emcp.util.DBControlUtil secureDBConsole
INFO: Securing Database Control (this may take a while) ...
Dec 28, 2009 12:25:54 PM oracle.sysman.emcp.util.DBControlUtil secureDBConsole
INFO: Database Control secured successfully.
Dec 28, 2009 12:25:54 PM oracle.sysman.emcp.util.DBControlUtil startOMS
INFO: Starting Database Control (this may take a while) ...
Dec 28, 2009 12:26:19 PM oracle.sysman.emcp.EMDBPostConfig performConfiguration
INFO: Database Control started successfully
Dec 28, 2009 12:26:19 PM oracle.sysman.emcp.EMDBPostConfig performConfiguration
INFO: >>>>>>>>>>> The Database Control URL is https://oratest03:1158/em <<<<<<<<<<<
Dec 28, 2009 12:26:25 PM oracle.sysman.emcp.EMDBPostConfig invoke
WARNING:
************************  WARNING  ************************

Management Repository has been placed in secure mode wherein Enterprise Manager data will be encrypted.  The encryption key has been placed in the file: /app/oracle/product/11.2.0.1/oratest03_T11R2/sysman/config/emkey.ora.   Please ensure this file is backed up as the encrypted data will become unusable if this file is lost. 

***********************************************************
Enterprise Manager configuration completed successfully
FINISHED EMCA at Dec 28, 2009 12:26:25 PM

That’s all. Remember, if you want or need, to backup you encryption key.