Author Archive

Theory of Constraints: the rise of NoSQL

July 9th, 2014



Why did NoSQL arise?  The rise of NoSQL makes no sense if you read an article like “Why You Should Never Use MongoDB”  and the ycombinator replies . The I read something like this “Why You Should Use MongoDB” and don’t find any reasons just  ranting about why the first article was shoddy.

One theory proposed by Martin Fowler is that NoSQL arose as a way to get around the constraint of the DBA bottleneck. When developers want to change the schema or get copies of the source database the DBAs are just seen as an obstacle. The obstacle of provisioning environment is told in nail biting prose in Gene Kim’s “The Phoenix Project”

Long before NoSQL and The Phoenix project, about 13 years ago, I was the DBA at a startup, back in the first dot com bubble. As a startup we were trying to get our project to market ASAP. The development team was working crazy hours and the schema was changing all the time.  As the DBA, it was my responsibility to validate all the schema changes and apply the schema changes to the development database. Such validation and application could take a couple hours considering all the other work that was on my plate. This couple hour delay was considered such a hinderance to developers that they unilaterly decided to dump the structured schema and go to an EAV  data store. EAV stands for “Entity Attribute  Values” model. From wikipedia

There are certain cases where an EAV schematic is an optimal approach to data modelling for a problem domain. However, in many cases where data can be modelled in statically relational terms an EAV based approach is an anti-pattern which can lead to longer development times, poor use of database resources and more complex queries when compared to a relationally-modelled data schema.
The EAV “schema” (more like schema-less) meant that perfromance was dreadful and the SQL was impossible to debug as it was completely obfuscated. An example of an EAV query might look ike
Select f.symbol
from strings_eav  eav1,
       integer_eav  eav2,
       strings_eav  eav3,
       fundmstr f
where =
 and =
 and =
 and eav1.attr_num = (select attar_num from attributes where attr_id = :"SYS_B_00")
 and eav1.attr_value=:"SYS_B_01"
 and eav1.act_date <= SYSDATE and eav1.end_date > SYSDATE
 and eav2.attr_num = (select attar_num from attributes where attr_id = :"SYS_B_02")
 and eav2.attr_value=:"SYS_B_03"
 and eav2.act_date <= SYSDATE and eav2.end_date > SYSDATE
 and ((f.def_attr_templ = :"SYS_B_04"
         and eav3.attr_num=(select attar_num from attributes where attr_id = :"SYS_B_05")
         and eav3.attr_value >= to_date(:"SYS_B_06")
         and eav3.act_date <= SYSDATE and eav3.end_date > SYSDATE 
         and exists ( select null from integer_eav eav4 
                           where =
                                     and in ( select attr_num 
                                                             from attributes 
                                                             where attar_id = :"SYS_B_07"
                                                                  or  attar_id = :"SYS_B_08")
                                     and eav4.attr_value = :"SYS_B_09"
                                     and eav4.act_date <= SYSDATE and eav4.end_date > SYSDATE)
       and not exists ( select null from integer_eav eav5
                              where =
                                     and = ( select attr_num 
                                                             from attributes 
                                                             where attar_id = :"SYS_B_10")


Eventually as development stabilized and performance was abominable we were able to get everyone on board and move the EAV back into a structured schema. Once we went to a structured relational schema the load on the database fell from completely saturated to almost idle.

Moral of the story

EAV can be good for quick development but for production it’s completely unreadable thus impossible to debug and the performance is atrocious.

Instead of using EAV one could give each developer a virtual database and let them work as fast as they can and then let them test out merging there changes in virtual databases before merging them in to trunk which is itself a virtual database that can be branched from and merged into.

Virtual databases take up almost no storage and can be made in minutes with a developer self service interface.

Screen Shot 2014-07-03 at 10.18.02 AM






Avoiding Data Doomsday: succeed now or pay later

July 7th, 2014

 “Virtualized data is fast becoming the new norm. No one can justify using storage like before” – Tim Gorman

How to Prevent a Data Doomsday 
Gartner has famously noted the exponential growth of data and predicted a data doomsday. While IT organizations have been struggling against current data sizes for a while, they largely hobbled along using old approaches to data management. One of those approaches, snapshotting, is becoming increasingly prohibitive now for two reasons: the bureaucracy and the technology.

  • Bureaucracy: A task that might take an hour when someone is idle can take multiple days as that person starts to be 95% busy or more. The more people involved such as DBAs, sys admins, storage admins, network admins then the longer it takes to manage the snapshots leading to days and weeks of delays.
  • Technology: All of the technical solutions that have been used traditionally require specialized expertise, manual intervention and scripting for third parties.
As a result of these two major bottlenecks in data management, IT organizations are learning that a new approach of virtualized storage is stripping away the time spent in bureaucratic cycles and the difficulties in managing snapshotting technologies.
Virtualized storage is quickly becoming the new norm. CIOs can either get on board now to reap competitive advantage, or wait a few years and realize that the reason they’re having to play catch-up with their competitors is that their infrastructures have become obsolete.

Already the leading companies in their industry such as

  • Fortune #1 Walmart
  • #1 pharmaceutical Pfizer
  • #1 social Facebook
  • #1 US bank  Wells Fargo
  • #1 networking  Cisco
  • #1 cable provider Comcast
  • #1 auction site Ebay
  • #1 insurance New York Life
  • #1 chip manufacture Intel
use data virtualization and are extending their lead in their markets by increasing application productivity 20-50% and sharping business intelligence with fast access to the newest data. Those companies stuck in the old data management paradigms  of cloning and copying data will fall farther and farther behind. Ultimately only those companies using data virtualization will manage to succeed.

Data virtualization is the most powerful method to improve data flow and development output as well as quality in the 20+ years I’ve been working with IT departments. It improves businesses by eliminating the enormous infrastructure, bureaucracy and time drag that it takes to provision databases and data for business intelligence groups, development environments, auditing requirements and backups. Development environments and business intelligence groups depend on having a copies of production data and databases and data virtualization allows provisioning in a few minutes with almost no storage overhead by sharing duplicate blocks among all the copies. As opposed to snapshot technologies, data virtualization automates the complete end to end process of collecting changed data from data sources, storing that data in a timeflow and provides provisioning of that data to target machines in minutes with a few clicks of a mouse. Because virtualized data is centrally managed the full chain of custody can be audited and managed as well.

For more on data virtualization and the over all process of data supply chain that data virtualization accomplishes, see how Delphix accomplished this technology.


cut_through_maze copy


kksfbc child completion

July 2nd, 2014

I’ve run into the wait “kksfbc child completion” a few times over the past but found very little written about it. I don’t have an explanation, but I have something that might be as good – a way to reproduce it. By being able to reproduce I at least test theories about it.
I ran a tight loop of

insert into t value(1);
See anything strange? I put “value” instead of “values” so the insert errored out. Thus causing parsing every execute. So this wait seems to be related to parsing. Could run further tests by correcting and adding a different litteral value to the insert to see what kind of waits might result.
Another tell tale sign that there was an error was that I also recieved “SQL*Net break/reset to client” which is a sign of an error in the sql statements that the application is attempting to execute.
oracle-l response was:
From: K Gopalakrishnan <>
Date: Sun, 11 Feb 2007 02:00:45 -0500
Message-ID: <[email protected]>
kks is Kompilation Layer function to Find the Bounded Cursor. Typically related to parsing (hard) which sometimes calls kksscl to scan the child list on those cursors when the cursors are shared.
Dion Cho says:
It’s literal meaning is “waiting till building the child cursor completes”, which is naturally related to the hard parse issue and multiple children problem.

The main reasons are known to be massive hard parses or library cache bugs. 


Redo log waits in Oracle

July 1st, 2014


Redo is written to disk when
User commits
Log Buffer 1/3 full (_log_io_size)
Log Buffer fills 1M
Every 3 seconds
DBWR asks LGWR to flush redo
Sessions Committing wait for LGWR

Redo Log Wait Events

Log file Sync


Wait for redo flush upon:
P1 = buffer# in log buffer that needs to be flushed
P2 = not used
P3 = not used
Commit less
Often possible in loops that commit every loop
Commit every 50 or 100 instead
Put redo on dedicated disk
Use Raw Device and/or Direct IO
More Radical
Consider Ram Disks
Can stripe if redo writes are comparable to stripe size
Striping shouldn’t hurt
Striping can help
Ex: imp – can have large redo writes – can improve by 10-30%
Alternate disks for redo and archiving of redo
Possibly 10gR2
Commit could be lost if machine crash
Or IO error

log buffer space


Wait for space in the redo log buffer in SGA
Increase log_buffer parameter in init.ora
Above 3M log_buffer little affect, if still a problem then backup is at disk level
Improve disk IO for redo
Faster disk
Raw file
Direct IO
Dedicated disk
p1, p2, p3 – no values

log file switch (archiving needed)

No p1,p2,p3
Database “hangs” for transactions
archive log stop;
– make room in log_archive_dest
archive log start;



log file switch (checkpoint incomplete)


No p1,p2,p3 args
Wait for checkpoint to complete because all log files are full
Add more log files
Increase size of log files

log file switch (private strand flush incomplete)

Like a “log file switch Completion”

log file switch completion


No p1,p2,p3
Wait for lgwr to switch log files when generating redo
Increase redo log file size
Reduces frequency of switches
What happens when a log file switch occurs:
Get next log file from control file
Get Redo Copy and Redo Allocation latch
Flush redo
Close File
Update Controlfile
Set new file to Current
Set old file to Active
If in Archivelog mode add file to archive list
Open all members of new logfile group
Write the SCN to the headers
Enable redo log generation
DBWR makes a list of blocks that need to be written out in order to over write the Redo log file a list of blocks that need to be written out in order to over write the Redo log file

switch logfile command

Same as log file switch completion but the command is executed by the dba
Alter system switch logfile;

Redo Log Sizing Concerns and Considerations

What happens to recovery time if I change my redo log file sizes
Larger Redo Log size can increase recovery time but
There are init.ora parameters to limit this
Seconds, limits lag between primary and standby
Increases log file switches
Seconds to Recovery
Easy and accuracy
Is overridden by FAST_START_IO_TARGET
alter system set fast_start_mttr_target=17 scope=both;




Redo over multiple weeks

June 30th, 2014

I’ve always wanted some sort of calendar view of load where I could see patterns across the same days of the week and same hours of the day and then be able to pick different periods and diff them:

The first step of such a display would be selecting the data in such a way to represent the graphic. A graphic should be harder to do than a similar, though less powerful, ascii representation.

So here is a quick sketch of looking at redo generation over time. Right now I have to monitor how well redo can be archived which depends on how it fast it is generated. (typically what I’m more interested in is the general performance which is what the above graphic addressed)

The output below shows the redo generated per hour on average (picking the maximum hour in the bucket which is 4 hours long below) and the maximum generation at the minute level.  I may or may not be able to archive redo at the up to the minute but I do want to make sure I can keep up with each hour.

DAY    HOUR  MAX(MB)    w1     w2      w3       w4       w5
SAT     0       9               5       4       4       3
SAT     4       12              4       5       4       4
SAT     8       1               0       0       0       0
SAT     12      0               0       0       0       0
SAT     16      1               0       0       0       0
SAT     20      4               1       0       0       1
SUN     0       10              3       3       1       3
SUN     4       13              6       5       3       5
SUN     8       7               6       6       0       6
SUN     12      7               0       3       0       4
SUN     16      1               0       0       1       0
SUN     20      8               3       2       0       3
MON     0       8               3       2       3       2
MON     4       7               2       3       2       1
MON     8       5               1       0       2       0
MON     12      1               0       0       1       0
MON     16      1               0       0       0       0
MON     20      7               2       2       0       2
TUE     0       14              6       5       7       4
TUE     4       7               1       1       1       2
TUE     8       3               0       0       0       0
TUE     12      1       1       0       0       0       0
TUE     16      1       1       0       0       0       0
TUE     20      3       1       1       1       1       1
WED     0       8       3       2       3       2       2
WED     4       7       2       1       3       2       2
WED     8       8       1       0       0       2       3
WED     12      7       1       0       0       1       1
WED     16      1       1       0       0       0       1
WED     20      4       1       1       1       1       1
THU     0       15      7       8       6       6       6
THU     4       8       2       1       1       1       1
THU     8       1       1       0       0       0       1
THU     12      16      1       11      0       0       1
THU     16      1       1       0       0       0       1
THU     20      4       1       1       1       1       1
FRI     0       11      2       2       2       2       2
FRI     4       8       3       1       1       1       1
FRI     8       4       1       0       0       0       0
set heading off
set feedback off
set pagesize 0
with pivot_data as (
          WW pivot_col
        , DY
        , D
        , HH
        , max(maxval) mv
        , max(average) av
      ( select distinct
               4*trunc(to_char(begin_time,'HH24')/4)     HH,
               to_char(begin_time,'DY')       DY,
               mod(to_char(begin_time,'D'),7)  D,
               mod(to_char(begin_time,'WW'),5)       WW,
               average/(1024*1024) average,
               maxval/(1024*1024) maxval,
        where  dbid=&&DBID and
               metric_name='Redo Generated Per Sec'
           and begin_time > sysdate - 31
   group by HH,D,DY,WW
select DY ||'   '||
       HH ||'   '||
       round(max(mv),0)    ||'  '||
       round(max(w1),0)  ||'    '||
       round(max(w2),0)  ||'    '||
       round(max(w3),0)  ||'    '||
       round(max(w4),0)  ||'    '||
from (
   select *
   from  pivot_data
   pivot ( avg(av)
           for pivot_col in ( 1 as w1,2 as w2,3 as w3 ,4 as w4 ,5 as w5 )
group by DY,D,HH
order by D,HH
set heading on
set feedback on
set pagesize 30

PS the above pivot is for 11g, for 10g here is the query without pivot

set pagesize 100
col DY for A4
col HH for 99
col mx for 99
col w1 for 99
col w2 for 99
col w3 for 99
col w4 for 99
col w5 for 99
with pivot_data as (
        , DY
        , D
        , HH
        , max(maxval) mv
        , max(average) av
      ( select distinct
               4*trunc(to_char(begin_time,'HH24')/4)     HH,
               to_char(begin_time,'DY')       DY,
               mod(to_char(begin_time,'D'),7)  D,
               mod(to_char(begin_time,'WW'),5)       WW,
               average/(1024*1024) average,
               maxval/(1024*1024) maxval,
               metric_name='Redo Generated Per Sec'
           and begin_time > sysdate - 31
   group by HH,D,DY,WW
select DY, HH,
      round(max( mv ),0) mx,
      round(max( decode( WW, 0 , av, null ) ),0) w1,
      round(max( decode( WW, 1 , av, null ) ),0) w2,
      round(max( decode( WW, 2 , av, null ) ),0) w3,
      round(max( decode( WW, 3 , av, null ) ),0) w4,
      round(max( decode( WW, 4 , av, null ) ),0) w5
   from  pivot_data
group by DY,D,HH
order by D,HH

Just to see the redo per hour can be done more simply as:

set pagesize 100
col stat_name format a30
col MB format 9,999.99
       btime, stat_name,
       round((end_value-beg_value)/(1024*1024),2) MB
from (
       to_char(s.BEGIN_INTERVAL_TIME,'DD-MON-YY HH24:MI')  btime,
       e.value end_value,
       Lag (e.value)
              OVER( PARTITION BY e.stat_name ORDER BY s.snap_id) beg_value
   and s.begin_interval_time > sysdate -2
   and e.stat_name = 'redo size'
order by e.stat_name, begin_interval_time
where end_value-beg_value > 0
order by btime;


Big data or Agile data ?

June 27th, 2014

The big data phenomenon threatens to break the existing data supply chain (DSC) of many information providers, particularly those whose chains are neither flexible nor scalable and include too many error-prone, manual touch points.  - Cognizant

Big data is getting big hype but what exactly is driving the hype behind big data?  The driving force  is demand by businesses to answer revenue driving  questions.  Questions that drive revenue generating decisions depend on the ability to access the right data at the right time by the right people. Accessing the right data at the right time by the right people unfortunately  remains elusive with big data where, according to Gartner,  85% of Fortune 500 organizations in 2015 will still be unable to exploit big data for competitive edge. How can companies get the competitive edge? The competitive edge can be had be a new technology called data virtualization that uses existing relational databases but accelerates the access to the data.

Data is a hot topic.

Big Data is an even hotter topic.

But data agility? I don’t hear much about it.

As Forbes magazine put it :

It is time to stop the stampede to create capacity to analyze big data and instead pursue a more balanced approach that focuses on finding more data sets and understanding how to use them to improve your business.

What are we all striving for? We are striving to access the data we want, when we want it both quickly an efficiently. That’s what I call data agility.

Data agility is the big pink elephant in the room. Every one is talking about big data but  no one is talking about how do you get the data to where you want when you want. If you want to do big data, how do you get the terrabytes (TB) of data onto  your Hdoop cluster from where ever it was collected?  How does it take? How much work is it?

As Infoworld put it:

The bigger big data gets, the more challenging it becomes to manage and analyze to deliver actionable business insight. That’s a little ironic, given that the main promise of big data is the ability to make better business decisions based on compute-intensive analysis of massive data sets. The solution is to create a supply chain that identifies business goals from the start — and deploy the agile infrastructure necessary to make good on those objectives

Getting results from big data even after you have the data is difficult. Most fortune 500 companies don’t know how to get results from big data:

Through 2015, 85% of Fortune 500 orgs will be unable to exploit big data for competitive advantage.

Unlike big data, most companies already have burning questions they know how to answer if only they could get access to the data faster on their current relational databases. In their current relational databases there are many clear questions and known methods for answering those questions but the problem is getting the right data to the right place at the right time.   One of the main goals of ERP is getting data faster to answer critical business questions. Why do companies invest millions in ERP? ERP gets millions of dollars of investment every year because companies want to get answers to important business questions faster and with fresher data. Fresh data means getting the right data to the right people at the right time, which is data agility.

Data agility means getting the right data to the right people at the right time. Data is the life blood of more and more of the economy and the economy is becoming a data economy. Data agility is crucial for companies to succeed in the current and future data economy.

Agile data is the solution to getting the right data to the right place at the right time. Agile data’s technological core relies on data virtualization which  requires tracking data blocks at the storage level. By tracking data blocks at the storage level, duplicate blocks can be shared across many different copies of data while at the same time  any block changes can be stored in such a way that only the copy that made the changes sees the changed block. Tracking block changes and sharing duplicate data blocks is the core of data virtualization. The core of data virtualization has been around for almost 20 years in the form of storage system snapshots but like the internet without the web or gasoline with out a car, agile data can’t happen without an agile data platform.

An virtual data platform  automates all the pieces of tracking data and provisioning it and encapsulates it into a hardware agnostic software that provides a user friendly self service interface.

For example, how would one supply data to

  • a developer who required a copy of production yesterday at noon when bugs were seen
  • a QA team who required a copy of the development database with all it’s current changes
  • a BI team who required 24×7 access to production for ETL batch jobs
  • a production database team who needs to extract data before it was inadvertently deleted or incorrectly modified on production.

Problem with cloning a database file system snapshots

Let’s look at a concrete example of cloning a database with file system snapshots. Most any experienced storage admin can take a storage snapshot of an Oracle database running on some specialized storage capable of supplying storage snapshots. The snapshot is easy though it still may be necessary to shut the database down before taking the snapshot to ensure the data is consistent. Even when the database can remain running during the snapshot, it may still may require specialized functionality or extra vendor packages if the database spans multiple LUNs that require synchronization for the snapshots. Once the snapshot is made then an experienced DBA in coordination with the storage admin can start up a database using that snapshot. Startup up such a database will require renaming the database, and changing the location of files that were not part of the snapshot typically such as log files, trace files, initialization files and then the database will have to be recovered. If the database is being started on some other machine than that machine might require that the snapshot files been made accessible over fiber channel or mounted via NFS. If these NFS or fiber channel configuration change the file names to the datafiles then the parameter files and possibly other files will require that the datafile locations be changed before starting the database. If the copy is required up to the most recent point in time then this may require getting redo files from the source database and recovering the database down to the last minute or second. All of that is the easy case.

In the easy case we have a clone database, a “thin copy”, that shares duplicate blocks with the original database but stores changed blocks separate from the original. The problem is we now have a development, test, reporting or some other type copy of the source database running on the same storage as the source database. If the source database is an important database then our copy will be impacting the performance of the source database. Protecting the source database from performance impact it one of the huge factors for creating copies of databases.  If the storage snapshot is copy on write, then there is an even bigger impact as all writes will induce a read and two writes (read the original block, write it somewhere else, and then write out the new copy). To solve this we want to get the original database files onto separate storage. We can copy the entire database to a separate filer, call it the development filer, and then make or thin copies there. The next problem that arrises when someone wants a copy of the source database tomorrow and all we have is a copy from yesterday. In this case we have to copy across the entire source database to the development storage array which defeats the purpose of thin cloning. The purpose of thin cloning is to provide fast storage efficient clones. How do we solve all these complications using thin cloning technology?

 Solution to thin cloning : data virtualization

Thin cloning obstacles are solved using data virtualization.  Data virtualization consists of 3 technologies. The first technology continuously collects all the changes from a data source and writes them to storage capable of file system snapshots. The second technology manages the storage saving all changes in a time window and purging data older than the time window that is no longer needed. The third technology harnesses the file system snapshots and the time window to provision database thin clones to target machines either over fibre channel or NFS.  All of this technology can be rolled into a software stack that can run on commodity hardware and map a filesystem onto any storage. It can be cobbled together using a file system such as open source ZFS and scripting or it can be repacked in self contained software such as Delphix.

Data Virtualization is exploding

Four years ago data virtualization technology was non-existent. Since then hundreds of companies have moved to virtual data platforms . When looking at agile data technology, some of the key functionalities to look are are

  • Databases - support cloning major databases such as Oracle, SQL Server, PostGres
  • Applications  - support thin cloning application stacks
  • Self Service - interface should be easily usable by application developers, QA staff and BI teams to provision their own clones on demand.
  • Branching  - support branching clones meaning making clones of clones which crucial for supporting multi-version development or even just patching a previous version.
  • Synchronization - support cloning multiple related database such that each clone is cloned from the exact same point in time. Any Fortune 500 that has multiple revenue tracking databases will need to synchronize a clones of each for analyzing financial close discrepancies.
  • Cloud Ready - supports any storage and efficient low bandwidth replication across heterogenous storage types. Installs on any commodity hardware.
  • Any Time - make clones at any time down to the second with the timeflow window
  • Live Archive - save specific points in the timeflow for ever to support  compliance and auditing

Now that data virtualization is quickly maturing, the next frontier is quickly arriving which is data supply chain.

Data Supply Chain

It’s time to start treating your data less as a warehouse and more as a supply chain. Having identified your sources of data, you must corral it for analysis, in the same way that the various components come together on an assembly line. Recognize that the data won’t be static—it will be manipulated as it goes through the supply chain, added to other pieces of data, updated as more recent data comes along, and transformed into new forms as you look at different pieces of data in aggregate.      - Syed Rasheed Redhat

Data Supply Chain  features are quickly evolving but  include
Screen Shot 2014-05-21 at 8.08.47 AM
  • Security
    • Masking
    • Chain of custody
  • Self Service
    • Login and Roles
    • Restrictions
  • Developer
    • Data Versioning and Branching
    • Refresh, Rollback
  • Audit
    • Live Archive
  • Modernization
    • Unix to Linux conversion
    • Data Center migration
    • Federated data cloning
    • Consolidation

Data Supply Chain re-invents data management and provisioning by virtualizing, governing, and delivering data on demand.

As Accenture put it:

Yes, data technologies are evolving rapidly, but most have been adopted in piecemeal fashion. As a result, enterprise data is vastly underutilized. Data ecosystems are complex and littered with data silos, limiting the value that organizations can get out of their own data by making it difficult to access. To truly unlock that value, companies must start treating data more as a supply chain, enabling it to flow easily and usefully through the entire organization—and eventually throughout each company’s ecosystem of partners too.






Importing AWR repositories from cloned databases

June 26th, 2014

I sometimes want to import AWR repositories from cloned databases to compare performance differences between the databases which should have the same performance characteristics. Unfortunately the clones have the same DBID this importing their AWR repositories into a central AWR repository would be probelematic as I wouldn’t have a way to tell which database the data came from as the DBID is the same. So here is a procedure to change the DBID. THe procedure requires importing the data into the awr_stage user, changing the DBID, then moving the data into the AWR repository:
(see for info on normal exporting and importing of AWR repositories)

The script uses a  awr_change_dbid.sql to defer the constraint on DBID, change the DBID and then put the constraint back

   create tablespace AWR datafile '/home/oracle/oradata/AWR_01.dbf' size 200M;
   Drop Directory AWR_DMP;
   Create Directory AWR_DMP AS '/home/oracle/awr_dumps/index.html';
-- create staging user user
   drop user awr_stage cascade; 
   create user awr_stage
     identified by awr_stage
     default tablespace awr
     temporary tablespace temp;
   grant connect to awr_stage;
   alter user awr_stage quota unlimited on awr;
   alter user awr_stage temporary tablespace temp;
-- load data
     dbms_swrf_internal.awr_load(schname  => 'AWR_STAGE',
 				 dmpfile  => 'my_awr_dump', -- file w/o .dmp extension
                                 dmpdir   => 'AWR_DMP');
-- change dbid
-- move data
   def schema_name='AWR_STAGE'
   select  '&schema_name' from dual;
   variable schname varchar2(30);
     :schname := '&schema_name';
     dbms_swrf_internal.move_to_awr(schname => :schname);
-- verify the new DBID in repository
   col host_name for a30
   select distinct dbid,  db_name, instance_name, host_name from


Number one bottleneck in IT ?

June 25th, 2014

“Any improvement not made at the constraint is an illusion.” – Gene Kim paraphrasing “The Theory of Constraints”

What is the constraint in IT?

The constraints in IT are

  1. Provisioning environments for development
  2. Setting up test and QA environments
  3. Architecting development to facilitate easy changes in code
  4. Development speed
  5. Product management  input

Meaning, until the first constraint is eliminated it is pointless and even potentially counter productive to tune the following constraint.

The first constraint for most organizations to tackle is thus the speed and agility with which they can provision environments for development.

The above list comes from Gene Kim the author of The Phoenix Project. He  lays out these top constraints  in this interview Gene Kim interview.

In the interview Gene Kim talks about what causes the biggest delays in application development in IT.  He says, starting around minute 6:45

“I’ve been trained in the theory of constraints and one of the things I think is so powerful is the notion of the constraint in the value stream. What is so provocative about that notion is that any improvement not made at the constraint is an illusion. If you fix something before the constraint you end up with more work piled up in front of the constraint. If you fix something after the constraint you will always be starved for work.

In most transformations, if you look at what’s really impeding flow, the fast flow of features, from development to operations to the customer, it’s typically IT operations.

Operations can never deliver environments upon demand.

People have to wait months or quarters to get a test environment. When that happens terrible things happen. People actually horde environments. They invite people to their teams because the know they have reputation for having a cluster of test environments so people end up testing on environments that are years old which doesn’t actually achieve the goal.

One of the most powerful things that organizations can do is to enable development and testing to get environment they need when they need it.

After that it’s about test setup time and test run time one that is eliminated .

After that is eliminated it’s typically about architecture. How do we make changes that don’t require 15 other changes simultaneously. How do we create more looser couplings.

Then after that the constraint moves into development or product management.

It is a very technical cultural obstacle is just making available environments for people who need it whether it’s production development or tests.”




It’s about : Data Supply Chain

June 24th, 2014

There have been a number of questions coming my way about Delphix versus snapshot technologies. The capabilities of Delphix can be differentiated from snapshot technologies through the following hierarchy:

Screen Shot 2014-06-24 at 6.57.54 AM

  1. Data Supply Chain (Delphix approach to data management)
  2. Data Virtualization (end-to-end collection and provisioning of thin clones)
  3. Thin Cloning
  4. Storage Snapshots
On top we have the most powerful and advanced data management features that enable fast, easy, secure, audit-able data flow through organizations.
Screen Shot 2014-05-21 at 8.08.47 AM
Data Supply Chain is built on top of other technologies. On the bottom we have the minimal building blocks starting with storage snapshots.  Storage snapshots can be used to make “thin clone” databases. Storage snapshots have been around for nearly 2 decades but have seen little usage for database thin cloning due to the technical and managerial hurdles. Part of the difficulty with creating thin clones is that thin cloning requires work by multiple people and/or teams such as as DBAs, system admins, storage admins etc it takes to create the thin clones.
Screen Shot 2014-05-21 at 8.08.47 AM
In order to overcome the obstacles creating thin clones, all the steps can be optimized and automated in a process called data virtualization.
Screen Shot 2014-05-21 at 8.08.47 AM
Data Virtualization is just the first step in automation. The next step is adding all the processes, functionality and control to manage the virtual data which is Data Supply Chain.
Screen Shot 2014-05-21 at 8.08.47 AM
File system snapshots such as ZFS address the very bottom of the hierarchy, that is, they only manage storage snapshots. They have no automated thin cloning of databases. Without automated thin cloning of databases there is no end-to-end processing of data from source to thin cloned target i.e. data virtualization. With out data virtualization there is no data supply chain.
Screen Shot 2014-05-21 at 8.08.47 AM
Data Supply Chain  features, all of which are encompassed by Delphix, include
Screen Shot 2014-05-21 at 8.08.47 AM
  • Security
    • Masking
    • Chain of custody
  • Self Service
    • Login and Roles
    • Restrictions
  • Developer
    • Data Versioning and Branching
    • Refresh, Rollback
  • Audit
    • Live Archive
  • Modernization
    • Unix to Linux conversion
    • Data Center migration
    • Federated data cloning
    • Consolidation

Data Supply Chain re-invents data management and provisioning by virtualizing, governing, and delivering data on demand.

Most businesses manage data delivery with manual, ad hoc processes: users file change requests, then wait for DBAs, systems administrators, and storage administrators to push data from system to system, bogging down production applications, networks, and target systems with long load times. Data delays cost businesses billions a year in lost productivity and low utilization of systems and software resources.

As a result, there  an enormous opportunity to optimize data management. Data management can be optimized with  data supply chain yielding significant business impact:

  • Drive revenue, competitive differentiation with faster application time to market
  • Enable faster growth via better release management of enterprise applications
  • Improve customer intimacy, upsell, cross-sell with faster, more flexible analytics
  • Free budget for innovation by reducing IT maintenance costs
  • Reduce compliance risk through better governance, data security.

Businesses need to manage data as a strategic asset across their operations, applying the same rigor as supply chain optimization for manufacturing companies.

Data Supply Chain Transformation Process with Delphix

Delphix applies a three-step process to transform the data supply chain:

  • Analyze: survey systems, processes, teams across data supply chains
  • Transform: virtualize, automate data delivery with centralized governance
  • Leverage: drive business value via new data products, process optimization

Businesses typically manage multiple data supply chains simultaneously, all of which are targets for data chain optimization:

  • Compliance retention, reporting
  • Modernization, migration projects
  • Application projects and development
  • BI, analytics
  • Data protection.

Delphix re-invents the data supply chain with its Virtual Data Platform:

  • Install data engines in hours across all repositories, locations (including cloud)
  • Connect: non-disruptively sync data across sites, systems, architectures
  • Control: secure data, track release versions, preserve and prove data history
  • Deploy: automatically launch virtual data environments in 10x less space, time
  • Leverage data with self service refresh, reset, branching, bookmarks, integration.

According to an IDC study, Delphix pays for itself in IT savings, with an average payback of 4.3 months.




If I can’t picture it, I can’t understand it

June 23rd, 2014

“If I can’t picture it, I can’t understand it.” – Albert Einstein

Along the same lines

“It is impossible to even think without a mental picture.” – Aristotle:

“Man’s mind cannot understand thoughts without images of them.” – Thomas Aqunias:

“The evolution of images is a kind of intermediate between that of the perceptions and that of the intelligence.” – Jean Piaget:

“Mathematics is cognitive process-thinking-that requires the dual coding of imagery and language. Imagery is fundamental to the process of thinking with numbers. Albert Einstein, whose theories of relativity helped explain our universe, used imagery as the base for his mental processing and problem solving. Perhaps he summarized the importance of imagery best when he said, ‘If I can’t picture it, I can’t understand it.’ “– NANCI BELL AND KIMBERLY TULEY

One of my favorite examples of the power of graphics to easily, quickly and powerful display quantitative information is Anscombe’s Quartet.


Anscombe’s Quartet
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89


Edward Tufte uses this example from Anscombe to show 4 datasets of x and y that have the same mean, standard deviation, and regression line, but which are qualitatively different.