<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Indexing Options for Change Data Capture</title>
	<atom:link href="http://oraclesponge.wordpress.com/2008/04/08/indexing-options-for-change-data-capture/feed/" rel="self" type="application/rss+xml" />
	<link>http://oraclesponge.wordpress.com/2008/04/08/indexing-options-for-change-data-capture/</link>
	<description>Oracle Data Warehouse Design and Architecture</description>
	<pubDate>Fri, 04 Jul 2008 18:52:28 +0000</pubDate>
	<generator>http://wordpress.org/?v=MU</generator>
		<item>
		<title>By: Joe Coffey</title>
		<link>http://oraclesponge.wordpress.com/2008/04/08/indexing-options-for-change-data-capture/#comment-51042</link>
		<dc:creator>Joe Coffey</dc:creator>
		<pubDate>Tue, 10 Jun 2008 04:02:23 +0000</pubDate>
		<guid isPermaLink="false">http://oraclesponge.wordpress.com/?p=367#comment-51042</guid>
		<description>Todor - 

Correct.  An interesting part of the product's history is that it "originally" had to mostly solve for what Siebel calls "remote" users.  Before version 7 (released in 2000 or 2001) the primary usage of Siebel was for mobile field sales users who worked disconnected on their laptops on a local database (Sybase SQL Anywhere).  Siebel had a neat little "store and forward" replication scheme and had to solve concurrency problems in this kind of architecture.

It's "connected client" architecture was 2-tier client server in which the SQL statements came directly from the client application on the user's computer.  Of course, the modification_num solution works for this architecture as well, but it really wasn't an option for the product to use a server database feature.

The mobile client architecture is still in use today - but Siebel's application server architecture (as well as the ubiquity of broadband connections compared to end of the 20th century) makes it much less common in my experience.

There are some decisions that appear to have been made in the pursuit of "database agnosticism" (although none terribly limiting in practice), but this one I think they got pretty much right.</description>
		<content:encoded><![CDATA[<p>Todor - </p>
<p>Correct.  An interesting part of the product&#8217;s history is that it &#8220;originally&#8221; had to mostly solve for what Siebel calls &#8220;remote&#8221; users.  Before version 7 (released in 2000 or 2001) the primary usage of Siebel was for mobile field sales users who worked disconnected on their laptops on a local database (Sybase SQL Anywhere).  Siebel had a neat little &#8220;store and forward&#8221; replication scheme and had to solve concurrency problems in this kind of architecture.</p>
<p>It&#8217;s &#8220;connected client&#8221; architecture was 2-tier client server in which the SQL statements came directly from the client application on the user&#8217;s computer.  Of course, the modification_num solution works for this architecture as well, but it really wasn&#8217;t an option for the product to use a server database feature.</p>
<p>The mobile client architecture is still in use today - but Siebel&#8217;s application server architecture (as well as the ubiquity of broadband connections compared to end of the 20th century) makes it much less common in my experience.</p>
<p>There are some decisions that appear to have been made in the pursuit of &#8220;database agnosticism&#8221; (although none terribly limiting in practice), but this one I think they got pretty much right.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Todor Botev</title>
		<link>http://oraclesponge.wordpress.com/2008/04/08/indexing-options-for-change-data-capture/#comment-51039</link>
		<dc:creator>Todor Botev</dc:creator>
		<pubDate>Thu, 05 Jun 2008 08:28:10 +0000</pubDate>
		<guid isPermaLink="false">http://oraclesponge.wordpress.com/?p=367#comment-51039</guid>
		<description>So MODIFICATION_NUM is the Siebel's mechanism for applying correct optimistic locking.</description>
		<content:encoded><![CDATA[<p>So MODIFICATION_NUM is the Siebel&#8217;s mechanism for applying correct optimistic locking.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Aldridge</title>
		<link>http://oraclesponge.wordpress.com/2008/04/08/indexing-options-for-change-data-capture/#comment-51037</link>
		<dc:creator>David Aldridge</dc:creator>
		<pubDate>Thu, 05 Jun 2008 00:22:27 +0000</pubDate>
		<guid isPermaLink="false">http://oraclesponge.wordpress.com/?p=367#comment-51037</guid>
		<description>Ah, very interesting. Thanks for taking the time to write that up Joe.</description>
		<content:encoded><![CDATA[<p>Ah, very interesting. Thanks for taking the time to write that up Joe.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe Coffey</title>
		<link>http://oraclesponge.wordpress.com/2008/04/08/indexing-options-for-change-data-capture/#comment-51035</link>
		<dc:creator>Joe Coffey</dc:creator>
		<pubDate>Thu, 29 May 2008 16:14:56 +0000</pubDate>
		<guid isPermaLink="false">http://oraclesponge.wordpress.com/?p=367#comment-51035</guid>
		<description>Todor,

Perhaps.  I'm not knowledgable enough about the SCN and when it increments to confirm.  But here's the scoop on MODIFICATION_NUM and why it is really there.

Each update statement is of the form:
UPDATE table set 
field1 = :bv1, 
field2 = :bv2, 
... 
modification_num = :bvN 
where row_id = :bvX 
and modification_num = :bvY

Why do such a thing?  Concurency control.

Siebel creates a data object in its application server that it calls a "Business Component" (or BC for short).  It creates a BC from a select query against the database.  The user can then work with the data in the BC - and even update it.  The updates are staged in the BC object on the application server - and only written back to the database on certain run time events (WriteRecord I believe - perhaps a few others).  

Well, Siebel doesn't take out a FOR UPDATE lock on the database.  What it does do is store the MODIFICATION_NUM and ROW_ID for each record in each table in the BC.  When the update statement is executed, Siebel sets the MODIFICATION_NUM to CURR_MOD_NUM + 1 (bvN in my example above).  It also includes CURR_MOD_NUM in the update statement (bvY in my example).  

So - what happens if another BC object has updated that record on that table in the meantime?  As you can see, the update statement will not update any rows, since the filter in the where clause "fails".  Siebel traps this error, rolls back the user's change and complains.  

Note that this can happen between sessions (which is the more logical case) or within a user session (which is the more common one).  It is possible that a user session creates two different BC objects that contain the same record in the database - and then try to update both of them in turn. This is almost always a "bug", but one that happens from time to time.  
 
So - as you can see, MODIFICATION_NUM is not really intended to be a counter of the number of times a record has been updated in the server database.  But you can also see that it effectively is.</description>
		<content:encoded><![CDATA[<p>Todor,</p>
<p>Perhaps.  I&#8217;m not knowledgable enough about the SCN and when it increments to confirm.  But here&#8217;s the scoop on MODIFICATION_NUM and why it is really there.</p>
<p>Each update statement is of the form:<br />
UPDATE table set<br />
field1 = :bv1,<br />
field2 = :bv2,<br />
&#8230;<br />
modification_num = :bvN<br />
where row_id = :bvX<br />
and modification_num = :bvY</p>
<p>Why do such a thing?  Concurency control.</p>
<p>Siebel creates a data object in its application server that it calls a &#8220;Business Component&#8221; (or BC for short).  It creates a BC from a select query against the database.  The user can then work with the data in the BC - and even update it.  The updates are staged in the BC object on the application server - and only written back to the database on certain run time events (WriteRecord I believe - perhaps a few others).  </p>
<p>Well, Siebel doesn&#8217;t take out a FOR UPDATE lock on the database.  What it does do is store the MODIFICATION_NUM and ROW_ID for each record in each table in the BC.  When the update statement is executed, Siebel sets the MODIFICATION_NUM to CURR_MOD_NUM + 1 (bvN in my example above).  It also includes CURR_MOD_NUM in the update statement (bvY in my example).  </p>
<p>So - what happens if another BC object has updated that record on that table in the meantime?  As you can see, the update statement will not update any rows, since the filter in the where clause &#8220;fails&#8221;.  Siebel traps this error, rolls back the user&#8217;s change and complains.  </p>
<p>Note that this can happen between sessions (which is the more logical case) or within a user session (which is the more common one).  It is possible that a user session creates two different BC objects that contain the same record in the database - and then try to update both of them in turn. This is almost always a &#8220;bug&#8221;, but one that happens from time to time.  </p>
<p>So - as you can see, MODIFICATION_NUM is not really intended to be a counter of the number of times a record has been updated in the server database.  But you can also see that it effectively is.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Todor Botev</title>
		<link>http://oraclesponge.wordpress.com/2008/04/08/indexing-options-for-change-data-capture/#comment-51034</link>
		<dc:creator>Todor Botev</dc:creator>
		<pubDate>Thu, 29 May 2008 13:29:11 +0000</pubDate>
		<guid isPermaLink="false">http://oraclesponge.wordpress.com/?p=367#comment-51034</guid>
		<description>Joe,

MODIFICATION_NUM sounds to me like SCN in the Siebel world - is it so?</description>
		<content:encoded><![CDATA[<p>Joe,</p>
<p>MODIFICATION_NUM sounds to me like SCN in the Siebel world - is it so?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Aldridge</title>
		<link>http://oraclesponge.wordpress.com/2008/04/08/indexing-options-for-change-data-capture/#comment-51033</link>
		<dc:creator>David Aldridge</dc:creator>
		<pubDate>Thu, 29 May 2008 02:48:43 +0000</pubDate>
		<guid isPermaLink="false">http://oraclesponge.wordpress.com/?p=367#comment-51033</guid>
		<description>Thanks Joe,

Yes, I think you might well be right on point 1. That certainly makes it more simple, though some of the basic challenges remain.

2 and 3 are very interesting also. I wish I could get that kind of insight from our own Siebel team :D</description>
		<content:encoded><![CDATA[<p>Thanks Joe,</p>
<p>Yes, I think you might well be right on point 1. That certainly makes it more simple, though some of the basic challenges remain.</p>
<p>2 and 3 are very interesting also. I wish I could get that kind of insight from our own Siebel team :D</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe Coffey</title>
		<link>http://oraclesponge.wordpress.com/2008/04/08/indexing-options-for-change-data-capture/#comment-51032</link>
		<dc:creator>Joe Coffey</dc:creator>
		<pubDate>Thu, 29 May 2008 02:37:12 +0000</pubDate>
		<guid isPermaLink="false">http://oraclesponge.wordpress.com/?p=367#comment-51032</guid>
		<description>David,

Great blog.  I tend to be more of a lurker on this and other blogs for a while.  I figured I'd offer you some thoughts on this subject though.

I see you have a solution - but one that sounds like you may revisit.  A few thoughts:

1.  It sounds like this is Siebel, so I assume that your update_dt field is really LAST_UPD.  Are you certain that this field is nullable in your implementation. Every version I've ever inspected this on (including the 8.0 I just tested) populates both CREATED and LAST_UPD on the initial insert of the record.
2.  Again, assuming this is Siebel, I'd point you to MODIFICATION_NUM being compared to an image table (as someone else suggested) as a reliable source of an answer to "What records have been updated since I last ran?"  The date fields are just dates - and have issues others have commented on.  The MODIFICATION_NUM field is updated on each insert or update statement that goes through Siebel processes.
3.  Please verify that your filter is really based on updates to the table you are working with - and not on one or more of what Siebel calls a "Business Component."  For example, if you want to extract "all orders that have been updated", your solution is not likely to drive strictly off of the S_ORDER table, but also off of the S_ORDER_ITEM table (and possibly the variety of attribute and extended attribute tables for each.)  Similarly, Siebel models the "primary address" as a part of the Account - but the S_ORG_EXT table won't be updated just becuase someone updates the zipcode on the primary address.</description>
		<content:encoded><![CDATA[<p>David,</p>
<p>Great blog.  I tend to be more of a lurker on this and other blogs for a while.  I figured I&#8217;d offer you some thoughts on this subject though.</p>
<p>I see you have a solution - but one that sounds like you may revisit.  A few thoughts:</p>
<p>1.  It sounds like this is Siebel, so I assume that your update_dt field is really LAST_UPD.  Are you certain that this field is nullable in your implementation. Every version I&#8217;ve ever inspected this on (including the 8.0 I just tested) populates both CREATED and LAST_UPD on the initial insert of the record.<br />
2.  Again, assuming this is Siebel, I&#8217;d point you to MODIFICATION_NUM being compared to an image table (as someone else suggested) as a reliable source of an answer to &#8220;What records have been updated since I last ran?&#8221;  The date fields are just dates - and have issues others have commented on.  The MODIFICATION_NUM field is updated on each insert or update statement that goes through Siebel processes.<br />
3.  Please verify that your filter is really based on updates to the table you are working with - and not on one or more of what Siebel calls a &#8220;Business Component.&#8221;  For example, if you want to extract &#8220;all orders that have been updated&#8221;, your solution is not likely to drive strictly off of the S_ORDER table, but also off of the S_ORDER_ITEM table (and possibly the variety of attribute and extended attribute tables for each.)  Similarly, Siebel models the &#8220;primary address&#8221; as a part of the Account - but the S_ORG_EXT table won&#8217;t be updated just becuase someone updates the zipcode on the primary address.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Aldridge</title>
		<link>http://oraclesponge.wordpress.com/2008/04/08/indexing-options-for-change-data-capture/#comment-51016</link>
		<dc:creator>David Aldridge</dc:creator>
		<pubDate>Wed, 14 May 2008 13:12:30 +0000</pubDate>
		<guid isPermaLink="false">http://oraclesponge.wordpress.com/?p=367#comment-51016</guid>
		<description>Well, the solution turns out to be the path of least resistance, which is a full table scan in the early hours of the morning. Frankly the bureaucratics overhead of trying to get changes into the system was more than I was prepared to take on :D

I think there's a flaw in your method though, as it relies on two kep points:

i) A max_id that is growing sequentially.
ii) Having an index on the max_id.

Which is rather the same as the problem of having an index on the update_dt itself, I feel, with the same remediations being required.</description>
		<content:encoded><![CDATA[<p>Well, the solution turns out to be the path of least resistance, which is a full table scan in the early hours of the morning. Frankly the bureaucratics overhead of trying to get changes into the system was more than I was prepared to take on :D</p>
<p>I think there&#8217;s a flaw in your method though, as it relies on two kep points:</p>
<p>i) A max_id that is growing sequentially.<br />
ii) Having an index on the max_id.</p>
<p>Which is rather the same as the problem of having an index on the update_dt itself, I feel, with the same remediations being required.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Todor Botev</title>
		<link>http://oraclesponge.wordpress.com/2008/04/08/indexing-options-for-change-data-capture/#comment-51015</link>
		<dc:creator>Todor Botev</dc:creator>
		<pubDate>Tue, 13 May 2008 18:25:43 +0000</pubDate>
		<guid isPermaLink="false">http://oraclesponge.wordpress.com/?p=367#comment-51015</guid>
		<description>Hi David,

Did you implement a solution already?

I’m a bit late with my proposal. But anyway - my idea comes down to: are there any other ways apart from create_dt to recognize the newly inserted records? Using create_dt for this seems to me too much work because we are normally interested only in the new records (hence we are using a very limited infomation of the column and the eventual index).

For example, if the table has a surrogate increasing ID column (the new reacords get the highest ID), you might take advantage of it. What about a small log table where the latest “captured” ID ls logged?

Theen the logic would look this way (you will still need an index on update_dt):

&lt;code&gt;log the current MAX_ID;&lt;/code&gt;

&lt;code&gt;select …
from …
where (ID &#62; MAX_ID_PREVIOUS_RUN and ID &#60;= MAX_ID)
or (update_dt &#62;= trunc(sysdate)-1 and update_dt &#60; trunc(sysdate))
&lt;/code&gt;</description>
		<content:encoded><![CDATA[<p>Hi David,</p>
<p>Did you implement a solution already?</p>
<p>I’m a bit late with my proposal. But anyway - my idea comes down to: are there any other ways apart from create_dt to recognize the newly inserted records? Using create_dt for this seems to me too much work because we are normally interested only in the new records (hence we are using a very limited infomation of the column and the eventual index).</p>
<p>For example, if the table has a surrogate increasing ID column (the new reacords get the highest ID), you might take advantage of it. What about a small log table where the latest “captured” ID ls logged?</p>
<p>Theen the logic would look this way (you will still need an index on update_dt):</p>
<p><code>log the current MAX_ID;</code></p>
<p><code>select …<br />
from …<br />
where (ID &gt; MAX_ID_PREVIOUS_RUN and ID &lt;= MAX_ID)<br />
or (update_dt &gt;= trunc(sysdate)-1 and update_dt &lt; trunc(sysdate))<br />
</code></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: David Aldridge</title>
		<link>http://oraclesponge.wordpress.com/2008/04/08/indexing-options-for-change-data-capture/#comment-50966</link>
		<dc:creator>David Aldridge</dc:creator>
		<pubDate>Wed, 23 Apr 2008 12:30:49 +0000</pubDate>
		<guid isPermaLink="false">http://oraclesponge.wordpress.com/?p=367#comment-50966</guid>
		<description>We don't have any stringent requirements for near real tme capture -- we're only interested in end-of-day results, not intraday. There's certainly a confusing range of options out there though.</description>
		<content:encoded><![CDATA[<p>We don&#8217;t have any stringent requirements for near real tme capture &#8212; we&#8217;re only interested in end-of-day results, not intraday. There&#8217;s certainly a confusing range of options out there though.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
