Independence

From slashdot I see that there is once again a new language in town.

Now then, there are a great many Java developers who have preached the benefits of implementing data logic (what they call business logic) in the application so-as to create RDBMS-independent code, including James Gosling who apparantly leads the pack in “not getting” SQL.

Would it not be ironic to see them now pushing all of those “database independent” applications to a new language?

In other news, PL/SQL is coming up to something like 20 years old. Code you wrote then could still be running now.

Another DBMS_Scheduler Chain Rule Issue

Following on from a recent adventure with a problem validating DBMS_Scheduler chain rules, I hit another issue today.

A rule was defined with a step name that does not exist. This happened because there is an odd limit of chain step name lengths (27 bytes I think), and the name of the step in the define_step call was modified from STP_EXP_SEND_FILES_TO_PROBES to STP_EXP_SEND_FILES_TO_PR after the rule code was written to reference STP_EXP_SEND_FILES_TO_PROBES.

When the job was executed the scheduler made repeated efforts to start a job_subname of STP_EXP_SEND_FILES_TO_PROBES_0 (that’s a zero on the end) approximately every 0.005 second, with the job run details reporting a status of FAILED and additional_info of:

CHAIN_LOG_ID=”576839″,STEP_NAME=”STP_EXP_SEND_FILES_TO_PROBES”, ORA-01403: no data found.

The job had to be stopped using FORCE=TRUE, dropped and recreated.

When the rule was corrected the job ran correctly.

Of course this has left us with rather large SYS.SCHEDULER$_JOB_RUN_DETAILS, SYS.SCHEDULER$_EVENT_LOG and SYS.SCHEDULER$_INSTANCE_PK segments, which is not ideal. We’ll see if support will let us run a shrink space on them.

So, to the lessons …

  1. Do not rely on Oracle validating rule and step modifications.
  2. Watch the first execution of a job on which you have modified the rules like a hawk in case you forgot point 1.

Solving ORA-24172: rule set %s.%s has errors

DBMS_Scheduler chains are a joy to use, until they stop being a joy and start being a real pain.

I modified the logic of a process, dropping one stored procedure and replacing it with another (instead of writing out a list of files to a control file so that a cron job can scp them to a set of other servers, we’re now executing scp as a DBMS_Scheduler executable-type job).

Calling the new procedure instead of the old one required dropping the obsolete DBMS_Scheduler program, the chain step, and the associated rules, and creating a new program, step and rules.

The old rules, step and program were dropped, and the new program and step were created.

However when creating the first of the new rules the following error message was raised:

begin
 DBMS_SCHEDULER.DEFINE_CHAIN_RULE (
 chain_name              => 'EXPORT',
 condition               => 'STP_EXP_SEND_FILES_TO_PR SUCCEEDED',
 action                  => 'START STP_RM_OLD_PROBE_FILES',
 rule_name               => 'START_RM_OLD_PROBE_FILES'
 );
end;
/

ORA-24172: rule set NCBI_ETL.SCHED_RULESET$4 has errors
ORA-06512: at "SYS.DBMS_ISCHED", line 1286
ORA-06512: at "SYS.DBMS_SCHEDULER", line 1257
ORA-06512: at line 2
24172. 00000 -  "rule set %s.%s has errors"
*Cause:    The rule references variables not in the evaluation context.
*Action:   Modify the rule condition to remove illegal reference.

The error is another of those where the majority of Google hits are dumb listings of the error code and message, and the “cause” and “action”.

The syntax and names in the call to DBMS_Scheduler.Define_Chain_Rule were all validated.

I exported the rule set associated with the chain:

select dbms_metadata.get_ddl('PROCOBJ','SCHED_RULESET$4')
from dual;

This told me the name of the evaluation context, which I then exported with:

select dbms_metadata.get_ddl('PROCOBJ','SCHED_EV_CTX$6')
from dual;

On checking the evaluation context I found that the newly-added chain step was present:

vt(10) := sys.re$variable_type('"STP_EXP_SEND_FILES_TO_PR"','sys.scheduler$_job_step_type','dbms_scheduler.get_job_step_cf',NULL);

Mysterious stuff!

I flirted with the idea of dropping the chain entirely, after exporting the chain and steps with:

select dbms_metadata.get_ddl('PROCOBJ','EXPORT')
from dual;

… and the rules (why are they not included in the chain definition?) with:

select 'exec DBMS_SCHEDULER.DEFINE_CHAIN_RULE ('''||chain_name||''','''||condition||''','''||action||''','''||rule_name||''','''||comments||''');'
from user_scheduler_chain_rules
where chain_name = 'EXPORT';

However, when I exported the rule set I noticed that there was still a rule defined with the name of the step I had previously dropped — it was the “what to do if anything fails” rule, and listed all the steps of the chain except the one that sends the chain failure email.

I dropped and created that rule with reference to failure of the new rule instead of the old one, and the new replacement rule that had previously failed now created correctly.

What I Learned

It appears that a chain step can be dropped in 10.2.0.4 even if it is referenced by a rule, but this may leave an “orphan” rule that will cause an ORA-24172 to be raised when rules are defined later on. Well, there may be other or more complete explanations, but the error that is raised may not be complaining about the rule you are trying to create — there may be another at fault.

Generating A Gap-free Series Of Numbers — Not Always A Problem

Well, it’s happened again. Someone has asked how to implement a requirement to generate a gap-free series of numbers and a swarm of nay-sayers have descended on them to say (and here I paraphrase slightly) that this will kill system performance, that’s it’s rarely a valid requirement, that whoever wrote the requirement is an idiot blah blah blah.

As I point out on the thread, it is sometimes a genuine legal requirement to generate gap-free series of numbers. Invoice numbers for the 2,000,000+ organisations in the UK that are VAT (sales tax) registered have such a requirement, and the reason for this is rather obvious: that it makes it more difficult to hide the generation of revenue from tax authorities. I’ve seen comments that it is a requirement in Spain and Portugal, and I’d not be surprised if it was not a requirement in many other countries.

So, if we accept that it is a valid requirement, under what circumstances are gap-free series* of numbers a problem? Group-think would often have you believe that it always is, but in fact it is only a potential problem under very particular circumstances.

  1. The series of numbers must have no gaps.
  2. Multiple processes create the entities to which the number is associated (eg. invoices).
  3. The numbers must be generated at the time that the entity is created.

If all of these requirements must be met then you have a point of serialisation in your application, and we’ll discuss that in a moment.

First let’s talk about methods of implementing a series-of-numbers requirement if you can drop any one of those requirements.

If your series of numbers can have gaps (and you have multiple processes requiring instant generation of the number) then use an Oracle Sequence object. They are very high performance and the situations in which gaps can be expected have been very well discussed. It is not too challenging to minimise the amount of numbers skipped by making design efforts to minimise the chance of a process failure between generation of the number and commiting the transaction, if that is important.

If you do not have multiple processes creating the entities (and you need a gap-free series of numbers that must be instantly generated), as might be the case with the batch generation of invoices, then you already have a point of serialisation. That in itself may not be a problem, and may be an efficient way of performing the required operation. Generating the gap-free numbers is rather trivial in this case. You can read the current maximum value and apply an incrementing value to every entity with a number of techniques. For example if you are inserting a new batch of invoices into your invoice table from a temporary working table you might:

insert into
  invoices
    (
    invoice#,
    ...)
with curr as (
  select Coalesce(Max(invoice#)) max_invoice#
  from   invoices)
select
  curr.max_invoice#+rownum,
  ...
from
  tmp_invoice
  ...

Of course you would protect your process so that only one instance can run at a time (probably with DBMS_Lock), and protect the invoice# with a unique key contrainst, and probably check for missing values with separate code if you really, really care.

If you do not need instant generation of the numbers (but you need them gap-free and multiple processes generate the entities) then you can allow the entities to be generated and the transaction commited, and then leave generation of the number to a single batch job. An update on the entity table, or an insert into a separate table.

So if we need the trifecta of instant generation of a gap-free series of numbers by multiple processes? All we can do is to try to minimise the period of serialisation in the process, and I offer the following advice, and welcome any additional advice (or counter-advice of course).

  1. Store your current values in a dedicated table. DO NOT use a sequence.
  2. Ensure that all processes use the same code to generate new numbers by encapsulating it in a function or procedure.
  3. Serialise access to the number generator with DBMS_Lock, making sure that each series has it’s own dedicated lock.
  4. Hold the lock in the series generator until your entity creation transaction is complete by releasing the lock on commit (using release_on_lock=true in DBMS_Lock.request)
  5. Delay the generation of the number until the last possible moment.
  6. Consider the impact of an unexpected error after generating the number and before the commit is completed — will the application rollback gracefully and release the lock, or will it hold the lock on the series generator until the session disconnects later? Whatever method is used, if the transaction fails then the series number(s)  must be “returned to the pool”.
  7. Can you encapsulate the whole thing in a trigger on the entity’s table? Can you encapsulate it in a table or other API call that inserts the row and commits the insert automatically?

I’m sure that there is more to add, so feel free to comment.

Finally, when someone asks “How do I generate a gap-free sequence?” can we please not all jump on the Bad Requirement Bandwagon? By all means question the need for it, advise that sequence objects are inappropriate, point out the potential problems, but please keep an open mind — sometimes a requirement really is a requirement.

* I can’t believe I actually looked up the plural of “series”, but I did. It’s “series”.

Working Around an Internal Error Based on Source Mnemonics

My frenzied hacking through the unexplored territory of Expression Filters was derailed (to mix my metaphors) by an internal error when using the Evaluate() function in a rather complex query that uses an inline view, analytic functions etc..

The following was extracted from the alert log:

ORA-07445: exception  encountered: core dump [kkqsCkLegalEqvExpCB()+199] [SIGSEGV] [Address not mapped  to object] [0x000000000] [] []

The mnemonic beginning “kkqs” in the first parameter apparantly indicates a fault raised from the query rewrite source module. Because query rewrite is not helpful to the execution of the query I coded an alter session to set query_rewrite_enabled to false, and executed the problem query again.

Success.

Interestingly, adding a NO_REWRITE hint to the query did not prevent the error occurring. A shame, because I would have preferred that approach.

According to the website query rewrite problems might also be indicated by any of the codes kkqg, kkqs, kkqs1, kkqs2, kkqs3, kkqu, kkqv, and kkqw, so disabling query rewrite might also be a good workaround for internal errors that reference mnemonics beginning with any of those strings.

It also prompts the thought that in some cases it might be possible to get guidance on working around internal erros based on the mnemonic. For example an error in “kkfd” probably indicates a parallel query fault, and the remediation might be to disable parallel query. “kxtr” … trigger problem?

And there I was thinking that all of these kernel code things were for nerds.

A Cause of ORA-38435 in Evaluate()

I’m part way through an implementation of some ELT code that uses expression filters to parse records and assign zero-to-many “inboxes” to each row, and as is often the case with the less well-explored parts of the system it often feels like I’m hacking through the undergrowth in unknown territory.

Under such conditions it’s not surprising to hit an error that few others seem to have hit before, and for which the majority of Google hits are for that irritating type of site that merely lists all of the Oracle errors in the documentation whether or not they have information on how to fix them.

Thus it is with “ORA-38435: missing elementary attribute value or invalid name-value pairs”.

Here’s how I found the root cause of the error in my case.

It was thrown when running a SELECT statement with the EVALUATE condition. The code looks something like this (slightly obfuscated for no real reason other than paranoia):

select
 i.inbox_name,
 D.*
from
 INBOX_EXPRESSIONS i,
 DATA_TABLE D
where
 evaluate (
   I.INBOX_EXPRESSION,
     My_Event(
       D.SERVICE,
       D.FILTER,
       ... blah blah blah ...
       D.SUMMARY,
       D.IDENTIFIER).getVarChar()
 )= 1
/

Firstly, the error was thrown in a predictable fashion after something like the 450th row of the result set had been returned. I re-executed it through SQL*Plus with an arraysize of 1, slightly modified to just return the ROWID of DATA_TABLE.

This result set showed ascending values of ROWID, which agreed with the execution plan that showed a full table scan of DATA_TABLE.

The error was thrown after the ROWID “AABonGAAJAAH1UeAAA” was returned.

I modified the query to something like:

select
 i.inbox_name,
 D.ROWID
from
 INBOX_EXPRESSIONS i,
 (SELECT * FROM DATA_TABLE WHERE ROWID = 'AABonGAAJAAH1UeAAA') D
where
 evaluate (
   I.INBOX_EXPRESSION,
     My_Event(
       D.SERVICE,
       D.FILTER,
       ... blah blah blah ...
       D.SUMMARY,
       D.IDENTIFIER).getVarChar()
 )= 1
/

This returned a correct row.

I then incremented the ROWID to ‘AABonGAAJAAH1UeAAB’ and re-executed, getting 0 rows. I kept incrementing until I reached ‘AABonGAAJAAH1UeAAF’ at which point the error was thrown.

Ah hah.

From the error stack it looked as if the error was raised by Evaluate, rather than the function My_Event.GatVarChar(), so I tested that by including the call to My_Event.GetVarchar() in a query against the columns of DATA_TABLE, and to see what it was returning for each row. Hopefully I’d spot something that looked like bad syntax on the error-raising row that might trip up the Evaluate() function.

And there it was:

SERVICE=>to_number('0'),FILTER=>'abc', ... ,SUMMARY=>'System's IPs are not responding.',...'

The inclusion of a single quote in the value of the column SUMMARY caused the GetVarchar function to construct an invalid string that was then passed to Evaluate().

The fix was fairly straightforward of course — replace every occurrence of a single single-quote with two single-quotes in the string columns.

select
 i.inbox_name,
 D.*
from
 INBOX_EXPRESSIONS i,
 DATA_TABLE D
where
 evaluate (
   I.INBOX_EXPRESSION,
     My_Event(
       D.SERVICE,
       Replace(D.FILTER,'''',''''''),
       ... blah blah blah ...
       Replace(D.SUMMARY,'''',''''''),
       Replace(D.IDENTIFIER,'''','''''')).getVarChar()
 )= 1
/

I’d not be surprised if there were other potential causes.

And by the way the handsome list of single-quotes reminds me of this:

James, while John had had “had”, had had “had had”; “had had” had had a better effect on the teacher

My_Event.GetVarchar()

Serious Space Savings

I knew this would happen before I even tried it.

I Request: “The SYSAUX tablespace is 500MB and it’s nearly full. The contents are mostly the default 30 days worth of DBMS_Scheduler logs. Can you increase it to 2GB?

A DBA Responds: “I’ve increased it to one GB for now“.

An entire gigabyte of disk space saved. Well done.

Now why didn’t I ask for 4GB?