Generating A Gap-free Series Of Numbers — Not Always A Problem

Well, it’s happened again. Someone has asked how to implement a requirement to generate a gap-free series of numbers and a swarm of nay-sayers have descended on them to say (and here I paraphrase slightly) that this will kill system performance, that’s it’s rarely a valid requirement, that whoever wrote the requirement is an idiot blah blah blah.

As I point out on the thread, it is sometimes a genuine legal requirement to generate gap-free series of numbers. Invoice numbers for the 2,000,000+ organisations in the UK that are VAT (sales tax) registered have such a requirement, and the reason for this is rather obvious: that it makes it more difficult to hide the generation of revenue from tax authorities. I’ve seen comments that it is a requirement in Spain and Portugal, and I’d not be surprised if it was not a requirement in many other countries.

So, if we accept that it is a valid requirement, under what circumstances are gap-free series* of numbers a problem? Group-think would often have you believe that it always is, but in fact it is only a potential problem under very particular circumstances.

  1. The series of numbers must have no gaps.
  2. Multiple processes create the entities to which the number is associated (eg. invoices).
  3. The numbers must be generated at the time that the entity is created.

If all of these requirements must be met then you have a point of serialisation in your application, and we’ll discuss that in a moment.

First let’s talk about methods of implementing a series-of-numbers requirement if you can drop any one of those requirements.

If your series of numbers can have gaps (and you have multiple processes requiring instant generation of the number) then use an Oracle Sequence object. They are very high performance and the situations in which gaps can be expected have been very well discussed. It is not too challenging to minimise the amount of numbers skipped by making design efforts to minimise the chance of a process failure between generation of the number and commiting the transaction, if that is important.

If you do not have multiple processes creating the entities (and you need a gap-free series of numbers that must be instantly generated), as might be the case with the batch generation of invoices, then you already have a point of serialisation. That in itself may not be a problem, and may be an efficient way of performing the required operation. Generating the gap-free numbers is rather trivial in this case. You can read the current maximum value and apply an incrementing value to every entity with a number of techniques. For example if you are inserting a new batch of invoices into your invoice table from a temporary working table you might:

insert into
  invoices
    (
    invoice#,
    ...)
with curr as (
  select Coalesce(Max(invoice#)) max_invoice#
  from   invoices)
select
  curr.max_invoice#+rownum,
  ...
from
  tmp_invoice
  ...

Of course you would protect your process so that only one instance can run at a time (probably with DBMS_Lock), and protect the invoice# with a unique key contrainst, and probably check for missing values with separate code if you really, really care.

If you do not need instant generation of the numbers (but you need them gap-free and multiple processes generate the entities) then you can allow the entities to be generated and the transaction commited, and then leave generation of the number to a single batch job. An update on the entity table, or an insert into a separate table.

So if we need the trifecta of instant generation of a gap-free series of numbers by multiple processes? All we can do is to try to minimise the period of serialisation in the process, and I offer the following advice, and welcome any additional advice (or counter-advice of course).

  1. Store your current values in a dedicated table. DO NOT use a sequence.
  2. Ensure that all processes use the same code to generate new numbers by encapsulating it in a function or procedure.
  3. Serialise access to the number generator with DBMS_Lock, making sure that each series has it’s own dedicated lock.
  4. Hold the lock in the series generator until your entity creation transaction is complete by releasing the lock on commit (using release_on_lock=true in DBMS_Lock.request)
  5. Delay the generation of the number until the last possible moment.
  6. Consider the impact of an unexpected error after generating the number and before the commit is completed — will the application rollback gracefully and release the lock, or will it hold the lock on the series generator until the session disconnects later? Whatever method is used, if the transaction fails then the series number(s)  must be “returned to the pool”.
  7. Can you encapsulate the whole thing in a trigger on the entity’s table? Can you encapsulate it in a table or other API call that inserts the row and commits the insert automatically?

I’m sure that there is more to add, so feel free to comment.

Finally, when someone asks “How do I generate a gap-free sequence?” can we please not all jump on the Bad Requirement Bandwagon? By all means question the need for it, advise that sequence objects are inappropriate, point out the potential problems, but please keep an open mind — sometimes a requirement really is a requirement.

* I can’t believe I actually looked up the plural of “series”, but I did. It’s “series”.

About these ads

12 thoughts on “Generating A Gap-free Series Of Numbers — Not Always A Problem

  1. David,

    Many thanks for this post. I was waiting to see someone to see other side of this problem. It seems most of this bashing of the requirement for gap-free sequence numbers is originated from a certain Oracle Expert (who was a VP in Oracle :) ). This is the problem when people start following somebody because of his/her reputation.

    p.s. BTW, I have utmost respect for that “Oracle Expert” but as he himself says “Never say NEVER, never say ALWAYS”.

    • I think that the problem is more with the followers than the followee — the instinct should always be to challenge the need for a gap-free syntax since I don’t doubt that there are a greate many the are specified unneccessarily. The points I make are just that (i) they are sometimes truly required, and (ii) they need not always be a disaster.

  2. Is it necessary to use dbms_lock?
    Oracle locking mechanism should be sufficient, I think. Or am I missing something?

    We could have table per “sequence”, or row per “sequence”:


    create table my_sequences(
    [sequence_name varchar2(x) primary key,]
    nextval number not null)
    [organization index];

    insert into my_sequences(['my_seq',] 0);

    update my_sequence s
    set s.nextval = s.nextval + 1
    [where s.sequence_name = :x]
    returning s.nextval into :my_seq_nextval;

    this update will serialize concurrent transactions (and increases a probability of deadlock, of course, if the whole algorithm not implemented correctly).

    • and maybe we could use

      select next_val
      from my_sequences [where sequence_name = :x]
      for update

      at the beginning of each such transaction, to serialize access to the “sequence”..

      And one more thought :
      For some people oracle advanced queuing may solve the problem (if they can defer id generation)
      We can defer id generation to some asynchronous process.

      Pseudocode:


      create table some_table(gap_non_free_id number, gap_free_id number, ...);

      procedure insert_row(....) is
      x some_table%type;
      begin
      insert into some_table(gap_non_free_id_seq.nextval, null, ...) returning gap_non_free_id into x;
      dbms_aq.enqueue(...., x);
      end;

      -- just pseudocode:
      procedure assing_id()
      begin
      dbms_aq.dequeue(all_ids_in_queue)
      --bulk_assing_of_gap_free_ids:
      -- we could generate ids into colection indexed by id from queue and update table with forall
      forall indeces in all_id_in_queue
      update some_table
      set gap_free_id = all_ids_in_queue(id)
      where gap_non_free_id = id
      end;

      Or another solution for deferred generation (without AQ)


      create index index_only_not_assigned on some_table(case when gap_free_id is null then 1 else null end);

      select next_val into :x from my_sequence;

      update some_table s
      set s.gap_free_id = :x + rownum
      where
      case when gap_free_id is null then 1 else null end = 1;

      :affected := sql%rowcount;

      update my_sequence
      set nextval=nextval+:affected;

      • Yes, there are a few different ways of managing the serialisation. I have a fondness for using explicit methodologies like DBMS_Lock rather than implicit ones such as getting a row-level lock — just a matter of preference. One might do both.

  3. It happens in Brazil too. For a sequence of invoice numbers we can not have a gap. Maybe it will change with digital invoice, but I am not sure.

    We have been using a table with the current number and a function to get the next number. Every part of the system that needs a new invoice uses this function, and this function provides the serialization. Of course, we need the serialization. :)

    Also, we use SELECT XXX FROM CONTROL_SEQUENCE_NUMBER WHERE SEQUENCE_ID = YYY FOR UPDATE to block the line of the table. We do not use dbms_lock, since the FOR UPDATE works fine. Using this approach we can have a lot of gap-free sequences, individualized and serialized. As you can imagine, the invoice stuff is not the only requirement for a gap-free sequence.

    BTW, I would not use a trigger for that. Since a trigger can fire twice, I would not take that risk when generating a gap-free sequence.

    Heitor

  4. I’ve seen this requirement a couple of times. One factor to consider is whether the transaction lasts long enough and occurs frequently enough to be worth ‘worrying’ about. The application I dealt with related to container ships entering ports. Container ships are pretty big things, and you can only move so many of them around at a time. With a transactions lasting just a couple of seconds, I wasn’t in the least concerned about (5).

    • Well it’s not even the duration of the complete transaction I’d say, Gary, unless by “transaction” you mean just that part of the operation that would serialise. I do think that generation of the number should be deferred to be the last part of a transaction that takes an hour, and if the generation only takes a couple of milliseconds then that’s probably not a problem.

      Shipping was my first career, by the way — two years of study for an HNC in Marine engineering, then engineering cadet on a chemical gas carrier. In retrospect, those hours monitoring anhydrous ammonia loading or scraping burned diesel fuel from inside an engine really put the problems of office life into perspective.

  5. Well it would have to be rather close to the exact end of the overall business transaction since any rollback would leave you with that gap… your rule #6 makes any “Last value table” version a non-starter.

    It’s the group-think that’s the killer. I never saw anyone ask, how many thousands of calls per second to this function. Premature-optimization is such a killer. I have guys spend weeks designing, no hyper-designing a table with just the perfect indexes, datatypes, constraints… and they show it to me with pride and I ask, how many thousands of queries per second they estimate on this table and they tell me, oh no, it’s not high volume, we use it once a week to kick of a 17 hour batch job. Well Good’on’ya mate, nice job!

    When I answer questions on Stack Overflow, I wish there was a checkbox for either

    [] I’m either an enthusiast programmer making a recipe database
    [] I need the full-throttle, big-boy, enterprise-class answer cuz I’m gonna pound the cookies out of this system.

    I see lots of questions in the first category and lots of answers that require a level of sophistication that even the answerer isn’t prepared to bring to bare.

    It seems a silly way to detect unpaid taxes. Have you ever checked-out at a store and there’s a mistake made – maybe the same item twice… and to delete the extra item a manager comes over with a special key to unlock the function to delete it? So I do the same thing in my system. I create the invoice, get the sequential, super-special number, give the merch, take the ducats and then void the invoice.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s