more on peoplesoft
As I mentioned before, we're in the middle of a major rollout of new functionality on our payroll systems. Mostly to do with finally letting folks access their own information as well as letting managers preview and better plan for annual and other leave, reviews, promotions, new employees, etcetc.
In simple words: self-service.
A not small part of all this is to ensure we can actually cover the performance and service levels expected of such a roll-out. Hence the focus on performance and improving the current process execution time.
So out came our Loadrunner software and on went some very heavy and complex load mixes of up to 200 users at low wait times, ramping up to 500 and over later on.
Highly recommended! Particularly if one wants to have any significant values out of dbms_stats.gather_system_stats()!
Like I mentioned previously, one of the major aspects of all this is to let the payroll folks be as pre-emptive as can be with pay calculations. Something not easily done due to the many problems with that function in Peoplesoft.
In the previous post I indicated where some of these problems can be found in the customer connection site: go there to find more details in various combinations of releases.
Well, I'm glad to report we have most of these under control. There are still one or two abnormal conditions that can cause blow-outs in execution time but I feel confident the approach we are following is sound and will produce a sensible result.
One of the problems we have is that with 10.2.0.2 Oracle and 8.48.10 Peopletools, we had to disable a lot of the functionality of the optimizer in order to get reliable results in our SQL. This is done by inclusion of so-called "undocumented" initialization parameters - usually the ones prefixed by an underscore "_".
Now, let me be very clear here: the ones we added, we did so BY SPECIFIC recommendation of Oracle Support! There!, before I get the usual cacophony about "thou shalt not use undocumented parameters". Hey, we were told by Oracle to USE these ones!
While I'm here, is it a good idea to use this type of parameter unless under duress? I am with the experts on this, 100% of the way: hate to have to use them.
Why? Well, first of all: there is a reason why they are mostly undocumented and "hidden". Mostly to do with the fact that they usually disable major sections of the inherent Oracle db functionality.
This can in turn have secondary effects on other pieces of functionality! And so on, the roller-coaster goes.
Second: when one upgrades to a later version or release, which of these parameters should then be taken away?
Yes, these, and many other similar problems.
But, in this case we were told by Support to use them, so we had to.
Of course, one of the side effects of the ones we use is that large tracts of the advanced CBO functionalitty of 10.2.0.2 were disabled!
With the result that our pay calcs became wildly unstable in performance. Mostly to do with the awfull SQL locked away in the pay calc Cobol program...
INSERT INTO ps_gp_rto_trgr_wrk
(emplid, country, trgr_event_id, trgr_effdt, trgr_create_ts, cal_run_id)
(SELECT DISTINCT a.emplid, c.country, a.trgr_event_id, a.trgr_effdt,
FROM ps_gp_rto_trgr a, ps_job b, ps_gp_cal_run_dtl c
WHERE a.cal_run_id = :1
AND b.emplid BETWEEN :2 AND :3
AND b.emplid = a.emplid
AND b.gp_paygroup = c.gp_paygroup
AND (b.pay_system_flg = 'GP'
OR b.absence_system_cd = 'AM')
AND b.empl_status IN ('A', 'P', 'Q', 'U')
AND b.effseq = (SELECT max(b1.effseq)
FROM ps_job b1
!-> WHERE b1.emplid BETWEEN :4 AND :5
!-> AND b1.emplid = b.emplid
AND b1.empl_rcd = b.empl_rcd
AND b1.effdt = b.effdt)
AND b.effdt <= c.prd_end_dt
AND a.trgr_status = 'I'
AND a.emplid BETWEEN :6 AND :7
AND a.country = :8
AND ('N' = :9
OR a.emplid IN (SELECT emplid
WHERE run_cntl_id = :10
AND oprid = :11))
AND ('A' = :12
OR 'S' = :13
AND a.emplid IN (SELECT emplid
FROM ps_gp_iter_trgr c
WHERE c.cal_run_id = a.cal_run_id
AND c.iter_trgr_status = 'U'
AND c.emplid BETWEEN :14 AND :15))
AND NOT EXISTS (SELECT 'X'
FROM ps_gp_rto_trg_wrk1 e
!-> WHERE e.emplid = a.emplid
!-> AND e.emplid BETWEEN :16 AND :17
AND e.trgr_event_id = a.trgr_event_id
AND e.trgr_effdt = a.trgr_effdt
AND e.trgr_create_ts = a.trgr_create_ts
AND e.country = c.country))
See those little "!->" flags I put at the start of those 4 lines? Yeah, that is the problem. Now, let me try to explain what is going on.
Those subqueries are what is called "co-related". IOW: for each row of the main query, the subquery will be executed and take the CURRENT and UNIQUE values of the columns from the main query and use those for its predicates. This type of query has been used in Oracle and SQL for decades now, nothing new here.
What is new is the BETWEEN predicate added by the nice Peoplesoft folks: it is completely and utterly unnecessary as the unique value for the CURRENT emplid is ALREADY provided by the main query! No need for a range scan!
Peoplesoft has an index to facilitate that subquery that looks like this:
1 ASC EMPLID
2 ASC EMPL_RCD
3 DESC SYS_NC00163$
4 DESC SYS_NC00164$
and the two DESC columns are EFFDT and EFFSEQ, of course.
All that BETWEEN predicate is doing is confusing the CBO out of its normal path. Because the column in the BETWEEN is the leading column of a concatenated index. And given that our CBO is already brain-damaged with about half a dozen "_" parameters suggested by Support, it doesn't take much for it to go South!
Indeed that was the case. We'd have this thing cached in SGA for a while and running fine, then all of a sudden the next execution would take 30 times longer to complete!
From there onwards, it'd stay bad until we did a flush of the shared_pool, thereby forcing all future statements to do a re-parse and recalculate the execution plan. Then things would go back to normal for a while, with luck. After a few hours, BANG again...
Jonathan was kind enough to help me look at this a few days ago and the pointers and ideas in his site for further investigation were most helpful. Thanks a million guv, if you happen to read this: I owe you a drink of your choice, next you're in Sydney!
As often happens, Tom Kyte had a recent and most relevant couple of entries in his blog that are also applicable here. Do yourself a favour and read both entries and the comments.
Basically, what all pointed out is that the plan as shown for the statement WITH the bind variables is different from the plan as executed WHEN the values for the bind variables are actually given.
And different again at some later stage when other sets of variable values were used.
Good old bind-variable peeking, as Gary alluded to in a comment to my last entry. Normally, not a problem at all. Unless the values being picked cause my handicapped CBO to go South with a redundant BETWEEN predicate, that is!
And indeed that was the problem here. With some combos of bind variables we got a horrible plan that took the BETWEEN predicate as the access path instead of the filter path. That basically stopped the selectivity of the index on the first column of the concatenation, instead of using all columns.
The symptom was a huge - 400000000!- bump in the BUFFER_GETS, caused by all the index range scanning taking place for every row in the main query.
Sometimes we'd be lucky enough to get the first execution to pick a good plan and it'd stay in memory until flushed out by normal load patterns. Then we'd get a bad plan on the next execution - or good, depending on which combination of bind values would cause the CBO to favour the wrong predicate to look at the index.
Turns out this SQL uses one of those WRK tables I mentioned in the previous post. So what I did to prevent this problem was really very simple: add that table to the ones that I clean the stats from and lock away from further gather_table_stats().
That way, this plan is re-examined by the dynamic sampling every single time this single SQL is executed. As it happens, the plan picked this way is invariably one that involves a full scan of the PS_GP_RTO_TRGR table, which in turn is the one that causes the BETWEEN predicate to be ignored and all the other index columns to be used for the access path.
"Overhead of the re-parsing!", I hear you complain? In the grand scheme of things it's not really that bad: about 100 execs a day. Enough to cause trouble if each execution time is 30 minutes, but not enough to cause trouble if I force a re-parse for every execution and it finishes in seconds!
And lo and behold, our pay calc is now an example of good behaviour: longest monthly run for all employees is 7 minutes, as opposed to the previous erratic executions of up to 1 hour!
Anyways, on to lighter things.
Quick peek at one of the locals:
There are two of them nesting in the pond across the park. Amazingly graceful creatures!
These little fellas always crack me up with their rowdy antics:
They are indeed gorgeous, happy birds.
Samuel at the recent NSW regionals, smoking everyone else by about 10 metres in the 4th 100M heat:
He ended up 4th in the finals. Dang, at 12 years of age he's obliterated my best ever 100m time: I feel really old now...