Measuring Off Shelf Alerting Performance

My model is better than yours.  Or is it?  How do you really know?  When a company says that their tool is “more accurate” than others what do they mean?   If they can get 90% accuracy, is that good? 

There is no standard for how to measure the value of an off-shelf-alerting tool so it’s easy to be misled.  Furthermore, there is no single measure that will do everything you want it to.  We need at least two measures and I would suggest a few more.

Let’s start with “accuracy”:  the proportion of audited alerts that were found to be correct.

  • Remember that an off-shelf alerting tool cannot create an alert until a significant period of no sales has elapsed. It will never capture all or even most off-shelf events.  It should capture the more persistent (and costly) ones overlooked my routine store operations.  So, I’m not going to worry about the accuracy what is not flagged and assumed to be on shelf, these tools will not get that right.
  • Also, you should use the most recent alerts available when conducting an audit: if your field organization did not conduct the audit using the most recent alerts, you are not measuring the accuracy of the system but some combination of accuracy and data degradation.
  • Audits must have consistent rules as to what constitutes “on shelf and available for sale”.  To my mind it’s only on-shelf if:
    • The product is on the correct shelf
    • The product is visible (not hidden behind other products or at the back of high/low shelves)
    • The product is within reach of the average shopper.
    • The product is not damaged.

Accuracy is a useful metric, but it is not enough. 

Consider an OSA system set up so that it generate alerts for product-store combinations that normally sell 10 or more units a week but have sold nothing for at least a month.  Over that month "what you should have sold" is way beyond our basic rule of thumb threshold for an alert and such a system would generate very, very few alerts .  However, when it did fire an alert, you can be almost certain there was an issue.  Accuracy would probably be close to 100% but with almost no alerts we will capture very little value.  We need at least 1 balancing metric.

Ideally we want to know what proportion of the actual off-shelf positions we are capturing, but unless you have a very good OSA measurement scheme in place (more on that later) you just don’t have that information available.  I suggest a proxy:

Scope: the proportion of item-store combinations examined that yield an alert.

This is not quite as good but at least we have the data readily available.  If we test 10 million item-store combinations and generate 300,000 alerts, 3% scope.  Not bad.

(Scope is not sufficient by itself either of course, imagine a system that flags all item-store combinations as off-shelf every day .  Scope is now 100%, but with almost zero accuracy. )

The following chart shows performance of 3 OSA systems (X,Y and Z)

System Y is clearly the most accurate, 5 points better than the next best, system X.  However it also has the least scope, only 0.5% of all points of distribution (item-store combinations) get alerts.  In comparison, system Z generates 4 times as many alerts (and perhaps 4 times the value) with only 6 points loss in accuracy.

A great tool will let you trade-off scope for accuracy.  Reduce the confidence level used to generate alerts and accuracy will go down, but scope will increase.  Increase the confidence level, accuracy will increase but scope goes down.  You can flex this to gain trust in the system in initial rollout (thru higher accuracy) then reduce accuracy and increase scope to get more value once trust is established.

Off Shelf Alert Accuracy balanced by scope is a great start to look at performance, but we really want to look at value and ROI, more on this soon.

This post is the ninth in a series on On-Shelf Availability.