Visualizing Forecast Accuracy. When not to use the "start at zero" rule ?

I recently joined a discussion on Kaiser Fung's blog Junk Charts ,  When to use the start-at-zero rule,  concerning when charts should force a 0 into the Y-axis.  BTW - If you have not done so, add his blog to your RSS feed, it's superb and I have become a frequent visitor.

On this particular post, I would completely agree with his thoughts was it not for this one metric I have problems visualizing, Forecast Accuracy.   Forecast Accuracy is a very, very widely used sales-forecasting metric that is based on a statistical one, so let's start there.  

The statistical metric (Mean Absolute Percentage Error) looks at the average absolute forecast error as a percentage of actual sales.    Some of the errors will be positive and some negative but by taking the absolute value we lose the sign and just look at the magnitude of error.  (We handle optimism or pessimism in the forecast with a different "bias" metric).  

There is occasionally heated discussion in the sales forecasting community about exactly how this should be calculated but let's save that for another day as all forms I am familiar with have the same properties with regard to plotting results.

  • perfect forecasts would have no error and return 0% MAPE, this is our base.
  • there is no effective upper bound on the metric (you can have 400% MAPE or more)

If we were to look at this across a range of product groups (A thru K) it might look something like this.  The Y-axis is forced to start at 0 and the length of the bars have meaning, Product D really does have almost twice the error rate of product A.  This plots out very nicely, it's hard to misunderstand  and the start-at-zero rule certainly does apply.

Now convert MAPE into a Forecast Accuracy with this simple calculation.

Forecast Accuracy = 1 - MAPE

I can only assume this metric was created in the sense of "bigger numbers are better".  It's in widespread use, it's part of the business forecasting language, and no, I can't change it.  Perfect forecasts are now at 100% and there is no lower bound on the metric, it can easily be negative.

This causes me a problem.  Check out the chart below: this is the same data as before but now expressed as Forecast Accuracy rather than MAPE in a standard Excel chart.  Excel is trying to help (bless it) and put the 0 value in without my help.  Work in supply-chain and you will see a lot of these.

The zero value has no special meaning on this metric, so starting at 0 is very misleading:  80% accuracy (20% MAPE) is not twice as good as 40% accuracy  (60% MAPE).

Allowing the minimum of the y-axis to float  does not solve this either (below)

I really don't know what this is trying to tell me... some product groups are better than others perhaps ?  Certainly, relative size is meaningless.

"Abandon it" you say  "go to a line chart".  Line charts often have floating axes and they do not emphasize relative size nearly as much as a bar-chart does (below).

Perhaps it's less confusing/misleading than the previous charts but I still don't like it. because there is data I want to compare relative sizes for (the MAPE) and line-charts seem most useful when trying to show patterns.  I have no reason to expect a useful pattern to form from product categories: I just sorted then alphabetically.

My thanks to the contributors on Junk Charts for helping me clarify my thinking on this.  I don't know that there is a great answer but as it's one I run into all the time I do want to find a better solution.  (FYI - It's just hit me that there are another set of supply-chain metrics for order fill-rates than have the exact same problem)

How about forcing the upper limit on the Y-axis to 100% and letting the lower limit float? I am trying to emphasize the negative space between the top of the bar and 100%, essentially the error rate.

I'm not entirely happy though, those heavy bars do draw the eye, and you would have to educate the user to read the negative space.  How about a dot-plot instead ?

You would still have to learn how to read it properly ...

Or how about this?  Inspiration or desperation?  I'm now plotting the bars down from the 100% mark, emphasizing MAPE while still using the Forecast Accuracy scale.  I'm not entirely sure yet, but I think I like it and if I generalize the "start at 0" idea to "start at base" it may even fit the rule.

What do you think?  Which version best handles the compromise between a user's desire to see the metric they know and my desire to show them relative error rates?  Have you a better idea?  I would love to hear it - this one really bugs me !  Can you think of any other examples of metrics where 0 is meaningless?

Data Visualization - are pie-charts evil ?

I'll be speaking next week at the Supply Chain Management Conference at the University of Arkansas on how data-visualization enables action.   

Good visualization is easy. Unfortunately, building bad visualizations that are hard to use, easy to misunderstand and that obscure and distort the data are even easier - many analysts can do it without trying.

In honor of the event, I'm resurrecting a post I created a couple of years ago "Are pie charts evil or just misunderstood".  I wrote this around the time I was moving away from a trial and error approach  (and 20 years of trial and error effort does get you cleaner visuals) to attempting to understand why some visuals so clearly work better than others.  

It turns out that there are some great frameworks to help in building better visuals.  Join me next week and we'll talk about human graphical perception, chart junk and non-data ink.

Enjoy !

Data Visualization - enabling action

I'll be speaking next week at the Supply Chain Management Research Center Conference at the University of Arkansas on how data-visualization enables action.

The basic premise (and one I firmly believe) is that the hardest part of any analytic project is not defining the problem, doing the analytics or finding the "solution", it's enabling action.

Far too many otherwise excellent analytic projects, tools and reports go unused because the results are presented in a way that is somewhere between difficult-to-understand and incomprehensible.

Managers typically do not have the time to just figure it out or double check their understanding, or re-work the results to something they can work with.

By making your analytics easy to consume (through good visualization practice) you make it possible for decision-makers to find what is important, understand it correctly and make good decisions, quickly.

Frankly many analytics providers don't try very hard to make their results easy to consume and their outputs are confusing, hard to use, easy to misunderstand and a long, long way from enabling decisions.

For those that do try, there is a tension between making things look "cool" or "interesting" and having them function well. Ideally we want both, but very few examples deliver well on both fronts. Indeed, a lot of the attempts to provide interest seem to be designed to obfuscate or distort meaning.

Here are some examples I plucked from a leading visualization vendor's web site. Each and every one of these charts is difficult to read because of limitations in our visual perception. We'll talk more about that in the conference next week.

And trying to make charts more interesting/attractive/eye-catching typically makes things worse. This "Funnel Chart" (below) is hilarious !. It's being terribly misused and gets almost everything wrong. I defy use to use this and make sensible decisions.

  • Color serves no purpose
  • It's very unclear whether values are represented by length, area or volume (thank goodness they included numbers)
  • The top value is (visually) about 100 times bigger than the bottom one but actually less than 5 times bigger in value.
  • I need another legend to tell me where all these regions are
  • Why, exactly, is it a funnel ? What does that imply? The NorthEast feeds the South which feeds into Central...
  • It has no contextual information. Perhaps Northwest is the smallest because that is our smallest market ?

Here's an example we will be working with in the conference . It's very hard to read, slow to use, easy to make mistakes with and distinctly over-dressed.

And exactly the same data once it's been stripped bare (below). It's now easy/quick to read, practically error-proof, has no distracting "chart junk" and has contextual data (budget) to understand what "good" is.

My interest in visualization is in enabling action from my analytic work. As a consultant, you may think that I get paid whether a client implements my work or not. That may be true, but I like to get paid more than once by the same client.

If you're going to be at the conference next week, drop by and see me: Supply Chain, Analytics and Visualization are among my favorite discussion topics.

I'll be posting more on this over the next few months but if you're looking for more right now, here are some excellent resources:

Stephew Few's blog, Visual Business Intelligence

Kaiser Fung's blog, Junk Charts

Nathan Yau's blog, Flowing Data

Business Analytics - finding the balance between complexity and readability

In this blog I try to present analytic material for a non-analytic audience.  I focus on point of sale and supply chain analytics: it's a complex area and frankly, it's far too easy whether writing for a blog or presenting to a management-team to slip into the same language I would use with an expert.  

So, I was inspired by a recent post on Nathan Yau's excellent blog 

FlowingData

 to look at the "readability" of my own posts and apply some simple analytics to the results.

I've followed Nathan's blog for a couple of years now for the many and varied examples of data-visualization he builds and gathers from other sources. One that particularly caught  my eye was this one published by the  Guardian just before the recent State of the Union address in the United States (click to enlarge).

The Guardianplotted the Flesch-Kincaid grade levels for past addresses. Each circle represents a state of the union and is sized by the number of words used. Color is used to provide separation between presidents. For example, Obama's state of the union last year was around the eighth-grade level, and in contrast, James Madison's 1815 address had a reading level of 25.3.

Neither the original post nor Nathan's go into much detail around why the linguistic standard has declined.  Within this period, the nature of the address and the intended audience has certainly changed.   Frankly, having scanned a few of the earlier addresses I think we can all be grateful not to be on the receiving end of one of them.

 So, 

I was inspired to find out the reading level of my own blog

.  It's intended to present analytic concepts to a non-analytic audience.  I can probably go a little higher than recent presidential addresses (8th-10th grades, roughly ages 13-15) but I don't want to be writing college-level material either.

All the books my kids read are graded in this (or a very similar) way but I had never thought about how such a grading system could be constructed.   The

Flesch-Kincaid

grade level estimate is based on a simple formula:


0.39 \left ( \frac{\mbox{total words}}{\mbox{total sentences}} \right ) + 11.8 \left ( \frac{\mbox{total syllables}}{\mbox{total words}} \right ) - 15.59

That's just a linear combination of : 

  • average words per sentence;
  • average syllables per word
  • a constant term.

In fact (though I have not yet  found details of how it was constructed) it looks to be the result of a regression model.  (Simple) data science in action from the 1970's.

Note that Flesch-Kincaid says nothing about the length of the book or the nature of the vocabulary it's all down to long sentences and the presence of multi-syllabic words.

(BTW - the preceding sentence has a Flesch-Kincaid grade score of 

13.63,

calculated with this online

utility

).  Now that's pretty high, worthy of an early 1900's president and (supposedly) understandable by young college students.    The sentence is longer than typical; 31 words vs. my average of 18 (see below) and words like "vocabulary", "sentences" and "multi-syllabic" are not helping me either.

Approach

I could have used copy/paste into the online utility I used above, recorded the results in a spreadsheet and pulled some stats from that. That would work, but if I ever want to repeat the exercise or modify it, perhaps to use a different readability index, I must do all that work again.   At the time of writing, there are currently 44 published posts on this blog - there must be a better way.

Actually there are probably many better ways but as I also wanted to flex some

R

-programming muscle I built a web-scraper in R to do the work for me and analyze the results (more on this in a later post).

Results

Let's start with some simple summaries of the results I collected.

Histograms showing the % of posts from this blog (prior to 2/14/13)

, the average (mean) value shown in red.

There is some variety in the grade reading level indicated by Flesch-Kincaid for my blog posts, averaging around 10 but ranging from 7 through 14.  I average about 750 words, but occasionally go much longer and have a number of very short "announcement" style posts.  Average words per sentence of 18.

OK, so now I know, but is that good?  I don't know that I have a definitive source but according to at least one 

source

  the target range on  Flesch-Kincaid for Techical or Industry readers is 7-12, so I'm feeling pretty good about that.

I did wonder whether there was any other, hidden, structure to the data though.  I know the equation is based on words per sentence and syllables per word so there is no point looking at those, obviously I'll find a relationship.   But is my writing style influenced by anything else?

Flesch-Kincaid grade level vs. the number of words by post on this blog.

Other than

 a h

andful of long posts that rate lower in the range 8-10,  I don't see much going on here.

Flesch-Kincaid grade level vs. the publication date by post on this blog. 

 The size of each post (in words) is shown by the area of each point, color is used purely to help visually differentiate each of the points.  Apart from a couple of recent "complex" posts  this does seem to be showing a trend, so I added a regression line and labeled the more extreme posts.  Point (b) is a very short "announcement" style post (you can hardly see the point at all) and I could probably ignore it completely.  Point (e) is a more fun piece I did around using pie-charts that's probably not very representative of the general material either.

If you want to compare readability for yourself here are the top (and bottom) posts ranked by Flesch-Kincaid grade level

Rank

Post

 Flesch-Kincaid grade level

words

sentences

1

Analytic tools "so easy a 10 year-old can use it" 

13.3

784

33

2

Point of Sale Analytics - newsletter released 

13.1

82

4

3

Point of Sale Data – Category Analytics 

12.8

676

29

4

How to save real money in truckload freight (Part I) 

12.8

723

31

5

The Primary Analytics Practitioner 

12.7

541

29

6

Reporting is NOT Analytics 

12.4

891

43

7

Point of Sale Data – Sales Analytics 

12.1

478

24

8

Data handling - the right tool for the job 

11.9

762

38

9

Data Cleansing: boring, painful, tedious and very, very important 

11.8

297

16

10

Point of Sale Data – Supply Chain Analytics

11.6

958

41

35

The right tools for (structured) BIG DATA handling

  9.0

1878

114

36

Better Point of Sale Reports with "Variance Analysis": Velocity...

  8.9

1264

78

37

Better Point of Sale Reports with Variance Analysis (update)

  8.5

177

10

38

Better Business Reporting in Excel - XLReportGrids 1.0 released

  8.4

70

5

39

What's driving your Sales? SNAP?

  8.3

651

42

40

Do you need daily Point of Sale data?...

  8.2

1395

83

41

SNAP Analytics (1) - Funding and spikes.

  8.1

531

32

42

SNAP Analytics (2) - Purchase Patterns

  7.9

773

44

43

Business Analytics - The Right Tool For The Job

  7.6

483

36

44

Are pie charts truly evil or just misunderstood ?

 7.1

1097

70

Conclusions

It appears that my material is (largely) written at a level that should be accessible to the reader.

 And I am using more readable language in recent blogs which sounds like a good thing.

But there remains a key question for me that these stats can't really answer.

 Am I getting better at explaining the 

complex (my goal) or just explaining simpler things ? What do you think ?

In case you are wondering, this post has a Flesch-Kincaid grade level of about 8.  So if you can follow the "State of the Union" address you should have been just fine with this.