Better Business Analytics - Christmas list

It's that time of year again: my kids have written, re-written and re-re-written their Christmas lists now so we all hope Santa will read them carefully and take notice.

With just a few days left before the holiday season hits I wanted to do something a little more fun, so I've pulled together a list of things that I think every Business Analyst should want.  Some are free to acquire, just costing your time to learn, others you may wish to ask Santa (or your CFO) to provide.

64 bit Operating system, applications and more memory

32 bit operating systems cannot recognize more than 4 GB of computer memory (RAM), regardless of how much you load onto your hardware.

 Forget "Big Data" for a moment - you can fill 4GB relatively easy with desktop tools - if you want to do any serious data processing on your laptop/desktop environment you will need more than that.  RAM is cheap, easy to upgrade and most modern laptops will take up to 8GB without any problem.

Max out your RAM

Note: 8GB of RAM for the laptop I am writing this on is currently $41.99 at www.crucial.com

Hard-drive upgrade

Solid State Disks (SSD) provide huge speed improvements over their spinning, hard-drive counterparts.  If you are are crunching numbers from file or a local database the hard-drive may well be slowing you down.  A modest investment (the 480 GB Sandisk SSD is currently available on Amazon for $360) can save you a lot of time.

Serious computing power

If you have access to a high powered server this may be of less use, but I was surprised recently to find out just how much computing power you can now get in a desktop workstation.  This beast with 2 CPUs (32 cores), 128 GB of RAM and 2TB of super-fast SSD hard-drive costs a little over $6000.  That's an awful lot of power (more than most servers I've worked with)  for the price of 2-3 good laptops.

Excel 2010 with PowerPivot.

Excel is a superb tool for interacting with your data for prototyping and occasionally for delivering models and results.  It can't do everything and for many problems you will need to turn to more specialized tools but any analyst that tells you they don't use it at all is probably not a very effective analyst.  With the ability to write VBA, to customize and embed your own calculations, it can be a very powerful modeling tool indeed.

With Excel 2010 and the free  PowerPivot  add-in, Excel now has the capability to embed  powerful BI capabilities too.  Integrate data from different sources, load enormous amounts of data, vastly more than the worksheet limit of about 1 million records and define more complex measures and functions to apply to this data with DAX (the Data Analytic eXpressions language).  

If you are not already there upgrade now to 2010 - make sure its the 64 bit version to blow past the 4GB memory limitation.

Note: Office 2013 may be just around the corner, but with Microsoft's latest offer if you buy now (Oct'12 thru April '13), you get Office 2013 for free when it's released.  Microsoft's Office Pre-Launch Offer

Time to learn R for statistical computing

OK - I know - those of you who are heavily invested in another package (SAS, SPSS, Statistica, ...) do not get this.  Those are great tools and why should you change?  Well, I'm not suggesting you swap one package for another - I'm suggesting that there is room in your head for more than one tool and R has a lot going for it.

Pros

  • According to Revolution Analytics R is now available as an embedded, in-database analytics engine for Oracle,  SAP HANA., Neteeza, TeraData and others.  This is a very exciting development allowing advanced statistical capabilities to be called from within the database from SQL. Handle routine analytics in-line and on-demand.
  • It's free - really - free.  R is open source tool  you can download from http://www.r-project.org/
  • It has packages to do an enormous variety of statistical analyses, you will probably not need or use even 90% of them.
  • It has great graphical capabilities for visualization.
  • It's callable from .NET languages too via RDotNet.  The interface is not well documented, but it does work.

Cons

  • The user interface is "ugly" unless you love command line applications - I don't.  RStudio does help with this.
  • If you are not used to vector based programming languages it may take you a while to grasp.  (try "The Art of R Programming" for help.)
  • There are a lot of commands to remember, I use my cheat sheet a lot.

Time to learn SQL

 In reality the time to learn  SQL was probably about 20 years ago, but if you have not yet done so, catch up quickly.  SQL (structured query language) is the programming language for relational databases.  Whenever you interact with a database it is some dialect of SQL that interacts with the database to select update or insert your data efficiently.

Most of your data will now come to you from a database of some form.  It is a common requirement to integrate data from multiple sources (without losing any), filter, aggregate and sort as a precursor to more advances analytic routines.  This is what the databases does superbly well - if you're doing this in Excel or indeed any non-SQL environment, you are working to hard.

While you are at it, get a good grasp of database design principles and normalization too.

2013 - the year for Column Oriented Databases ?

Column storage is not a new idea in relational databases, but perhaps it's an idea that is about to mature.  Regular, relational databases physically store rows of data together which is very efficient for transactional databases that update and retrieve a few rows at a time.

column-oriented database stores entire columns of data together.   This is very efficient for systems  where aggregates are computed over large numbers of similar rows - such as in data warehouse reporting and analytics Notably, SQL Server 2012 released its first direct support for column storage this year.  It's not perfect, particularly as the implementation limits how you can write to the database, but it is fast.  My own testing on a large database showed a 10-fold increase in speed on large aggregation queries and almost linear scaling as the number of fields chosen (and hence the number of columns that must be accessed) changed.

I've also had excellent performance from InfiniDb available as a free "community edition" though I suspect I have not fully tested it's capability with tables that only have a few hundred million records :-)

Column Storage is exceptionally fast and could allow for significant changes in system architecture for reporting and analytic tools.  How about POS analytics?  Straight from database to reports without the need for any custom-architecture, pre-aggregation, cubes or waiting on batch processing?  Want to update a few Item attributes and re-run your reports now?  No problem !  (Embed in-database analytic capability so you can do more than add/subtract/multiply/divide and you have a serious number-crushing platform).

The #1 item for your list - a (much) bigger dry erase board.

At times, it seems I can't think clearly without a dry-erase board and pen.  My own workhorse is a 8'x4' board that sees heavy daily use, but bigger ideas need more space.  Amazon has a dry-erase paint on offer that I may have to try - enough for about 30 linear feet of wall for $500.

Jumbo Whiteboard Dry Erase Paint Clear 240 Sq Ft

What do you think?

So what do you think should be on every good Business Analyst's Christmas list?   Am I missing this year's top gift?  What should a "naughty" analyst receive - more bad data?  Let me know in the comments section.

P.S.  Santa - if you're reading this, I rather like this T-shirt from CafePress.com too.