Tuesday, December 31, 2013

WebFOCUS and R for Dynamic Statistical Analysis

Part I
You can leverage the best of two powerful software products by combining WebFOCUS Business Intelligence language with the R statistical programming language.

WebFOCUS from Information Builders provides you with robust BI capabilities such as web access, highly-dynamic scripting, and far-reaching enterprise data access. With the R open-source offering, you have sophisticated statistical analysis, data visualization, and access to web data content.

In this multi-part article, I will show you how easy it is to integrate the two products. 

To help explain this, let's look at a simple application where the two products work together.

I have a text analytics software product called the BI Analyzer, which we use in large legacy modernization initiatives as a preliminary assessment tool. The software mines the applications' text for important keywords and loads the scan results into an inventory database--imagine a small Google engine specializing in indexing your custom computer applications--from which WebFOCUS and R can perform analytics.

Using statistical analysis, I want to determine the complexity of each scanned procedure. I use WebFOCUS' auto-prompting features to generate a simple user interface of options (if I wanted, I could make this a nicer looking web page, but it meets my needs).

To determine a procedure's complexity, I look at various keywords found inside all of the procedures and, based on their "hit" frequencies, categorize each procedure into one of four buckets: Low, Somewhat Low, Medium, and High. 

In addition to showing this breakout in tabular form, I want to display a BoxPlot graph which provides great quartile visualization. 

Using the options I choose from the web user interface, WebFOCUS dynamically generates and executes both WebFOCUS and R scripts. It then displays the consolidated results on the web page (again, this is a very simple application to which I could add more presentation sizzle if needed).  

Notice in the screenshot that I am searching for any Crystal Reports procedures which use JavaScript Pages (identified by a scanned keyword that contains ".jsp"). I only want to consider those reports which are active, ignoring any in the scanned inventory database which were flagged as obsolete. I'm picking the BoxPlot option but have others such as Histogram and Plot.

WebFOCUS produces a tabular report and then combines it with the R graph. 

Picture of WebFOCUS/R User Interface

Had I selected "Update" instead of "View," the procedure would have used the results of its quartile analysis to update the inventory database, flagging each of the scanned procedures referencing JavaScript Pages with its calculated complexity ranking. Later, other reporting features would use this information to calculate time and cost estimates for the legacy BI conversion project.

Here is a summary of what the WebFOCUS procedure does for me: 
  • Via a web browser, interacts with the user
  • Based on the user's selected options, extracts data
  • Dynamically builds a WebFOCUS report script based on user's selections
  • Runs the script and creates a report
  • Dynamically builds an R graph script based on user's selections 
  • Calls R to run the script and create a graph (using the extracted WebFOCUS data)
  • Displays the WebFOCUS and R results on the screen

Visually, the interaction between WebFOCUS and R looks something like this: 

Picture of WebFOCUS and R Interaction

In Part II, I share the technical details of how WebFOCUS and R work together.

Friday, December 27, 2013

Free Business Intelligence Software: MicroStrategy Analytics Desktop

Here's a hot deal on Business Intelligence software. MicroStrategy is offering a free download of their Analytics Desktop product.

If you do not want to install software on-premise, MicroStrategy has another option: their Express SaaS product free for the first year.

Wait, there's more! You can get their full-blown Enterprise product as a 30-day evaluation version with free Reporting Suite.

Click on this link for more information.

Tuesday, December 10, 2013

You Too Can Be Sexy! Using R for Descriptive Statistics and Predictive Analytics

Information Builders has pushed back its deadline for submitting papers for the upcoming Summit 2014 being held in Orlando, Florida. If you are interested in presenting, you now have until the 20th of December to provide your topic to the BI software vendor. 

Submitting a presentation is easy. Just come up with a catchy title and subtitle such as: 
"You Too Can Be Sexy!: Integrating R with WebFOCUS for Descriptive and Predictive Statistics"

Follow that with a few paragraphs describing the main ideas of your presentation, making sure everything jives with Information Builders' hot market topics of analytics, Big Data, Business Intelligence, etc. 

Here is a sample synopsis:  
"Experts call data analysis the sexiest job of the 21st century. In this era of Big Data, a command of statistics and analytic tools correlates with job security and high income. Yet within five years, the U.S. may struggle with over a million unfilled jobs needing deep analytic skills.

Here is your chance to augment your WebFOCUS know-how with the open-source R language, which is seeing a dramatic increase in job demand. Doug Lautzenheiser will cover industry trends, using R for descriptive and predictive statistics, integration with WebFOCUS, and resources to help you build R skills."

For a presenter bio, provide a line or two: 
"In 1992, Doug Lautzenheiser first started presenting at Information Builders annual users' group conferences with his 'How to Get Your Maserati out of the Traffic Jam.' Over two decades later, Doug still loves to use catchy titles."

The real work will come after Information Builders approves your presentation. In preparation, have a good idea of the topics you plan to cover (remember, you only have one hour).

Here is an outline example to go with my "Be Sexy" theme:  
Industry Trends
  • Big Data
  • Predictive Analytics
The R product
  • History
  • Downloading and installing
  • Basics of the language
  • Reading in data
  • Summarizing the data
  • Source command for running scripts
  • Packages from Comprehensive R Archive Network (CRAN)
Descriptive Statistics and R
  • Terms (Population, Sample, Observation, Variable, Dataset)
  • Measures of Center: Mean, Median, Mode
  • Measures of Spread: Range, Interquartile Range (IQR), Variance, Standard Deviation
  • Distribution, P value, Confidence Interval
  • Measures of Shape: Skewness (positive or negative), Kurtosis (mesokurtic, platykurtic, leptokurtic)
  • Bar graph, Scatter plot, Boxplot, Histogram
  • Central Limit Theorem
  • Tests of Association: Correlation, Regression
  • Tests of Inference: Chi-Square, t-test (independent samples, correlated samples), Analysis of Variance  
Predictive Statistics and R
  • The statistical and analytical techniques used to develop models that predict future events or behaviors
  • Examples
  • Linear regression, Logistics regression, decision trees
  • Steps
Integration with WebFOCUS 
  • Using RStat
  • Using Dialogue Manager  
Getting Help
  • Classes (e.g., Coursera)
  • Books (e.g., "Learning R" from O'Reilly)

You will need to finalize your presentation about a month or two before the actual event starts the first week of June. Information Builders will contact you when the deadline hits.

If you are not interested in presenting, be sure to attend and see my presentation (assuming it gets approved, of course). You too can be Sexy!

