Tuesday, December 31, 2013

WebFOCUS and R for Dynamic Statistical Analysis

Part I
You can leverage the best of two powerful software products by combining WebFOCUS Business Intelligence language with the R statistical programming language.

WebFOCUS from Information Builders provides you with robust BI capabilities such as web access, highly-dynamic scripting, and far-reaching enterprise data access. With the R open-source offering, you have sophisticated statistical analysis, data visualization, and access to web data content.

In this multi-part article, I will show you how easy it is to integrate the two products. 

To help explain this, let's look at a simple application where the two products work together.

I have a text analytics software product called the BI Analyzer, which we use in large legacy modernization initiatives as a preliminary assessment tool. The software mines the applications' text for important keywords and loads the scan results into an inventory database--imagine a small Google engine specializing in indexing your custom computer applications--from which WebFOCUS and R can perform analytics.

Using statistical analysis, I want to determine the complexity of each scanned procedure. I use WebFOCUS' auto-prompting features to generate a simple user interface of options (if I wanted, I could make this a nicer looking web page, but it meets my needs).

To determine a procedure's complexity, I look at various keywords found inside all of the procedures and, based on their "hit" frequencies, categorize each procedure into one of four buckets: Low, Somewhat Low, Medium, and High. 

In addition to showing this breakout in tabular form, I want to display a BoxPlot graph which provides great quartile visualization. 

Using the options I choose from the web user interface, WebFOCUS dynamically generates and executes both WebFOCUS and R scripts. It then displays the consolidated results on the web page (again, this is a very simple application to which I could add more presentation sizzle if needed).  

Notice in the screenshot that I am searching for any Crystal Reports procedures which use JavaScript Pages (identified by a scanned keyword that contains ".jsp"). I only want to consider those reports which are active, ignoring any in the scanned inventory database which were flagged as obsolete. I'm picking the BoxPlot option but have others such as Histogram and Plot.

WebFOCUS produces a tabular report and then combines it with the R graph. 

Picture of WebFOCUS/R User Interface


Had I selected "Update" instead of "View," the procedure would have used the results of its quartile analysis to update the inventory database, flagging each of the scanned procedures referencing JavaScript Pages with its calculated complexity ranking. Later, other reporting features would use this information to calculate time and cost estimates for the legacy BI conversion project.

Here is a summary of what the WebFOCUS procedure does for me: 
  • Via a web browser, interacts with the user
  • Based on the user's selected options, extracts data
  • Dynamically builds a WebFOCUS report script based on user's selections
  • Runs the script and creates a report
  • Dynamically builds an R graph script based on user's selections 
  • Calls R to run the script and create a graph (using the extracted WebFOCUS data)
  • Displays the WebFOCUS and R results on the screen


Visually, the interaction between WebFOCUS and R looks something like this: 

Picture of WebFOCUS and R Interaction



In Part II, I share the technical details of how WebFOCUS and R work together.

Before you read that next section, I'm curious as to what you think. Do you see value in using WebFOCUS and R together?

Please leave your comments here. 

Friday, December 27, 2013

Free Business Intelligence Software: MicroStrategy Analytics Desktop

Here's a hot deal on Business Intelligence software. MicroStrategy is offering a free download of their Analytics Desktop product.

If you do not want to install software on-premise, MicroStrategy has another option: their Express SaaS product free for the first year.

Wait, there's more! You can get their full-blown Enterprise product as a 30-day evaluation version with free Reporting Suite.

Click on this link for more information.


Tuesday, December 10, 2013

You Too Can Be Sexy! Using R for Descriptive Statistics and Predictive Analytics

Information Builders has pushed back its deadline for submitting papers for the upcoming Summit 2014 being held in Orlando, Florida. If you are interested in presenting, you now have until the 20th of December to provide your topic to the BI software vendor. 

Submitting a presentation is easy. Just come up with a catchy title and subtitle such as: 
"You Too Can Be Sexy!: Integrating R with WebFOCUS for Descriptive and Predictive Statistics"

Follow that with a few paragraphs describing the main ideas of your presentation, making sure everything jives with Information Builders' hot market topics of analytics, Big Data, Business Intelligence, etc. 

Here is a sample synopsis:  
"Experts call data analysis the sexiest job of the 21st century. In this era of Big Data, a command of statistics and analytic tools correlates with job security and high income. Yet within five years, the U.S. may struggle with over a million unfilled jobs needing deep analytic skills.

Here is your chance to augment your WebFOCUS know-how with the open-source R language, which is seeing a dramatic increase in job demand. Doug Lautzenheiser will cover industry trends, using R for descriptive and predictive statistics, integration with WebFOCUS, and resources to help you build R skills."

For a presenter bio, provide a line or two: 
"In 1992, Doug Lautzenheiser first started presenting at Information Builders annual users' group conferences with his 'How to Get Your Maserati out of the Traffic Jam.' Over two decades later, Doug still loves to use catchy titles."

The real work will come after Information Builders approves your presentation. In preparation, have a good idea of the topics you plan to cover (remember, you only have one hour).

Here is an outline example to go with my "Be Sexy" theme:  
Industry Trends
  • Big Data
  • Predictive Analytics
The R product
  • History
  • Downloading and installing
  • Basics of the language
  • Reading in data
  • Summarizing the data
  • Source command for running scripts
  • Packages from Comprehensive R Archive Network (CRAN)
Descriptive Statistics and R
  • Terms (Population, Sample, Observation, Variable, Dataset)
  • Measures of Center: Mean, Median, Mode
  • Measures of Spread: Range, Interquartile Range (IQR), Variance, Standard Deviation
  • Distribution, P value, Confidence Interval
  • Measures of Shape: Skewness (positive or negative), Kurtosis (mesokurtic, platykurtic, leptokurtic)
  • Bar graph, Scatter plot, Boxplot, Histogram
  • Central Limit Theorem
  • Tests of Association: Correlation, Regression
  • Tests of Inference: Chi-Square, t-test (independent samples, correlated samples), Analysis of Variance  
Predictive Statistics and R
  • The statistical and analytical techniques used to develop models that predict future events or behaviors
  • Examples
  • Linear regression, Logistics regression, decision trees
  • Steps
Integration with WebFOCUS 
  • Using RStat
  • Using Dialogue Manager  
Getting Help
  • Classes (e.g., Coursera)
  • Books (e.g., "Learning R" from O'Reilly)

You will need to finalize your presentation about a month or two before the actual event starts the first week of June. Information Builders will contact you when the deadline hits.

If you are not interested in presenting, be sure to attend and see my presentation (assuming it gets approved, of course). You too can be Sexy!

Monday, September 9, 2013

Learn from Big Data How to Predict the Future

Experts believe our collection of Big Data will double every two years until 2020. 

Much of those digital artifacts come from people like you and me as we "Like" things on Facebook, buy books over the web, post blog entries, and share smartphone photos on Instagram. Yet only a fraction of this data is actually being used.  

So what should we do with it?

Eric Siegel says that most valuable thing we can do with data is to "learn from it how to predict."

The founder of the Predictive Analytics World conference, Dr. Siegel is also the author of the bestselling book, "Predictive Analytics," with the catchy subtitle of "The Power to Predict Who Will Click, Buy, Lie, or Die." 

I read his work right on the heals of taking a Coursera MOOC on Data Analysis and was pleased to get Siegel's common-sense clarifications of the same academic topics.

Throughout the book, Siegel provides real-life examples of how organizations use data and software to infer something unknown, perhaps imperfectly but often with surprising accuracy.

For example, Siegel covers how the retail giant Target Corporation uses predictive analytics to decide which of its shoppers might be pregnant and how financial services giant Chase predicts which customers might pay off mortgages early (good for the homeowner but bad for Chase since they lose interest payments).

Siegel points out that after a predictive application provides insight, somebody still has to do something about it. Target needs to provide pregnancy-related coupons to pregnant customers. Chase needs to convince mortgage holders to stay.

Siegel's book focuses on five different "effects" of using data to infer some unknown situation:

The Prediction Effect
 "A little prediction goes a long way."

The Data Effect
"Data is always predictive."

The Induction Effect
"Art drives machine learning; when followed by computer programs, strategies designed in part by informal human creativity succeed in developing predictive models that perform well on new cases."

The Ensemble Effect
"When joined in an ensemble, predictive models compensate for one another's limitations, so the ensemble as a whole is more likely to predict correctly than its component models are."

The Persuasion Effect
"Although imperceivable, the persuasion of an individual can be predicted by uplift modeling, predictively modeling across two distinct training data sets that record, respectively, the outcomes of two competing treatments."


In addition to these five effects, Siegel covers the important Big Data topic of ethics. 

Imagine that your company could predict which of its customers were likely to die soon. What actions should it take? Who owns that powerful piece of information? Are there any obligations and responsibilities related to holding that insight? 

I found Siegel's book to be not only educational but also enjoyable; it was like a "Moneyball" for the business world. And it was not without a game; Siegel devoted an entire chapter to how IBM's Watson computer used predictive analytics to beat humans on Jeopardy!.

If you want a free copy of the book, just attend one of the upcoming PAWCon events. In September, there is one in Boston where you can hear Dr. Siegel during the keynote presentation.


Friday, July 26, 2013

Handling BI Modernization in Secure Environments

During BI modernization initiatives, we work with a variety of firms with differing security concerns.

Our first step in our toll-gated DAPPER modernization methodology is to always perform a quick Legacy Assessment to evaluate the existing state of BI affairs. With this, we can better understand how we can automate its replacement and retirement.

Why do a Legacy Assessment? 

I like to say that with this simple step, you will spend a little to learn a lot.

You may already feel you have a good handle on your current legacy reporting environment. Unfortunately, I do not. Without a quick assessment, I cannot tell you how to save years and millions of dollars by automating the conversion of the applications.

In just a short time, we can tell you: 
  • How big and difficult is this BI modernization initiative? 
  • Where are the pitfalls that could cause this project to fail? 
  • How much money and time will we have to spend to modernize? 
  • Should we really do it? 
  • If we do decide to proceed, what is the best course of action? 

Our quick assessment creates a digital profile of your legacy reporting applications. With that picture, not only can we analyze the current state but we can better predict the future state and decide how to get there easiest.



Our software sifts through decades of legacy application code, assessing each individual file's purpose, complexity, grouping, redundancy, status (active or obsolete), and appropriate modernization approach.



Security Concerns about a Legacy Assessment

The first security concern is always data. At no time during the Legacy Assessment do we access our client's data; we only use legacy application code.

Many companies do not consider this source code to be highly secure and will transfer copies outside of their organization. We then need no access to the client's network whatsoever and can perform all work on our external computers that have the BI Modernization installed.

Not all firms view source code this way. If you work for a financial institution such as a bank, a healthcare provider, or an insurance company, you may not be allowed to share legacy source code. 

For clients who do consider legacy source code to be secure and not allowed to leave their environment, we can split the BI Modernization Workbench processing between the client's and our external computers. Unlike the scenario where we do all work remotely, we will now need access to the client's network to perform the scanning and subsequent analysis. 

On the client's computer, we install a small batch processing component that can read copies of the client's legacy source code. We scan on the client's computer and then transfers the results--load instructions of keyword counts--to the external BI Modernization Workbench computer. There, we will perform the subsequent processes of database loads and analysis.   

This model requires clients to agree that the scanning results are safe to copy externally to the BI Modernization Workbench. Prior to the Legacy Assessment, we can share examples to provide assurance of security compliance. These files basically contain only the names of legacy programs and associated keyword counts. 

Keep in mind that this model can cause the Legacy Assessment to take longer since we will not have easy access to the legacy source code while performing analysis on the external computer. Customizing the scanning module for a client could be difficult since the BI Modernization Workbench's development environment will have no legacy code for testing.  

You may be wondering about a third option: just do everything within the client's environment. If a client is secure enough that they will not transfer copies of legacy reporting code, then they are probably going to be adverse to having the entire stack of the BI Modernization Workbench installed on their network for use by external consultants.


Resource Concerns about a Legacy Assessment

Although not related to our topic of security, many firms are concerned that a Legacy Assessment might be resource intensive and their already over-worked technical staff will not have the time to devote to this engagement.

During a Legacy Assessment, we need very little time from the client's staff other than assistance with setup, pointing us in the right direction, and answering occasional questions. 

Don't Take the Hard Road

If you are converting legacy reporting tools such as SAP Business Objects, Crystal Reports, IBM Cognos, 4GLs (FOCUS, NOMAD, and RAMIS), or just plain old reporting tools to modern software products and want to do the initiative with the minimal time, cost, risk, and skill set requirements, contact me to discuss the best option. 

Saturday, June 15, 2013

The Best Big Data Software

In the June 2013 issue of Software Development Times, the editors have once again selected the industry's leading firms. This is their 11th annual choosing of the SD Times 100. You can download the entire issue here.

One of their selected categories grouped the software vendors in the hot Big Data and Business Intelligence market.







The magazine--one of my favorites--introduced these innovative companies with this quote:
"In this new category, the editors of SD Times recognize that with exponential data growth comes exponential problem growth. It also creates a storage problem, a retrieval problem, and a problem in understanding it all, so organizations 'doing' Big Data get actionable information that keeps them a step or two ahead of the competition. These vendors are the ones who've tackled the giant data problem with aplomb."

And the winners are:

10gen
10gen developed the open-source MongoDB noSQL Big Data database. In a recent announcement, IBM and 10gen will jointly craft a new standard for enterprise databases specifically for the mobile market. See this article about the shake-up happening in the enterprise database market.

Apache Hadoop
The Apache Software Foundation is a community of developers working on open-source software. It is the poster child for successful open-source software.

If you are a software professional, it is highly unlikely that you do not use at least one of their projects: Apache web server, Ant application development tools, ActiveMQ message queuing, Derby relational database, Flex web browser application development platform, Lucene search engine, the OpenOffice productivity suite, the Struts web application development framework, Subversion source code management system, the Tomcat Java app server, and others.

Within the Big Data space, Apache has some of the leading technologies: Cassandra platform, HBase read/write access, Mahout machine learning library, the Hadoop distributed computing platform, and the Pig analytics tool. If you are not familiar with Hadoop, it was initially based on papers published around 2004 by Google on how it was handling massive amounts of data.

Cloudera
Some of the original Big Data software developers from the Apache Software Foundation and companies such as Yahoo and Google quickly formed their own firms to take advantage of this emerging market (smart guys!). Cloudera is one such vendor specializing in Apache Hadoop.

DataStax
DataStax was formed to specialize in the Apache Cassandra platform. You can read about them here.

FatCloud
FatCloud makes the FatDB NoSQL database for the Windows .NET platform.

Hortonworks
Hortonworks is another leading firm specializing in the Apache Hadoop software.

Objectivity
Objectivity is the maker of a graph database and Big Data database tools.

Pentaho
Pentaho is an open-source BI software platform that is actively pursuing the Big Data analytics space. As part of that offering, they not long ago announced their Instaview data visualization product.

Splunk
Splunk is targeting the Big Data niche of machine-generated data. Websites, servers, mobile phones, and other devices are constantly spitting out huge amounts of data. Splunk wants to help organizations unlock the hidden potential of this potentially actionable information.



As you read this list, the term "open-source" should have repeatedly jumped out at you. Either the editors of SD Times are in love with the OSS concept or there is a real revolution going on within the software industry (the real answer may be both, I suppose).

It is also interesting to see the Big Data power cluster of individuals originally associated with Apache projects: Cloudera, Hortonworks, and DataStax.

Along with mobile application development, today's hot space for software professionals is Big Data. 

Monday, June 10, 2013

Beginner's Guide to R Statistical Programming Language

In June of 2013, ComputerWorld published a nice overview to the R statistical programming language.

You can see it here

Thursday, March 14, 2013

Online Degrees Done Dirt Cheap

While driving across Wisconsin yesterday, I learned of a great new place to get an online degree.

In the city of Middleton, stop at Poupon U and learn all about mustard. Even if you cannot visit the campus, you can still go online and earn an official diploma--for the low price of just eight dollars--in a variety of degree programs, such as:

  • M.D. degree (Mustard Doctor)
  • JD degree (Juris Dufus)
  • DDS degree (Doctor of Diddley-Squat)
  • MBA degree (Master of Bad Attitude)
  • CPA degree (Couch Potato Authority)


Each degree is authorized by Elvis (who moved to Wisconsin to be closer to the fresh cheese) and Buford, their resident village idiot.

And be sure to show your alumni pride by donning the collegiate apparel. Go Big Yellow!



Saturday, February 23, 2013

Adventures in MOOC Part 3

When I was in first grade, I won an art contest among my young peers. On a white poster board, I had drawn a puppy playing with a ball and titled the work, "I Love And Feed My Dog."

I had added streak marks to indicate a rapid movement of the ball and stretched out the puppy's body to give the impression he was moving toward the viewer. 

In an award ceremony in front of the entire school, my teacher escorted me and my puppy picture onto the gymnasium floor to meet the principal for a ribbon. The six-foot tall suit greeted us and, as he started to hand me the bright red ribbon, took a glance at my winning art. 

Rumored to have only one working eye, he probably could not appreciate the intended perspective of the art. 

The principal immediately withdrew the ribbon and turned his attention to my teacher, "Are you sure this kid drew this? Did one of his parents do it for him?" 

This glass-eyed Ruler of Children evaluated the work for five seconds and immediately called a six-year old a liar and cheat. 

My teacher came to my defense and convinced him that I had actually drawn the picture. Had she withered under the imposing Mr. Authority, it is hard to say what would have happened. With a finger pointed at the gym exit, Principal Meany probably would have banished me from the school forever.

I may not remember this event exactly but when he handed down the ribbon to the three-foot tall version of me, he growled. 

Which brings me to the sixth week of my online Coursera MOOC. You can read my earlier adventures here.

After the first Data Analysis written assignment, the professor posted a notice about a concern of plagiarism. More than likely, a handful of students are using the web to share their work. 

Here's the challenge for the Coursera professor. He cannot police thousands of students and evaluate their work. He has to allow peers to review others' work and make a decision. But here is the rub: he also does not have time to evaluate whether the peers did a good job with grading. 

If a peer marks down your work claiming that he or she suspects you did not do this work on your own, you have no mechanism for appealing that grade. There are just too many students. Here are his comments:
A charge of plagiarism is a very serious accusation and should only be made on the basis of strong evidence. It is currently very difficult to prove or disprove a charge of plagiarism in the MOOC peer assessment setting. As a result, I am not expecting you to police your classmates’ work for plagiarism. You should evaluate the work of your classmates on the merits of what they have submitted. You should only mark them down if you are absolutely 100% confident that their submission constitutes an act of egregious plagiarism. I am not in a position to evaluate whether or not a submission actually constitutes plagiarism, and I will not be able to entertain appeals or to alter any grades that have been assigned through the peer evaluation system. 

So imagine a peer reviews your work and thinks, "You could not have done this alone! I'm giving you little cheat a piece of my mind and failing you on this assignment." Your evaluator points a bony finger to the door and sends you out into the street, dragging your poster board behind you and crushing your dreams for a red ribbon.

In a MOOC, your teacher is not there to defend you. The Coursera professor is too busy; there are too many students. You have no recourse on Coursera.  

Or maybe not. It would seem that if Coursera is to be taken seriously, they would need a fair grading process and formal mechanism for appealing decisions. After all, not all of the students are above average. The same people who are getting bad grades are also assigning grades.

Coursera announced there are now 2.7 million people enrolled in their courses. Would you trust your college grade to be crowd-sourced by a random sampling of millions of people around the world? 

Just like my one-eyed Principal was not a fair art judge, neither would be all of your peers in a MOOC. 

Wednesday, February 20, 2013

Gartner Slaps BI Vendor Birst for its Giddiness

Here is a copy of the e-mail that BI software vendor Birst sent out today, apologizing for an earlier communication that was not properly approved by industry analyst Gartner:


Dear Doug,
On Wednesday, February 13, 2013, you received an email from Wynn White on behalf of Birst promoting our inclusion in Gartner's 2013 Magic Quadrant for Business Intelligence and Analytics Platforms. That communication was not authorized by Gartner, nor was it in compliance with Gartner's Copyright and Quote Policy. Birst apologizes for this error. The following has been approved by Gartner.
Birst has been positioned by Gartner, Inc. in the Challengers quadrant of the 2013 Magic Quadrant for Business Intelligence and Analytics Platforms. The report presents a global view of Gartner's opinion of software vendors that should be considered by organizations seeking to use business intelligence (BI) platforms. After evaluating 38 different BI and analytics software vendors on 15 functional areas, Gartner analysts placed Birst in the Challengers quadrant based primarily on completeness of vision and ability to execute.
What does this all mean? We're stoked to be named a Challenger.
Get your complimentary copy of the Gartner 2013 Business Intelligence and Analytics Magic Quadrant.
Regards,
ww sig.png
Wynn White
Vice President, Marketing
Birst

This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Birst. Gartner does not endorse any vendor; product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.



Who can blame Birst for its giddiness on being in the BI and Analytics Magic Quadrant? Congratulations to them on their success.  

Tuesday, February 19, 2013

Adventures in MOOC Part 2

I am now in the homestretch of a Coursera MOOC that lasts eight weeks. At the five-week milestone of a Data Analysis course using the R statistical programming language, I have survived four weekly quizzes and the first written assignment. See my original posting for the back story.

Seriously, I almost didn't take the very first quiz. It was that intimidating. But after passing three of them, I started to feel pretty good. Perhaps I was even ready for a sexy Data Scientist swagger.

But then a major setback shook my confidence: I saw the Data Analysis written assignment. You had to download data, sift through it for associations, generate graphs, and write a paper. Some of the other students were complaining in the discussion forum that completing it took them over twenty hours.

I fumbled with the data analysis for quite a while. I considered admitting defeat and just watch the lectures in the future. Mark this John Hopkins class as an "audit" instead of getting a graded certificate.

I went to bed on Friday night thinking, "This is crazy! I don't have the time for this stuff. This course isn't that important."

Luckily for my scholastic endeavors, by 6:30am the next morning my attitude had improved and I knocked out the assignment within four hours. What had stopped me the previous night was dirty data. The higher-ed bums had made the data bad so that your first step needed to be to convert formats and clean things up.

In the morning, I had the presence of mind to look through the course discussion forums and read the comments of other disgruntled students who had already commented on the dirty-data-trick.

You see, in real life, you have to deal with dirty data. Whether you are working with your internal ERP structured data, files from a third-party external source, or Big Data from around the globe, some of it will be dirty and need up-front handling before jumping into the data analysis. Through the use of dirty data in the assignment, the teacher was making a real-life point.

Whoever thinks participating in an online virtual college course is easier than sitting in a physical classroom is very mistaken. This is serious stuff.

Students in online education deserve our respect.

However, one major issue with the assignment was its timing. The teacher--Jeff Leek of Simply Statistics fame--assigned it during Week 3, but I wouldn't understand how to do it until after Week 4's lectures and quiz. I fumbled through the assignment before I knew what I am doing.

Because there are thousands of students, the teacher cannot grade the assignments. Instead, he needs the students to grade each other according to a standard Yes/No rubric where each evaluation question is worth zero to five points. Each student has to evaluate the work of four peers or receive a 20% deduction in their assignment grade.

After I submitted my assignment and started on the peer reviews, I immediately got that "oh crap" feeling. Looking at the work of others, I started to gain a better insight into what the teacher might have wanted. Compared to theirs, my data analysis seemed pretty simple. At the time, I didn't understand how to use R to program a linear model with confounding, interacting variables. Confound it!

Oh well, my data analysis report was formatted nicely and the R graphs looked pretty; I can only hope my peer graders will be gracious and overlook my simple statistics.

To get the positive karma flowing, I gave all of my peers perfect scores. 

Monday, February 11, 2013

2012 Top 1% of LinkedIn Profile Views

LinkedIn just notified me that I was in the top 1% of all LinkedIn profile views in 2012. With over 200 million users, that means I am just in the top two million.

If you and I are not yet connected on LinkedIn, be sure to send me an invitation.


Wednesday, February 6, 2013

Adventures in MOOC

I am now in my third week of a MOOC adventure. In case you are not familiar with the term, MOOC is a massively open online class.

In particular, I am enrolled in an eight-week Data Analysis class from the John Hopkins Bloomberg School of Public Health, offered through the Coursera platform.

Professor Jeff Leek describes the R statistical programming course as:

"This course is an applied statistics course focusing on data analysis. The course will begin with an overview of how to organize, perform, and write-up data analyses. Then we will cover some of the most popular and widely used statistical methods like linear regression, principal components analysis, cross-validation, and p-values. Instead of focusing on mathematical details, the lectures will be designed to help you apply these techniques to real data using the R statistical programming language, interpret the results, and diagnose potential problems in your analysis. You will also have the opportunity to critique and assist your fellow classmates with their data analyses. Here is a post where I describe how data analysis fits in with other quantitative subjects: http://simplystatistics.org/2013/01/10/the-landscape-of-data-analysis/"

Being "massively open" means that there can be lots of students. I cannot find a good indicator of how many students there really are, but one discussion board forum has almost 3,000 page views. One student also put together a Google Map of where around the world people are located. While not everybody probably participated in the map, I guess there are at least hundreds if not thousands of students in class with me. 

From NY Times Published: January 26, 2013
From NY Times, Published: January 26, 2013
In a recent article, Thomas Friedman wrote about a Coursera "revolution" after learning there were almost two and a half million people taking Coursera courses. That is up from just 300,000 students less than one year ago. 

With thousands of people in each course, a teacher is not going to have time to get to know the students, grade their quizzes, answer their questions, or perform other typical teacher activities. Instead, the MOOC solution is to have the students do things themselves. 

To enable students, Coursera courses consist of many self-service components:
  • Online videos
  • Downloadable lecture notes
  • Self-scoring online quizzes
  • Lots of auxiliary reference material
  • Course Wiki 
  • Course discussion forums 
  • Course Meetups 
  • Assignments which are peer graded 

My first surprise was the weekly quiz. They are along the lines of "Using the knife and bottle of whiskey under your seat, perform a self-appendectomy..." 

Back in my college days, you had to have all of the quiz answers safely stored within your head. That is no longer true in today's world of MOOCs. 

Instead, the quiz question might ask you to download a comma-separated file from a website and load it into a "data frame" using the R programming language. From there, you need to follow a list of instructions such as create a subset of data, perform data "munging"--I have been doing IT for over thirty years and that was a new term to me--and then answer a final question like: "On rows 124, 246, and 368, what are the values of ABCVAR?" 

Once I realized the quiz expects you to search for answers, I was fine. 

Like work at my Christian college, there is an honor code. At Valparaiso University, I always wrote at the bottom of each test: "I have neither given nor received, nor will I tolerate, the use of unauthorized aid." Coursera has a check-box version of this oath. 

If MOOC students get past the initial fear--I closed the first quiz for an entire day before venturing back for a second look---I think most will pass with flying colors. The reason being you can take each quiz up to four times, with your recorded grade being the very last attempt. 

Now, if the Coursera system gave you another set of random questions each time, it would be tough.

However, it graciously just gives you the same questions again and even shows you which ones you got right and wrong. They might put the multiple-choice answers in a different sequence and rewrite the question a little, but the quiz basically stays the same. 

So if you were given a question with four multiple-choice answers along with four attempts to solve it, I do believe most people would be able to pass the test (if not get 100%). These tests are half of the total score, so pretty much a given. 

The two assignments that make up the other half of the score might be a different issue. Those are to be graded by your peers according to a published rubric. I apology to my peers, but I am concerned about their ability to grade. 

But that will be another blog posting. 

Sunday, February 3, 2013

Goodreads: The Millionaire Fastlane by DeMarco


The Millionaire Fastlane: Crack the Code to Wealth and Live Rich for a Lifetime!The Millionaire Fastlane: Crack the Code to Wealth and Live Rich for a Lifetime! by M.J. DeMarco

My rating: 3 of 5 stars





Throughout his book, DeMarco yelled at me for being an idiot and not a millionaire like him. By the end, I felt like a disgusting loser for having a well-paying job instead of being self-made, living in a fancy mansion, and driving an expensive sports car.

If you too want to be ashamed of your lack of millions of dollars, by all means read DeMarco's book.

For much of the book, I kept thinking, "OK, DeMarco, stop rambling and just tell me your secret." After he disclosed his secret to millions, he then rambled on for way too long.

The last few chapters actually seem to be completely unneeded, only making (some) sense when DeMarco finally disclosed he had organized the book according to a big acrostic of FASTLANE SUPERCHARGED. Cute, but totally unnecessary.

One of his millionaire secrets is never-ever-ever trade the hours in your days for money. If somebody is paying you an annual salary or by the hour, you are a fool, a big loser. To be a winner, you must be an independent producer of things that sell themselves while you snorkel in the Caribbean.

DeMarco's often crude style of writing implies he is a young kid who made it rich quickly and is now using book sales to keep his money rolling in.

Still, I liked DeMarco's book. He shares his Fastlane secrets to quick riches, makes some good points, and provides the reader with a fairly enjoyable book.

View all my reviews


Sunday, January 27, 2013

Difference between ETL and ELT

When interviewing for a BI job, you may get asked the tricky question, "What is the difference between ETL and ELT?"

Both are acronyms for a process that leads to the same result. ETL stands for Extract/Transform/Load while ELT is Extract/Load/Transform. You accomplish the same thing but do activities in a slightly different sequence. Your interviewer is going to want you to explain WHY.

For a nice explanation, see this Data Academy white paper.

Wednesday, January 2, 2013

2013 Designated the International Year of Statistics

In an earlier blog posting, I stated (tongue in check, of course) that 2013 was the Year of Legacy Application Modernization.

Actually, somebody beat me to the punch and already designated 2013 as the International Year of Statistics.

Before you smirk, remember that Harvard Business Review called the Data Scientist "the sexiest job in the world" and that Gartner predicted Big Data will generate 4 million worldwide jobs by 2015.

I'm going to participate in this global event by signing up for a free Coursera class on Data Analysis using the R programming language! Jeff Leek at John Hopkins University will teach the online class starting on 2013 January 22nd.

The Coursera website says this is a first-year course and requires no investments other than three to five hours each of the eight weeks. It does help before the class to already have the open-source R software installed and have a basic familiarity of the statistical programming language.

See the Simply Statistics blog for more details on the International Year of Statistics. 

2013 Resolution: Modernize Legacy Applications

Happy New Year! In case your calendar does not state it, 2013 is the year of legacy software application modernization.

Momentum has been building for over a decade and several factors are now coming together to force the issue for many organizations.

It is retirement time for the Baby Boomer generation who were supporting these legacy applications. The COBOL, PL/1, and 4GL developers are no longer available to handle the workload. The new kids coming into the IT shop to replace the outgoing retirees have no knowledge of nor any desire to shoulder the burden of these legacy technologies.

The legacy software vendors are not making life easy for their customers either. Instead, it seems to be time to milk those licenses before those inevitable product end-of-life events happen.

A major emerging option is the Cloud. As more software gets hosted as a Cloud solution, more companies will dismiss their major hardware investments.

Many legacy applications are blocking the move from the expensive legacy platforms to more affordable solutions by being too difficult to port. That is why many different companies have jumped in to help.

In particular, the non-mainframe hardware vendors want to help mainframe customers move to their platforms. Here are some examples (with links to help you contact these firms).

Hardware/Software Vendors with Modernization Groups

Hardward/Software Vendor
Location
Vancouver, British Columbia, Canada
Chicago, Illinois, USA
Plano, Texas, USA
Armonk, New York, USA
Newbury, Berkshire, England
Redmond, Washington, USA
Redwood Shores, California, USA



There are also pure-play modernization firms offering assistance in the form of both software products and services.

Modernization Vendor
Location
Dallas, Texas, USA
Herzlia, Israel
Budapest, Hungary
Austin, Texas, USA
Toronto, Ontario, Canada
Marietta, Georgia, USA
Kirkland, Washington, USA


The big player of these vendors appears to be BluePhoenix Solutions, a 20-year old Israeli firm with about $20 million in annual revenue. Info-Tech Research did a white paper on these vendors and had this to say about BluePhoenix:
"Of the vendors evaluated, Blue Phoenix was the vendor all other vendors agreed was the competition; as well, they offered the most complete service associated with legacy modernization, including testing and training of the client development resource pool on the newly transformed system."

While these firms focus on the platform, database, and transaction systems, we at Partner Intelligence specialize in modernizing just the Business Intelligence applications. We have partnered with several of these companies and participated in their modernization projects. 

If you are interested in modernizing your legacy reporting applications, see these previous articles:

About Me

My photo

I am a project-based software consultant, specializing in automating transitions from legacy reporting applications into modern BI/Analytics to leverage Social, Cloud, Mobile, Big Data, Visualizations, and Predictive Analytics using Information Builders' WebFOCUS. Based on scores of successful engagements, I have assembled proven Best Practice methodologies, software tools, and templates.

I have been blessed to work with innovators from firms such as: Ford, FedEx, Procter & Gamble, Nationwide, The Wendy's Company, The Kroger Co., JPMorgan Chase, MasterCard, Bank of America Merrill Lynch, Siemens, American Express, and others.

I was educated at Valparaiso University and the University of Cincinnati, where I graduated summa cum laude. In 1990, I joined Information Builders and for over a dozen years served in regional pre- and post-sales technical leadership roles. Also, for several years I led the US technical services teams within Cincom Systems' ERP software product group and the Midwest custom software services arm of Xerox.

Since 2007, I have provided enterprise BI services such as: strategic advice; architecture, design, and software application development of intelligence systems (interactive dashboards and mobile); data warehousing; and automated modernization of legacy reporting. My experience with BI products include WebFOCUS (vendor certified expert), R, SAP Business Objects (WebI, Crystal Reports), Tableau, and others.