Saturday, October 27, 2012

Big Data is about to Change Your Life

Big Data is in the news because it is going to change our lives, both as consumers and workers.

To see what IBM is saying about Big Data, get their new study.

IBM says:
"Ultimately, big data is a combination of these characteristics [of volume, variety, velocity, and veracity] that creates an opportunity for organizations to gain competitive advantage in today’s digitized marketplace. It enables companies to transform the ways they interact with and serve their customers, and allows organizations – even entire industries – to transform themselves. Not every organization will take the same approach toward engaging and building its big data capabilities. But opportunities to utilize new big data technology and analytics to improve decisionmaking and performance exist in every industry."

Companies will use Big Data to better understand their customers and then to dynamically provide personalized offerings to each consumer.

Building these Big Data capabilities will take money and resources. The research people at Gartner think Big Data will drive over $220 billion of IT spending this year and in 2013.

They predict that before 2015 Big Data will create 4.4 million jobs. The US is not going to be able to provide enough talent, so only one in four of the new Big Data jobs is expected to go to Americans.

Big Data needs a new type of technologies. Gartner also pointed out that today's standard computer infrastructures are quickly becoming obsolete. We are moving to environments consisting of mobile and cloud platforms.

The ground is definitely shifting under our feet. On the positive side, those with technical backgrounds have an excellent ground-floor opportunity to use our skills in this new Big Data world. On the negative side, without ongoing effort to acquire new skills the world could pass us by.

Thursday, October 25, 2012

Using R to Analyze Hadoop Jobs

Using summarized data from the Indeed job board, I tried my hand at some R graphs. Considering the hot nature of Big Data, I chose job openings in the United States for Hadoop.

Here is a summary of the top jobs from Indeed:

City State JobPostings
San Francisco, CA California 568
New York, NY New York 402
Seattle, WA Washington 179
Sunnyvale, CA California 174
Palo Alto, CA California 158
San Jose, CA California 134
Mountain View, CA California 124
Boston, MA Massachusetts 119
Annapolis Junction, MD Maryland 119
Reston, VA Virginia 107
San Mateo, CA California 105
Chicago, IL Illinois 95
Redwood City, CA California 94
Los Angeles, CA California 89

It appears that Indeed didn't give me a complete summary of all of the Hadoop jobs, just the top fourteen cities. Oh well, let's look at those.

To visualize the data, I played with a variety of R plot options, but ultimately stopped with a graph using the Cleveland Dot Plot.

The graph shows there were eight cities within California posting a large number of jobs for Hadoop experience (Indeed summarized the cities with at least 89 postings). Of those eight, the largest volume of opportunities--almost 600 postings--were in San Francisco (not labeled on the graph, but you can spot it easily in the original data table).

Within these top US locations, no state other than California had a large number of Hadoop opportunities outside of one major city. As you might guess, these are happening places, such as NYC, Boston, and Seattle.

The largest clustering of Hadoop jobs were in San Francisco and NY City. Behind that were the California tech hot spots such as Sunnyvale, Palo Alto, San Jose, and Mountain View.

Working with R is slightly different from other programming languages. Instead of creating a program that you just run and get results, with R you interact within a workspace and examine the results as you go along.

To produce this graph, I first created the tab delimited file of Indeed job postings you saw above. Then, I had to load that data into the R workspace's memory. Here are the command for that:

setwd("C:/Users/Doug/My Documents/RLibrary/") 
HadoopJobs<-read.table("HadoopJobs2012Oct.txt", header=TRUE)

The first command sets my R working document. The second creates an object called "HadoopJobs" in memory which now contains the job posting counts. With that done, I just needed to produce the dot plot graph (showing job posting counts grouped by US states) and put a title on the top:

dotchart(HadoopJobs$JobPostings, groups=HadoopJobs$State)
title("Hadoop Jobs by State (2012 Oct)")

I find it impressive that R is able to do all of this work in just four simple statements. For full disclosure, I did have to add a couple of other statements. The ones I just showed you put the results on the screen for me to see; in order to save the results to a JPEG picture file so that you could also view it, I had to reissue the graph commands sandwiched between the following two R functions:

jpeg(file="HadoopJobs.jpg") the dot chart commands again...

If you don't have a copy of R, be sure to download a free open-source copy at:

We may have to wait a while before demand for Big Data file repositories comes to Midwestern cities like Cincinnati, Ohio (in case you are interested, there are six Hadoop job postings here in town). 

Tuesday, October 16, 2012

Great Time to be a Mobile App Developer

You don't have to be an advanced statistician to spot a trend in this Indeed job posting graph:

The easy translation of these numbers: it's a great time to be a mobile app developer. 

Monday, October 15, 2012

C Programming Languages Continue Popularity

If you are just starting a software development career and wonder if you should specialize in any particular programming language, check out the Software Index from TIOBE, the software quality tracking firm.

TIOBE states its purpose as:
"The index can be used to check whether your programming skills are still up to date or to make a strategic decision about what programming language should be adopted when starting to build a new software system."

TIOBE uses results from eight different search engines to find and rank the occurrences of fifty different computer programming languages. TIOBE classifies the languages as either "A" (mainstream) or "B" (non-mainstream) with minus signs for differentiation (e.g., "A-" and "A--").

The programming language that has consistently been the most popular for decades is C. As of October 2012, in fact, the top five languages are all C-related:
  1. C (the classic language from Bell Labs
  2. Java (a scaled-back version of C++ to be safe for the web)
  3. Objective-C (C/Smalltalk from Steve Jobs' NeXTSTEP days, now used for Apple development)
  4. C++ (the object-oriented version of C)
  5. C# (the Microsoft managed version of C)

Last year, TIOBE named Objective-C its "Programming Language of the Year" for its rapid rise up the charts. Back in 1997, Objective-C was not even on the list. In 2007, it ranked 44th out of 50. Today, it is ranked at number three.

If you want to work on mission-critical, high-speed, server-based software, you can consider a C/C++ specialty. If you want a web-based server specialty, then Java (with a Linux/Unix alignment) or C# (with a Microsoft alignment) are good choices. If you want to develop mobile applications, you want Objective-C.

Just to back up TIOBE's rankings, be sure to look at the Indeed job trending as well. Here are the five programming languages on a chart showing them as a percentage of the total job postings:

Objective-C is a small percentage of the total jobs, so the graph above does not do it justice. If you separate it from the pack, however, you can see Objective-C's rapid growth in popularity.

This trend should of course look very similar to that for the Apple iPhone and iPad:

It looks like you won't go wrong if you pick a C language.

16 Oct 2012 Update: for a great summary of the TIOBE report, see this eWeek article

Goodreads: Blue Ocean Strategy

Blue Ocean Strategy: How To Create Uncontested Market Space And Make The Competition IrrelevantBlue Ocean Strategy: How To Create Uncontested Market Space And Make The Competition Irrelevant by W. Chan Kim

My rating: 2 of 5 stars

The authors (W. Chan Kim and Renee Mauborgne) provide case studies on how some companies left their "bloody-red" oceans of competition for completely open blue oceans where they were unique.

Some are well-known business stories, such as Southwest Airlines becoming a low-cost provider. However, the book provides details into Southwest's underlying business strategies that may not be well known. Other case studies gave new insight into various companies and their product strategics. One interesting story, for example, was [yellow tail], the Australian wine company that stepped outside of the traditional wine marketing with a simpler offering targeting casual drinkers.

The book covers the "strategic canvas" for analyzing competitors and planning a new market space. It then outlines six different principals for creating your own "blue ocean" strategy:

1) Reconstruct market boundaries
2) Focus on the big picture, not on the numbers
3) Reach beyond existing demand
4) Get the strategic sequence right
5) Overcome key organizational hurdles to make blue ocean strategy happen in action
6) Build execution into strategy from the start to build organizational trust and commitment

View all my reviews

Sunday, October 14, 2012

Goodreads: How to Measure Anything

How to Measure Anything: Finding the Value of How to Measure Anything: Finding the Value of "Intangibles" in Business by Douglas W. Hubbard

My rating: 3 of 5 stars

Douglas Hubbard provides an excellent layperson's overview of business statistics and analytics. The first half of the book is great; I skimmed through the second half that seemed to only contain "oh, by the way" topics.

Hubbard is able to take the dreaded college Stats 101 course and cover the material simply in a way that explains "why" we do it without focusing on the scary mathematical "how."

View all my reviews

Saturday, October 13, 2012

Goodreads: Mining the Social Web by Matthew Russell

Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media SitesMining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites by Matthew A. Russell

My rating: 2 of 5 stars

This short book might have more appropriately been titled, "How I Personally Mined the Social Web using Python."

Without giving too much explanation, the author provides samples of his Python routines. Where another author might spend an entire chapter (if not the whole book) explaining a technological topic, Russell just makes a comment and moves on to his code examples. If you are comfortable with, "Install this, run that command, and now copy my code..." then this is an okay book.

This is basically a Python cookbook with Social Media recipes. It covers APIs useful for Google e-mail, Twitter, Facebook, and LinkedIn. As such, it was interesting reading to see how it is done, but this is not a primer on how to do it.

View all my reviews

Sunday, October 7, 2012

Today's Biggest Success Tip: Predictive Analytics using Big Data

Are you looking to change your life? If you answered "Absolutely!" then prepare yourself to take advantage of one of today's biggest opportunities: Predictive Analytics using Big Data.

Trends such as globalization, economic uncertainty, and rapidly changing technology have shaken up our society. Right now, we face several major disruptions within computer technology: Cloud, Personalized Assistants (tablets, smartphones, and other mobile devices), Social Media, Big Data, and Analytics.

Ten years from now, all of these trends will have transported us into a completely new business environment.

A connected population of billions of people generates a massive amount of data which, properly digested and analyzed, will lead to an explosion of knowledge. Smart people will leverage this knowledge to take beneficial action, either for themselves personally or for the good of others.

Technologies such as Hadoop and Map/Reduce emerged to enable storing large amounts of unstructured data. Companies will combine their traditional data warehouses containing structured enterprise data with unstructured mountains of external data and store the results in the cloud (using services such as Amazon AWS/EC2, Rackspace, Microsoft Azure, or Google App Engines).

After storage comes the application of algorithms to make sense of the data. Computer programming languages (such as Java, Python, C/C++, and R) that can perform large data set techniques (e.g., regression, classification, natural language processing, clustering, collaborative filtering, and machine learning) will be enhanced with toolkits (such as Weka, OpenNLP, and NLTK) and evolve into entire frameworks for analyzing Big Data, finding patterns, reducing uncertainties, and enabling new decisions and actions.

With all these technologies, our challenge will then be to identify the questions we want to ask of the data.

If you could solve a world problem (or make millions of dollars) if you had answers to just a few difficult questions, wouldn't you start today to gather the data?   

About Me

My photo

I am a project-based software consultant, specializing in automating transitions from legacy reporting applications into modern BI/Analytics to leverage Social, Cloud, Mobile, Big Data, Visualizations, and Predictive Analytics using Information Builders' WebFOCUS. Based on scores of successful engagements, I have assembled proven Best Practice methodologies, software tools, and templates.

I have been blessed to work with innovators from firms such as: Ford, FedEx, Procter & Gamble, Nationwide, The Wendy's Company, The Kroger Co., JPMorgan Chase, MasterCard, Bank of America Merrill Lynch, Siemens, American Express, and others.

I was educated at Valparaiso University and the University of Cincinnati, where I graduated summa cum laude. In 1990, I joined Information Builders and for over a dozen years served in regional pre- and post-sales technical leadership roles. Also, for several years I led the US technical services teams within Cincom Systems' ERP software product group and the Midwest custom software services arm of Xerox.

Since 2007, I have provided enterprise BI services such as: strategic advice; architecture, design, and software application development of intelligence systems (interactive dashboards and mobile); data warehousing; and automated modernization of legacy reporting. My experience with BI products include WebFOCUS (vendor certified expert), R, SAP Business Objects (WebI, Crystal Reports), Tableau, and others.