Here is a summary of the top jobs from Indeed:
|San Francisco, CA||California||568|
|New York, NY||New York||402|
|Palo Alto, CA||California||158|
|San Jose, CA||California||134|
|Mountain View, CA||California||124|
|Annapolis Junction, MD||Maryland||119|
|San Mateo, CA||California||105|
|Redwood City, CA||California||94|
|Los Angeles, CA||California||89|
It appears that Indeed didn't give me a complete summary of all of the Hadoop jobs, just the top fourteen cities. Oh well, let's look at those.
To visualize the data, I played with a variety of R plot options, but ultimately stopped with a graph using the Cleveland Dot Plot.
The graph shows there were eight cities within California posting a large number of jobs for Hadoop experience (Indeed summarized the cities with at least 89 postings). Of those eight, the largest volume of opportunities--almost 600 postings--were in San Francisco (not labeled on the graph, but you can spot it easily in the original data table).
Within these top US locations, no state other than California had a large number of Hadoop opportunities outside of one major city. As you might guess, these are happening places, such as NYC, Boston, and Seattle.
The largest clustering of Hadoop jobs were in San Francisco and NY City. Behind that were the California tech hot spots such as Sunnyvale, Palo Alto, San Jose, and Mountain View.
Working with R is slightly different from other programming languages. Instead of creating a program that you just run and get results, with R you interact within a workspace and examine the results as you go along.
To produce this graph, I first created the tab delimited file of Indeed job postings you saw above. Then, I had to load that data into the R workspace's memory. Here are the command for that:
The first command sets my R working document. The second creates an object called "HadoopJobs" in memory which now contains the job posting counts. With that done, I just needed to produce the dot plot graph (showing job posting counts grouped by US states) and put a title on the top:
title("Hadoop Jobs by State (2012 Oct)")
I find it impressive that R is able to do all of this work in just four simple statements. For full disclosure, I did have to add a couple of other statements. The ones I just showed you put the results on the screen for me to see; in order to save the results to a JPEG picture file so that you could also view it, I had to reissue the graph commands sandwiched between the following two R functions:
..do the dot chart commands again...
If you don't have a copy of R, be sure to download a free open-source copy at: http://www.r-project.org.
We may have to wait a while before demand for Big Data file repositories comes to Midwestern cities like Cincinnati, Ohio (in case you are interested, there are six Hadoop job postings here in town).