Wednesday, August 19, 2015

O'Reilly Announces Affordable Video Training for Big Data and More

O'Reilly, known for publishing technical books, has announced a new line of affordable training materials.

These "Learning Paths" consist of a group of videos all related to a particular topic:
  • Git source code management (5 courses with 22 hours of video training)
  • Beginning UX Design (3 courses with 10 hours)
  • Design for Mobile (4 courses with 12 hours)
  • Beginning JavaScript (3 courses with 14 hours)
  • Hadoop for Data Scientists (3 courses with 16 hours) 
  • Data Visualization (4 courses with 11 hours) 
  • Data Science with R (5 courses with 24 hours)
  • Beginning Java (4 courses with 26 hours)
  • Python for Data (4 courses with 19 hours)
  • Networking for Sysadmins (3 courses with 17 hours) 


Right now, O'Reilly is offering a special introductory price of $99 for each of the Learning Paths.

Click here to see their website

Thursday, July 9, 2015

Taking the Mystery out of Big Data

Today's companies have the potential to benefit from incredibly large amounts of data.

To shake off the mystery of this "Big Data," it's useful to know its history.

In the not-so-distant past, firms tracked their own internal transactions and master data (products, customers, employees, and so forth) but little else. Companies probably only had very large databases if their industry called for high-volume and high-speed applications such as telecommunication, shipping, or point of sales. Even then, those transactions were all formatted in a standard way and could be saved inside the relational database IBM designed in the 1960s.

This was perfectly fine for corporate computing in the 1970s and 1980s. Then, in the middle of the 1990s, along came the world-wide web, browsers, and e-commerce. Before the end of that decade, a web search engine company named Google was facing challenges as to how to track all of the changes happening all over global web pages. A traditional computing option would have been to scale-up: get a bigger platform, a more powerful database engine, and more disk space.

But spending money wasn't a good option for a little operation like Google; it was well behind the established search engines like Lycos, WebCrawler, AltaVista, Infoseek, Yahoo, and others.

Google decided on a strategy of scaling out instead of up. Using easily-obtained commodity computers, they spread out not only the data but the application processing. Instead of buying a big super-computer, they used thousands of run-of-the-mill boxes all working together. On top of this distributed data framework, they built a processing engine using a common software technique known as Map-Shuffle-Reduce.

Of course, a scale-out paradigm meant Google now had multiple places where a failure could happen when writing data or running a software process. One or more of those thousands of cheap computers could crash and mess up everything. To deal with this, Google added automated data replication and fail-over logic to handle bad situations under the covers and still make everything work as expected for the user.

In 2003 in a published document, Google explained to the world their distributed data storage methods. The following year, they disclosed details on their parallel-processing engine.

One reader of Google's white papers was Doug Cutting, who was working on an Apache Software Foundation open-source software spider/crawler search engine called Nutch. Like Google, Doug had run into issues handling the complexity and size of large-scale search problems. Within a couple of years, Doug applied Google's techniques to Nutch and had it scaling out dramatically.

Understanding its importance, Doug shared his success with others. In 2006 while working with Yahoo, Doug started an Apache project called "Hadoop," named after his daughter's stuffed toy elephant. By 2008, individuals familiar with this new Hadoop open-source product were forming companies to provide complementary products and services.

With our history lesson over, we are back to the present. Today, Hadoop is an entire "ecosystem" of offerings available not only from the Apache Software Foundation but from for-profit companies such as Cloudera, Hortonworks, MapR, and others. Volunteers and paid employees around the world work diligently and passionately on these open-source Big Data software offerings.

When you hear somebody say "Big Data," he or she typically refers to the need to accumulate and analyze massive amounts of very diverse and unstructured data that cannot fit on a single computer. Big Data is usually accomplished using the following:

  • Scale-out techniques to distribute data and process in parallel
  • Lots of commodity hardware
  • Open-source software (in particular, Apache Hadoop)




Large companies with terabytes of transactions stored in an enterprise data warehouse on database computers or applications like Teradata or Netezza are not doing Big Data. Sure, they have very large databases but that's not "big" in today's sense of the word.

Big Data comes from the world around the company; it's generated rapidly from social media, server logs, machine interfaces, and so forth. Big Data doesn't follow any particular set of rules, so you will be challenged when trying to slap a static layout on top of it and make it conform. That's one big reason why traditional relational database management systems (RDBMSs) cannot handle Big Data.

The term "Hadoop" usually refers to several pieces of Big Data software:

  • The "Common" modules, handling features such as administration, management, and security
  • The distributed data engine, known as Hadoop Distributed File System (HDFS)
  • The parallel-processing engine (either the traditional MapReduce framework now known as YARN or an emerging one called Spark)
  • A distributed data warehouse feature on top of the HDFS (HBase for standard reporting needs or Cassandra for active, operational needs)


In addition to the basic Hadoop software, however, there are lots of other pieces. For putting data into Hadoop, for example, you have several options:

  • Programmatically with languages (e.g., Java, Python, Scala, or R), you can use Application Programming Interfaces (APIs)  or a Command Line Interface (CLI)
  • Streaming data using the Apache Flume software
  • Batch file transfers using the Sqoop module
  • Messages using the Kafka product


When pulling data out of Hadoop, you have other open-source options:

  • Programmatically with languages
  • Hbase, Hive with HiveQL, or Pig with PigLatin which all provide easier access than using MapReduce against the underlying distributed file system   
  • Elasticsearch or Solr for searching
  • Mahout for automated machine learning
  • Drill, an always-active "daemon" process, which acts as a query engine for data exploration


But why would you want the complexity of this "Big Data?"

It was obvious for Google and Nutch, search engines trying to scour and collect bytes from the entire world-wide web.  It was their business to handle Big Data.

Any large firm is on the other end of Google; they have a web site which people browse and use, quite probably navigating to it from Google's search results. One Big Data use case for most companies would therefore be to do large-scale analysis of its web server logs. In particular, they could look for suspicious behavior that suggests some type of hacking attempt. Big Data can protect your company from cybercrimes.

If you offer products online, a common Big Data use case would be as a "recommendation engine." A smart Big Data application can provide each customer with personalized suggestions on what to buy. By understanding the customer as an individual, Big Data can improve engagement, satisfaction, and retention.

Big Data can be a more cost-effective method of extracting, transforming, and loading data into an enterprise data warehouse. Apache open-source software might replace and modernize your expensive proprietary COTS ETL package and database engines. Big Data could reduce the cost and time of getting your BI results.  

It's a jungle out there; there's fraud happening. You may have some bad customers with phony returns, a bad manager trying to game the system for bonuses, or entire groups of bad hackers actively involved in scamming money from your company. Big Data can "score" financial activities and provide an estimate of how likely individual transactions are fraudulent.

Most companies have machine-generated data: time-and-attendance boxes, garage security gates, badge readers, manufacturing machines with logs, and so forth. These are examples of the emerging tsunami of "Internet of Things" data. Capturing and analyzing time-series events from IoT devices can uncover high-value insights of which we would otherwise be ignorant.

The real key to Big Data success is having specific business problems you need to solve and on which you would take immediate action.

One of my clients was great about focusing on problems and taking actions. They had pharmacies inside their retail stores and, each week, a simple generated report showed the top 10 reasons insurance companies rejected their pharmacy claims. Somebody was then responsible for making sure the processing problems behind the top reasons went away.

Likewise, the company's risk management system identified weekly the top 10 reasons customers got hurt in the stores (by the way, the next time you are in a grocery store, thank the worker sweeping up spilled grapes from the floor around the salad bar). This sounds simple, but you might be surprised the extreme business benefits obtained from constantly solving the problems from the top of a dynamic Top-10 list. 

Today, your company may be making the big mistake of ignoring the majority of data around it. Hadoop and its ecosystem of products and partners make it easier for everybody to get value from Big Data.

We are truly just at the beginning of this Big Data movement. Exciting things are still ahead. 

Saturday, May 23, 2015

Information Builders Talking Big Data at Summit 2015

In just a couple of weeks, Information Builders will hold its annual user conference in Kissimmee, Florida. Many of the topics at Summit 2015 will deal with Big Data.



Be sure to attend the following sessions:

Tom White, MapR Technologies
Eric Greisdorf, Information Builders
Sunday 2:00PM - 3:00PM
We've all heard that the market is demanding big data solutions that provide real-time insights on their data. With the countless claims of companies solving this problem, how can you discern fact from fiction? And how do these solutions support WebFOCUS in providing real-time insights? Join MapR Technologies, an Information Builders partner and provider of the leading Hadoop distribution, to learn how MapR and WebFOCUS deliver on the promise of true real-time data analytics.


John Thuma, Teradata Big Data Practice
Sunday 3:15PM - 4:15PM
Big data is not a product or a service. Big data is a movement. Understanding how you can leverage big data from within your enterprise may be a challenge. Business intelligence (BI) and data warehousing have matured into technology, process, and people. In this session, we will discuss how BI tools fit into this new big data zoo. The secret is, there is no secret. Don't forget what you already know.


Boris Evelson, Forrester Research
Monday 1:30PM - 2:30PM
Customer insight teams, agile business intelligence (BI) investments, and big data buzz have grown at breakneck rates as organizations try to capitalize on new data with limited success. To break through the data fog, technology leaders need new approaches to systematically link data directly to insight and action. In this session, Mr. Evelson will answer questions such as: (1) What will it take for organizations to start using more of its data for analysis and insights? Today, the average organization uses only 12 percent of its data. (2) Why is business agility a key success factor in the age of the customer, and what impact does it have on your earlier-generation data management and BI investments? (3) What are the key differences between earlier-generation BI and the leading-edge systems of insights? (4) What are the key components of the new-generation systems of insights (processes, people, technology)?


Howard Dresner, Dresner Advisory Services, LLC
Monday 4:00PM - 5:00PM
In this session, veteran industry analyst Howard Dresner shares the latest findings from his annual "Wisdom of Crowds Business Intelligence Market Study." He'll answer questions such as: Who's driving business intelligence (BI) within the organization? Who are the targeted users and how are they changing? Which organizations are most successful with BI and why? What do organizations hope to achieve with BI and how is that changing over time? Which technologies and initiatives are most important, which are climbing, and which are falling? What is the current state of data and how has this changed since last year? How are people sharing BI-derived insights within their organizations and has this improved since 2014? How has user adoption of BI changed in recent years and why?


Mark Smith, Ventana Research
Tony Cosentino, Ventana Research
Tuesday 11:00AM - 12:00PM
In today's applications, systems, and devices, there is data being generated every second of the day that can either overwhelm an organization, or improve its effectiveness. Smart organizations architect their enterprise to integrate and process data from any location, including cloud computing and the Internet of Things (IoT), and at any time to deliver analytics and business intelligence (BI) that improve performance. Using a business perspective on technology and IT is required to bring the right analytics and BI technology and skills to an organization. Moving beyond the hype on agile and self-service BI requires a focus on the metrics and information people need to be effective in their roles and responsibilities. Unveiling the latest in analytics and data research across business and IT, Ventana's Tony Cosentino and Mark Smith will provide best practices and steps to help any organization be effective in using big data for a strategic advantage in analytics and BI.


Stephen Mooney, Information Builders
Tuesday 11:00AM - 12:00PM
Are you interested in learning how iWay leverages the Hadoop ecosystem? Join us for an informative session on big data, where we will show you how iWay is harnessing the power of technologies like Sqoop, Flume, Kafka, Storm, and HDFS to provide a simplified and reliable data integration platform.


Clif Kranish, Information Builders
Wednesday 9:45AM - 10:45AM
Many organizations now rely on Hadoop for their big data initiatives. In this presentation, we will show you how data managed by Hadoop can be staged by DataMigrator and used by WebFOCUS. We will cover how to use the data adapter for Hive and when to use Impala or Tez. You will learn how arrays and other complex Hive data types are supported, and how to take advantage of alternatives to HDFS, such as MapR-FS. We will also introduce the new Phoenix adapter for the NoSQL database HBase, which is distributed with Hadoop.

Thursday, September 25, 2014

Lessons from Doorbell Replacement

How hard can it be to replace the button for a doorbell?

That was the task my wife gave me and it appeared within my abilities. Surely, I could be done within an hour.

My wife had already spent thirty dollars on a brushed metal doorbell that looked nice during the day with a back-lit button you could see at night.

On Saturday, I jumped into action and removed the old buzzer button, a simple twenty-year old plastic box. I pulled the doorbell away from the door frame and undid the twisted wires. Quite brittle, they broke easily. To get more length, I tried to pull the wires out farther, but they would not budge. If I was not careful, I would have to instead buy a wireless doorbell.

I examined the new doorbell as I took it out of the package. While the old one had just laid on top of the wood, this one had an inch-long metal piece that was supposed to fit inside a 5/8-inch wide hole. A search of my tool box came up empty for that particular drill bit. Plus, I didn't really believe the doorframe was deep enough.

"Look," I explained to my wife, "this isn't going to work" and presented various reasons to discard her plan.

With open disappointment, she agreed; I put the old button back in place. Ding dong, it still worked.

At the home improvement store, we bought a different thirty-dollar brushed-metal doorbell button; this one could lay flat against the door frame but did not have a light, a compromise.

On the next Saturday, I went back to work. How hard could this be?

I once again removed the old doorbell button. This new one had two pieces: a back that attached to the door frame with screws, and a front that snapped onto the back. The wires gave me grief, but I was finally able to attach the new doorbell button.

This one would not lay flat; some type of plastic protrusion on the back always got in the way. The last doorbell ringer needed a hole, so I considered that as a potential solution for this situation. I got out a power drill and started poking little holes in the door frame.

Ultimately, I was able to get the button to lay flat. When I tried to snap on the front, however, it would not close; something was preventing the snap from catching. I completely removed the doorbell, busted some more holes behind it, hooked it up again, took it off, and repeated several times.

By now I was frustrated. It wouldn't snap close, so I decided to try to keep it shut with some Gorilla Glue.

No luck. With brown glue spots all over the door frame and a ruined doorbell, I had failed. Trying to remove the Gorilla Glue mess, I scrubbed off patches of door frame paint. I tossed the thirty-dollar buzzer in the trash and once again returned the old plastic one to its proper place. Ding dong, it still worked.

Okay, I needed to stop and think about this. What approach was best? I still had the original back-lit doorbell buzzer that my wife wanted. I needed the right tools to do the job.

I made another trip to the home improvement store and bought a 5/8-inch hole drill bit with diamond grit (coincidentally, another thirty dollars). While there, I picked up white paint to cover the Gorilla Glue fiasco.

On the third Saturday, I removed the old doorbell buzzer, drilled the hole, and put in the new buzzer; it just barely fit. With some silicon chalking around the button and some white paint to cover mistakes, all was good. Ding dong!

After replacing this legacy piece of hardware, here are some of my personal insights:

  • I started without assessing the situation 
  • I never had a proper plan  
  • Having never done this before, I did not have the proper know-how, expertise, or skills
  • I did not have the proper tools to do the job 
  • It took longer than expected (especially without plan, skills, or tools)
  • It cost more than expected 
  • I could have saved by hiring a professional



My personal experience with a doorbell buzzer is similar to companies replacing their legacy business systems. How hard could it be, for example, to get rid of old reporting applications and convert all of the existing procedures to newer technology?

Upper management already bought the new BI product, so you just assign the conversion effort to the college intern. How hard could it be? Surely, she can knock it out quickly.

Ding dong: no up-front assessment, no planning, no accurate expectations as to time and cost, no specialized skills or tools, minimal progress every Saturday.

You may consider a legacy system modernization initiative as a one-off project your team can just fumble through and then forget about. That can be the painful approach and you may have to cover up mistakes afterwards. Before you do that, consider there are professionals who have done modernizations before and who have developed methodologies and automated software to reduce the time, cost, and risk.

Don't be a ding dong. 

Wednesday, June 11, 2014

Wendy's Wins Big with BI/Analytics

Congratulations to my friends at The Wendy's Company for being honored yesterday with Information Builders' 2014 Award of Distinction.

IB wrote this about Wendy's enterprise dashboard:

"The Wendy’s Company, the world’s third largest quick-service hamburger company, created a BI portal and dashboard environment that integrates an enterprise point-of-sale system to deliver targeted reports with drill-down capabilities for decision-makers at every level of the company. WebFOCUS helps managers control costs and make informed decisions that improve the bottom line. Thousands of international and domestic franchises currently use WebFOCUS dashboards, helping Wendy’s to improve profit margins at hundreds of restaurants." 


At the beginning, the idea was for an "above-the-store" executive portal where a few individuals could see all of the company's KPIs related to revenue, speed of service, costs, and customer satisfaction. However, it did not take long before thousands of decision makers at different levels of the QSR organization asked for access to that valuable information.

For more information, see IB's press release

Sunday, March 23, 2014

Developing BI/BA Web and Mobile Applications with WebFOCUS

WebFOCUS for BI/BA Application Development

WebFOCUS is a powerful enterprise Business Intelligence and Analytics platform well-suited for today's complex and rapidly-changing environment. The software is produced by privately-held American software vendor, Information Builders.

As part of WebFOCUS, Information Builders provides two BI/BA application development tools, one for business developers and another for IT technical developers:

  • BI Portal and InfoAssist for business developers (web-based and mobile)
  • Windows-based Developer Studio for IT technical developers 

With WebFOCUS, companies can provide business users with self-service intelligence and analytics:

  • Dashboards and scorecards 
  • Self-service guided ad-hoc data exploration
  • Mobile BI with right-time data on any device 
  • InfoApps to easily analyze and manipulate information
  • Deep integration with desktop products such as Microsoft Excel and Adobe PDF 
  • Integration with open-source software such as R statistical programming language and Python 
  • Dynamic report scheduling and distribution, with real-time alerts
  • Integration with enterprise data and information management  


Note: Within the next month or two, Information Builders will release the next generation of Developer Studio which will be called the Application Studio. The new product’s “look and feel” will be consistent with that of InfoAssist. For example, Application Studio will also have a ribbon-based user interface.

BI Portal and InfoAssist

The WebFOCUS BI Portal enables business users to easily create and share sophisticated portals, launch pages, reports, and graphs hosted within the corporate WebFOCUS environment or in the cloud. The users have no software to install.

InfoAssist, the report and graph layout tool, is appropriate for business users as well as for light-weight IT development. Information Builders will license BI Portal/InfoAssist based on a certain number of users (e.g., one hundred) or on an unlimited basis.

WebFOCUS 8 comes with sophisticated multi-tenant security enabling companies to open up web and mobile BI to not only their internal employees, but also to external partners and customers.

Picture: BI Portal with InfoAssist




Windows-based Developer Studio (IDE)

The Developer Studio is a Windows-based IDE enabling IT technical developers to build complex, interactive BI applications hosted within the corporate WebFOCUS environments or in the cloud. 

However, the Developer Studio product is not an appropriate tool for business users. Instead, Information Builders provides for business users the web and mobile InfoAssist product.

Picture: Developer Studio IDE 





Developer Studio Features

The Developer Studio provides an IT BI/BA developer with a variety of software development features, including: 
  • Metadata Management: tool for generating and maintaining the metadata layer 
  • HTML Composer: layout tool for designing and creating web launch pages 
  • Procedure Viewer: layout tool to visually display multiple-step procedural logic 
  • Report Painter: layout tool for designing and creating interactive and dynamic reports 
  • SQL Report Wizard: tool for building reports using SQL requests 
  • Define Dialog: tool for creating virtual columns using business rules 
  • Join Painter: tool for logically connecting tables across the enterprise 
  • HTMLForm: tool for embedding HTML commands inside WebFOCUS procedures 
  • Match Wizard: tool for logically matching tables across the enterprise for either join-like functionality or exceptions
  • For creating graphs, Developer Studio utilizes the InfoAssist product  

Developer Studio Editions

IT technical developers can purchase and install one of two different “editions” of the Developer Studio product:
  • Report Developer Edition
  • Full Edition: includes MAINTAIN development and full personal WebFOCUS image 

Information Builders offers a lower-priced Edition for IT technical developers who do not need to develop online database maintenance applications or have their own personal WebFOCUS environment. Typically, few IT technical developers need all of the features within the Full Edition.

Report Developer Edition

The Report Developer Edition of Developer Studio has all of the features needed to build web and mobile BI applications, including the Procedure Viewer, HTML Composer, Report Painter, Join Painter, Financial Modeling Language Painter, Reporting Server Management, and Change Management features. 

An IT technical developer can download and install this edition of the Developer Studio on his or her workstation for a reasonable one-time license fee plus on-going annual maintenance fee.

Full Developer Studio Edition

The Developer Studio IDE can also be purchased in the “Full Edition” version with its own personal, stand-alone WebFOCUS environment—a WebFOCUS Client Tier (web server, Java app server, client components, security repository, etc.) and a WebFOCUS Reporting Server. This provides a personal “sandbox” for testing WebFOCUS functionality. 

In addition, the full edition of Developer Studio comes with the MAINTAIN development product for creating web and mobile database maintenance applications. This full edition is limited to those IT technical developers who perform MAINTAIN development or to those few individuals with a valid business case for a personal BI environment.

Note: The IT technical developer’s version of Developer Studio must be kept in sync with the other components of WebFOCUS environments. In other words, the developer should not install a client release higher than the software release of the WebFOCUS web tier and BI server.

GUI Code Generators

These WebFOCUS tools enable a visual application development process by providing layout tools that can translate the design and generate the necessary computer programming instructions.

When developing BI applications with WebFOCUS, several languages are involved:
  • InfoAssist: generates the FOCUS 4GL 
  • Developer Studio’s Report Painter: generates the FOCUS 4GL 
  • Developer Studio’s HTML Composer: generates HTML, JavaScript, and XML 

Being a business user tool, InfoAssist does not provide access to the generated instructions. Developer Studio, on the other hand, lets the IT technical developer get to a text editor from where he or she can modify the generated instructions.

There are really only two valid situations where the IT technical developer needs to manually type instructions: the FOCUS 4GL’s Dialogue Manager and the HTML’s JavaScript. Both are procedural scripting languages that accompany the main logic.

There is no reason for an IT technical developer to manually change any of the actual HTML/XML logic generated by the Developer Studio’s HTML Composer other than to add JavaScript. 

Likewise, the IT technical developer must be careful when modifying the 4GL non-procedural code used for creating reports, graphs, and extract files. It is possible to make manual changes not recognized by the Developer Studio’s parser which will cause the layout tools to fail. Where this might happen is when the developer creates a highly-dynamic layout that cannot be painted.

WebFOCUS technical developers should consciously avoid developing BI/BA applications that cannot be opened in the graphical layout tools. While necessary in some complex situations, having “non-paintable” code dramatically increases the skill-set requirements, shifting from individuals who can use visual GUI development to those who must know how to hand-code the 4GL instructions.

Should You Use WebFOCUS?

A list of Information Builders' customers looks like the "Who's Who" of global businesses and government agencies. For over a decade, Gartner has included Information Builders in the Leader's quadrant for enterprise BI products. 

WebFOCUS runs on a variety of platforms (e.g., Windows, UNIX, Linux, and mainframes) and can scale to handle millions of users. It provides super-secure web access to sensitive data, which can be stored in many diverse formats (Information Builders has always been a well-known provider of enterprise data adapters through their iWay Software brand, partnering with many of the other leading software vendors). 

WebFOCUS provides business users with easy-to-use yet robust BI/BA tools. It provides IT developers with a sophisticated IDE to produce robust self-service web and mobile applications. 

WebFOCUS is highly dynamic. Unlike old-fashioned report writers like Crystal Reports which were limited to a single static layout, a WebFOCUS procedure can dynamically restructure itself to produce thousands of different report versions.  

Legacy tools can be modernized into WebFOCUS. For example, SQL routines can be directly embedded inside the WebFOCUS language to leverage legacy assets. For many other reporting tools, there are utilities to automatically assess and transform into WebFOCUS in order to reduce the time and risk of the BI modernization initiative. 

For more information, visit the WebFOCUS website

Saturday, March 1, 2014

How to Ask Smart BI/Analytics Questions

Many people rush when posting a "how-to" question on a software vendor case, a LinkedIn group, or some other web forum, as well as when sending an e-mail question to associates. Getting a good answer to the question can be dramatically improved by first properly structuring the request.

For a great article on how to ask smart questions, see author Eric Raymond’s article on that topic. Eric has written several well-known books on open-source software and the Linux operating system, one of his most famous being “The Cathedral and the Bazaar.”

Eric advocates that “Google is your friend” and you must always search before asking for help. 

He recommends that, before posting a question in a web forum, you try to find the answer yourself by going through these steps:

  1. Search the archives of the forum to which you plan to post
  2. Search the web
  3. Read the manual
  4. Read a FAQ
  5. Inspect or experiment
  6. Ask a skilled friend
  7. If you're a programmer, read the source code (e.g., the WebFOCUS JavaScript engine)

Eric also makes the point that “All diagnosticians are from Missouri,” meaning that the people reading your answer will ask you to “Show me your problem’s symptoms in chronological sequence.” 

Following Eric's advice should help you get answers to your BI/Analytics questions.

Have you been frustrated by people asking for help (or perhaps, have you found yourself guilty of asking vague questions)? 

Sunday, February 16, 2014

Tableau Software Continues to Win Data Visualization Fight

The three big contenders for BI/Analytics data visualization are Tableau Software, QlikTech's QlikView, and TIBCO's Spotfire.

Using the job market as an indicator shows that demand for Tableau is growing much faster than that for the other two.

Picture (2014 February 16)



Do you agree with this simple assessment? Post your comments here if you have insight into the Data Visualization software market.

For more information on these three software products and some additional details on Tableau, see my earlier blog article.

Tuesday, February 4, 2014

Closer Look at R Statistics Scripts Called by WebFOCUS


Part IV
Let's take a closer look at the R scripts WebFOCUS generating within my sample text-mining application. 



In this simple application, WebFOCUS generates one of three graph options: box plot, histogram, or a plot, depending on the user selection.  

CRAN
If you do not have R already installed, you can get a copy and full implementation instructions from the Comprehensive R Archive Network--a kind of App Store for all things R. The hyperlink for the CRAN is: http://www.r-project.org/.

For WebFOCUS to be able to communicate easily with R, we want to install R on either the same computer as the BI product or within its network of accessible drives. 



When you code in the R programming language, you are commonly using assignment statements and functions. 

For example, I want to assign the values in my WebFOCUS-generated text file into a named R data structure. Assignments are done with a symbol that puts together a left-caret with a dash. 

To pull out those values, I must first read the text file using a read function. Here is the basic command:




I want a data structure called "keywords" to hold my WebFOCUS file. The read command has a variety of options; I am using the "table" option to read a simple tabular structure. My table has a row identifying the column headers, so I inform read of that with a parameter called header. My table values are separated by commas, so I tell read about that using the sep parameter.

After that command, R has loaded my WebFOCUS data table into its memory. 

Before I actually create an output graph, I need to tell R where to store it. Had I been working with R interactively, the graph would just show on the screen. Here, I need to save the graph into a bitmap file and later in the process tell WebFOCUS to display it on the screen. 

I do that with the R png command (PNG stands for Portable Network Graphics). Here is a sample:




Along with the output file's full path, I also include some optional parameters to tell R how tall and wide to make the graph. 

Two R commands down and just one more to make the graph! 

Boxplot
If the user requested a box plot, I would use the R function called, naturally enough, boxplot. My sample statement looks like this: 




The first parameter for the boxplot function tells it to use a column called "Count" in my keywords data structure. Notice that R uses the US dollar sign to separate the structure and column name (in this example, keywords$Count). I am also passing in some optional variables to make the box plot lay down horizontally, give R to option to vary its width, and set the X axis label to say "Search Counts." 

The boxplot function gives me a bitmap graph such as this:




Histogram
The other R graph functions look very similar. For example, my histogram command is simply: 




The difference here is a parameter called main where I am blanking out the main title. My output histogram might look like this:




Plot
Next is the simple graph of points plotted, whose R function looks like this: 




This gives me a graph such as the following:





Quick Clean-Up!
Before the R script ends and returns control to WebFOCUS, I am going to tell R to close the output graph file (otherwise, R controls the picture and I cannot open it). To do that, I call a function called "dev" (short for device) and turn off the graphics device connected to the output. It is simply this:




One Last Thing
The three types of graphs used above are all standard R features. Some, however, require that an add-on "package" to be installed and referenced. You can read about this on the CRAN website.   

For example, I might want WebFOCUS to create a file of keyword counts and then produce a word cloud. Within R, I would need to install the package containing that feature. Once installed, WebFOCUS needs to reference that package with the "library" function. 

Here is an example of WebFOCUS generating a word cloud R script: 









If you do have R scripts that use a package, you must have WebFOCUS start R using the R_LIBS parameter to identify where you installed the package. See an earlier article on how that works

Do You Have WebFOCUS and R? 
I hope you enjoyed this multi-part article on integrating WebFOCUS and the R statistical programming language. Do you have both products and, if so, do you use them together? Please leave a comment and let us know your experience. 

As always, if I can ever be of service, just let me know.

About Me

My Photo

I am a project-based consultant, helping data-intensive firms use agile methods and automation tools to replace legacy reporting and bring in modern BI/Analytics to leverage Social, Cloud, Mobile, Big Data, Visualizations, and Predictive Analytics. For several world-class vendors, I led services teams specializing in providing software implementation and custom application development. Based on scores of successful engagements, I have assembled proven methodologies and automated software tools.

During twenty years of technical consulting, I have been blessed to work with smart people from some of the world's most respected organizations, including: FedEx, Procter & Gamble, Nationwide, The Wendy's Company, The Kroger Co., JPMorgan Chase, MasterCard, Bank of America Merrill Lynch, Siemens, American Express, and others.

I was educated at Valparaiso University and the University of Cincinnati, graduating summa cum laude. In 1990, I joined Information Builders, the vendor of WebFOCUS BI and iWay enterprise integration products, and for over a dozen years served in branch leadership roles. For several years, I also led technical teams within Cincom Systems' ERP software product group and the custom software services arm of Xerox.

Since 2007, I have provided enterprise BI services such as: strategic advice; architecture, design, and software application development of intelligence systems (interactive dashboards and mobile); data warehousing; and automated modernization of legacy reporting.