Saturday, February 12, 2011

Wordle Beautiful Word Clouds

Wordle is an interesting web application for producing pictures known as "word clouds" or "tag clouds".

In a word cloud, text objects are associated with a number.  The bigger the number, the larger the word's font in the picture. A common use would be to associate each word with its frequency in a text (for example, those found in the Kings James Bible) for a visual representation.

As an example, here is a word cloud of my recent blog postings (I'm surprised I use the word "just" so much). Click on the picture for a bigger and dynamically generated version.
Wordle: BI-Software.Blogspot.com 001
In business intelligence applications, being able to visualize content is extremely important. You could stare at numbers all day long and not figure out a thing. But add some color and that might change--bad things in red and good things in green pop off the page at you.

Size is also useful to be able to easily compare data items. Often in reports, designers add a graphic bar whose length corresponds with the figure next to it. The bigger the number, the longer the bar (which could be colored as well to add a judgement of good or bad). These "peer graphics" help the user easily compare different objects on the report, such as Total Agricultural Revenue of Oklahoma versus that of New Jersey.

Visualization techniques of colors and relative size is already common in BI products, but a word cloud application such as Wordle could add an additional user interface feature.

I can see word clouds as front-ends for BI applications analyzing either text or data. Why limit data representation to columns and rows?

As a simple example, sales executives could view their geographic regions on the word cloud based on the size of each territory's current revenue, achievement of quota percentage, or whatever measure he or she selects as the criterion.

For interactivity, we could activate hyperlink hotspots on each of the terms in the cloud and allow the user to drill down to BI details or drill to other related business topics.

Check out Wordle for more information as well as some beautiful examples stored in the public gallery.

[Duh! Wordle uses Java on your desktop to dynamically display the word clouds. That doesn't work on the Apple iPhone or iPad. If that is a problem for your website, just save the cloud as an image.]


Update on 2011 April 21. SM Reilly, a blog reader with a background in qualitative market research, had questions about the value of these word clouds. SM wrote:

I am a qualitative market research consultant investigating the value of these for my clients.
An extra step or two can eliminate YOUR words from your analysis, fyi.
So far, I cannot ascertain that the visualizations are presented such that words are proximate to other words based on their relationship to one another so you can't learn anything about why "dishonest" is near or far from "bankers."
Further, it won't work for short phrases, so "good value" and "bad value" are just analyzed as "good," "bad" and "value" and the good and bad are lumped together with those same words used in other contexts, e.g., "good luck" and "good time." Not all thoughts and ideas are expressed as single words. Also, the richness of the English language provides many synonyms for the same concept. Just look up the word "nice" in a thesaurus.
It cannot distinguish between the same word with different meanings, whether as noun vs verb (weather, wait, buy).
It cannot correct for spelling, so if the people contributing content are poor spellers, then "recieve and receive," "thru and through" aren't associated.
With all of these limitations, can you provide me with support for the value of a word cloud? It's a serious question. 


I appreciate SM's questions. Let's take a look at the issues he or she raises.

To create a word cloud, you provide the drawing program with a list of words or phrases and their frequencies. The program draws the cloud, changing the size of each word according to the frequency count. The more occurrences, the bigger the font.

That is all that piece of software is meant to do. That program is doing its job exactly as it was coded. Nothing more, nothing less.

A program that was written to produce a word cloud will be good at only that specific task. Outside of drawing a cloud, it is basically dumb; everything else is outside of its intentionally limited scope of expertise. Like SM says, the word cloud program does not know anything about the list of words you provided it; it just reads them and draws a picture.

A computer program that is limited in its function is far from useless. In fact, we intentionally design modules to perform very specific functions. Like Legos, we then put these modules together to make something bigger than the sum of the parts.

For example, consider the software module inside a e-book reader that only knows how to take text and display it on the screen. Without that limited functionality, the entire device is worthless.

So yes the word cloud drawing program is limited in functionality. What that means, is that you are responsible for providing the word cloud program with a "smart" list.

You need use another computer program that can perform intelligence searches through a textual document. Inside that module, you can put any type of smarts that you need. For example, you can fix all of the spelling errors before counting terms. You can convert Japanese to German. You can identify sentence parts, such as subjects, objects, and verbs. You can consolidate synonyms.

It is up to you to make a smart list to give to the word cloud program.

Here is a personal example. For a major financial services company, I analyzed thousands of legacy computer programs to identify similarities in purpose. Basically, my client needed to spot functional redundancies so that the programs could be replaced efficiently.

I wrote a parsing program to scan all of the textual contents, examine any available clues, and store conclusions into a database. I looked at the names of the programs, figured out what data objects the programs were using, checked the names of the folders where the programs were stored, looked at which users accessed the programs, and so forth.

The process was like a Google index search, but instead of just counting keywords I had to consider the meanings of acronyms, abbreviations, and synonyms. My logic had to translate various words and consolidate things to a small number of abstract topics.

With all of those hints, I consolidated thousands of programs into twelve "buckets." When all was said and done, I could tell the client: "These 350 programs look like they deal with Fraud. These 500 look like Sales Reporting, these 625 seem to deal with Customer Service," and so forth.

Now that list of twelve terms and their counts might look very simple, but the work that went into the list was rather hard. The client could now visually see similarities and redundancies within thousands of complex documents and plan accordingly.

SM Reilly, I hope this helps clarify the use of word clouds. If you are interested in learning more, contact me.


You may also be interested in these articles:

2 comments:

Anonymous said...

I am a qualitative market research consultant investigating the value of these for my clients.

An extra step or two can eliminate YOUR words from your analysis, fyi.

So far, I cannot ascertain that the visualizations are presented such that words are proximate to other words based on their relationship to one another so you can't learn anything about why "dishonest" is near or far from "bankers."

Further, it won't work for short phrases, so "good value" and "bad value" are just analyzed as "good," "bad" and "value" and the good and bad are lumped together with those same words used in other contexts, e.g., "good luck" and "good time." Not all thoughts and ideas are expressed as single words. Also, the richness of the English language provides many synonyms for the same concept. Just look up the word "nice" in a thesaurus.

It cannot distinguish between the same word with different meanings, whether as noun vs verb (weather, wait, buy).

It cannot correct for spelling, so if the people contributing content are poor spellers, then "recieve and receive," "thru and through" aren't associated.

With all of these limitations, can you provide me with support for the value of a word cloud? It's a serious question.


SM Reilly

Mr. Smith said...

Word clouds and tag clouds are everywhere now and people are making them for all kinds of things. I also think there are a lot better and easier to use tools out there now to create wordles, too. There's even some sites like wordle word cloud where you can display your word clouds, get comments, even make products displaying them. It's crazy how things can grow in this day and age.

About Me

My Photo
Helping companies make better decisions via Business Intelligence. INTP working on the E&J. Traveler, reader, family guy, coffee drinker.