Thursday, May 28, 2009

The Semantic Web (aka Web 3.0)

I attended a meetup last night - the first one in Toronto to gather and discuss Web 3.0/the Semantic Web. Much of what was said I agreed with - however, the strongest feeling I had coming away was "What is the elevator pitch to a non-techy person?"

I have been interested in the issue of how we deal with this great volume of information now freely available (and growing at a ridiculous rate). Everyone is blogging, books and articles are online, and it's pretty clear that media is moving to a permanent home online (sorry MC - it's just too convenient). How do we dig through all of this to find what we want? I think many of the people at the meetup last night feel this same frustration...


The speaker last night, Greg Boutin, has blogged about the difference between "Finding" and "Discovering." The suggestion here is that the Semantic Web can absolutely help answer a specific question (ie Wolfram Alpha) - in seconds. The google experience is "Discovering" - you are interested in a topic, and what to read more about it (minutes or hours). I would like to add in one more Search method - Mahalo. While I don't use them much - they went in the other direction - they use actual people to create content pages about popular subjects. I like to think of it as "human algorithms" rather than "statistical algorithms"..:)

So, let's look at these search engines in the context of different searches. Let's run through a simple one: "How Big Is Canada?"

Wolfram Alpha

Wolfram answers the question, Google will eventually get you there, and Mahalo gives me absolutely nothing (not even easy to scan relevant alternate results). Conclusion - if you have a simple question - Wolfram is better than the rest.

Let's try another search term - this time, just "Canada". Perhaps I'm planning a vacation to Canada and want to read up on it. Let's have a look:

Wolfram Alpha

Again, Wolfram gives me the basics - size, biggest cities, pop density, etc. The searcher should no longer think "I'm going to Nova Scotia, then Quebec, then Toronto, Alberta, then BC in 1 week, and I'm driving..." :) What does Google tell us? Well, we get a nice link to the Government web site, the Wikipedia entry,, etc. Now we start to open multiple tabs, and start digging. And how about Mahalo? Basically, the google search results, but neatly formatted and easily digestable.

While I do enjoy the occasional "surfing" session - often I want answers, and I want them now. I don't have time to dig to page 27 on some forum, or page 5 in Google. Personally, I'm getting frustrated with Google these days - I often find myself doing advanced filtering (ie by date).

So - in summary, one of the benefits of the Semantic Web is to make your life easier because you will be able to find answers much more quickly, and it will make sense in the context of your query (ie Title: Job title != Land ownership != movie title..etc)

Here are some random questions/thoughts out of last night:

- How will content get tagged/structured/indexed correctly? Text analyzers? Manually? Standards that assist both?
- Is this inevitably just addon tools like Zemanta? Or should it be part of the core fabric of how data is stored?
- ..and is Zemanta truly Web 3.0? Or just smart text-analyzers? How is that really different than Google search?
- While there is clear benefit to new Software Apps, what is the benefit to the Content Owners? Don't tell them "increased readership through linking"! (in case you don't know, the major newspapers are blaming google and craigslist for their downfall).
- PageRank has worked pretty well for quite awhile (certainly better than Yahoo directories or Altavista back in the day) - but will it work with the dramatic increases in data?

Final Point: the Semantic Web and Web 3.0 remains poorly defined and no one can agree on what it really is or how it will really work. The meetups and the discussions happening around the globe on the net indicate a push is underway. There is definitely a market opportunity here. Now what is it...