My 9/07/08 Missoulian column
I’m old enough to remember how amazing the first Web pages were. At the time, the Web was such a new thing that few of us knew enough to want to be able to search Web pages. We were happy with the rudimentary subject catalogs of the Web created by people. And then Google took the idea of Web search that other companies had been working on and changed everything. Their wide-ranging search gives results that take into account every word on a Web page or in a document and their links to other documents and sites.
But what would happen if Internet search was even better than it is now? What if you were able to get perfectly relevant results every time for your search terms, not wacky suggestions of pages that are way off base? What if the search engine could almost read your mind and know the context of the words you searched for, not just the words themselves? Or: What would it be like to simply type in a question – in everyday language – and be presented with an answer that was almost perfectly relevant because the search engines could understand your question?
That’s called semantic search. Semantic search is all about context: where those words are and how they relate to the rest of a Web site or document they are in, and how they relate to what you are really searching for, not just the simple existence of those words on a Web page or in document. Talking about semantic search is necessarily abstract, and somewhere there’s already a library full of Ph.D. dissertations on semantic search. But semantic search is all about context, not just the raw indexes of words and Web pages that search engines – for the most part – use now.
Semantic search is the next Holy Grail of Internet search. Search engine experts think that semantic search will be as close to a perfect kind of search that is imaginable. If all the world’s information does indeed become indexed and cataloged and accessible to all through a Web browser, then better methods of search must already be in place. If not, the information will be as good as inaccessible because any search will include so much information you’re not looking for it will render the search useless.
To use an example from last week, say you want to search for information about grizzly bears. You type in those search terms, and with semantic search, you jump right to a list of Web sites that includes Montana Fish, Wildlife and Park’s site on how to tell the difference between a grizzly and a black bear, a site on grizzly bear hibernation and biology, and another on endangered species status. You won’t get hits for football teams, the rock ‘n’ roll band or a manufacturer of industrial machinery, all of which share the same terms.
But how does semantic search work? It’s all in the metadata. What’s metadata? Data about data. In order for semantic search to work and put search results in context, there must be more information linked to all the particular words you are searching for in order to place them in the context of the words around it and in context of all its possible meanings, too. That’s where metadata comes in: all that extra data. So semantic search depends on more than the Web page data itself, more than the words that are indexed; it depends on information in order to place that information in context.
This metadata comes from different places. Some of it is offered by companies who have deep databases they want indexed and available to searchers, and so this data has to be described in a way that search engines will understand and use to accurately catalog the data. Some metadata comes from you, the user, in forms that you might not realize, such as from your own browsing habits and the “cookies” that the search engines store on your PC. Some of it comes from search engine companies themselves, recording your search history.
And all this metadata must be in a standard, agreed on format, or it won’t be useful for all search engines. These standards will be in the same vein as the rest of the data standards that run the Web. The World Wide Web Consortium defines it this way: “The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.”
The Web and search company Yahoo! feels there is a “tipping point” of this metadata now available and has started to integrate it into its search engine, but before that, the average Web user wouldn’t have seen it make a difference. Only now, one researcher has said in the BBC News, is the technology being put in place to fulfill that vision; before now, proposing a fully semantic Web was like “trying to build a jet plane when the world only had the technology for bicycles.”
But others think researchers are way ahead of themselves, and a semantic Web is out of reach for many years – maybe always – because of the incredible complexity of building such an index that would have to parse and present incredible amounts of information in context of language.
Even simpler versions of semantic search are one area where – surprisingly – Google might be at a disadvantage. Google has built Web search on pure indexing, without semantic metadata. They are trying new things to include metadata and to work with Web histories for users and other methods to make search results more relevant. But if other search companies make big strides in semantic search, can Google catch up?
Some people are already predicting Google will be left in the “semantic” dust if they don’t hurry up. I’ll wait a few years and see, but I think it will be more than a few years before the promises of semantic search actually pan out.