Search 3.0

by coelomic


“How could the world beat a path to your door when the path was uncharted, uncatalogued, and could be discovered only serendipitously?” — Paul Gilster, Digital Literacy

Nobody can deny the importance of search in improving the usability of the internet. Apart from actually hunting down information, search also forms an integral part of our exploration of the internet, the gateway rather. Millions of us use search engines to “enter” the internet. Typing in a few terms we sit back and click away at the flat list of links served to us. In this column I rant about search and the internet in general.

We all have used Google or Yahoo for search or any of the other popular search sites for that matter. For many people, using search engines has become routine. Not bad for a technology that’s not even 20 years old. But how did search engines come into being? What are the origins of this entity that prowls the outer reaches of cyberspace? The history of internet search is very interesting.

In 1998 the last of the current search super powers, and the most powerful to date, Google, was launched. It decided to rank pages using an important concept of implied value due to inbound links. This makes the web somewhat democratic as each off going link is a vote. Google has become so popular that major portals such as AOL and Yahoo have used Google and allowed that search technology to own the lions share of web searches. In 1998 MSN search is launched. The open directory and direct hit were also launched in 1998.

It is a study in the quest for greater granularity in finding information more and more of which was becoming virtual as we have a tendency to “go digital” in all walks of life.

Find:

Though search is getting better day by day, nothing epitomises the frustrations of sometimes trying to find the information that you want. Remember, “search” and ” find” are two different concepts. I quote from this register.co.uk article:

Ten years ago the internet – one computer network amongst many – was brought to our attention with the promise that it would give us unlimited access to “all the world’s information.” The phrase still pops up when people refer to the internet in the public prints. We’re awash with information, but more hasn’t proved to be better. What we have is a typical tragedy of the commons, a space that more closely resembles a toxic wasteland. The promise hasn’t been fulfilled. Canonical databases and archives cost money; copyright is a fact of life, and clever licensing workarounds don’t address the underlying economic issues. Information costs money and most rights holders like to be paid.Lazy governments have cynically taken advantage of this. Technologists only see more technology as the answer, and they’ve sold the idea to politicians. In the United Kingdom, the administration has presided over the slow strangulation of the public library service, and now simply points parents and schools to the internet. Buy a PC and broadband, and you’ll have everything you want: and if the garbage flies at you at 500 times the speed it did on dial-up, then you’re experiencing the thrill of truly living in the “information age”!

All the world’s info:

Search engines try and give you the exact information that you need. In the process of doing that they have to build indexes that are scalable and ever relevant, not to mention the fact that they have to be all encompassing with respect to “information”. Hence the stated purpose of companies such a s Google is “to organise all the world’s information”. A lofty goal indeed. In organising all the worlds information, all human knowledge will have to be the one that has to be indexed first. Let us take a look at this concept and its inherent drawbacks.

IT is evident to any one who takes a survey of the objects of human knowledge, that they are either ideas actually imprinted on the senses; or else such as are perceived by attending to the passions and operations of the mind; or lastly, ideas formed by help of memory and imagination – either compounding, dividing, or barely representing those originally perceived in the aforesaid ways. – By sight I have the ideas of light and colours, with their several degrees and variations By touch I perceive hard and soft, heat and cold, motion and resistance, and of all these more and less either as to quantity or degree. Smelling furnishes me with odours; the palate with tastes; and hearing conveys sounds to the mind in all their variety of tone and composition. – And as several of these are observed to accompany each other, they come to be marked by one name, and so to be reputed as one THING. Thus, for example, a certain colour, taste, smell, figure and consistence having been observed to go together, are accounted one distinct thing, signified by the name apple; other collections of ideas constitute a stone, a tree, a book, and the like sensible things – which as they are pleasing or disagreeable excite the passions of love, hatred, joy, grief, and so forth. – Of the Principles of Human Knowledge (1710), Bishop George Berkeley (1710)

It is plainly evident that a huge portion of all human knowledge that is existent today is of a metaphysical nature and not indexable. Even if they are, then I wonder how much of it is actually written down, for it to be indexed. That is problem number one. The nature of all the worlds information is not text based. It lies in sounds, sights, and experiences. But hey today we have search engines that index sound, as audio files, visual elements as video. But then that is not exactly information is it unless all such audio & videos were “howto’s”!!

But despite these drawbacks the web is so much more usable these days as a result of companies like Google, Yahoo etc. Still there is much wanting when on the quest for a particular piece of information. Below are some of the specific problems that I came across while using search engines.

Too much info:

The web started off as a network of interconnected computers. The early days of search was based on indexing everything in sight,a slow painstaking process until Google changed all that. unfortunately though more relevant search results are provided to each search query the problem still exists that the user has to come up with the perfect search query to find the right information!! It is still upto the user to create search queries that will yield the best results.

The perfect page is out there somewhere. It’s the page that has exactly the information you’re looking for and to you it’s beautiful and unattainable like a faraway star. If only you had a super-sized net for capturing it!

Most people use a search engine by simply typing a few words into the query box and then scrolling through whatever comes up. Sometimes their choice of words ends up narrowing the search unduly and causing them not to find what they’re looking for. More often the end result of the search is a haystack of off-target web pages that must be combed through. How often have you started a search on say ” How a radio works” and had to wade through blogs of individuals who made a DIY radio or sites that sold content or that exiated for plain advertising? The biggest problem people have with search engines (perhaps) is that they’re so good! You can type in a word and within a fraction of a second you’ll have 20,000 pages to look at. Most of those pages will not be exactly what you’re after, and you have to spend a load of time wading through the 19,993 that aren’t quite right.

I beleive that the solution to the problem lies in categorisation of serach results. Yes, categorisation! Rather than being faced with a flat list of links that Google thinks are relevant, what is needed is categories into which the results of that particular query fit. For example if I typed in the query “use radio” the results maybe presented as “buy radio”, ” radio usage”, “radio DIY”, etc, so that at a glance I can weed out the results that I think are not related to my taste and then delve deeper into the category that would then have more relevant content. That way I can weed out a lot of sites with the specific keyword but uninteresting content. That way it would be upto the system to categorise the results and not upto me to slice dice and refine my keywords to hone in on the information that I want. A more visual approach that would releive me of thinking. Some intriguing technologies are getting better at bringing order to all that chaos, and could revolutionize how people mine the Internet for information. Software now exists that analyzes search results and automatically sorts them into categories that, at a glance, present far more information than the typical textual list. A similar process powers Grokker, a downloadable program that not only sorts search results into categories but also “maps” the results in a holistic way, showing each category as a colorful circle. Within each circle, subcategories appear as more circles that can be clicked on and zoomed in on.

Informed Search :

The way search engines are constructed now means that you have to have a certain amount of search savvy, if I may use the word, to get results from the system. What I mean by that is that, anyone can type words into google and start a search but to further refine the results one has to have indepth knowledge of constructing search queries. That is asking a lot of your core user group. Search engines have to be more accessible to people wit less than stellar keyword construction abilities. A more democratic search engine is the way forward, one that serves the uninformed tramp as well as the university degree holder. Search engines have a socila responsibility as more content is on the web these days and search results are equally valuable to us all irrespective of our backgrounds.

Second generation of search : via this fantastic website

It includes a group of search services that make use of technology that organizes search results by peer ranking, or clusters results by concept, site or domain. This is in contrast to the more long-standing method of term relevancy ranking. This newer type of ranking usually works in addition to term ranking and looks at “off the page” information to determine the retrieval and order of your search results. Search engines that employ this alternative may be thought of as second generation search services. For example:

* Google ranks by the number of links from the hightest number of pages ranked high by the service
* Teoma ranks by the number of linking pages on the same subject as your search
* Vivisimo organizes results by keyword and/or concept

* The human element: concept processing. Second generation services such as Ask Jeeves and SurfWax apply different kinds of concept processing to a search statement to determine the probable intent of a search. This is often accomplished by the use of human generated indexes. With these services, the burden of coming up with precise or extensive terminology is shifted from the user to the engine. These services are therefore taking on the role of thesauri.
* The human element: “horizontal” presentation of results. Most search tools return results in one long, vertical list. In contrast to this, there is a growing group of search tools that use concept processing to return results in a horizontal organization. With these tools, you can first review concept categories retrieved by your search before examining the results within particular categories. This can make it easier to zero in on the aspects of your topic that interest you. Examples of these tools include All 4 One Metasearch, Clusty and Exalead.
* The human element: peer ranking. Search services such as Google and Teoma derive their results from the behavior and judgment of millions of Web developers.


The future:

It is interesting to note that there are certain forms of search that are still not widely adopted by search engines. I would like for example a way to scan the bar codes of all the books that I own so that Google can provide me with their book search results and I can easily search through my books. There is no copyright trouble as I did pay for the books. Interesting thought isn’t it. It would vastly increase my productivity.

The other craze on the internet now is “interesting links”. More people are turning to the web to serve as a source of entertainment and part of the trend is to search for “cool links”. Unfortunately typing “cool links” into Google is not what I mean. A whole array of service have sprung up based on the idea. Look at ” Del.icio.us”, “Digg” and the like. It would be so easy for Google, Yahoo and the like to come up with a page of interesting links gleaned from their daily, monthly and annual searches.
What is very markedly absent from my favourite search engine “Google” is the lack of the human element. They are not big fans of human involvement in providing search results. But isn’t a little bias a good thing when it comes to certain specific search results, say for instance product recommendations. They maybe able to add it to their Froogle search results maybe. Yahoo has the right mix by investing in up and coming web societies such as Flickr and Del.icio.us. I would love Google to incorporate tag based search or to even refine the concept in some way.

New tech:

The aim of the Semantic Web efforts is to be able to find and access Web sites and Web resources not by keywords, as Google does today, but by descriptions of their contents and capabilities,” Jerry Hobbs, a computer scientist at the University of Southern California,

Right now, this kind of search capability is impossible because Web search engines require that users guess the right keywords to find what they seek. However, several maturing technologies are considered the most likely keys to fulfilling the goals of the Semantic Web project. These technologies, already tried and tested in research labs, will help make the Semantic Web a reality:

The Web ontology language (OWL), developed on top of XML, will help search engines discern whether two Web sites have the same content even if they are described using different terminology or metalanguage.

It is an exciting time to be in. The possibilities are endless.