Simplifying Concepts

Essence of Life is in Simplicity

HITS Algorithm

HITS Algorithm

  • HITS Algorithm stands for Hyperlink Induced Topic Search Algorithm.
  • HITS Algorithm classifies relevant web pages as authorities and hubs given a certain search query.
  • It is query independent.
  • It has been implemented by ASK.COM search engine.

Reason and concept behind the development of HITS Algorithm was:

  • Prior to HITS Algorithm there was a text based ranking system.
  • So, for a given query (search), keyword matches were done.
  • And the document with most occurrences of keyword appeared in the result.
  • This was ridiculous and often returned irrelevant data.
  • Also it lacked synonymic capacity.
  • For instance, User may refer “automobile” for a car or vehicle.
  • Finally, it would also return WebPages with just word “automobile” billion times as the first result. So we can imagine how ridiculous it was?

So in order to tackle these anomalies HITS Algorithm was introduced. It used the link structure of web in order to discover and rank relevant pages. To understand this lets take an example:

  • Consider an example that user wants to search for top automobile manufacturers in last 2 years.
  • The user may be expecting the list of top car brands as the result of search.
  • However from the perspective of user an automobile is car, but for computer automobile is just an automobile.
  • There needs to be mapping required semantically (automobile = car).
  • However again, it is useless since searching remains stills text based and further car manufacturer may not use automobile in their description.
  • So there needs to be different mechanism for ranking.
  • So the concept of Hub and Authority was introduced which forms the basis of HITS.

Concepts

  • Page ‘i’ is called an authority for the query if it contains valuable information on subject.
  • Official websites of carmakers can be considered AUTHORATIVE.
  • These are the ones that are truly relevant to the query.
  • There is SECOND category of pages relevant in process of finding the AUTHORATIVE pages, these are called HUBS.
  • HUB’s role is to advertise the AUTHORATIVE pages.
  • They point the search engine in right direction.
  • HUBS even can be blogs describing the automobiles.

HITS Algorithm identifies good Authorities and Hubs by assigning two weights to pages namely:

    a.Authority weight/ranking:

    • These are pages with many in-links (many links pointing to particular URL).
    • Also they are the pages pointed by pages with HIGH HUB WEIGHT.

    b.Hub weight/ranking:

    • These are the pages with many out-links. (Pointing or referrers to other sites).
    • These pages serve as organiser of information/topic.
    • Also pages that point to pages with HIGH AUTHORITY have HIGH HUB WEIGHT.

AUTHORATIVE and HUBS have mutual reinforcement relationship.

That’s it for this post more information regarding working of the HITS Algorithm and its Advantages & Disadvantages would be covered in my next post. “Click Here”.


Note: Hey guys please do share this knowledge because “Knowledge Grows on Sharing” but do not copy & paste to other forums or websites. If you like this do give a facebook share and fb like. Stay tuned !!!

Post a comment

error: Content is protected !!