Simplifying Concepts

Essence of Life is in Simplicity

Compare Text Mining and Data Mining

Difference between Text Mining and Data Mining?
Compare Text Mining and Data Mining?

Differentiate/Difference/Compare Text Mining and Data Mining.

Text Mining

Data Mining

It is the process of deriving high quality information from plain text.

It involves discovering of patterns that are interesting from large data sets.

It is a subset of Data Mining.

It is a superset of Data Mining.

It supports mining of only text.

It supports mining of mixed data.

It is concerned with organisational and retrieval of information from large number of text documents.

It is concerned with important aspects of artificial intelligence, Machine learning mechanisms.

It involves mining within unstructured data like documents.

It involves mining within structured data.

Only a single column of text can be mined at one time.

It supports mining of more than one text column at once.

The patterns are extracted from natural language text.

The patterns are extracted from structured databases.

Approaches of text mining:
a. Simple Keyword base approach
b. Tagging approach
c. Information extraction approach

Approaches of data mining are based on:
a. Kind of database mined
b. Kind of knowledge mined
c. Kind of techniques used
d. Application adopted

Example: Companies can use text mining to find overall trend in bug reports of their software product or customer complaints to understand major issues faced by customers.

Example: It can be used by companies to identify what clusters their customer belong to or classify the new customers into particular group and plan for product development accordingly.


Note: The above differences have been derived through a proper understanding. So please share the link of this webpage as “sharing is a way of spreading knowledge”. But, please do not copy & paste it in other Website or Forums.

Difference between Classification and Clustering

Difference between Classification and Clustering?
Compare Classification and Clustering?

Differentiate/Compare/Difference between Classification and Clustering.

Classification

Clustering

It involves the task of assigning instances/items/points to pre-defined classes.

It involves the task of grouping related points together without labelling them.

Labelling is the priori activity.

Labelling the group of points is posteriori activity.

We have a training set containing data that has been previously categorised.

We do not know the characteristic of similarity of data in advance.

Classification algorithm requires training data.

Clustering algorithm does not require the training data.

Thus based on the training data the algorithm finds the category that new data point belongs to.

It involves the usage of statistical concepts that splits the datasets into sub-datasets such that they have similar data.

There is a concept of response or decision variable used.

There is no existence/concept of response variable.

Since training set exists we describe it as supervised learning.

Since no training set is used, it is also described as unsupervised learning.

Example1- An insurance company trying to assign customers to high and low risk categories.

Example1- An online shopping mart recommending the books based on the other customers who had brought the similar books in past .

Example2- Deciding whether particular patient record can be associated with a specific disease.

Example2- Grouping patient records with similar symptoms without knowing what the symptoms indicate.


Note: The above differences have been derived from the above link through a proper understanding. So please share the link of this webpage as sharing is a way of spreading knowledge. But, please do not copy & paste it in other Website or Forums.

Difference between Periodic and Incremental Crawler

Differentiate/Compare/Difference between PERIODIC & INCREMENTAL Crawlers.

Differentiate between Periodic & Incremental Crawlers?
Compare Periodic & Incremental Crawlers?

Periodic Crawler

Incremental Crawler

It visits the web until the desired number of pages is in collection and stops once the collection reaches its target size.

It keeps visiting pages even after collection reaches its target size to incrementally update/ refresh the local collection.

It operates in a batch mode.

It operates in a steady mode.

It runs in a periodic fashion and updates all the pages in each crawl.

It runs continuously without pause and usually refreshes local collection.

Thus it involves building brand new collection and replacing the old with new collection.

It refreshes existing pages in collection and replaces less important with new and most important pages.

It indexes/collects pages usually after a week or month.

It indexes/collects pages in timely fashion say, daily.

It is easy to implement.

It is relatively difficult to implement.

It is less effective and appears to be less intelligent.p>

It is more effective. Say, Google search engine crawler.

It has fixed frequency.

It has variable frequency.

Since entire collection needs to be replaced with new, it imposes heavy overhead/load on network/server.

Since only a part of collection is replaced, it imposes less overhead/loss on network/server.

The freshness of the crawler is not stable.

The freshness of crawler collection is stable..

It can index new page only after next crawling cycle starts.

It can index new page right after it is found by or submitted to the search engine.


For further understanding and detailed conceptual clarity of the topic you can refer:

http://ilpubs.stanford.edu:8090/376/1/1999-22.pdf : Further there is even diagrammatic representation of freshness of crawlers in this above link that can be used as a point to differentiate between Periodic and Incremental Crawler.


Note: The above differences have been derived from the above link through a proper understanding. So please share the link of this webpage as sharing is a way of spreading knowledge. But, please do not copy & paste it in other Website or Forums.

Pages:1234
error: Content is protected !!