Simplifying Concepts

Essence of Life is in Simplicity

Apriori Algorithm

Apriori Algorithm

  • It is a scalable mining method that is it can be applied on large number of transactions.
  • It is mining algorithm designed to extract frequent itemsets.
  • The items that occur frequently in the transactions are knows as frequent items.
  • Apriori algorithm uses the prior knowledge of frequent itemset properties, hence apriori.

Certain terminologies

  • Ck: Candidate itemset of size k
  • Lk: Frequent itemset of size k
  • Transaction-Id: basically is reference to a transaction, like say receipt number of customer that would be stored in a database on buying certain items from shopping mart or mall.
  • Items: These are the commodities that are purchased like say bread, milk. In below example I1, I2… represent items.
  • Support:It states the minimum number of count of particular item or itemset that is permitted.

I would be soon explaining what Ck and Lk are once I give out the steps of the algorithm.For proper understanding we would directly dive into an example.In this case we are considering support to be 2(Support=2)

Table-1

Transaction-Id Items
T100 I1,I2,I5
T200 I2,I4
T300 I2,I3
T400 I1,I2,I4
T500 I1,I3
T600 I2,I3
T700 I1,I3
T800 I1,I2,I3,I5
T900 I1,I2,I3

(Don’t worry if you are not able to decipher these steps )

General Apriori Algorithm Steps

  • Step1: Find frequent itemset Lk+1.
  • Step2: This step is called join step. In this Ck is generated by performing join operation. A kind of cross multiply operation you may have seen in SQl. That is (Lk+1* Lk+1). No worries if you have not understood this would be clear once you start seeing the example.
  • Step3: Pruning step, in this we remove certain items based on their count.
  • Step4: A frequent itemset Lk has been achieved.

Now, moving back to our example, in this T100 represents transaction number.Initially the algorithm will scan the database and prepare a candidate set-1 (C1) table where k=1. This table basically contains the item count for each item. From Table-1 it is clear that there are five items namely I1, I2, I3, I4 and I5.

C1

Item Count
I1 6
I2 7
I3 6
I4 2
I5 2

In C1 basically we have count of the items that are present in all of the transactions. From this C1, L1 is computed or generated.

L1

Item Count
I1 6
I2 7
I3 6
I4 2
I5 2

In case if the C1 (candidate itemset1) had any item less than support count(less than 2) that would have been removed or pruned. Say if I5 were 1 it would be not there in the L1 table.

Now moving ahead C2 is generated from L1. C2 is obtained by performing join operation on L1. So the candidate set formed is candidate set of order which contains two items. In this a pair can appear only once for instance I1, I2 is considered same as I2, I1.

C2

Item Count
I1,I2 4
I1,I3 4
I1,I4 1
I1,I5 2
I2,I3 4
I2,I4 2
I2,I5 2
I3,I4 0
I3,I5 1
I4,I5 0

Now pruning is performed to obtain L2 from C2. Thus L2 contains frequent items with count >= 2.

L2

Item Count
I1,I2 4
I1,I3 4
I1,I5 2
I2,I3 4
I2,I4 2
I2,I5 2

The count is obtained from Table-1 that is how many times (I1, I2) have occurred together. So it is clear that {I4, I5}, {I1, I4}, {I3, I4}, {I3, I5} have count less than 2 so they are not placed in L2 table above.Now generate C3 that is candidate set of size 3. That is each itemset can now contain 3 items.

C3

Itemset Count
I1,I2,I3 2
I1,I2,I5 2
I1,I2,I4 1
I1,I3,I5 0
I2,I3,I4 1
I2,I4,I5 0
I1,I3,I4 0

C3 is built by join operation on L2 that is (L2*L2). So in this itemset of 3 is formed by joining (I1, I2) and (I1, I3) so itemset created would contain (I1, I2,I3). In other words in case if there is one common item we remove and if say we had (I1, I2) and (I3, I4) then it would not be put into C3 that is it would have been ignored. Similarly (I1, I2) and (I1, I5) are joined to form (I1, I2, I5) and its count is noted from Table-1.(I1, I2) and (I1, I3) are joined to form (I1, I2, I3) so its count is 2. In the similar fashion whole C3 is built.

Now L3 is built by pruning:
L3

Itemset Count
I1,I2,I3 2
I1,I2,I5 2

Proceeding further C4 is built from L3 through join operation. That is L3*L3. Since it is C4 it should contain four items in itemset.

C4

Itemset Count
I1,I2,I3,I5 1

Now L4 will have no records or itemset because count of (I1, I2, I3, I5) is 1. L4 is said to be null.The frequent itemset by definition is Lk-1 when Lk becomes null.So L4 is null the frequent itemset by definition is L3

So in other words itemset {I1, I2, I3} and (I1, I2, I5) are the most frequently brought items.This was the glimpse of Apriori algorithm. In the next post I would be giving insights on Pseudo code of Apriori algorithm and Generation of Association rules.”Click here”


Note: Hey guys please do share this knowledge because “Knowledge Grows on Sharing” but do not copy & paste to other forums or websites. If you like this do give a facebook share and fb like. Stay tuned !!!

HITS Algorithm

HITS Algorithm

  • HITS Algorithm stands for Hyperlink Induced Topic Search Algorithm.
  • HITS Algorithm classifies relevant web pages as authorities and hubs given a certain search query.
  • It is query independent.
  • It has been implemented by ASK.COM search engine.

Reason and concept behind the development of HITS Algorithm was:

  • Prior to HITS Algorithm there was a text based ranking system.
  • So, for a given query (search), keyword matches were done.
  • And the document with most occurrences of keyword appeared in the result.
  • This was ridiculous and often returned irrelevant data.
  • Also it lacked synonymic capacity.
  • For instance, User may refer “automobile” for a car or vehicle.
  • Finally, it would also return WebPages with just word “automobile” billion times as the first result. So we can imagine how ridiculous it was?

So in order to tackle these anomalies HITS Algorithm was introduced. It used the link structure of web in order to discover and rank relevant pages. To understand this lets take an example:

  • Consider an example that user wants to search for top automobile manufacturers in last 2 years.
  • The user may be expecting the list of top car brands as the result of search.
  • However from the perspective of user an automobile is car, but for computer automobile is just an automobile.
  • There needs to be mapping required semantically (automobile = car).
  • However again, it is useless since searching remains stills text based and further car manufacturer may not use automobile in their description.
  • So there needs to be different mechanism for ranking.
  • So the concept of Hub and Authority was introduced which forms the basis of HITS.

Concepts

  • Page ‘i’ is called an authority for the query if it contains valuable information on subject.
  • Official websites of carmakers can be considered AUTHORATIVE.
  • These are the ones that are truly relevant to the query.
  • There is SECOND category of pages relevant in process of finding the AUTHORATIVE pages, these are called HUBS.
  • HUB’s role is to advertise the AUTHORATIVE pages.
  • They point the search engine in right direction.
  • HUBS even can be blogs describing the automobiles.

HITS Algorithm identifies good Authorities and Hubs by assigning two weights to pages namely:

    a.Authority weight/ranking:

    • These are pages with many in-links (many links pointing to particular URL).
    • Also they are the pages pointed by pages with HIGH HUB WEIGHT.

    b.Hub weight/ranking:

    • These are the pages with many out-links. (Pointing or referrers to other sites).
    • These pages serve as organiser of information/topic.
    • Also pages that point to pages with HIGH AUTHORITY have HIGH HUB WEIGHT.

AUTHORATIVE and HUBS have mutual reinforcement relationship.

That’s it for this post more information regarding working of the HITS Algorithm and its Advantages & Disadvantages would be covered in my next post. “Click Here”.


Note: Hey guys please do share this knowledge because “Knowledge Grows on Sharing” but do not copy & paste to other forums or websites. If you like this do give a facebook share and fb like. Stay tuned !!!

Load and Stress Testing

Difference between Load and Stress Testing?
Compare Load and Stress Testing?

Differentiate/Compare/ Difference between Load and Stress Testing.

Load Testing

Stress Testing

Suppose, if an application is built for 500 users, then in load testing we check for up to 500 users.

Suppose, if an application is built for 500 users, then in stress testing we check for greater than 500 users

It helps us to study the behaviour of application under loads.

It helps us to study the breaking point of the system/ server.

It determines the reliability of the application.

It determines the stability of the application.

The parameter under focus is the response time.

The parameter under focus is the throughput.

Load testing is subset of Stress testing.

Stress testing is superset of Load testing.

Load testing is done to ensure that the system meets the specified load. (I.e. whether system has load bearing capacity.)

Stress testing ensures whether the application meets the unexpected loads.

Testing the app with maximum numbers of users and input is load testing.

Testing the app with more than maximum number of users & inputs is stress testing.

Load tests are more formal and usually predefined by client and domain experts.

It is informal and is needed to overcome & understand erroneous situations possible.

It is called operational capacity testing.

It is also sometimes called torture testing.

Load testing could be considered like a non destructive test.

Stress testing is considered like a destructive test.

It involves maximal utilization of resources.

It involves denial or unavailability of resources.


Note: The above differences have been derived through a proper understanding. So please share the link of this webpage as “sharing is a way of spreading knowledge”. But, please do not copy & paste it in other Website or Forums.

Hey guys we are eager to hear from you, about your views and your thoughts. Let us know how much you like the differences mentioned above. Please give your comments and if possible share because “Sharing is Caring”.

Static and Dynamic Testing

Difference between Static and Dynamic Testing?
Compare Static and Dynamic Testing?

Differentiate/Compare/ Difference between Static and Dynamic Testing.

Static Testing

Dynamic Testing

It is testing without executing the software.

It is testing that involves the execution of software.

It can start early in the software lifecycle.

It can start after the development of software component.

Software is examined manually and some static analysis tools are used.

In this by giving inputs, its outputs are examined & compared with what is expected.

Review, inspection and meetings come under static testing.

Unit, Integration, System & acceptance testing come under dynamic testing.

It involves finding bugs before compiling.

It involves finding bugs after compile & linking.

It is about prevention.(Preventing errors from being occuring)

It is about cure.(It allows to figure out errors and fix them)

It is more cost effective.

It is not cost effective.

It is done usually in verification stage.

It is done usually in validation stage.

It guarantees 100% statement coverage.

It does not guarantee complete code coverage.

More reviews and comments (on source code) are highly recommended for good quality software.

More defects (uncover) and its correction is highly recommended for the good quality software.

Missing requirement, design defect and syntax errors are the defects found in this testing.

Variable not constant, functionality missing are kind of defects found in dynamic testing.


Note: The above differences have been derived through a proper understanding. So please share the link of this webpage as “sharing is a way of spreading knowledge”. But, please do not copy & paste it in other Website or Forums.

Hey guys we are eager to hear from you, about your views and your thoughts. Let us know how much you like the differences mentioned above. Please give your comments and if possible share because “Sharing is Caring”.

Unit and Integration Testing

Difference between Unit and Integration Testing?
Compare Unit and Integration Testing?

Differentiate/Compare/ Difference between Unit and Integration Testing.

Unit Testing

Integration Testing

It is a testing method that verifies whether individual units of source code are working properly.

It is a level of testing in which individual software modules are combined and these combined modules are tested.

Thus unit test is done in isolation.

Integration test is done in modules.

Unit testing checks a single component of the application.

It checks if different units work together as an application.

The errors in unit testing are internal in nature.

The errors are by large external in nature.

Usually errors include bugs, syntax errors and logical errors.

Usually errors in integration testing include interfacing errors.

Various approaches/ strategies of unit testing include:
a. Static Unit Testing
b. Dynamic Unit Testing

Various approaches to Integration testing are:
a. Incremental
b. Top-Down approach
c. Bottom-Up approach
d. Sandwich
e. Big bang

Unit test cases are easier to understand and maintain.

Integration test cases are hard and difficult to maintain.

Usually units are built by a single person, so developers are the people who are responsible for unit testing.

Multiple units are built by different people, so testing is carried out by system integration tester.

Unit tested modules never fulfil the final functional requirements.

The outcome of Integration testing fulfils the functional requirements, but yet it requires system validation.

The developer can continue testing till his will or till the time he wants.

Testing continues till the complete combined version of the system with no interface errors is built.

It occurs before integration and system testing.

It occurs or is only possible once the unit testing is done.


Note: The above differences have been derived through a proper understanding. So please share the link of this webpage as “sharing is a way of spreading knowledge”. But, please do not copy & paste it in other Website or Forums.

Hey guys we are eager to hear from you, about your views and your thoughts. Let us know how much you like the differences mentioned above. Please give your comments and if possible share because “Sharing is Caring”.

error: Content is protected !!