The dataset contains 8124 descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family. Each species is identified as definitely edible, definitely poisonous (or of unknown edibility and not recommended).
The objective in this assignment is to mine the data and find association rules that can be used to identify the edibility of a mushroom.
For this assignment, the following libraries are used:
library(GGally)
library(ggplot2)
library(dplyr)
library(arules)
library(arulesViz)
library(gridExtra)
This assignment uses the data frame mushrooms dataset that can be found in the DATASET folder.
dataset <- read.csv("../datasets/mushrooms.csv")
In order to work with the “arules” library, the data has to be in transaction format. In this case we the data needs to be converted into a transaction class.
df <- as(dataset, "transactions")
Now we can check the structure of each transaction.
inspect(df[1:2, ])
## items transactionID
## [1] {type=poisonous,
## cap_shape=convex,
## cap_surface=smooth,
## cap_color=brown,
## bruises=yes,
## odor=pungent,
## gill_attachment=free,
## gill_spacing=close,
## gill_size=narrow,
## gill_color=black,
## stalk_shape=enlarging,
## stalk_root=equal,
## stalk_surface_above_ring=smooth,
## stalk_surface_below_ring=smooth,
## stalk_color_above_ring=white,
## stalk_color_below_ring=white,
## veil_type=partial,
## veil_color=white,
## ring_number=one,
## ring_type=pendant,
## spore_print_color=black,
## population=scattered,
## habitat=urban} 1
## [2] {type=edible,
## cap_shape=convex,
## cap_surface=smooth,
## cap_color=yellow,
## bruises=yes,
## odor=almond,
## gill_attachment=free,
## gill_spacing=close,
## gill_size=broad,
## gill_color=black,
## stalk_shape=enlarging,
## stalk_root=club,
## stalk_surface_above_ring=smooth,
## stalk_surface_below_ring=smooth,
## stalk_color_above_ring=white,
## stalk_color_below_ring=white,
## veil_type=partial,
## veil_color=white,
## ring_number=one,
## ring_type=pendant,
## spore_print_color=brown,
## population=numerous,
## habitat=grasses} 2
The summary is as follows.
summary(df)
## transactions as itemMatrix in sparse format with
## 8124 rows (elements/itemsets/transactions) and
## 119 columns (items) and a density of 0.1932773
##
## most frequent items:
## veil_type=partial veil_color=white gill_attachment=free
## 8124 7924 7914
## ring_number=one gill_spacing=close (Other)
## 7488 6812 148590
##
## element (itemset/transaction) length distribution:
## sizes
## 23
## 8124
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 23 23 23 23 23 23
##
## includes extended item information - examples:
## labels variables levels
## 1 type=edible type edible
## 2 type=poisonous type poisonous
## 3 cap_shape=bell cap_shape bell
##
## includes extended transaction information - examples:
## transactionID
## 1 1
## 2 2
## 3 3
Let’s make sure it’s in the correct format.
str(df)
## Formal class 'transactions' [package "arules"] with 3 slots
## ..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
## .. .. ..@ i : int [1:186852] 1 4 11 12 23 31 34 35 38 39 ...
## .. .. ..@ p : int [1:8125] 0 23 46 69 92 115 138 161 184 207 ...
## .. .. ..@ Dim : int [1:2] 119 8124
## .. .. ..@ Dimnames:List of 2
## .. .. .. ..$ : NULL
## .. .. .. ..$ : NULL
## .. .. ..@ factors : list()
## ..@ itemInfo :'data.frame': 119 obs. of 3 variables:
## .. ..$ labels : chr [1:119] "type=edible" "type=poisonous" "cap_shape=bell" "cap_shape=conical" ...
## .. ..$ variables: Factor w/ 23 levels "bruises","cap_color",..: 21 21 3 3 3 3 3 3 4 4 ...
## .. ..$ levels : Factor w/ 70 levels "abundant","almond",..: 20 50 5 16 17 27 34 62 24 33 ...
## ..@ itemsetInfo:'data.frame': 8124 obs. of 1 variable:
## .. ..$ transactionID: chr [1:8124] "1" "2" "3" "4" ...
In this plot we can see the distribution of items amongs the transactions.
image(sample(df, 100))
Apriori was one of the first algorithms developed for the discovery of association rules and continues to be one of the most widely used. It has two stages:
rules <- apriori(df)
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 812
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 8124 transaction(s)] done [0.01s].
## sorting and recoding items ... [56 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10 done [3.78s].
## writing ... [3315185 rule(s)] done [0.34s].
## creating S4 object ... done [1.17s].
rules
## set of 3315185 rules
As we can see, if we try to apply the algorithm without any other consideration it gives an astonishing 3.3M rules, so we need to try to reduce them.
Before trying to get better results we can take a look at the summary of this rules:
summary(rules)
## set of 3315185 rules
##
## rule length distribution (lhs + rhs):sizes
## 1 2 3 4 5 6 7 8 9 10
## 5 354 5580 35965 132340 325154 579505 781023 809297 645962
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 7.000 8.000 8.091 9.000 10.000
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.1002 Min. :0.80 Min. :0.1002 Min. :0.8266
## 1st Qu.:0.1064 1st Qu.:1.00 1st Qu.:0.1064 1st Qu.:1.0849
## Median :0.1064 Median :1.00 Median :0.1064 Median :1.7110
## Mean :0.1238 Mean :0.99 Mean :0.1253 Mean :1.8713
## 3rd Qu.:0.1123 3rd Qu.:1.00 3rd Qu.:0.1182 3rd Qu.:2.1815
## Max. :1.0000 Max. :1.00 Max. :1.0000 Max. :6.8718
## count
## Min. : 814
## 1st Qu.: 864
## Median : 864
## Mean :1006
## 3rd Qu.: 912
## Max. :8124
##
## mining info:
## data ntransactions support confidence call
## df 8124 0.1 0.8 apriori(data = df)
Here we can find a function named run_apriori_with_loop
that we can use to try different configurations of the support values, the confidence values and the minumun lenght values. It returns a data frame that contains the number of rules taking in consideration the redundant ones and not, with every combination possible.
run_apriori_with_loop <- function(df, supp_values, conf_values, minlen_values) {
result_df <- data.frame()
for (supp_val in supp_values) {
for (conf_val in conf_values) {
for (minlen_val in minlen_values) {
rules <- apriori(
df,
parameter = list(
supp = supp_val,
conf = conf_val,
minlen = minlen_val
),
appearance = list(rhs = c("type=edible", "type=poisonous")),
control = list(verbose = FALSE)
)
red_rules <- rules
not_red_rules <- rules[!is.redundant(rules)]
rules_conf <- sort(not_red_rules, by = "lift", decreasing = TRUE)
result_df <- rbind(
result_df,
data.frame(
Support = supp_val,
Confidence = conf_val,
MinLength = minlen_val,
Redundant_Rules = length(red_rules),
Non_Redundant_Rules = length(rules_conf)
)
)
}
}
}
return(result_df)
}
supp_values <- c(0.3, 0.4, 0.5)
conf_values <- c(0.6, 0.7, 0.8, 0.9)
minlen_values <- c(1, 2, 3)
result_rules <- run_apriori_with_loop(
df,
supp_values,
conf_values,
minlen_values
)
result_rules
## Support Confidence MinLength Redundant_Rules Non_Redundant_Rules
## 1 0.3 0.6 1 786 73
## 2 0.3 0.6 2 786 73
## 3 0.3 0.6 3 776 98
## 4 0.3 0.7 1 686 68
## 5 0.3 0.7 2 686 68
## 6 0.3 0.7 3 681 77
## 7 0.3 0.8 1 576 53
## 8 0.3 0.8 2 576 53
## 9 0.3 0.8 3 574 58
## 10 0.3 0.9 1 312 29
## 11 0.3 0.9 2 312 29
## 12 0.3 0.9 3 311 30
## 13 0.4 0.6 1 38 10
## 14 0.4 0.6 2 38 10
## 15 0.4 0.6 3 33 16
## 16 0.4 0.7 1 16 7
## 17 0.4 0.7 2 16 7
## 18 0.4 0.7 3 14 7
## 19 0.4 0.8 1 4 2
## 20 0.4 0.8 2 4 2
## 21 0.4 0.8 3 3 2
## 22 0.4 0.9 1 4 2
## 23 0.4 0.9 2 4 2
## 24 0.4 0.9 3 3 2
## 25 0.5 0.6 1 0 0
## 26 0.5 0.6 2 0 0
## 27 0.5 0.6 3 0 0
## 28 0.5 0.7 1 0 0
## 29 0.5 0.7 2 0 0
## 30 0.5 0.7 3 0 0
## 31 0.5 0.8 1 0 0
## 32 0.5 0.8 2 0 0
## 33 0.5 0.8 3 0 0
## 34 0.5 0.9 1 0 0
## 35 0.5 0.9 2 0 0
## 36 0.5 0.9 3 0 0
As we can see in the above results there is a very high difference between having removed the redundant rules and not, in this case we think that having 7 rules with 0.7 confidence is good enough to us, because having 10 seems a lot to have into consideration if a mushroom is poisonous or not, and 2 is too also too low.
Let’s take a closer look to this association rules given a support value of 0.4, a confidence values of 0.7 and a minimum lenght of 2.
supp_val <- 0.4
conf_val <- 0.7
minlen_val <- 2
rules_2 <- apriori(
df,
parameter = list(supp = supp_val, conf = conf_val, minlen = minlen_val)
)
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.7 0.1 1 none FALSE TRUE 5 0.4 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 3249
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 8124 transaction(s)] done [0.01s].
## sorting and recoding items ... [21 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 done [0.00s].
## writing ... [1454 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
summary(rules_2)
## set of 1454 rules
##
## rule length distribution (lhs + rhs):sizes
## 2 3 4 5 6 7
## 111 396 538 321 81 7
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 3.000 4.000 3.922 5.000 7.000
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.4008 Min. :0.7032 Min. :0.4008 Min. :0.8751
## 1st Qu.:0.4185 1st Qu.:0.9195 1st Qu.:0.4343 1st Qu.:1.0000
## Median :0.4530 Median :0.9765 Median :0.4797 Median :1.0252
## Mean :0.4796 Mean :0.9470 Mean :0.5087 Mean :1.0789
## 3rd Qu.:0.4914 3rd Qu.:1.0000 3rd Qu.:0.5495 3rd Qu.:1.0541
## Max. :0.9754 Max. :1.0000 Max. :1.0000 Max. :1.8649
## count
## Min. :3256
## 1st Qu.:3400
## Median :3680
## Mean :3896
## 3rd Qu.:3992
## Max. :7924
##
## mining info:
## data ntransactions support confidence
## df 8124 0.4 0.7
## call
## apriori(data = df, parameter = list(supp = supp_val, conf = conf_val, minlen = minlen_val))
inspect(rules_2[1:6])
## lhs rhs support confidence
## [1] {bruises=yes} => {gill_spacing=close} 0.4027573 0.9691943
## [2] {bruises=yes} => {gill_attachment=free} 0.4155588 1.0000000
## [3] {bruises=yes} => {veil_color=white} 0.4155588 1.0000000
## [4] {bruises=yes} => {veil_type=partial} 0.4155588 1.0000000
## [5] {stalk_shape=enlarging} => {gill_attachment=free} 0.4069424 0.9402730
## [6] {stalk_shape=enlarging} => {veil_color=white} 0.4081733 0.9431172
## coverage lift count
## [1] 0.4155588 1.1558624 3272
## [2] 0.4155588 1.0265353 3376
## [3] 0.4155588 1.0252398 3376
## [4] 0.4155588 1.0000000 3376
## [5] 0.4327917 0.9652234 3306
## [6] 0.4327917 0.9669212 3316
rules_3 <- apriori(
df,
parameter = list(supp = supp_val, conf = conf_val, minlen = minlen_val),
appearance = list(rhs = c("type=edible", "type=poisonous"))
)
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.7 0.1 1 none FALSE TRUE 5 0.4 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 3249
##
## set item appearances ...[2 item(s)] done [0.00s].
## set transactions ...[119 item(s), 8124 transaction(s)] done [0.01s].
## sorting and recoding items ... [21 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 done [0.00s].
## writing ... [16 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules3_conf <- sort(rules_3, by = "lift", decreasing = TRUE)
inspect(rules3_conf)
## lhs rhs support confidence coverage lift count
## [1] {odor=none} => {type=edible} 0.4194978 0.9659864 0.4342688 1.864941 3408
## [2] {odor=none,
## veil_type=partial} => {type=edible} 0.4194978 0.9659864 0.4342688 1.864941 3408
## [3] {gill_size=broad,
## stalk_surface_above_ring=smooth} => {type=edible} 0.4155588 0.9398664 0.4421467 1.814514 3376
## [4] {gill_size=broad,
## stalk_surface_above_ring=smooth,
## veil_type=partial} => {type=edible} 0.4155588 0.9398664 0.4421467 1.814514 3376
## [5] {bruises=no,
## gill_attachment=free,
## ring_number=one} => {type=poisonous} 0.4007878 0.7722960 0.5189562 1.602179 3256
## [6] {bruises=no,
## gill_attachment=free,
## veil_type=partial,
## ring_number=one} => {type=poisonous} 0.4007878 0.7722960 0.5189562 1.602179 3256
## [7] {bruises=no,
## ring_number=one} => {type=poisonous} 0.4007878 0.7386570 0.5425899 1.532393 3256
## [8] {bruises=no,
## veil_type=partial,
## ring_number=one} => {type=poisonous} 0.4007878 0.7386570 0.5425899 1.532393 3256
## [9] {bruises=no,
## veil_color=white} => {type=poisonous} 0.4042344 0.7220756 0.5598227 1.497993 3284
## [10] {bruises=no,
## veil_type=partial,
## veil_color=white} => {type=poisonous} 0.4042344 0.7220756 0.5598227 1.497993 3284
## [11] {bruises=no,
## gill_attachment=free} => {type=poisonous} 0.4030034 0.7214632 0.5585918 1.496723 3274
## [12] {bruises=no,
## gill_attachment=free,
## veil_type=partial} => {type=poisonous} 0.4030034 0.7214632 0.5585918 1.496723 3274
## [13] {bruises=no,
## gill_attachment=free,
## veil_color=white} => {type=poisonous} 0.4020187 0.7209713 0.5576071 1.495702 3266
## [14] {bruises=no,
## gill_attachment=free,
## veil_type=partial,
## veil_color=white} => {type=poisonous} 0.4020187 0.7209713 0.5576071 1.495702 3266
## [15] {stalk_surface_above_ring=smooth} => {type=edible} 0.4480551 0.7032457 0.6371246 1.357692 3640
## [16] {stalk_surface_above_ring=smooth,
## veil_type=partial} => {type=edible} 0.4480551 0.7032457 0.6371246 1.357692 3640
pruned_rules <- rules_3[!is.redundant(rules_3)]
summary(pruned_rules)
## set of 7 rules
##
## rule length distribution (lhs + rhs):sizes
## 2 3 4
## 2 4 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 2.500 3.000 2.857 3.000 4.000
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.4008 Min. :0.7032 Min. :0.4343 Min. :1.358
## 1st Qu.:0.4019 1st Qu.:0.7218 1st Qu.:0.4806 1st Qu.:1.497
## Median :0.4042 Median :0.7387 Median :0.5426 Median :1.532
## Mean :0.4131 Mean :0.7948 Mean :0.5276 Mean :1.595
## 3rd Qu.:0.4175 3rd Qu.:0.8561 3rd Qu.:0.5592 3rd Qu.:1.708
## Max. :0.4481 Max. :0.9660 Max. :0.6371 Max. :1.865
## count
## Min. :3256
## 1st Qu.:3265
## Median :3284
## Mean :3356
## 3rd Qu.:3392
## Max. :3640
##
## mining info:
## data ntransactions support confidence
## df 8124 0.4 0.7
## call
## apriori(data = df, parameter = list(supp = supp_val, conf = conf_val, minlen = minlen_val), appearance = list(rhs = c("type=edible", "type=poisonous")))
prun_rules3_conf <- sort(pruned_rules, by = "lift", decreasing = TRUE)
inspect(prun_rules3_conf)
## lhs rhs support confidence coverage lift count
## [1] {odor=none} => {type=edible} 0.4194978 0.9659864 0.4342688 1.864941 3408
## [2] {gill_size=broad,
## stalk_surface_above_ring=smooth} => {type=edible} 0.4155588 0.9398664 0.4421467 1.814514 3376
## [3] {bruises=no,
## gill_attachment=free,
## ring_number=one} => {type=poisonous} 0.4007878 0.7722960 0.5189562 1.602179 3256
## [4] {bruises=no,
## ring_number=one} => {type=poisonous} 0.4007878 0.7386570 0.5425899 1.532393 3256
## [5] {bruises=no,
## veil_color=white} => {type=poisonous} 0.4042344 0.7220756 0.5598227 1.497993 3284
## [6] {bruises=no,
## gill_attachment=free} => {type=poisonous} 0.4030034 0.7214632 0.5585918 1.496723 3274
## [7] {stalk_surface_above_ring=smooth} => {type=edible} 0.4480551 0.7032457 0.6371246 1.357692 3640
In the following plots, we can observe the results obtained, both redundant and non-redundant, presented in scatter plots, graphs, and clusters:
grid.arrange(
plot(rules_2),
plot(rules_2, method = "two-key plot"),
ncol = 2
)
The following plots represent items and rules as vertices connecting them with directed edges. This representation focuses on how the rules are composed of individual items and shows which rules share items.
plot(rules_3, method = "graph")
plot(pruned_rules, method = "graph")
To visualize the grouped matrix we use a balloon plot with antecedent groups as columns and consequent’s as rows. The color of the balloons represent the lift and the size of the balloon shows support. Furthermore, the columns and rows in the plot are reordered in such a way that the the most interesting group is placed in the top left corner. (Lift is decreasing from left to right.)
plot(pruned_rules, method = "grouped")
Eclat is a data mining algorithm used to extract sets of frequent items from transactional datasets. The name “Eclat” is derived from “Equivalence Class Clustering and bottom-up Lattice Traversal,” reflecting its approach based on equivalence class clustering and bottom-up lattice traversal. In other words, it is an effective tool for discovering patterns of frequent itemsets in extensive transactional datasets.
run_rule_induction <- function(df, support_values, min_confidence_values) {
result_df <- data.frame()
for (min_support in support_values) {
for (min_confidence in min_confidence_values) {
eclad_rules <- eclat(df,
parameter = list(support = min_support, tidLists = TRUE),
control = list(verbose = FALSE)
)
rulesf <- ruleInduction(eclad_rules, confidence = min_confidence)
red_rules <- rulesf
rulesf <- rulesf[!is.redundant(rulesf)]
filtered_rules <- subset(
rulesf,
rhs %in% c("type=edible", "type=poisonous")
)
result_df <- rbind(
result_df,
data.frame(
Support = min_support,
Confidence = min_confidence,
Redundant_Rules = length(red_rules),
Non_Redundant_Rules = length(filtered_rules)
)
)
}
}
return(result_df)
}
Using the function run_rule_induction
we can obtain the results of the different possible cobination for the following support and confidence values:
0.4, 0.5, 0.6, 0.7, 0.8, 0.9
0.5, 0.6, 0.7, 0.8, 0.9
support_values <- c(0.4, 0.5, 0.6, 0.7, 0.8, 0.9)
min_confidence <- c(0.5, 0.6, 0.7, 0.8, 0.9)
results <- run_rule_induction(df, support_values, min_confidence)
print(results)
## Support Confidence Redundant_Rules Non_Redundant_Rules
## 1 0.4 0.5 1810 23
## 2 0.4 0.6 1625 10
## 3 0.4 0.7 1454 7
## 4 0.4 0.8 1370 2
## 5 0.4 0.9 1137 2
## 6 0.5 0.5 430 1
## 7 0.5 0.6 383 0
## 8 0.5 0.7 318 0
## 9 0.5 0.8 318 0
## 10 0.5 0.9 269 0
## 11 0.6 0.5 120 0
## 12 0.6 0.6 120 0
## 13 0.6 0.7 103 0
## 14 0.6 0.8 103 0
## 15 0.6 0.9 86 0
## 16 0.7 0.5 75 0
## 17 0.7 0.6 75 0
## 18 0.7 0.7 75 0
## 19 0.7 0.8 75 0
## 20 0.7 0.9 60 0
## 21 0.8 0.5 47 0
## 22 0.8 0.6 47 0
## 23 0.8 0.7 47 0
## 24 0.8 0.8 47 0
## 25 0.8 0.9 40 0
## 26 0.9 0.5 11 0
## 27 0.9 0.6 11 0
## 28 0.9 0.7 11 0
## 29 0.9 0.8 11 0
## 30 0.9 0.9 11 0
Now, we will select the result for 7 rules using a support of 0.4 and confidence of 0.7. By running it, leads to the formation of a set of 565 itemsets. After eliminating redundant rules, we are left with 7 rules, as it can be seen the following plots:
eclad_rules <- eclat(df, parameter = list(support = 0.4, tidLists = TRUE))
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## TRUE 0.4 1 10 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 3249
##
## create itemset ...
## set transactions ...[119 item(s), 8124 transaction(s)] done [0.01s].
## sorting and recoding items ... [21 item(s)] done [0.00s].
## creating bit matrix ... [21 row(s), 8124 column(s)] done [0.00s].
## writing ... [565 set(s)] done [0.01s].
## Creating S4 object ... done [0.02s].
summary(eclad_rules)
## set of 565 itemsets
##
## most frequent items:
## veil_type=partial gill_attachment=free veil_color=white
## 283 270 268
## ring_number=one gill_spacing=close (Other)
## 222 190 694
##
## element (itemset/transaction) length distribution:sizes
## 1 2 3 4 5 6 7
## 21 97 185 170 76 15 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 3.000 3.000 3.411 4.000 7.000
##
## summary of quality measures:
## support count
## Min. :0.4008 Min. :3256
## 1st Qu.:0.4205 1st Qu.:3416
## Median :0.4609 Median :3744
## Mean :0.4906 Mean :3986
## 3rd Qu.:0.5180 3rd Qu.:4208
## Max. :1.0000 Max. :8124
##
## includes transaction ID lists: TRUE
##
## mining info:
## data ntransactions support
## df 8124 0.4
## call
## eclat(data = df, parameter = list(support = 0.4, tidLists = TRUE))
rulesf <- ruleInduction(eclad_rules, confidence = 0.7)
rulesf <- rulesf[!is.redundant(rulesf)]
filtered_rules <- subset(rulesf, rhs %in% c("type=edible", "type=poisonous"))
inspect(filtered_rules)
## lhs rhs support confidence lift
## [1] {odor=none} => {type=edible} 0.4194978 0.9659864 1.864941
## [2] {bruises=no,
## gill_attachment=free,
## ring_number=one} => {type=poisonous} 0.4007878 0.7722960 1.602179
## [3] {bruises=no,
## veil_color=white} => {type=poisonous} 0.4042344 0.7220756 1.497993
## [4] {bruises=no,
## gill_attachment=free} => {type=poisonous} 0.4030034 0.7214632 1.496723
## [5] {bruises=no,
## ring_number=one} => {type=poisonous} 0.4007878 0.7386570 1.532393
## [6] {gill_size=broad,
## stalk_surface_above_ring=smooth} => {type=edible} 0.4155588 0.9398664 1.814514
## [7] {stalk_surface_above_ring=smooth} => {type=edible} 0.4480551 0.7032457 1.357692
In the following plots, we can observe the results obtained, both redundant and non-redundant, presented in scatter plots, graphs, and clusters:
grid.arrange(
plot(rulesf),
plot(rulesf, method = "two-key plot"),
ncol = 2
)
plot(rulesf, method = "graph")
plot(filtered_rules, method = "graph")
plot(filtered_rules, method = "grouped")
To conclude this task, we can compare the two results obtained from the Apriori
and Eclat
algorithms, as shown below:
invisible(grid.arrange(
plot(pruned_rules, method = "grouped") + ggtitle("Apriori"),
plot(filtered_rules, method = "grouped") + ggtitle("Eclat"),
ncol = 2
))
As we can observe in both cases, we have achieved the same results with two different algorithms and a similar configuration.
After the study we can conclude that the following association rules are the ones that help us to say if a mushroom is poisonous or not.
{odor=none}
{stalk_surface_below_ring=smooth}
{stalk_surface_above_ring=smooth}
{gill_size=broad}
{gill_size=broad, stalk_surface_above_ring=smooth}
{bruises=no}
{bruises=no, ring_number=one}
{bruises=no, gill_attachment=free}
{bruises=no, veil_color=white}
{bruises=no, gill_attachment=free, ring_number=one}