📄 17.txt

📁 This complete matlab for neural network
💻 TXT
📖 第 1 页 / 共 4 页
字号:
the active session. This is important in the context of matching active sessio
ns with clusters in order to make recommendations. During an active session, g
iven two matching clusters with the same non-normalized matching score, the la
rger cluster should be weighed less. This corresponds to the intuitive notion 
that we should see more of the user's active session before obtaining a better
 match with the larger cluster. 


We can also impose a minimum threshold t on the matching score. In this case, 
we consider a cluster c as a matching cluster for the active session s, only i
f match(s, c) &sup3; t. The value of t is domain specific and may depend on th
e statistical properties of the access logs for the site. 


Once the matching score is computed for an active session, we have to decide w
hich of the URLs from these matching clusters are to be presented to the user.
 Thus, our next task is to construct a recommendation set for the user. To com
pute the recommendation value for a URL within a matching cluster, we take int
o account both the physical link distance to the current active session and th
e matching score for the cluster. Given a cluster c and an active session s, w
e compute a recommendation score for each URL within the cluster: 


 


Note that if the URL u is in the current active session, then its recommendati
on value is zero because of the link distance factor. Finally, we can compute 
the recommendation set Recommend(s) for current active session s by collecting
 from each cluster all URLs whose recommendation score satisfies a minimum rec
ommendation threshold r: 


 


The URLs in the recommendation set are ranked according to their recommendatio
n score, when presented to the user. Furthermore, for each URL that is contrib
uted by several clusters, we use its maximal recommendation score from all of 
the contributing clusters. 

  


Experimental Results 


We used the access logs from the University of Minnesota Computer Science Web 
server to test the three methods discussed earlier. The preprocessed log (for 
February of 1999) was converted into a session file comprising 14294 user tran
sactions and a total of 4001 unique URLs (before support filtering). We provid
e a summary of the results below. 


Recommendations Based on Usage Clusters 


In this experiment using the hypergraph partitioning algorithm as modified by 
[CC99] in order to take frequent itemsets as the input performed the clusterin
g of URLs. The frequent itemsets were found using the tree projection algorith
m described in [AAP99]. Each URL serves as a vertex in the hypergraph, and eac
h edge represents a frequent itemset with the weight of the edge taken as the 
interest for the set. Since interest increases dramatically with the number of
 items in a rule, the log of the interest is taken in order to prevent the lar
ger rules from completely dominating the clustering process. For the recommend
ation process we chose a session window size of 2, since the average session s
ize was 2.4. The recommendation results are given for the sample path 


/research => /grad-info => /registration-info


Each table below corresponds to one step in the user navigation through the pa
th. In each case the current active session window is given along with the top
 recommendations. A cut-off value of 0.30 was used for the recommendation scor
e. 



Session Window Recommendation Score 

/research

 

 /newsletter/newfaculty.html 0.73 

/newsletter 0.65 

/faculty 0.55 

/research/cnmrg 0.55 

/research/softeng 0.55 

/research/airvl 0.51 

/research/mmdbms 0.48 

/research/agassiz 0.47 

/personal-pages 0.37 

/registration-info 0.35 

/registration-info/spring99.html 0.32 

/grad-info 0.30 

/registration-info/schedule99-00.html 0.30 

/grad-research 0.30 




Session Window Recommendation Score 

/research 

/grad-info


 

 /faculty 0.59 

/personal-pages 0.52 

/newsletter/newfac.html 0.52 

/newsletter 0.46 

/grad-info/grad-handbook.html 0.45 

/grad-info/course-guide.html 0.45 

/grad-info/prospective-grads.html 0.40 

/registration-info 0.39 

/research/cnmrg 0.39 

/research/softeng 0.39 

/research/airvl 0.36 

/registration-info/spring99.html 0.35 

/research/mmdbms 0.34 

/research/agassiz 0.33 

/registration-info/schedule99-00.html 0.33 




Session Window Recommendation Score 

/grad-info 

/registration-info


 

 /faculty 0.51 

/personal-pages 0.45 

/grad-info/grad-handbook.html 0.45 

/grad-info/course-guide.html 0.45 

/grad-info/prospective-grads.html 0.40 

/registration-info/spring99.html 0.36 

/registration-info/schedule99-00.html 0.34 


Note that in many cases the obvious recommendations associated with the URLs i
n the active session window rank lower than URLs for pages that are farther aw
ay in the site graph. This variation is mainly due to the link distance factor
 discussed earlier. In each case the recommendation set is composed of URLs fr
om a number of matching clusters. When /research page is requested, the URLs f
or a number of popular research groups in the department are added to the set.
 When /grad-info is requested some of the frequently visited URLs associated w
ith that page as well as related class registration pages rank higher. 


Recommendations Using Transaction Clustering 


For this experiment the same session file was clustered into clusters of trans
actions. We used multivariate k-means clustering for this task. The transactio
n clusters were converted to URL clusters by computing the mean transaction fo
r each cluster and assigning to each URL in the cluster its associated mean va
lue. Again a cut-off recommendation value of 0.30 was used in the resulting re
commendation sets. 



Session Window Recommendation Score 

/research

 

 /faculty 0.62 

/grad-info 0.56 

/grad-research 0.53 

/personal-pages 0.47 

/tech-reports 0.44 

/research/cnmrg 0.40 

/research/mmdbms 0.40 

/research/airvl 0.37 

/research/agassiz 0.31 

/grad-info/grad-handbook.html 0.30 




Session Window Recommendation Score 

/research 

/grad-info


 

 /grad-info/grad-handbook.html 0.60 

/faculty 0.48 

/grad-research 0.41 

/personal-pages 0.37 

/tech-reports 0.34 

/grad-info/course-guide.html 0.32 

/research/cnmrg 0.31 

/research/mmdbms 0.31 

/registration-info/spring99.html 0.30 




Session Window Recommendation Score 

/grad-info 

/registration-info


 

 /grad-info/grad-handbook.html 0.61 

/registration-info/spring99.html 0.60 

/grad-info/course-guide.html 0.33 

/registration-info/schedule99-00.html 0.32 

/personal-pages 0.30 


In comparing the results with those obtained by usage clustering, we observe t
hat these results (as well as results from other experiments with a variety of
 usage data), support our intuition that the usage clustering method can captu
re overlapping interests of different types of users, even if the associated t
ransaction profiles are not considered similar enough. For example, in the fir
st reference to the /research page, the usage clustering method, in addition t
o the core set of recommendations, also provided recommendations for users (ma
inly graduate students) who may be interested in registering for courses, as w
ell users who may be interested in finding out about research areas of new fac
ulty. Similar observations can be made about the other steps in the sample pat
h. 


On the other hand, the transaction clustering technique seems to provide a nar
rower aggregated view of usage activity more directly centered around the a co
re set of URLs. Which of these methods is more suitable as part of Web persona
lization may depend on the structure and content of a particular site, as well
 as the goals of the site designers and operators. 


Recommendations Using Frequent Itemsets 


For the itemset method the frequent itemsets were discovered using a support t
hreshold of 0.25%. However, only a window size of 1 was used, i.e. the algorit
hm used itemsets of size 2 to determine the recommendation set. This is becaus
e the data set contained very few itemsets of size larger than 2 with sufficie
nt support. A lower support threshold setting would have yielded larger itemse
ts. A confidence threshold of 0.1 was used in the selection of candidate URL r
ecommendations. The results of the experiments are given below. 


Note that in general, given the session window w = <s1, s2, …, sn>, the recom
mendation score of each URL ui in the recommendation set is the confidence of 
the association rule {s1,s2,…,sn}=>{ui} multiplied by the link distance facto
r ldf(w, u). The results are summarized below: 



Session Window Recommendation Score 

/research 

Support = 2.58%

 /faculty 0.39 

/grad-info 0.34 

/personal-pages 0.27 

/grad-research 0.22 

/research/airvl 0.19 

/research/cnmrg 0.19 

/research/softeng 0.16 

/research/agassiz 0.15 

/research/mmdbms 0.15 

/research/grouplens 0.13 

/tech-reports 0.13 




Session Window Recommendation Score 

/grad-info 

Support = 7.35%


 

 /grad-info/course-guide.html 0.22 

/personal-pages 0.17 

/grad-info/grad-handbook.html 0.15 

/faculty 0.14 

/grad-info/prospective-grads.html 0.14 




Session Window Recommendation Score 

/registration-info 

Support = 4.20% 


 

 /registration-info/spring99.html 0.50 

/registration-info/schedule99-00.html 0.20 

/undergraduate-info 0.16 


While the itemset method is quite efficient and can provide good results in ma
ny cases, the above experiment also shows the limitations of this method. For 
example, if in step 2 a window size of 2 was used (containing /research and /g
rad-info), then the resulting recommendation set would have reduced to the set
 {</faculty,0.50>,</personal-pages,0.46>}, thus missing the frequently visited
 pages descending from /grad-info. On the other hand, the window size of one d
oes not take enough of the user session history into account. For example, in 
the last step, the URL /undergraduate-info is given as a recommendation while 
the hypothetical user is most likely a graduate student. 


As the results suggest, in cases were the use of the itemset recommendation al
gorithm is justified, this method can provide even a narrower aggregated view 
of usage patterns than the transaction clustering method.
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -