Web Usage Patterns Using Association Rules and Markov Chains

Similar documents
A Note on Missing Data Effects on the Hausman (1978) Simultaneity Test:

Introduction. Enterprises and background. chapter

Market and Information Economics

Documentation: Philadelphia Fed's Real-Time Data Set for Macroeconomists First-, Second-, and Third-Release Values

San Francisco State University ECON 560 Summer 2018 Problem set 3 Due Monday, July 23

Appendix B: DETAILS ABOUT THE SIMULATION MODEL. contained in lookup tables that are all calculated on an auxiliary spreadsheet.

Advanced Forecasting Techniques and Models: Time-Series Forecasts

Dynamic Programming Applications. Capacity Expansion

Forecasting Sales: Models, Managers (Experts) and their Interactions

Multiple Choice Questions Solutions are provided directly when you do the online tests.

1. FIXED ASSETS - DEFINITION AND CHARACTERISTICS

Economic Growth Continued: From Solow to Ramsey

Problem Set 1 Answers. a. The computer is a final good produced and sold in Hence, 2006 GDP increases by $2,000.

Data Mining Anomaly Detection. Lecture Notes for Chapter 10. Introduction to Data Mining

Data Mining Anomaly Detection. Lecture Notes for Chapter 10. Introduction to Data Mining

The relation between U.S. money growth and inflation: evidence from a band pass filter. Abstract

Services producer price indices for Market research and public opinion polling

Empirical analysis on China money multiplier

(1 + Nominal Yield) = (1 + Real Yield) (1 + Expected Inflation Rate) (1 + Inflation Risk Premium)

The Relationship between Money Demand and Interest Rates: An Empirical Investigation in Sri Lanka

AN ENTERPRISE FINANCIAL STATE ESTIMATION BASED ON DATA MINING

MA Advanced Macro, 2016 (Karl Whelan) 1

An Incentive-Based, Multi-Period Decision Model for Hierarchical Systems

Description of the CBOE S&P 500 2% OTM BuyWrite Index (BXY SM )

Inventory Investment. Investment Decision and Expected Profit. Lecture 5

Fundamental Basic. Fundamentals. Fundamental PV Principle. Time Value of Money. Fundamental. Chapter 2. How to Calculate Present Values

1 Purpose of the paper

Effect of Probabilistic Backorder on an Inventory System with Selling Price Demand Under Volume Flexible Strategy

If You Are No Longer Able to Work

CENTRO DE ESTUDIOS MONETARIOS Y FINANCIEROS T. J. KEHOE MACROECONOMICS I WINTER 2011 PROBLEM SET #6

An Indian Journal FULL PAPER. Trade Science Inc. The principal accumulation value of simple and compound interest ABSTRACT KEYWORDS

GUIDELINE Solactive Bitcoin Front Month Rolling Futures 5D Index ER. Version 1.0 dated December 8 th, 2017

Description of the CBOE Russell 2000 BuyWrite Index (BXR SM )

An Analysis of Trend and Sources of Deficit Financing in Nepal

ANSWER ALL QUESTIONS. CHAPTERS 6-9; (Blanchard)

a. If Y is 1,000, M is 100, and the growth rate of nominal money is 1 percent, what must i and P be?

Stock Market Behaviour Around Profit Warning Announcements

IJRSS Volume 2, Issue 2 ISSN:

CHAPTER CHAPTER26. Fiscal Policy: A Summing Up. Prepared by: Fernando Quijano and Yvonn Quijano

BEHAVIOR VISUALIZATION OF AUTONOMOUS TRADING AGENTS

DOES EVA REALLY HELP LONG TERM STOCK PERFORMANCE?

Systemic Risk Illustrated

Open-High-Low-Close Candlestick Plot (Statlet)

Data-Driven Demand Learning and Dynamic Pricing Strategies in Competitive Markets

Forward Contract Hedging with Contingent Portfolio Programming

Market risk VaR historical simulation model with autocorrelation effect: A note

4452 Mathematical Modeling Lecture 17: Modeling of Data: Linear Regression

Transaction Codes Guide

Volatility and Hedging Errors

A Regime Switching Independent Component Analysis Method for Temporal Data

Portfolio investments accounted for the largest outflow of SEK 77.5 billion in the financial account, which gave a net outflow of SEK billion.

Uzawa(1961) s Steady-State Theorem in Malthusian Model

UCLA Department of Economics Fall PhD. Qualifying Exam in Macroeconomic Theory

Origins of currency swaps

Elton, Gruber, Brown, and Goetzmann. Modern Portfolio Theory and Investment Analysis, 7th Edition. Solutions to Text Problems: Chapter 21

Volume 31, Issue 1. Pitfall of simple permanent income hypothesis model

Acceleration Techniques for Life Cash Flow Projection Based on Many Interest Rates Scenarios Cash Flow Proxy Functions

PARAMETER ESTIMATION IN A BLACK SCHOLES

Supplement to Models for Quantifying Risk, 5 th Edition Cunningham, Herzog, and London

Exponential Functions Last update: February 2008

NASDAQ-100 DIVIDEND POINT INDEX. Index Methodology

This specification describes the models that are used to forecast

OPTIMUM FISCAL AND MONETARY POLICY USING THE MONETARY OVERLAPPING GENERATION MODELS

Unemployment and Phillips curve

LIDSTONE IN THE CONTINUOUS CASE by. Ragnar Norberg

The Impact of Interest Rate Liberalization Announcement in China on the Market Value of Hong Kong Listed Chinese Commercial Banks

Bond Prices and Interest Rates

VERIFICATION OF ECONOMIC EFFICIENCY OF LIGNITE DEPOSIT DEVELOPMENT USING THE SENSITIVITY ANALYSIS

COOPERATION WITH TIME-INCONSISTENCY. Extended Abstract for LMSC09

Chapter Outline CHAPTER

VaR and Low Interest Rates

Problem 1 / 25 Problem 2 / 25 Problem 3 / 30 Problem 4 / 20 TOTAL / 100

STATIONERY REQUIREMENTS SPECIAL REQUIREMENTS 20 Page booklet List of statistical formulae New Cambridge Elementary Statistical Tables

A Simple Method for Consumers to Address Uncertainty When Purchasing Photovoltaics

DEBT INSTRUMENTS AND MARKETS

Estimating Earnings Trend Using Unobserved Components Framework

Short-term Forecasting of Reimbursement for Dalarna University

Objectives for Exponential Functions Activity

INSTITUTE OF ACTUARIES OF INDIA

An Algorithm for Solving Project Scheduling to Maximize Net Present Value

Forecasting of Intermittent Demand Data in the Case of Medical Apparatus

FORECASTING WITH A LINEX LOSS: A MONTE CARLO STUDY

The Mathematics Of Stock Option Valuation - Part Four Deriving The Black-Scholes Model Via Partial Differential Equations

FINAL EXAM EC26102: MONEY, BANKING AND FINANCIAL MARKETS MAY 11, 2004

Population growth and intra-specific competition in duckweed

You should turn in (at least) FOUR bluebooks, one (or more, if needed) bluebook(s) for each question.

Technological progress breakthrough inventions. Dr hab. Joanna Siwińska-Gorzelak

Organize your work as follows (see book): Chapter 3 Engineering Solutions. 3.4 and 3.5 Problem Presentation

An Introduction to PAM Based Project Appraisal

UNIVERSITY OF MORATUWA

Econ 546 Lecture 4. The Basic New Keynesian Model Michael Devereux January 2011

GUIDELINE Solactive Gold Front Month MD Rolling Futures Index ER. Version 1.1 dated April 13 th, 2017

A Method for Estimating the Change in Terminal Value Required to Increase IRR

Online Appendix to: Implementing Supply Routing Optimization in a Make-To-Order Manufacturing Network

Chapter 12 Fiscal Policy, page 1 of 8

Memorandum of Understanding

Service producer price index (SPPI) for storage and warehousing Industry description for SNI group SPPI report no 14

Exam 1. Econ520. Spring 2017

Solve each equation Solve each equation. lne 38. Solve each equation.

Watch out for the impact of Scottish independence opinion polls on UK s borrowing costs

Transcription:

Web Usage Paerns Using Associaion Rules and Markov hains handrakasem Rajabha Universiy, Thailand amnas.cru@gmail.com Absrac - The objecive of his research is o illusrae he probabiliy of web page using a a period of ime using wo saisical echniques. SME Nonhaburi province handicraf goods e-commerce web sie was seleced as a case sudy in his research. Web pages were caegorized ino hree porions: SME firm secion s News, Goods deails and usomer aciviies. Markov chain echnique was applied in order o presen probabiliy of each even. Associaion rule echnique was also used o derive he pahs of web page visiing. This complemenary resuls from wo echniques should suppor web adminisraor o spo wha web pages are he ineresed web pages of his web sie archiecure. Associaion rules show he pah of consequence visied web page while Markov chains give and informaion of possibiliy of evens ha should be visied in specific assigned period of ime. Keywords - Web Mining, Markov hains, Associaion Rule I. INTRODUTION A websie adminisraor (also known as admin) has responsibiliies o creae, mainain and fulfill he imporan daa and informaion of organizaion in order o suppor he enerprise mission. Some conens are ineresed bu anoher one are bored. The conens ha are mosly simulaneous visiing may cause he downgrade of sysem performance. Adminisraor mus deeply analyze he sofware design, daabase design, server or even nework bandwidh adjusmen in order o address hese problems. In case of less visied web page, adminisraor has o modify, increase some conens or deleed hem. This research gahered web usage from sys-log daabase. The e-commerce websie was used as an experimen. The conen in websie was caegorized ino hree ypes such as SME firm secion s News (even A), Goods deails (even B) and usomer aciviies (even ). The 12 web usage daa observaions were gahered during 1-31January 218. These observaions were used o find ou heir paerns of web page visiing by using Associaion rule echnique. Meanwhile, Markov chains were also calculaed in order o obain he probabiliy of evens in defined period of ime. From he resuls, boh echniques provide informaion abou web usage paerns and possibiliy of occurring so ha web sie adminisraor can use hese in conen managemen. II. RELATED THEORY AND RESEARH A. Relaed Research 1) Associaion rule echnique was applied in web usage mining. Observaions were gahered from web usage log file of web page VTSNS, he Advanced School of Technology Novi Sad Serbia, web sie. The experimen was repeiive pruning huge received rules by seing value of suppor and confidence. The derived rules were used in web sie map (archiecure) modificaion. 2) Two web browsers, under exposed, ha were used and experimened in he server of Deparmen of Mahemaics and Saisics, Sagar Universiy ha which one was mos populariy and appreciae. Markov chains model was used o illusrae probabiliy of heir wo browsers usage afer passing ime uni 47

Web Usage Paerns Using Associaion Rules and Markov hains afer sar ime. The less preferred browser, small value probabiliy of exisence, was inspeced for is problems, performance, ec. Afer his less popular web browser was modified abou undesired feaures, such as adjus-add on-re configuraion, Markov chains was hen re processing. The modified web browser was become more increase in probabiliy of sae exisence han before. B. Associaion Rule [3] Associaion rule is a saisical echnique ha used o find ou he dependency of aribues. For example, if aribue A is occurred while aribue B is also occurred hen i can define he rule as A B. The crieria of decision making from he rule can be acceped are wo merics: suppor and confidence ha can be compued by (1) and (2). sup por( A B) ( A B) (1) n( A B) confidence ( A B) (2) n( A) where n( A B) is he number of observaions ha A (source) and B (desinaion) are boh occurred. na ( ) is oal observaions ha sae A is presence. Normally, a suppor value is calculaed from oal daa se herefore his value is used o find ou he pahs have he frequenly happening of user s web sie raveling. If he suppor value is oo high hen here may a few rules are discovered. On he oher hand, here may ge many rules if he suppor value is oo small. For more deail, he confidence value covers he oal observaions ha aribue A, or source B aribues, is occurred. Therefore, he confidence value presens he probabiliy of specific rules ha he happening of desinaion aribues when he source aribues of observaions are oally occurred. If his value is high, i means ha hese rules have more happening in case of he specific occurring source aribue observaions daase.. Markov hains [4] Markov chains is a saisical echnique ha is used o calculae for he probabiliy of ransiion of wo evens beween wo periods in a ime. There should have more saes in he sudying problems. Thus, he specific sae could ravel o anoher sae, or even iself, under prior probabiliy of ransiion marix (P). A any ime period passed from sar poin, he probabiliy of all saes could be calculaed from he sysem of firs order difference equaion. Le is a n 1 size of vecor ha describes he possibiliy of all saes being a ime. describes he possibiliy of all saes being a he ime. The should be calculaed by (3). P 1 (3) is an iniial possibiliy vecor of all saes. I is used o derive he value of all parameers in a sysem of he firs order difference equaion ha is used o find ou all possible evens a a period of ime wihou equaion (3) in consequence from =, 1, 2, 3, -1. Thus, he firs order difference equaion provides more comforable in compuing han ypical processing. III. RESEARH METHODOLOGY A. Daa Preparaion Daa Source: A number of 12 records of he observaion were colleced from SME Nonhaburi Province handicraf goods e-commerce Websie. Websie map is composed of hree menus. The firs porion explains abou he mission of privae enerprise. The second porion is an imporan menu since i gives informaion abou enerprise s goods. And he hird porion informaion and aciviies of order, paymen and goods receive. Each menu is composed of many sub menus hus his research considered only in hree groups in 48

order o reduce compuaional complexiy. Web Usage Log: Our research case sudied web sie applicaion has designed a daabase ha was used o keep all users acions a he choosing menu from sar even, or sae (invocae), and oher raveling saes unil hey logoff he websie. TABLE I DATA DETAIL OF WEB USAGE LOG DATABASE Aribues Descripion Daa Type IP address Dae Time Menu-chosen IP address of exernal user Day: monh: year Hour: min Menu ype S=sar, L=logoff, A=Firm s news, B=Goods deail, =usomer aciviy. 999.999.999.999 99:99:99 99:99 S, A, B,, L Noe: Daa collecion period: Web usage logs were gahered during 1-3 January 218. B. Associaion Rules From web usage log daa base, each user s daa, observaion, were coding and cleaning before furher daa processing sep. Objecive of daa preparaion was o presen he absence and presence of even, or sae, during period of ime, for an example, five observaions were shown in Table II. If he user selecs any menu (S, A, B,, L) hen chosen menu wascoded as 1 (presence), The was done if he absence. TABLE II PARTIAL DATA ABOUT USER S MENU SELETION Observaion# Sar A B Logoff 1 1 1 1 2 1 1 1 1 1 3 1 1 1 1 4 1 1 1 5 1 1 1 Some combinaion of pahs do no happen such as L A, A S Thus his kind of pah mus be deleed. The amoun of all possible pahs, or rule, in an Associaion rule is shown in (4). d d1 # ofpossiblerule 3 2 1 (4) While d is an amoun of even, or aribue, an ineres experimen. For example, if here are 3 evens as: A - B -, hen here are 12 possible rules. A B, A, A B, B, B A, B, A, B A, B, A, B, B, A In his research, here were five evens (d=5) hen here were one hundred possible associaion rules. Some of hese rules were measured by suppor and confidence merics while some pahs did no pass. In order o address he problem of numerous rules abou imporan and less imporan conens. Thus, he suppor and confidence values should be he high score. In his research, he accepable suppor value was se o.3 and he confidence value was se o.5. There were four discovered rules ha passed boh crieria (descending order) as shown in Table III. The mosly happen pah is S A L, suppor =.33. Many cusomers sar visi his websie, read News hen log off. The enerprise goods migh no be ineresed so ha hey suddenly leave his web sie. TABLE III USER S MOSTLY OURRED PATHS # Suppor onfidence Rule or Pah 1 2 3.33.31.3.56.52.51 S A L S B L S B L. Markov hain All daa of user s menu selecion, during hey spen heir ime in he websie, were summarized in working able, as shown in Table IV). For example - according o Table 49

Web Usage Paerns Using Associaion Rules and Markov hains IV, afer login o he websie, he 3% of users (p=.3) chose o visi menu A. And he logoff was chosen a 1%. In case of A sae, afer A was chosen, 1% of user sill chose o say in A and he 3% chose o ravel o B,, and L (logoff). The experimenal ransiion probabiliies marix ( P ) deail is shown in Table IV. TABLE IV SUMMARY PROBABILITY OF USERS BEHAVIORS Sar A B Logoff.....2 Sar.3.1.2.4. A.4.3.5.3. B.2.3.1.2..1.3.2.1.8 Logoff According o ransiion probabiliy marix, here were connecing pah beween some node, or sae, wih oher nodes or even is self while some connecing pahs were absence since here were no ransiion probabiliy beween hem. The whole relaion pahs were illusraed, as shown in Fig. 1. S.2.1 are cumbersome in calculaion cause he nex probabiliy of ineresed even is depend on prior even probabiliy hus hese can made simplified by anoher echnique, he sysem of firs order difference equaion mehod. Afer he daa preparaion was finished, i was summarized ha iniial sae ( ) has probabiliy column vecor as shown in (5), he probabiliy even vecor a ime as shown in (6) and he experimenal ransiion probabiliy marix ( P ) as shown in (7)..2.2.5.1 S A B L (5) (6).....2.3.1.2.4. P.4.3.5.3..2.3.1.2..1.3.2.1.8 (7).1 A.4 B.2.2 Afer a long rial of mahemaical calculaion, he sysem of firs order difference equaion for all evens a ime period was compued hen all even s probabiliy or vecor was presened in equaion (8), (9), (1), (11), and (12). L S.25.16(.56).3(.2).7(.2).2(.21) (8) Fig. 1 Markov Model of Transiion Probabiliy Marix (P) From Table IV, daa was called ransiion probabiliy Marix ( P ). This marix presens an explanaion abou users behaviors abou he web sie s menu choosing. Markov chains mehod was hen applied o illusrae he model of all probably evens (or sae) in a specific ime period. Markov chains A.12.15(.56).15(.2).1(.2).11(.21) (9) B.21.33(.56).3(.2).3(.2).34(.21) (1).9.12(.56).12(.2).4(.2).18(.21) (11) L.48.44(.56).3(.2).1(.2).2(.21) (12) 5

IV. RESEARH SUMMARY AND SUGGESTION A. Summary The proposed research echniques in his research could presen he co-occurrence among evens under he arbirary defined dependency level such as suppor and confidence. Websie adminisraor can choose he value of rules as well as ha wheher resul rules should be sufficien o explain he cusomers behaviors or no. According o he calculaion resuls, Markov chains can presen he probabiliy of paricular evens. While he associaion rules give all possible consequence pahs ha are relaed o he occurred evens bu no any occurrence of probabiliy abou each even. Significance associaion rule pah informs web sie owner abou user or cusomers behaviors. Pah #2 and #3 presen how ofen users visi websie from menu A, B hen go o L. Some cusomers visi he evens B and hen go o L. In pracice, if here is an amoun of oal cusomers N ha coun he number of websie visiing a a period of ime, such as he firs day of any monh, hen he roughly possible amoun of cusomers M ha should follow he pah #3 ha could prediced by (13). funcion, Informaion heory, ausal model analysis and ec should be considered as oher possible echniques ha can be used o solve his problem. REFERENES (Arranged in he order of ciaion in he same fashion as he case of Foonoes.) [1] Dimirijević, M. (211). Web Usage Associaion Rule Mining Sysem. Inerdisciplinary Journal of Informaion, Knowledge, and Managemen, Vol. 6. [2] Shukla, D. (211). Analysis of Users Web Browsing Behavior Using Markov chain Model. Deparmen of Mahemaics and Saisics, Sagar Universiy, Sagar M.P., 473, India. [3] Kumar, V. (25). Inroducion o Daa Mining. Pang-Ning Tan, Michael Seinbach, Vipin Kumar Addison-Wesley, ISBN: 321321367. [4] Fewser. Markov chains. Auckland Universiy, New Zealand. M N * B * (13) The main objecive of e-commerce web sie is o provide he ineresing goods o huge number of cusomers so ha many poenial visiors decide o buy web sie s goods. Therefore, he percenage of web sie success could be calculaed from (14). M SuccessPercenage *1% (14) N B. Furher Research Based on echniques ha were used here, all significan even dependences migh (or no) mee he saisical significance crieria (.5 ) since hey give no any informaion abou ype I error. In alernaive, Bayesian heorem under join probabiliy densiy 51