One of the most important phenomena at the end of the twentieth century has been the development that has revolutionized communications - the World Wide Web or simply the Web. Its impact on society has been so great that some have compared it to the invention of the wheel or the discovery of fire. The Web has today become a mass communication network of worldwide scope.
Companies responded to the rise of the Web in several ways, most frequently through the creation of corporate websites comparable to virtual business cards. The next step in the development of corporate websites was their integration with internal company systems, such as sales. From this point on, new business models appeared in the digital market and the design and construction of website solutions became a complex task.
For many companies and/or institutions, it is no longer sufficient to have a website and high-quality products or services. The difference between success and failure for an e-business can be the potential of the relevant website to attract and retain visitors. This potential is determined by the content of the site, its design, and technical considerations, including the time taken to load the pages.
To stay competitive, a company needs an up-to-date website, which should offer the information sought by visitors in a readily accessible way. This should be achieved in the most efficient way possibly, ideally online and automatically. However, the reality is in many cases very different insofar as the structure of the website does not help the visitor to find the desired information, even though the data is contained within it.
The intelligent website is a new generation of portals capable of improving its own structure and content on the basis of an analysis of visitor behaviour. This is a difficult undertaking due to the lack of data needed to characterize the behaviour of website visitors. Web log files are important data sources in this regard. However, depending on the website traffic, these files may consist of millions of registers, each containing a great deal of irrelevant information, so that analysis of it becomes a complex task.
In this research, the “Knowledge Discovery in Databases” (KDD) process is used to extract unknown patterns from the web data. The process starts with the selection of data sources, in this case web logs and website pages. The next step is the cleaning and preprocessing stage. The third step is the transformation of data into information. From the information originating in web data, a visitor behaviour model was developed and used in the creation of a similarity measure. This measure was applied in web-mining clustering algorithms to extract significant patterns from web data. These patterns were checked by a business expert, who provided opinions based on personal experience. This procedure enabled improvements to be made in the different stages of the process.
Finally a framework to acquire and maintain the knowledge extracted from web data was created and used for making online (navigation) and offline (web site changes) recommendations.
09 May 2005