Web text mining using fuzzy logic. +CD

number: 
2207
إنجليزية
department: 
Degree: 
Imprint: 
Computer Science
Author: 
Huda Abdul Mahdi Taleb
Supervisor: 
Dr. Sawsan K. Thamer
year: 
2009

Abstract:

With the explosive growth of the amount of content on the Internet, it has become increasingly difficult for users to find and utilize information and for content providers to classify and catalog documents.Traditional web search engines often return hundreds or thousands of results for a search, which is time consuming for users to browse therefore the searching Web pages similarity is using.The proposed system (Web Pages Fuzzy Similarity) consists of two phases: Off-line and On-line phases. The Off-line phase constructs Documents Vector DB while On-line phase constructs a Query Document and then gives similar pages to it. Every document should be passed through a set of operations to extract the information that represent it.These operations are: Lexical Text Analyzer, Elimination of Stop Words and other unused words, HTML Document Ranking (HDR) method,weights computation of words by using formula depending on the words frequency and the words attributes (such as font style, font size, position of the words, link text, title, header), and then the documents vector DB is constructed from the largest weights of the document’s words. The On-line phase consists of two steps. The first one takes a query and constructs document vector for it. The second step computes the similarity between query document and documents stored in DB. The similarity measure is done using two methods. The first one is Cosine Similarity Measure and second one is a new suggested formula named Fuzzy Web mining logic equation. Using fuzzy logic enhances and improves the results of search by extracting the most related pages which could be extracted by normal method but with lower relationship.