The huge amount of information in the internet makes rapid need of text
summarization. Text summarization is the process of selecting important sentences
from documents with keeping the main idea of the original documents. This paper
proposes a method depends on Technique for Order of Preference by Similarity to
Ideal Solution (TOPSIS). The first step in our model is based on extracting seven
features for each sentence in the documents set. Multiple Linear Regression (MLR)
is then used to assign a weight for the selected features. Then TOPSIS method
applied to rank the sentences. The sentences with high scores will be selected to be
included in the generated summary. The proposed model is evaluated using dataset
supplied by the Text Analysis Conference (TAC-2011) for English documents. The
performance of the proposed model is evaluated using Recall-Oriented Understudy
for Gisting Evaluation (ROUGE) metric. The obtained results support the
effectiveness of the proposed model.
Multi-document summarization is an optimization problem demanding optimization of more than one objective function simultaneously. The proposed work regards balancing of the two significant objectives: content coverage and diversity when generating summaries from a collection of text documents.
Any automatic text summarization system has the challenge of producing high quality summary. Despite the existing efforts on designing and evaluating the performance of many text summarization techniques, their formulations lack the introduction of any model that can give an explicit representation of – coverage and diversity – the two contradictory semantics of any summary. In this work, the design of
... Show MoreExtractive multi-document text summarization – a summarization with the aim of removing redundant information in a document collection while preserving its salient sentences – has recently enjoyed a large interest in proposing automatic models. This paper proposes an extractive multi-document text summarization model based on genetic algorithm (GA). First, the problem is modeled as a discrete optimization problem and a specific fitness function is designed to effectively cope with the proposed model. Then, a binary-encoded representation together with a heuristic mutation and a local repair operators are proposed to characterize the adopted GA. Experiments are applied to ten topics from Document Understanding Conference DUC2002 datas
... Show MoreAutomatic document summarization technology is evolving and may offer a solution to the problem of information overload. Multi-document summarization is an optimization problem demanding optimizing more than one objective function concurrently. The proposed work considers a balance of two significant objectives: content coverage and diversity while generating a summary from a collection of text documents. Despite the large efforts introduced from several researchers for designing and evaluating performance of many text summarization techniques, their formulations lack the introduction of any model that can give an explicit representation of – coverage and diversity – the two contradictory semantics of any summary. The design of gener
... Show MoreIt is the regression analysis is the foundation stone of knowledge of statistics , which mostly depends on the ordinary least square method , but as is well known that the way the above mentioned her several conditions to operate accurately and the results can be unreliable , add to that the lack of certain conditions make it impossible to complete the work and analysis method and among those conditions are the multi-co linearity problem , and we are in the process of detected that problem between the independent variables using farrar –glauber test , in addition to the requirement linearity data and the lack of the condition last has been resorting to the
... Show MoreIt is well-known that the existence of outliers in the data will adversely affect the efficiency of estimation and results of the current study. In this paper four methods will be studied to detect outliers for the multiple linear regression model in two cases : first, in real data; and secondly, after adding the outliers to data and the attempt to detect it. The study is conducted for samples with different sizes, and uses three measures for comparing between these methods . These three measures are : the mask, dumping and standard error of the estimate.
Currently, the prominence of automatic multi document summarization task belongs to the information rapid increasing on the Internet. Automatic document summarization technology is progressing and may offer a solution to the problem of information overload.
Automatic text summarization system has the challenge of producing a high quality summary. In this study, the design of generic text summarization model based on sentence extraction has been redirected into a more semantic measure reflecting individually the two significant objectives: content coverage and diversity when generating summaries from multiple documents as an explicit optimization model. The proposed two models have been then coupled and def
... Show MoreAdvances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship an
... Show MoreWith the growth of the use mobile phones, people have become increasingly interested in using Short Message Services (SMS) as the most suitable communications service. The popularity of SMS has also given rise to SMS spam, which refers to any unwanted message sent to a mobile phone as a text. Spam may cause many problems, such as traffic bottlenecks or stealing important users' information. This paper, presents a new model that extracts seven features from each message before applying a Multiple Linear Regression (MLR) to assign a weight to each of the extracted features. The message features are fed into the Extreme Learning Machine (ELM) to determine whether they are spam or ham. To evaluate the proposed model, the UCI bench
... Show More