Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.
The density-based spatial clustering for applications with noise (DBSCAN) is one of the most popular applications of clustering in data mining, and it is used to identify useful patterns and interesting distributions in the underlying data. Aggregation methods for classifying nonlinear aggregated data. In particular, DNA methylations, gene expression. That show the differentially skewed by distance sites and grouped nonlinearly by cancer daisies and the change Situations for gene excretion on it. Under these conditions, DBSCAN is expected to have a desirable clustering feature i that can be used to show the results of the changes. This research reviews the DBSCAN and compares its performance with other algorithms, such as the tradit
... Show MoreWireless sensor networks (WSNs) represent one of the key technologies in internet of things (IoTs) networks. Since WSNs have finite energy sources, there is ongoing research work to develop new strategies for minimizing power consumption or enhancing traditional techniques. In this paper, a novel Gaussian mixture models (GMMs) algorithm is proposed for mobile wireless sensor networks (MWSNs) for energy saving. Performance evaluation of the clustering process with the GMM algorithm shows a remarkable energy saving in the network of up to 92%. In addition, a comparison with another clustering strategy that uses the K-means algorithm has been made, and the developed method has outperformed K-means with superior performance, saving ener
... Show MoreData centric techniques, like data aggregation via modified algorithm based on fuzzy clustering algorithm with voronoi diagram which is called modified Voronoi Fuzzy Clustering Algorithm (VFCA) is presented in this paper. In the modified algorithm, the sensed area divided into number of voronoi cells by applying voronoi diagram, these cells are clustered by a fuzzy C-means method (FCM) to reduce the transmission distance. Then an appropriate cluster head (CH) for each cluster is elected. Three parameters are used for this election process, the energy, distance between CH and its neighbor sensors and packet loss values. Furthermore, data aggregation is employed in each CH to reduce the amount of data transmission which le
... Show MoreFeature selection (FS) constitutes a series of processes used to decide which relevant features/attributes to include and which irrelevant features to exclude for predictive modeling. It is a crucial task that aids machine learning classifiers in reducing error rates, computation time, overfitting, and improving classification accuracy. It has demonstrated its efficacy in myriads of domains, ranging from its use for text classification (TC), text mining, and image recognition. While there are many traditional FS methods, recent research efforts have been devoted to applying metaheuristic algorithms as FS techniques for the TC task. However, there are few literature reviews concerning TC. Therefore, a comprehensive overview was systematicall
... Show MoreThis paper introduces some properties of separation axioms called α -feeble regular and α -feeble normal spaces (which are weaker than the usual axioms) by using elements of graph which are the essential parts of our α -topological spaces that we study them. Also, it presents some dependent concepts and studies their properties and some relationships between them.
Assume that G ≅ HN the Harada–Norton group. In this paper, effective investment for the graph ΓRI HN standard features to acquire meaningful algebraic results for the graph ΓRI HN and its corresponding group HN. For instance, marketing a modern methods to understand the way of create a precise small subgroups in G. Furthermore, performing a full investigation for getting particular ΓRI HN parameters.
One of the crucial public health problems worldwide is the urinary tract infections (UTIs) that are derived from uropathogenic bacteria (UPBs). Slime layer is known to have the ability to permit bacteria to achieve smooth surfaces attachment like catheters and prosthetic implants which in turn, facilitate biofilm formation and may cause lethal infections. On the other hand, Extended-spectrum beta-lactamase (ESBL) production is considered a growing concern among UPBs due to the limiting of the treatment options and contributes to resistance toward antibiotics. The principal study's point is the finding out the slime layer and ESBL production in Escherichia coli and Klebsiella pneumoniae of uropathogenic origin. Ten ready isolates (five isola
... Show MoreIn this paper we generalize Jacobsons results by proving that any integer in is a square-free integer), belong to . All units of are generated by the fundamental unit having the forms
our generalization build on using the conditions
This leads us to classify the real quadratic fields into the sets Jacobsons results shows that and Sliwa confirm that and are the only real quadratic fields in .