Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.
There is no doubt that the field of education and teaching (in any country and whatever its ruling system varies “d”) is considered one of the most specific and sensitive field, because it is related to the building of human. And as the human is a purpose (aim) and the means in the same time and he is the strategist capital, so he way of his rearing, education, choosing the educator, methods of working and the aims are considered serious matters.
The educational process has aim determined by the society for him self through its working establishments in this field and these are the official and public establishments. And as he feels that the establishments have failed to achieve its d
... Show MoreAbstract
The main problem of the study lies in the lack of a clear perception among the study sample about the impact of digital marketing tools on legal liquidity. Legal) of the International Development Bank for Investment and Finance and to achieve the objectives of the research, the method of observation and survey was used in measuring the dimensions of digital marketing. As for banking liquidity, the reports and financial statements of the bank were used as the research sample, as well as the use of the statistical analysis program SPSS in the statement of the relationship The study concluded, in summary, the following: Mar
... Show MoreIt is often needed in demographic research to modern statistical tools are flexible and convenient to keep up with the type of data available in Iraq in terms of the passage of the country far from periods of war and economic sanctions and instability of the security for a period of time . So, This research aims to propose the use of style nonparametric splines as a substitute for some of the compounds of analysis within the model Lee-Carter your appreciation rate for fertility detailed variable response in Iraq than the period (1977 - 2011) , and then predict for the period (2012-2031). This goal was achieved using a style nonparametric decomposition of singular value vehicles using the main deltoid , and then estimate the effect of time-s
... Show MoreAbstract: This study aims to investigate the backscattering electron coefficient for SixGe1-x/Si heterostructure sample as a function of primary electron beam energy (0.25-20 keV) and Ge concentration in the alloy. The results obtained have several characteristics that are as follows: the first one is that the intensity of the backscattered signal above the alloy is mainly related to the average atomic number of the SixGe1-x alloy. The second feature is that the backscattering electron coefficient line scan shows a constant value above each layer at low primary electron energies below 5 keV. However, at 5 keV and above, a peak and a dip appeared on the line scan above Si-Ge alloy and Si, respectively, close to the interfacing line
... Show MoreThe research aims to: Preparing rehabilitative exercises with accompanying tools to rehabilitate those with shoulder dislocation. Knowing the effect of rehabilitative exercises and accompanying aids in improving the muscular strength and motor range of those with dislocations in the shoulder joint. The two researchers used the experimental design with the same experimental group with the pre and post tests, so the researcher chose a sample appropriate to the nature of his research problem, its goals and its hypotheses, as a sample of the injured was chosen to remove the shoulder joint, who completed the treatment, who were not practicing sports, and those who went to the Physiotherapy Center at Al-Was
... Show MoreThe banking sector is currently facing great challenges resulting from intense competition in the financial environment, and this is what makes the supreme audit bodies and the Central Bank audit as the highest supervisory authority on banks in order to achieve profit and not be exposed to loss, and this requires identifying the banking strengths and risks that constitute points Weakness that affects the future performance and the life of the bank, which requires special supervisory care, and from this point of view, the research aims to use the CAMELS model as a control tool in banks, through the use of its six indicators: capital adequacy, asset quality, management quality, profits, liquidity And sensitivity to market risks, th
... Show MoreChemical pollution is a very important issue that people suffer from and it often affects the nature of health of society and the future of the health of future generations. Consequently, it must be considered in order to discover suitable models and find descriptions to predict the performance of it in the forthcoming years. Chemical pollution data in Iraq take a great scope and manifold sources and kinds, which brands it as Big Data that need to be studied using novel statistical methods. The research object on using Proposed Nonparametric Procedure NP Method to develop an (OCMT) test procedure to estimate parameters of linear regression model with large size of data (Big Data) which comprises many indicators associated with chemi
... Show MoreWe have provided in this research model multi assignment with fuzzy function goal has been to build programming model is correct Integer Programming fogging after removing the case from the objective function data and convert it to real data .Pascal triangular graded mean using Pascal way to the center of the triangular.
The data processing to get rid of the case fogging which is surrounded by using an Excel 2007 either model multi assignment has been used program LNDO to reach the optimal solution, which represents less than what can be from time to accomplish a number of tasks by the number of employees on the specific amount of the Internet, also included a search on some of the
... Show More