This research deals with a shrinking method concerned with the principal components similar to that one which used in the multiple regression “Least Absolute Shrinkage and Selection: LASS”. The goal here is to make an uncorrelated linear combinations from only a subset of explanatory variables that may have a multicollinearity problem instead taking the whole number say, (K) of them. This shrinkage will force some coefficients to equal zero, after making some restriction on them by some "tuning parameter" say, (t) which balances the bias and variance amount from side, and doesn't exceed the acceptable percent explained variance of these components. This had been shown by MSE criterion in the regression case and the percent explained variance in the principal component case.
Journal of Theoretical and Applied Information Technology is a peer-reviewed electronic research papers & review papers journal with aim of promoting and publishing original high quality research dealing with theoretical and scientific aspects in all disciplines of IT (Informaiton Technology
Data mining is a data analysis process using software to find certain patterns or rules in a large amount of data, which is expected to provide knowledge to support decisions. However, missing value in data mining often leads to a loss of information. The purpose of this study is to improve the performance of data classification with missing values, precisely and accurately. The test method is carried out using the Car Evaluation dataset from the UCI Machine Learning Repository. RStudio and RapidMiner tools were used for testing the algorithm. This study will result in a data analysis of the tested parameters to measure the performance of the algorithm. Using test variations: performance at C5.0, C4.5, and k-NN at 0% missi
... Show MoreThis research aims to choose the appropriate probability distribution to the reliability analysis for an item through collected data for operating and stoppage time of the case study.
Appropriate choice for .probability distribution is when the data look to be on or close the form fitting line for probability plot and test the data for goodness of fit .
Minitab’s 17 software was used for this purpose after arranging collected data and setting it in the the program.
&nb
... Show MoreIn this paper we present the theoretical foundation of forward error analysis of numerical algorithms under;• Approximations in "built-in" functions.• Rounding errors in arithmetic floating-point operations.• Perturbations of data.The error analysis is based on linearization method. The fundamental tools of the forward error analysis are system of linear absolute and relative a prior and a posteriori error equations and associated condition numbers constituting optimal of possible cumulative round – off errors. The condition numbers enable simple general, quantitative bounds definitions of numerical stability. The theoretical results have been applied a Gaussian elimination, and have proved to be very effective means of both a prior
... Show MoreTwitter popularity has increasingly grown in the last few years, influencing life’s social, political, and business aspects. People would leave their tweets on social media about an event, and simultaneously inquire to see other people's experiences and whether they had a positive/negative opinion about that event. Sentiment Analysis can be used to obtain this categorization. Product reviews, events, and other topics from all users that comprise unstructured text comments are gathered and categorized as good, harmful, or neutral using sentiment analysis. Such issues are called polarity classifications. This study aims to use Twitter data about OK cuisine reviews obtained from the Amazon website and compare the effectiveness
... Show MoreWith the freedom offered by the Deep Web, people have the opportunity to express themselves freely and discretely, and sadly, this is one of the reasons why people carry out illicit activities there. In this work, a novel dataset for Dark Web active domains known as crawler-DB is presented. To build the crawler-DB, the Onion Routing Network (Tor) was sampled, and then a web crawler capable of crawling into links was built. The link addresses that are gathered by the crawler are then classified automatically into five classes. The algorithm built in this study demonstrated good performance as it achieved an accuracy of 85%. A popular text representation method was used with the proposed crawler-DB crossed by two different supervise
... Show MoreGovernmental establishments are maintaining historical data for job applicants for future analysis of predication, improvement of benefits, profits, and development of organizations and institutions. In e-government, a decision can be made about job seekers after mining in their information that will lead to a beneficial insight. This paper proposes the development and implementation of an applicant's appropriate job prediction system to suit his or her skills using web content classification algorithms (Logit Boost, j48, PART, Hoeffding Tree, Naive Bayes). Furthermore, the results of the classification algorithms are compared based on data sets called "job classification data" sets. Experimental results indicate
... Show MoreThis review explores the Knowledge Discovery Database (KDD) approach, which supports the bioinformatics domain to progress efficiently, and illustrate their relationship with data mining. Thus, it is important to extract advantages of Data Mining (DM) strategy management such as effectively stressing its role in cost control, which is the principle of competitive intelligence, and the role of it in information management. As well as, its ability to discover hidden knowledge. However, there are many challenges such as inaccurate, hand-written data, and analyzing a large amount of variant information for extracting useful knowledge by using DM strategies. These strategies are successfully applied in several applications as data wa
... Show MoreDue to the increased of information existing on the World Wide Web (WWW), the subject of how to extract new and useful knowledge from the log file has gained big interest among researchers in data mining and knowledge discovery topics.
Web miming, which is a subset of data mining divided into three particular ways, web content mining, web structure mining, web usage mining. This paper is interested in server log file, which is belonging to the third category (web usage mining). This file will be analyzed according to the suggested algorithm to extract the behavior of the user. Knowing the behavior is coming from knowing the complete path which is taken from the specific user.
Extracting these types of knowledge required many of KDD
The estimation of the regular regression model requires several assumptions to be satisfied such as "linearity". One problem occurs by partitioning the regression curve into two (or more) parts and then joining them by threshold point(s). This situation is regarded as a linearity violation of regression. Therefore, the multiphase regression model is received increasing attention as an alternative approach which describes the changing of the behavior of the phenomenon through threshold point estimation. Maximum likelihood estimator "MLE" has been used in both model and threshold point estimations. However, MLE is not resistant against violations such as outliers' existence or in case of the heavy-tailed error distribution. The main goal of t
... Show More