Please use this identifier to cite or link to this item:
Title: Advanced Predictive Modeling on Real Data by Modern Imputation Techniques
Authors: Ali P., D.
Keywords: Mathematics and Computing
Issue Date: 2016
Series/Report no.: DI-79;
Abstract: In this project we analyze real data and create models to predict the target variable. Since data are real, it may have missing values and outliers. We perform advanced data cleaning and imputation techniques to deal with missing values and outliers. We give more preference to find the parameters which effect our target variable. In this regard we have used logistic regression to obtain statistical inference about the quantity of interest. Apart from logistic regression we create decision tree and random forest models to classify our target variable. We perform linear regression imputation, decision tree imputation and multiple imputation as our major imputation techniques. Since we deal with real data, the most important and difficult problems we encounter are imputation and data cleaning. We perform exploratory data analysis and establish the hypothesis of data, check the multi collinearity, diagnose outliers and make inference for missing data. Then we create training and validation data sets. Finally we fit our model on training data set. After this we perform validation of our model by ROC curve and confusion matrix. We obtain null deviance, residual deviance and AIC values for our logistic regression model to measure its performance. We also perform ANOVA test on our logistic regression model and based on obtained results we have made further analysis.
Appears in Collections:11. MC

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.