This image may not relate to this project at all. Source: www.childcarseats.com.au. All images, data and R Script can be found here
This is a short homework assignment in DSO_530 Applied Modern Statistical Learning Methods class by professor Robertas Gabrys, USC. I completed this project with two classmates He Liu and Kurshal Bhatia. In this assignment, we compare the predictive power of KNN and Logistic Regression.
Prompt
A child car seat company is interested in understanding what factors contribute to sales for one of its products. They have sales data on a particular model of child car seats at different stores inside and outside the United States.
To simplify the analysis, the company considers sales at a store to be “Satisfactory” if they are able to cover 115% of their costs at that location (i.e., roughly 15% profit) and “Unsatisfactory” if sales cover less than 115% of costs at that location (i.e., less than 15% profit).
The data set consists of 11 variables and 400 observations. Each observation corresponds to one of the stores.
Load data
> carseat.data=read.csv("carseat.txt")>head(carseat.data) Sales CompPrice Income Advertising Population Price ShelveLoc Age Education Urban US11138731127612004217112111148162608306510113111335102698015912114011710044669715514115014164334012803813106112411313501720781601
> logistic_model=glm(Sales~.,data=training_data, family="binomial")>summary(logistic_model)Call:glm(formula = Sales ~ ., family ="binomial", data = training_data)Deviance Residuals: Min 1Q Median 3Q Max -2.1090-0.6056-0.18990.42252.6784Coefficients: Estimate Std. Error z value Pr(>|z|)(Intercept) -1.39131192.0272717-0.6860.492525CompPrice 0.11334530.01837596.1686.91e-10***Income 0.01536230.00619702.4790.013175*Advertising 0.14817290.03918253.7820.000156***Population -0.00029050.0012774-0.2270.820126Price -0.11757690.0150529-7.8115.68e-15***ShelveLoc 2.61331490.49555695.2731.34e-07***Age -0.05682640.0116597-4.8741.09e-06***Education -0.04518320.0641827-0.7040.481447Urban -0.51725750.3888039-1.3300.183393US 0.30590330.50938540.6010.548150---Signif. codes:0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1(Dispersion parameter for binomial family taken to be 1) Null deviance:402.98 on 299 degrees of freedomResidual deviance:222.16 on 289 degrees of freedomAIC:244.16Number of Fisher Scoring iterations:6