I have a data set in which I want to score clients be 3 be 0 be -3. I have a dataset which contains 200000 observations 0: 192000 3: 600 -3: 200 with 160 explanatory variables including 156 numerical variables which are almost dummy and Only 4 categorical variable.
I was trying to do a regression with randomforest and xgboost I find an R2 of 0.002 very low. for my interpretation I said that by dint of having values of class 0 which present very much almost all of the data that the model is predicting values which are close to 0. So looking at the explanation of R2 is how our model is doing better than the average or looking at the predictions I saw that the model predicts values similar to the average of target which is 0.004 so for that I found a very low value. I started to do classification now but I would like to understand what I found as the value of R2 and why in this case I should not do any regression (is that because the dependent variable is discrete or because I have variables which are almost all dummy) .
I would like to have an explanation of the result that I found and if my reasoning is correct.
3 posts - 2 participants