Channel: Data Science, Analytics and Big Data discussions - Topics tagged data_science

↧

XGBoost only predicts NAN after removing all NANs from the training data in python

May 20, 2018, 7:52 am

≫ Next: Methods to deal with zero values while performing log transformation of variable

≪ Previous: Best open source data analytics tool

@kateb4 wrote:

Hi, I asked a question on StackOverflow, but they did not answer my question, so I decided to try it here.

Hello!
I’m trying to get my code to work, it used to give no errors, until I changed some things in my data and now it’s totally giving no output. It seems like the predictor predicts nan’s which I find strange, as none of the input values are nan’s. This error is raised when I run the xgb.train on a sample of 5000 of the dataset (with over 300000 observations). When I run it on a smaller sample of the dataset, this error does not occur.

The code I ran:

Statadata= pd.read_stata(‘figtemp.dta’)
Statadata = Statadata.drop(Statadata[(Statadata[‘periodf’] == 3) | (Statadata[‘periodf’] == 4)].index)
Statadata = Statadata.drop(Statadata[(Statadata[‘periods’] == 3) | (Statadata[‘periods’] == 4)].index)
Statadata.drop(Statadata[Statadata[‘zcstscoreela’].isnull()].index, inplace=True)
Statadata.drop(Statadata[Statadata[‘zcstscoremath’].isnull()].index, inplace=True)

eng = Statadata[Statadata[‘department’]==‘english’]
eng = eng.drop(eng[eng[‘zcstscoreelaprior’].isnull()].index)

math = Statadata[Statadata[‘department’]==‘math’]
math = math.drop(math[math[‘zcstscoremathprior’].isnull()].index)

y_en_gpa = eng[‘gpatotal’]
y_en_cst = eng[‘zcstscoreela’]
X_en = eng.copy()
del X_en[‘gpatotal’]
del X_en[‘zcstscoremath’]
del X_en[‘zcstscoreela’]
del X_en[‘pareduccode’]
del X_en[‘cstscoreela’]
del X_en[‘cstscoremath’]

y_math_gpa = math[‘gpatotal’]
y_math_cst = math[‘zcstscoremath’]
X_math = math.copy()
del X_math[‘gpatotal’]
del X_math[‘zcstscoremath’]
del X_math[‘zcstscoreela’]
del X_math[‘pareduccode’]
del X_math[‘cstscoreela’]
del X_math[‘cstscoremath’]

english:

deleting the columns and rows with missing values:

missing_en=X_en.isnull().sum()
missingbool_en=missing_en<25
selected_en=X_en.columns[missingbool_en]
selected_en=X_en[selected_en]
selected_en=selected_en.dropna(0)
y_en_cst=y_en_cst[selected_en.index]
y_en_gpa=y_en_gpa[selected_en.index]

math:

deleting the columns and rows with missing values:

missing_math=X_math.isnull().sum()
missingbool_math=missing_math<25
selected_math=X_math.columns[missingbool_math]
selected_math=X_math[selected_math]
selected_math=selected_math.dropna(0)
y_math_cst=y_math_cst[selected_math.index]
y_math_gpa=y_math_gpa[selected_math.index]

columns_to_overwrite = [‘department’, ‘crsnamef’, ‘markf’, ‘crsnames’, ‘marks’, ‘cstlevelela’, ‘cstlevelmath’, ‘status’, ‘grade’, ‘gpaavg’]
columns_to_overwrite2 = [ ‘markf’, ‘crsnames’, ‘marks’, ‘cstlevelela’, ‘cstlevelmath’, ‘status’, ‘grade’]

new_en=pd.get_dummies(selected_en[‘crsnamef’])
for i in columns_to_overwrite2:
nieuw_en=pd.get_dummies(selected_en[i])
new_en=new_en.merge(nieuw_en, left_index=True, right_index=True, suffixes=[’_1’,’_2’])

selected_en=selected_en.drop(labels=columns_to_overwrite, axis=“columns”)
selected_en=new_en.merge(selected_en,left_index=True, right_index=True)

math:

Creating the dummy variables for the categorical string variables

new_math=pd.get_dummies(selected_math[‘crsnamef’])
for i in columns_to_overwrite2:
nieuw_math=pd.get_dummies(selected_math[i])
new_math=new_math.merge(nieuw_math, left_index=True, right_index=True, suffixes=[’_1’,’_2’])

selected_math=selected_math.drop(labels=columns_to_overwrite, axis=“columns”)
selected_math=new_math.merge(selected_math,left_index=True, right_index=True)

X_train_math_gpa, X_test_math_gpa, y_train_math_gpa, y_test_math_gpa = train_test_split(selected_math, y_math_gpa, random_state=4)
X_train_math_cst, X_test_math_cst, y_train_math_cst, y_test_math_cst = train_test_split(selected_math, y_math_cst, random_state=4)

paramstest2 = {
‘max_depth’: 8,
‘min_child_weight’: 3,
‘gamma’: 0.4,
‘subsample’: 0.7,
‘colsample_bytree’: 0.7,
}
data_train = xgb.DMatrix(X_train_math_gpa, label=y_train_math_gpa)
data_test = xgb.DMatrix(X_test_math_gpa, label=y_test_math_gpa)

model=xgb.train(paramstest2, data_train, 5000, evals=[(data_test, “test”)], verbose_eval=100, early_stopping_rounds=50)

it gives me the following error:
[13:24:16] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 26 extra nodes, 6 pruned nodes, max_depth=6
[0] test-rmse:nan
Will train until test-rmse hasn’t improved in 50 rounds.
[13:24:16] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 66 extra nodes, 2 pruned nodes, max_depth=8
[13:24:16] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 36 extra nodes, 46 pruned nodes, max_depth=8
[13:24:16] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 16 extra nodes, 44 pruned nodes, max_depth=6
[13:24:16] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 24 extra nodes, 92 pruned nodes, max_depth=7
[13:24:16] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 20 extra nodes, 80 pruned nodes, max_depth=7
[13:24:17] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 50 pruned nodes, max_depth=4
[13:24:17] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 92 pruned nodes, max_depth=5
[13:24:17] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 102 pruned nodes, max_depth=5
[13:24:17] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 112 pruned nodes, max_depth=5
[13:24:17] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6

…

[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 170 pruned nodes, max_depth=0
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 206 pruned nodes, max_depth=0
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 160 pruned nodes, max_depth=0
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 178 pruned nodes, max_depth=0
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 142 pruned nodes, max_depth=3
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 154 pruned nodes, max_depth=0
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 188 pruned nodes, max_depth=0
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 150 pruned nodes, max_depth=0
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 160 pruned nodes, max_depth=0
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 166 pruned nodes, max_depth=0
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 182 pruned nodes, max_depth=0
Traceback (most recent call last):
File “”, line 1, in
File “/Users/catlinbruys/PycharmProjects/Bachelor_Thesis/venv/lib/python3.6/site-packages/xgboost/training.py”, line 204, in train
xgb_model=xgb_model, callbacks=callbacks)
File “/Users/catlinbruys/PycharmProjects/Bachelor_Thesis/venv/lib/python3.6/site-packages/xgboost/training.py”, line 99, in _train_internal
evaluation_result_list=evaluation_result_list))
File “/Users/catlinbruys/PycharmProjects/Bachelor_Thesis/venv/lib/python3.6/site-packages/xgboost/callback.py”, line 247, in callback
best_msg = state[‘best_msg’]
KeyError: ‘best_msg’

What can I do to solve this problem? I really need a solution as it is for a very important project. Thanks

Posts: 4

Participants: 2

Read full topic

↧

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

September 22, 2019, 11:40 pm

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

February 16, 2017, 4:24 pm

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

January 5, 2014, 10:34 pm

Ominde Commission Report and Recommendations – Ominde Report of 1964

March 16, 2015, 5:14 am

Bureau of Internal Revenue: Regional Offices (Directory)

January 9, 2014, 11:06 pm

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

March 26, 2017, 11:23 pm

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

October 17, 2016, 7:20 am

Mp3 Download: Mdu - Kunjenjenjena

December 7, 2017, 8:16 am

How the kill the job , when DTP request running for long hours.

July 26, 2013, 2:41 am

Microsoft Intune から展開しているアプリのアップデートについて

October 17, 2016, 4:11 am

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

September 1, 2017, 10:00 pm

Car crash in Dunton Bassett leaves driver in critical condition

October 7, 2014, 7:51 am

Macky 2, Two Others In Road Accident

March 29, 2015, 5:34 am

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

May 14, 2015, 11:27 pm

Detroit mafia: D’Anna Brothers agree to plea deal

April 21, 2016, 6:56 am

Delivery block field greyed out using VA02

January 26, 2016, 2:52 pm

Muloraki Au

June 22, 2016, 1:44 am

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

October 12, 2017, 2:23 pm

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

February 9, 2018, 4:56 am

FIAT 500 B0111 B0112

July 5, 2018, 10:31 am

© 2025 //www.rssing.com