Quantcast
Channel: Data Science, Analytics and Big Data discussions - Topics tagged data_science
Viewing all articles
Browse latest Browse all 787

Is linear regression fit for this data

$
0
0

@shounakrockz47 wrote:

I am predicting the number of vehicles in 4 traffic junctions.

So, I have following columns in my dataset :

  1. DateTime
  2. Junction_ID
  3. Number_of_vehicles

At the first glance, this problem may look like Time series regression. But, the data given seems like Linear Regression problem.

So, I have applied linear regression in the following manner :

  • Used get_dummies extensively for all the columns. I used dummy variables for 31 days,24 hours, 7 days of weeks and 4 Junction Ids.

  • Then applied Linear Regression model in following way :

         from sklearn.model_selection import train_test_split
    
          x_train, x_test, y_train, y_test = train_test_split(train_data,train_vehicles)
    
      clf.fit(x_train,y_train)
    
      import math
    
      pred=clf.predict(x_test)
    
      pred.shape #got result as (12030,)
    
      result = []
      for x in pred:
      result.append(math.ceil(x))
    
      from sklearn.metrics import mean_squared_error
    
      score=mean_squared_error(y_test, result)
      rmse=math.sqrt(score)
      print('RMSE is :', rmse)
    

I am getting RMSE value as 10.636853077462394

My questions are :

  • Since RMSE value is on lower side , can I say this model is decent ?

  • Is there any other approach which I can use on this dataset ?

  • Do I need to check for colinearity ?

  • How can I check if multiple variables are interrelated ?

  • Should I go for non-linear regression on this dataset ?

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 787

Trending Articles