Data Science, Analytics and Big Data discussions

↧

Is anyone of this called supervised learning? and is this a correct way to...

July 1, 2015, 10:12 am

@vinayprakash808 wrote: Say there are two data sets. One is training data sets and other is test data set. Here we need to model the data using training data sets and validate the same model using...

View Article

Feature Engineering with Latitude and Longitude

July 3, 2015, 11:34 am

@mukesh wrote: Hi, I am working on Kaggle crime category prediction problem. Here input variable of dataset are datetime, district, dayofweek, address and geo variables( Latitude and Longitude). I...

View Article

How to implement "pruning" while building CART models in R?

July 6, 2015, 1:46 am

@Ravi wrote: Hello, While reading about ways to avoid and reduce overfitting on our training data while building CART models, I came across the process of pruning which simply removes the nodes which...

View Article

How should we place the clusters in a K-means clustering implementation?

July 8, 2015, 1:11 am

@adityashrm21 wrote: Hello, After deciding the number of clusters we want, how should we place the clusters so that the algorithm converges closest to the global optimum solution? Should we just...

View Article

Image may be NSFW.
Clik here to view.

Does K-means clustering algorithm really finds the global minimum or not?

July 7, 2015, 11:22 pm

@adityashrm21 wrote: Hello, The K-means clustering algorithms uses the square of the Euclidean distance to find the global minimum solution and this problem is not trivial. Does this mean that the...

View Article

How do we decide the number of clusters to use while implementing the k-means...

July 7, 2015, 11:54 pm

@adityashrm21 wrote: Hi, While implementing k-means clustering algorithm in a model, how should we decide the number of clusters that we want to use in the model? I have read that we need to specify...

View Article

How are decision trees not sensitive to Skewed distributions?

July 8, 2015, 12:40 am

@Ravi wrote: Hello, I don't seem to understand the concept that decision trees are insensitive to Skewed distributions.I read that this is because it is a non-parametric method. What do we mean by...

View Article

Image may be NSFW.
Clik here to view.

Should an ideal run of K-means clustering produce evenly distributed points...

July 8, 2015, 3:53 am

@adityashrm21 wrote: Hi, While using the K-means algorithm on a set of points, is it necessary that all the means have evenly distributed points in their clusters? What if the situation like the one...

View Article

Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' is a list, but does not...

July 8, 2015, 5:37 am

@adityashrm21 wrote: Hello, I was trying to implement K-means algorithm with a dataset....

View Article

Difference in performance of the Naive bayes and AODE algorithms

July 13, 2015, 5:21 am

@pravin wrote: Hi, I read that like naive Bayes, AODE does not perform model selection and does not use tuneable parameters. As a result, it has low variance. It predicts class probabilities rather...

View Article

Image may be NSFW.
Clik here to view.

Is it a good practice to remove observations with very less frequency from...

July 14, 2015, 3:47 am

@Aditya_Sharma wrote: Hi, Suppose while exploring some data, I see the histogram of a variable like this one Then is it a good and helpful practice to assign to the observations with very low...

View Article

Books / Websites which provide steps to solve various data science projects?

July 15, 2015, 12:39 am

@Imran wrote: Hi, I am new to data science domain and recently started participating in Data science competitions. During the last 3 competitions, the biggest challenges I faced were lack of practical...

View Article

Ridge regression using glmnet in R

July 13, 2015, 11:42 pm

@mukesh wrote: Hello, I am a little new and learning about ridge and lasso regression. Do we need to pass data only as matrices in the glmnet() function while performing the ridge regression?So in the...

View Article

Importance of error term in linear equation

July 23, 2015, 1:46 am

@pravin wrote: Hi, Recently I have watched a you tube videos on linear regression and it is showing linear equation as y= a+bx+e (error term). Please help me to understand this error term, does this...

View Article

Methods to deal with zero values while performing log transformation of variable

July 23, 2015, 2:13 am

@Steve wrote: Hi, I am working on a data science project in python and while data exploration I have found a feature with skewed distribution. I want to apply log transformation to reduce the skewness...

View Article

How can I create Confusion Matrix in Python?

July 3, 2015, 12:10 am

@mukesh wrote: Hi, I am using naive bayes algorithm to predict probability of different classes of test data set. Now, I want to check the power of model. Should I use confusion matrix or log-loss...

View Article

Best Universitites for Masters in Data Science?

August 16, 2015, 7:37 am

@bhavyaghai wrote: I am passionate about data science and want to pursue masters in data science. Can you please recommend top universities for pursuing Masters/Ph.D. from US or other parts of world ?...

View Article

A Very Good Data Science Course in Python by Harvard

August 23, 2015, 4:07 am

@rohanpota wrote: Lectures and SlidesPage on harvard.eduSlides AssignmentsIntro to Python, Numpy, Matplotlib (Homework 0) (Solutions)Poll Aggregation, Web Scraping, Plotting, Model Evaluation, and...

View Article

I am fresher but want to move into analytics - how,what,when,where do it?

August 29, 2015, 11:29 am

@xtremcurious22 wrote: please clear my doubt Facts before advice: education :btech nit durgapurfresher i know statistics :self-taught strong aptitude and mathematical skill.please clear my doubt how...

View Article

Error while implementing randomForest in R

September 2, 2015, 11:16 am

@adityashrm21 wrote: Hello, I a facing a problem while impementing a randomForest model in R.I am getting an error saying-> Error in randomForest.default(m, y, ...) : Can't have empty classes in y....

View Article