Is anyone of this called supervised learning? and is this a correct way to...
@vinayprakash808 wrote: Say there are two data sets. One is training data sets and other is test data set. Here we need to model the data using training data sets and validate the same model using...
View ArticleFeature Engineering with Latitude and Longitude
@mukesh wrote: Hi, I am working on Kaggle crime category prediction problem. Here input variable of dataset are datetime, district, dayofweek, address and geo variables( Latitude and Longitude). I...
View ArticleHow to implement "pruning" while building CART models in R?
@Ravi wrote: Hello, While reading about ways to avoid and reduce overfitting on our training data while building CART models, I came across the process of pruning which simply removes the nodes which...
View ArticleHow should we place the clusters in a K-means clustering implementation?
@adityashrm21 wrote: Hello, After deciding the number of clusters we want, how should we place the clusters so that the algorithm converges closest to the global optimum solution? Should we just...
View ArticleDoes K-means clustering algorithm really finds the global minimum or not?
@adityashrm21 wrote: Hello, The K-means clustering algorithms uses the square of the Euclidean distance to find the global minimum solution and this problem is not trivial. Does this mean that the...
View ArticleHow do we decide the number of clusters to use while implementing the k-means...
@adityashrm21 wrote: Hi, While implementing k-means clustering algorithm in a model, how should we decide the number of clusters that we want to use in the model? I have read that we need to specify...
View ArticleHow are decision trees not sensitive to Skewed distributions?
@Ravi wrote: Hello, I don't seem to understand the concept that decision trees are insensitive to Skewed distributions.I read that this is because it is a non-parametric method. What do we mean by...
View ArticleShould an ideal run of K-means clustering produce evenly distributed points...
@adityashrm21 wrote: Hi, While using the K-means algorithm on a set of points, is it necessary that all the means have evenly distributed points in their clusters? What if the situation like the one...
View ArticleError in xy.coords(x, y, xlabel, ylabel, log) : 'x' is a list, but does not...
@adityashrm21 wrote: Hello, I was trying to implement K-means algorithm with a dataset....
View ArticleDifference in performance of the Naive bayes and AODE algorithms
@pravin wrote: Hi, I read that like naive Bayes, AODE does not perform model selection and does not use tuneable parameters. As a result, it has low variance. It predicts class probabilities rather...
View ArticleIs it a good practice to remove observations with very less frequency from...
@Aditya_Sharma wrote: Hi, Suppose while exploring some data, I see the histogram of a variable like this one Then is it a good and helpful practice to assign to the observations with very low...
View ArticleBooks / Websites which provide steps to solve various data science projects?
@Imran wrote: Hi, I am new to data science domain and recently started participating in Data science competitions. During the last 3 competitions, the biggest challenges I faced were lack of practical...
View ArticleRidge regression using glmnet in R
@mukesh wrote: Hello, I am a little new and learning about ridge and lasso regression. Do we need to pass data only as matrices in the glmnet() function while performing the ridge regression?So in the...
View ArticleImportance of error term in linear equation
@pravin wrote: Hi, Recently I have watched a you tube videos on linear regression and it is showing linear equation as y= a+bx+e (error term). Please help me to understand this error term, does this...
View ArticleMethods to deal with zero values while performing log transformation of variable
@Steve wrote: Hi, I am working on a data science project in python and while data exploration I have found a feature with skewed distribution. I want to apply log transformation to reduce the skewness...
View ArticleHow can I create Confusion Matrix in Python?
@mukesh wrote: Hi, I am using naive bayes algorithm to predict probability of different classes of test data set. Now, I want to check the power of model. Should I use confusion matrix or log-loss...
View ArticleBest Universitites for Masters in Data Science?
@bhavyaghai wrote: I am passionate about data science and want to pursue masters in data science. Can you please recommend top universities for pursuing Masters/Ph.D. from US or other parts of world ?...
View ArticleA Very Good Data Science Course in Python by Harvard
@rohanpota wrote: Lectures and SlidesPage on harvard.eduSlides AssignmentsIntro to Python, Numpy, Matplotlib (Homework 0) (Solutions)Poll Aggregation, Web Scraping, Plotting, Model Evaluation, and...
View ArticleI am fresher but want to move into analytics - how,what,when,where do it?
@xtremcurious22 wrote: please clear my doubt Facts before advice: education :btech nit durgapurfresher i know statistics :self-taught strong aptitude and mathematical skill.please clear my doubt how...
View ArticleError while implementing randomForest in R
@adityashrm21 wrote: Hello, I a facing a problem while impementing a randomForest model in R.I am getting an error saying-> Error in randomForest.default(m, y, ...) : Can't have empty classes in y....
View Article