- Home /

Hi ,

I had the understanding that the major difference between machine learning and statistical model is, the later "assumes" certain type of distribution of data & based on that different model paradigm as well as statistical results we obtain (e.g. p-values, F-statistics, t-stat, etc.). But in case of machine learning, we don't bother about distribution of data and more interested in prediction.

When I was going through Mllib doc, I found for linear regression we are specifying a distribution. But Mllib is a machine learning package. So, I've the following questions:

1) Is my understanding between ML & statistical method is wrong?

2) Is spark is using statistical modeling for linear regression and GLMs?

Thanks!

Comment

**Answer** by srowen
·
Apr 12, 2019 at 06:40 PM

That's a very broad question, and I wouldn't say that ML doesn't make assumptions about distributions. For example, you assume you test/train data follow the same distribution. Using regularization means making assumptions about the prior distribution of coefficients. Any regressor minimizing squared error assumes there's a model it can represent such that errors around the prediction are Gaussian.

I suppose I wouldn't agree with this stats vs ML distinction then.

**Answer** by Velma Dennis
·
Jul 30 at 10:51 AM

Contrary to popular belief, machine learning has been around for several decades. It was initially shunned due to its large computational requirements and the limitations of computing power present at the time. However, machine learning https://edubirdie.com/thesis-statement-generator has seen a revival in recent years due to the preponderance of data stemming from the information explosion.

So, if machine learning and statistics are synonymous with one another, why are we not seeing every statistics department in every university closing down or transitioning to being a ‘machine learning’ department? Because they are not the same!

**Answer** by Antonio McMillan
·
Jul 30 at 10:59 AM

These are different models that work with different frequencies. For example, static text generators use different combination algorithms. You can choose different processors for training and try different generation algorithms. I like this site because the code logic works smoothly and students are willing to pay for this content.

**Answer** by BillyKemp
·
Jul 30 at 11:11 AM

*The major difference between machine learning and statistics is their purpose. Machine learning models are designed to make the most accurate predictions possible. Statistical models are designed for inference about the relationships between variables*

**Answer** by hussain
·
Aug 05 at 10:10 AM

**Linear regressio**n is from the field of statistics and is studied as a model for understanding the relationship between input and output numerical variables, but has been borrowed by machine learning. It is both a statistical algorithm and a machine learning algorithm.

I was going through this article of linear regression and found out they actually did mention about it and maybe it is right up to a certain point

Databricks Inc.

160 Spear Street, 13th Floor

San Francisco, CA 94105

info@databricks.com

1-866-330-0121

- Anonymous
- Sign in
- Create
- Ask a question
- Create an article
- Explore
- Topics
- Questions
- Articles
- Badges