Why does linear regression use “vertical” distance to the best-fit-line, instead of actual distance?What is the difference between linear regression on y with x and x with y?Why does linear regression use a cost function based on the vertical distance between the hypothesis and the input data point?Other ways to find line of “best” fitHow to plot the contribution of each regression coefficient in a model, with R?Line of best fit (Linear regression) over vertical lineOther ways to find line of “best” fitBest method of calculating line of best fit / extrapolate to compensate for delaysCoefficient of determination of a orthogonal regressionWhy is linear regression different from PCA?Visualling results from longitudinal mixed model with subtle time by treatment trendsHow do I explain the “line of best fit” in this diagram?Why does linear regression use a cost function based on the vertical distance between the hypothesis and the input data point?Can residuals be calculated from N-point moving averages or just the regression line? Also, what is the standard way to determine regression line?
Should you avoid redundant information after dialogue?
Mistakenly modified `/bin/sh'
What is the closed form of the following recursive function?
How to make "plastic" sounding distored guitar
Doing research in academia and not liking competition
What are some symbols representing peasants/oppressed persons fighting back?
Can I activate an iPhone without an Apple ID?
I gave my characters names that are exactly like another book. Is it a problem?
What are the arguments for California’s nonpartisan blanket primaries other than giving Democrats more power?
Can a Resident Assistant Be Told to Ignore a Lawful Order?
Krazy language in Krazy Kat, 25 July 1936
Reform ger and Chabad programming
Are there any double stars that I can actually see orbit each other?
Align by center of symbol
Why is dry soil hydrophobic? Bad gardener paradox
How long do Apple retain notifications to be pushed to iOS devices until they expire?
Connect neutrals together in 3-gang box (load side) with 3x 3-way switches?
Is a public company able to check out who owns its shares in very detailed format?
How would you write do the dialogues of two characters talking in a chat room?
How can I legally visit the United States Minor Outlying Islands in the Pacific?
Postgresql numeric and decimal is automatically rounding off
Remove intersect line for one circle using venndiagram2sets
Behavior of the zero and negative/sign flags on classic instruction sets
Old short story where the future emperor of the galaxy is taken for a tour around Earth
Why does linear regression use “vertical” distance to the best-fit-line, instead of actual distance?
What is the difference between linear regression on y with x and x with y?Why does linear regression use a cost function based on the vertical distance between the hypothesis and the input data point?Other ways to find line of “best” fitHow to plot the contribution of each regression coefficient in a model, with R?Line of best fit (Linear regression) over vertical lineOther ways to find line of “best” fitBest method of calculating line of best fit / extrapolate to compensate for delaysCoefficient of determination of a orthogonal regressionWhy is linear regression different from PCA?Visualling results from longitudinal mixed model with subtle time by treatment trendsHow do I explain the “line of best fit” in this diagram?Why does linear regression use a cost function based on the vertical distance between the hypothesis and the input data point?Can residuals be calculated from N-point moving averages or just the regression line? Also, what is the standard way to determine regression line?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
Linear regression uses the "vertical" (in two dimensions) distance of (y - ŷ). But this is not the real distance between any point and the best fit line.
I.e. - in the image here:
you use the green lines instead of the purple.
Is this done because the math is simpler? Because the effect of using the real distance is negligible, or equivalent? Because it's actually better to use a "vertical" distance?
regression linear-model
New contributor
David Refaeli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
Linear regression uses the "vertical" (in two dimensions) distance of (y - ŷ). But this is not the real distance between any point and the best fit line.
I.e. - in the image here:
you use the green lines instead of the purple.
Is this done because the math is simpler? Because the effect of using the real distance is negligible, or equivalent? Because it's actually better to use a "vertical" distance?
regression linear-model
New contributor
David Refaeli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
6
$begingroup$
There is such a thing as minimizing perpendicular distance. It is called Deming Regression. Ordinary linear regression assums the x value are known and the only error is in y. That is often a reasonable assumption.
$endgroup$
– Michael Chernick
12 hours ago
1
$begingroup$
Sometimes the ultimate purpose of finding the regression line is to make predictions of $hat Y_i$'s based on future $x_i$'s. (There is a 'prediction interval' formula for that.) Then it is vertical distance that matters.
$endgroup$
– BruceET
12 hours ago
$begingroup$
@MichaelChernick I think your one-liner explained it best, maybe you can elaborate it a bit, and post it as an answer?
$endgroup$
– David Refaeli
12 hours ago
$begingroup$
I think Gung's answer is what I would say elaborating on my comment.
$endgroup$
– Michael Chernick
10 hours ago
$begingroup$
Related: stats.stackexchange.com/questions/63966/…
$endgroup$
– Sycorax
10 hours ago
add a comment |
$begingroup$
Linear regression uses the "vertical" (in two dimensions) distance of (y - ŷ). But this is not the real distance between any point and the best fit line.
I.e. - in the image here:
you use the green lines instead of the purple.
Is this done because the math is simpler? Because the effect of using the real distance is negligible, or equivalent? Because it's actually better to use a "vertical" distance?
regression linear-model
New contributor
David Refaeli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
Linear regression uses the "vertical" (in two dimensions) distance of (y - ŷ). But this is not the real distance between any point and the best fit line.
I.e. - in the image here:
you use the green lines instead of the purple.
Is this done because the math is simpler? Because the effect of using the real distance is negligible, or equivalent? Because it's actually better to use a "vertical" distance?
regression linear-model
regression linear-model
New contributor
David Refaeli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
David Refaeli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 55 mins ago
Peter Mortensen
2032 silver badges8 bronze badges
2032 silver badges8 bronze badges
New contributor
David Refaeli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 12 hours ago
David RefaeliDavid Refaeli
1093 bronze badges
1093 bronze badges
New contributor
David Refaeli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
David Refaeli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
6
$begingroup$
There is such a thing as minimizing perpendicular distance. It is called Deming Regression. Ordinary linear regression assums the x value are known and the only error is in y. That is often a reasonable assumption.
$endgroup$
– Michael Chernick
12 hours ago
1
$begingroup$
Sometimes the ultimate purpose of finding the regression line is to make predictions of $hat Y_i$'s based on future $x_i$'s. (There is a 'prediction interval' formula for that.) Then it is vertical distance that matters.
$endgroup$
– BruceET
12 hours ago
$begingroup$
@MichaelChernick I think your one-liner explained it best, maybe you can elaborate it a bit, and post it as an answer?
$endgroup$
– David Refaeli
12 hours ago
$begingroup$
I think Gung's answer is what I would say elaborating on my comment.
$endgroup$
– Michael Chernick
10 hours ago
$begingroup$
Related: stats.stackexchange.com/questions/63966/…
$endgroup$
– Sycorax
10 hours ago
add a comment |
6
$begingroup$
There is such a thing as minimizing perpendicular distance. It is called Deming Regression. Ordinary linear regression assums the x value are known and the only error is in y. That is often a reasonable assumption.
$endgroup$
– Michael Chernick
12 hours ago
1
$begingroup$
Sometimes the ultimate purpose of finding the regression line is to make predictions of $hat Y_i$'s based on future $x_i$'s. (There is a 'prediction interval' formula for that.) Then it is vertical distance that matters.
$endgroup$
– BruceET
12 hours ago
$begingroup$
@MichaelChernick I think your one-liner explained it best, maybe you can elaborate it a bit, and post it as an answer?
$endgroup$
– David Refaeli
12 hours ago
$begingroup$
I think Gung's answer is what I would say elaborating on my comment.
$endgroup$
– Michael Chernick
10 hours ago
$begingroup$
Related: stats.stackexchange.com/questions/63966/…
$endgroup$
– Sycorax
10 hours ago
6
6
$begingroup$
There is such a thing as minimizing perpendicular distance. It is called Deming Regression. Ordinary linear regression assums the x value are known and the only error is in y. That is often a reasonable assumption.
$endgroup$
– Michael Chernick
12 hours ago
$begingroup$
There is such a thing as minimizing perpendicular distance. It is called Deming Regression. Ordinary linear regression assums the x value are known and the only error is in y. That is often a reasonable assumption.
$endgroup$
– Michael Chernick
12 hours ago
1
1
$begingroup$
Sometimes the ultimate purpose of finding the regression line is to make predictions of $hat Y_i$'s based on future $x_i$'s. (There is a 'prediction interval' formula for that.) Then it is vertical distance that matters.
$endgroup$
– BruceET
12 hours ago
$begingroup$
Sometimes the ultimate purpose of finding the regression line is to make predictions of $hat Y_i$'s based on future $x_i$'s. (There is a 'prediction interval' formula for that.) Then it is vertical distance that matters.
$endgroup$
– BruceET
12 hours ago
$begingroup$
@MichaelChernick I think your one-liner explained it best, maybe you can elaborate it a bit, and post it as an answer?
$endgroup$
– David Refaeli
12 hours ago
$begingroup$
@MichaelChernick I think your one-liner explained it best, maybe you can elaborate it a bit, and post it as an answer?
$endgroup$
– David Refaeli
12 hours ago
$begingroup$
I think Gung's answer is what I would say elaborating on my comment.
$endgroup$
– Michael Chernick
10 hours ago
$begingroup$
I think Gung's answer is what I would say elaborating on my comment.
$endgroup$
– Michael Chernick
10 hours ago
$begingroup$
Related: stats.stackexchange.com/questions/63966/…
$endgroup$
– Sycorax
10 hours ago
$begingroup$
Related: stats.stackexchange.com/questions/63966/…
$endgroup$
– Sycorax
10 hours ago
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Vertical distance is a "real distance". The distance from a given point to any point on the line is a "real distance". The question for how to fit the best regression line is which of the infinite possible distances makes the most sense for how we are thinking about our model. That is, any number of possible loss functions could be right, it depends on our situation, our data, and our goals (it may help you to read my answer to: What is the difference between linear regression on y with x and x with y?).
It is often the case that vertical distances make the most sense, though. This would be the case when we are thinking of $Y$ as a function of $X$, which would make sense in a true experiment where $X$ is randomly assigned and the values are independently manipulated, and $Y$ is measured as a response to that intervention. It can also make sense in a predictive setting, where we want to be able to predict values of $Y$ based on knowledge of $X$ and the predictive relationship that we establish. Then, when we want to make predictions about unknown $Y$ values in the future, we will know and be using $X$. In each of these cases, we are treating $X$ as fixed and known, and that $Y$ is understood to be a function of $X$ in some sense. However, it can be the case that that mental model does not fit your situation, in which case, you would need to use a different loss function. There is no absolute 'correct' distance irrespective of the situation.
$endgroup$
add a comment |
$begingroup$
Summing up Michael Chernick comment and gung answer:
Both vertical and point distances are "real" - it all depends on the situation.
Ordinary linear regression assumes the $X$ value are known and the only error is in the $Y$'s. That is often a reasonable assumption.
If you assume error in the $X$'s as well, you get what is called a Deming regression, which fits a point distance.
New contributor
David Refaeli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
David Refaeli is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f417426%2fwhy-does-linear-regression-use-vertical-distance-to-the-best-fit-line-instead%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Vertical distance is a "real distance". The distance from a given point to any point on the line is a "real distance". The question for how to fit the best regression line is which of the infinite possible distances makes the most sense for how we are thinking about our model. That is, any number of possible loss functions could be right, it depends on our situation, our data, and our goals (it may help you to read my answer to: What is the difference between linear regression on y with x and x with y?).
It is often the case that vertical distances make the most sense, though. This would be the case when we are thinking of $Y$ as a function of $X$, which would make sense in a true experiment where $X$ is randomly assigned and the values are independently manipulated, and $Y$ is measured as a response to that intervention. It can also make sense in a predictive setting, where we want to be able to predict values of $Y$ based on knowledge of $X$ and the predictive relationship that we establish. Then, when we want to make predictions about unknown $Y$ values in the future, we will know and be using $X$. In each of these cases, we are treating $X$ as fixed and known, and that $Y$ is understood to be a function of $X$ in some sense. However, it can be the case that that mental model does not fit your situation, in which case, you would need to use a different loss function. There is no absolute 'correct' distance irrespective of the situation.
$endgroup$
add a comment |
$begingroup$
Vertical distance is a "real distance". The distance from a given point to any point on the line is a "real distance". The question for how to fit the best regression line is which of the infinite possible distances makes the most sense for how we are thinking about our model. That is, any number of possible loss functions could be right, it depends on our situation, our data, and our goals (it may help you to read my answer to: What is the difference between linear regression on y with x and x with y?).
It is often the case that vertical distances make the most sense, though. This would be the case when we are thinking of $Y$ as a function of $X$, which would make sense in a true experiment where $X$ is randomly assigned and the values are independently manipulated, and $Y$ is measured as a response to that intervention. It can also make sense in a predictive setting, where we want to be able to predict values of $Y$ based on knowledge of $X$ and the predictive relationship that we establish. Then, when we want to make predictions about unknown $Y$ values in the future, we will know and be using $X$. In each of these cases, we are treating $X$ as fixed and known, and that $Y$ is understood to be a function of $X$ in some sense. However, it can be the case that that mental model does not fit your situation, in which case, you would need to use a different loss function. There is no absolute 'correct' distance irrespective of the situation.
$endgroup$
add a comment |
$begingroup$
Vertical distance is a "real distance". The distance from a given point to any point on the line is a "real distance". The question for how to fit the best regression line is which of the infinite possible distances makes the most sense for how we are thinking about our model. That is, any number of possible loss functions could be right, it depends on our situation, our data, and our goals (it may help you to read my answer to: What is the difference between linear regression on y with x and x with y?).
It is often the case that vertical distances make the most sense, though. This would be the case when we are thinking of $Y$ as a function of $X$, which would make sense in a true experiment where $X$ is randomly assigned and the values are independently manipulated, and $Y$ is measured as a response to that intervention. It can also make sense in a predictive setting, where we want to be able to predict values of $Y$ based on knowledge of $X$ and the predictive relationship that we establish. Then, when we want to make predictions about unknown $Y$ values in the future, we will know and be using $X$. In each of these cases, we are treating $X$ as fixed and known, and that $Y$ is understood to be a function of $X$ in some sense. However, it can be the case that that mental model does not fit your situation, in which case, you would need to use a different loss function. There is no absolute 'correct' distance irrespective of the situation.
$endgroup$
Vertical distance is a "real distance". The distance from a given point to any point on the line is a "real distance". The question for how to fit the best regression line is which of the infinite possible distances makes the most sense for how we are thinking about our model. That is, any number of possible loss functions could be right, it depends on our situation, our data, and our goals (it may help you to read my answer to: What is the difference between linear regression on y with x and x with y?).
It is often the case that vertical distances make the most sense, though. This would be the case when we are thinking of $Y$ as a function of $X$, which would make sense in a true experiment where $X$ is randomly assigned and the values are independently manipulated, and $Y$ is measured as a response to that intervention. It can also make sense in a predictive setting, where we want to be able to predict values of $Y$ based on knowledge of $X$ and the predictive relationship that we establish. Then, when we want to make predictions about unknown $Y$ values in the future, we will know and be using $X$. In each of these cases, we are treating $X$ as fixed and known, and that $Y$ is understood to be a function of $X$ in some sense. However, it can be the case that that mental model does not fit your situation, in which case, you would need to use a different loss function. There is no absolute 'correct' distance irrespective of the situation.
answered 12 hours ago
gung♦gung
111k34 gold badges272 silver badges543 bronze badges
111k34 gold badges272 silver badges543 bronze badges
add a comment |
add a comment |
$begingroup$
Summing up Michael Chernick comment and gung answer:
Both vertical and point distances are "real" - it all depends on the situation.
Ordinary linear regression assumes the $X$ value are known and the only error is in the $Y$'s. That is often a reasonable assumption.
If you assume error in the $X$'s as well, you get what is called a Deming regression, which fits a point distance.
New contributor
David Refaeli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
Summing up Michael Chernick comment and gung answer:
Both vertical and point distances are "real" - it all depends on the situation.
Ordinary linear regression assumes the $X$ value are known and the only error is in the $Y$'s. That is often a reasonable assumption.
If you assume error in the $X$'s as well, you get what is called a Deming regression, which fits a point distance.
New contributor
David Refaeli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
add a comment |
$begingroup$
Summing up Michael Chernick comment and gung answer:
Both vertical and point distances are "real" - it all depends on the situation.
Ordinary linear regression assumes the $X$ value are known and the only error is in the $Y$'s. That is often a reasonable assumption.
If you assume error in the $X$'s as well, you get what is called a Deming regression, which fits a point distance.
New contributor
David Refaeli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
$endgroup$
Summing up Michael Chernick comment and gung answer:
Both vertical and point distances are "real" - it all depends on the situation.
Ordinary linear regression assumes the $X$ value are known and the only error is in the $Y$'s. That is often a reasonable assumption.
If you assume error in the $X$'s as well, you get what is called a Deming regression, which fits a point distance.
New contributor
David Refaeli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
David Refaeli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
answered 10 hours ago
David RefaeliDavid Refaeli
1093 bronze badges
1093 bronze badges
New contributor
David Refaeli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
David Refaeli is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
add a comment |
David Refaeli is a new contributor. Be nice, and check out our Code of Conduct.
David Refaeli is a new contributor. Be nice, and check out our Code of Conduct.
David Refaeli is a new contributor. Be nice, and check out our Code of Conduct.
David Refaeli is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f417426%2fwhy-does-linear-regression-use-vertical-distance-to-the-best-fit-line-instead%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
6
$begingroup$
There is such a thing as minimizing perpendicular distance. It is called Deming Regression. Ordinary linear regression assums the x value are known and the only error is in y. That is often a reasonable assumption.
$endgroup$
– Michael Chernick
12 hours ago
1
$begingroup$
Sometimes the ultimate purpose of finding the regression line is to make predictions of $hat Y_i$'s based on future $x_i$'s. (There is a 'prediction interval' formula for that.) Then it is vertical distance that matters.
$endgroup$
– BruceET
12 hours ago
$begingroup$
@MichaelChernick I think your one-liner explained it best, maybe you can elaborate it a bit, and post it as an answer?
$endgroup$
– David Refaeli
12 hours ago
$begingroup$
I think Gung's answer is what I would say elaborating on my comment.
$endgroup$
– Michael Chernick
10 hours ago
$begingroup$
Related: stats.stackexchange.com/questions/63966/…
$endgroup$
– Sycorax
10 hours ago