Alternatives to minimizing loss in regressionHow to design and implement an asymmetric loss function for regression?Difference between Random Forest and MARTDesigning Asymmetric regression (assymettric loss for regression)Minimizing symmetric mean absolute percentage error (SMAPE)Is minimizing squared error equivalent to minimizing absolute error? Why squared error is more popular than the latter?When is logistic regression minimizing under squared error loss the same as maximizing binomial likelihood?Alternatives to Regression AnalysisHow does one compare the statistical performance of different models on a regression or function approximation task over different data sets?Loss function of linear regressionNeural Networks - Difference between 1 and 2 layers?
Why does the Eurostar not show youth pricing?
8086 stack segment and avoiding overflow in interrupts
Blank spaces in a font
Why did House of Representatives need to condemn Trumps Tweets?
Tikzpicture doesn't display correctly
If you inherit a Roth 401(k), is it taxed?
Can a US President, after impeachment and removal, be re-elected or re-appointed?
Complaints from (junior) developers against solution architects: how can we show the benefits of our work and improve relationships?
Is it okay for me to decline a project on ethical grounds?
What is the German equivalent of the proverb 水清ければ魚棲まず (if the water is clear, fish won't live there)?
In syntax, why cannot we say things like "he took walked at the park"? but can say "he took a walk at the park"?
What would the United Kingdom's "optimal" Brexit deal look like?
Do the books ever say oliphaunts aren’t elephants?
How can Paypal know my card is being used in another account?
Self-deportation of American Citizens from US
Just how much information should you share with a former client?
Narset, Parter of Veils interaction with Aria of Flame
How does a poisoned arrow combine with the spell Conjure Barrage?
Why did Windows 95 crash the whole system but newer Windows only crashed programs?
Why did some Apollo missions carry a grenade launcher?
Should I intervene when a colleague in a different department makes students run laps as part of their grade?
Why would a personal invisible shield be necessary?
Story about separate individuals coming together to make some supernatural universe power
Why does aggregate initialization not work anymore since C++20 if a constructor is explicitly defaulted or deleted?
Alternatives to minimizing loss in regression
How to design and implement an asymmetric loss function for regression?Difference between Random Forest and MARTDesigning Asymmetric regression (assymettric loss for regression)Minimizing symmetric mean absolute percentage error (SMAPE)Is minimizing squared error equivalent to minimizing absolute error? Why squared error is more popular than the latter?When is logistic regression minimizing under squared error loss the same as maximizing binomial likelihood?Alternatives to Regression AnalysisHow does one compare the statistical performance of different models on a regression or function approximation task over different data sets?Loss function of linear regressionNeural Networks - Difference between 1 and 2 layers?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
$begingroup$
We know that loss (error) minimization originated with Gauss in the early 18th c.
More recently Friedman* extolled its virtues for use in predictive modeling:
“The aim of regression analysis is to use the data to construct a
function f(x) that can serve as a reasonable approximation of f(x)
over the domain D of interest. The notion of reasonableness depends on
the purpose for which the approximation is to be used. In nearly all
applications however accuracy is important...if the sole purpose of
the regression analysis is to obtain a rule for predicting future
values of the response y, given values for the covariates (x1,...,xn),
then accuracy is the only important virtue of the model...”
But is accuracy the only important virtue of a model?
My question concerns whether or not loss minimization is the only way to evaluate a regression function whether OLS, ML, gradient descent, quadratic or least absolute deviation, whatever.
Clearly, it's possible to evaluate multiple metrics beyond loss a posteriori, i.e., after models have been built. But are there multivariate functions which evaluate multiple metrics a priori or during the optimization phase?
For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?
Are such functions available and/or have these issues been research
*Friedman, J.H., Multivariate Adaptive Regression Splines, The Annals of Statistics, 1991, vol. 19,No. 1 (March), pp. 1-67.
regression error
$endgroup$
add a comment |
$begingroup$
We know that loss (error) minimization originated with Gauss in the early 18th c.
More recently Friedman* extolled its virtues for use in predictive modeling:
“The aim of regression analysis is to use the data to construct a
function f(x) that can serve as a reasonable approximation of f(x)
over the domain D of interest. The notion of reasonableness depends on
the purpose for which the approximation is to be used. In nearly all
applications however accuracy is important...if the sole purpose of
the regression analysis is to obtain a rule for predicting future
values of the response y, given values for the covariates (x1,...,xn),
then accuracy is the only important virtue of the model...”
But is accuracy the only important virtue of a model?
My question concerns whether or not loss minimization is the only way to evaluate a regression function whether OLS, ML, gradient descent, quadratic or least absolute deviation, whatever.
Clearly, it's possible to evaluate multiple metrics beyond loss a posteriori, i.e., after models have been built. But are there multivariate functions which evaluate multiple metrics a priori or during the optimization phase?
For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?
Are such functions available and/or have these issues been research
*Friedman, J.H., Multivariate Adaptive Regression Splines, The Annals of Statistics, 1991, vol. 19,No. 1 (March), pp. 1-67.
regression error
$endgroup$
$begingroup$
I think your question is missing a crucial detail. Friedman's statement has the "if ... then" format. You ask about the "then" portion, but never state whether you are accepting the "if" portion as a premise. That is, are we assuming for the purpose of writing an answer as given that "the sole purpose of the regression analysis is to obtain a rule for predicting future values of the response $y$, given values for the covariates $(x_1,dots ,x_n)$"? Because a model might forego that premise in exchange for, e.g., interpretability of some phenomenon.
$endgroup$
– Sycorax
9 hours ago
$begingroup$
@Sycorax Apologies for any ambiguity. The question is more general than Friedman's "if". To your point, I'm interested in learning about 'model(s) that might forego that premise in exchange for, e.g., interpretability..." Thx.
$endgroup$
– user332577
6 hours ago
add a comment |
$begingroup$
We know that loss (error) minimization originated with Gauss in the early 18th c.
More recently Friedman* extolled its virtues for use in predictive modeling:
“The aim of regression analysis is to use the data to construct a
function f(x) that can serve as a reasonable approximation of f(x)
over the domain D of interest. The notion of reasonableness depends on
the purpose for which the approximation is to be used. In nearly all
applications however accuracy is important...if the sole purpose of
the regression analysis is to obtain a rule for predicting future
values of the response y, given values for the covariates (x1,...,xn),
then accuracy is the only important virtue of the model...”
But is accuracy the only important virtue of a model?
My question concerns whether or not loss minimization is the only way to evaluate a regression function whether OLS, ML, gradient descent, quadratic or least absolute deviation, whatever.
Clearly, it's possible to evaluate multiple metrics beyond loss a posteriori, i.e., after models have been built. But are there multivariate functions which evaluate multiple metrics a priori or during the optimization phase?
For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?
Are such functions available and/or have these issues been research
*Friedman, J.H., Multivariate Adaptive Regression Splines, The Annals of Statistics, 1991, vol. 19,No. 1 (March), pp. 1-67.
regression error
$endgroup$
We know that loss (error) minimization originated with Gauss in the early 18th c.
More recently Friedman* extolled its virtues for use in predictive modeling:
“The aim of regression analysis is to use the data to construct a
function f(x) that can serve as a reasonable approximation of f(x)
over the domain D of interest. The notion of reasonableness depends on
the purpose for which the approximation is to be used. In nearly all
applications however accuracy is important...if the sole purpose of
the regression analysis is to obtain a rule for predicting future
values of the response y, given values for the covariates (x1,...,xn),
then accuracy is the only important virtue of the model...”
But is accuracy the only important virtue of a model?
My question concerns whether or not loss minimization is the only way to evaluate a regression function whether OLS, ML, gradient descent, quadratic or least absolute deviation, whatever.
Clearly, it's possible to evaluate multiple metrics beyond loss a posteriori, i.e., after models have been built. But are there multivariate functions which evaluate multiple metrics a priori or during the optimization phase?
For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?
Are such functions available and/or have these issues been research
*Friedman, J.H., Multivariate Adaptive Regression Splines, The Annals of Statistics, 1991, vol. 19,No. 1 (March), pp. 1-67.
regression error
regression error
asked 9 hours ago
user332577user332577
364 bronze badges
364 bronze badges
$begingroup$
I think your question is missing a crucial detail. Friedman's statement has the "if ... then" format. You ask about the "then" portion, but never state whether you are accepting the "if" portion as a premise. That is, are we assuming for the purpose of writing an answer as given that "the sole purpose of the regression analysis is to obtain a rule for predicting future values of the response $y$, given values for the covariates $(x_1,dots ,x_n)$"? Because a model might forego that premise in exchange for, e.g., interpretability of some phenomenon.
$endgroup$
– Sycorax
9 hours ago
$begingroup$
@Sycorax Apologies for any ambiguity. The question is more general than Friedman's "if". To your point, I'm interested in learning about 'model(s) that might forego that premise in exchange for, e.g., interpretability..." Thx.
$endgroup$
– user332577
6 hours ago
add a comment |
$begingroup$
I think your question is missing a crucial detail. Friedman's statement has the "if ... then" format. You ask about the "then" portion, but never state whether you are accepting the "if" portion as a premise. That is, are we assuming for the purpose of writing an answer as given that "the sole purpose of the regression analysis is to obtain a rule for predicting future values of the response $y$, given values for the covariates $(x_1,dots ,x_n)$"? Because a model might forego that premise in exchange for, e.g., interpretability of some phenomenon.
$endgroup$
– Sycorax
9 hours ago
$begingroup$
@Sycorax Apologies for any ambiguity. The question is more general than Friedman's "if". To your point, I'm interested in learning about 'model(s) that might forego that premise in exchange for, e.g., interpretability..." Thx.
$endgroup$
– user332577
6 hours ago
$begingroup$
I think your question is missing a crucial detail. Friedman's statement has the "if ... then" format. You ask about the "then" portion, but never state whether you are accepting the "if" portion as a premise. That is, are we assuming for the purpose of writing an answer as given that "the sole purpose of the regression analysis is to obtain a rule for predicting future values of the response $y$, given values for the covariates $(x_1,dots ,x_n)$"? Because a model might forego that premise in exchange for, e.g., interpretability of some phenomenon.
$endgroup$
– Sycorax
9 hours ago
$begingroup$
I think your question is missing a crucial detail. Friedman's statement has the "if ... then" format. You ask about the "then" portion, but never state whether you are accepting the "if" portion as a premise. That is, are we assuming for the purpose of writing an answer as given that "the sole purpose of the regression analysis is to obtain a rule for predicting future values of the response $y$, given values for the covariates $(x_1,dots ,x_n)$"? Because a model might forego that premise in exchange for, e.g., interpretability of some phenomenon.
$endgroup$
– Sycorax
9 hours ago
$begingroup$
@Sycorax Apologies for any ambiguity. The question is more general than Friedman's "if". To your point, I'm interested in learning about 'model(s) that might forego that premise in exchange for, e.g., interpretability..." Thx.
$endgroup$
– user332577
6 hours ago
$begingroup$
@Sycorax Apologies for any ambiguity. The question is more general than Friedman's "if". To your point, I'm interested in learning about 'model(s) that might forego that premise in exchange for, e.g., interpretability..." Thx.
$endgroup$
– user332577
6 hours ago
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
But is accuracy the only important virtue of a model?
The practical aspects of what a model's for is too nuanced for a theoretical discussion. Interpretation and generalizability come to mind. "Who will use this model?" should be a top line question in all statistical analyses.
Friedman's statement is defensible in a classical statistics framework: we have no fundamental reason to object to black-box prediction. If you want a Y-hat that's going to be very close to the Y you observe in the future, then "build your model as big as a house" as John Tukey would say.
This does not excuse overfitting, unless your question is ill defined [plural "you" being the proverbial statistician]. Overfitting is too often the result of analysts who encounter data rather than approach it. By going through hundreds of models and picking "the best", you are prone to build models that lack generality. We see it all the time.
Inadvertantly you also ask a different question: "Alternatives to minimizing loss in regression". Minimax estimators, estimators that minimize a loss function, are a subset of the class of method of moments estimators: estimators that give a zero to an estimating function. The goal of the MOMs is to find an unbiased estimator, rather than one that minimizes a loss.
$endgroup$
1
$begingroup$
Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
$endgroup$
– user332577
6 hours ago
$begingroup$
@user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
$endgroup$
– AdamO
6 hours ago
$begingroup$
That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
$endgroup$
– user332577
4 hours ago
add a comment |
$begingroup$
Rational choice theory says that any rational preference can be modeled with a utility function.
Therefore any (rational) decision process can be encoded in a loss function and posed as an optimization problem.
For example, L1 and L2 regularization can be viewed as encoding a preference for smaller parameters or more parsimonious models into the loss function. Any preference can be similarly encoded, assuming it's not irrational.
For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?
Then you would adjust your utility function to include a term penalizing those things, just as we did for L1/L2 regularization.
Now, this might make the problem computationally intractable; for example, 0/1 loss is known to result in an NP-hard problem. In general, people prefer to study tractable problems so you won't find much off-the-shelf software that does this, but nothing is stopping you from writing down such a function and applying some sufficiently generalized optimizer to it.
If you retort that you have a preference in mind which cannot be modeled by a loss function, even in principle, then all I can say is that such a preference is irrational. Don't blame me, that's just modus tollens from the above theorem. You are free to have such a preference (there is good empirical evidence that most preferences that people actually hold are irrational in one way or another) but you will not find much literature on solving such problems in a formal regression context.
$endgroup$
$begingroup$
"People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
$endgroup$
– user332577
6 hours ago
$begingroup$
It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
$endgroup$
– olooney
5 hours ago
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f419864%2falternatives-to-minimizing-loss-in-regression%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
But is accuracy the only important virtue of a model?
The practical aspects of what a model's for is too nuanced for a theoretical discussion. Interpretation and generalizability come to mind. "Who will use this model?" should be a top line question in all statistical analyses.
Friedman's statement is defensible in a classical statistics framework: we have no fundamental reason to object to black-box prediction. If you want a Y-hat that's going to be very close to the Y you observe in the future, then "build your model as big as a house" as John Tukey would say.
This does not excuse overfitting, unless your question is ill defined [plural "you" being the proverbial statistician]. Overfitting is too often the result of analysts who encounter data rather than approach it. By going through hundreds of models and picking "the best", you are prone to build models that lack generality. We see it all the time.
Inadvertantly you also ask a different question: "Alternatives to minimizing loss in regression". Minimax estimators, estimators that minimize a loss function, are a subset of the class of method of moments estimators: estimators that give a zero to an estimating function. The goal of the MOMs is to find an unbiased estimator, rather than one that minimizes a loss.
$endgroup$
1
$begingroup$
Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
$endgroup$
– user332577
6 hours ago
$begingroup$
@user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
$endgroup$
– AdamO
6 hours ago
$begingroup$
That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
$endgroup$
– user332577
4 hours ago
add a comment |
$begingroup$
But is accuracy the only important virtue of a model?
The practical aspects of what a model's for is too nuanced for a theoretical discussion. Interpretation and generalizability come to mind. "Who will use this model?" should be a top line question in all statistical analyses.
Friedman's statement is defensible in a classical statistics framework: we have no fundamental reason to object to black-box prediction. If you want a Y-hat that's going to be very close to the Y you observe in the future, then "build your model as big as a house" as John Tukey would say.
This does not excuse overfitting, unless your question is ill defined [plural "you" being the proverbial statistician]. Overfitting is too often the result of analysts who encounter data rather than approach it. By going through hundreds of models and picking "the best", you are prone to build models that lack generality. We see it all the time.
Inadvertantly you also ask a different question: "Alternatives to minimizing loss in regression". Minimax estimators, estimators that minimize a loss function, are a subset of the class of method of moments estimators: estimators that give a zero to an estimating function. The goal of the MOMs is to find an unbiased estimator, rather than one that minimizes a loss.
$endgroup$
1
$begingroup$
Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
$endgroup$
– user332577
6 hours ago
$begingroup$
@user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
$endgroup$
– AdamO
6 hours ago
$begingroup$
That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
$endgroup$
– user332577
4 hours ago
add a comment |
$begingroup$
But is accuracy the only important virtue of a model?
The practical aspects of what a model's for is too nuanced for a theoretical discussion. Interpretation and generalizability come to mind. "Who will use this model?" should be a top line question in all statistical analyses.
Friedman's statement is defensible in a classical statistics framework: we have no fundamental reason to object to black-box prediction. If you want a Y-hat that's going to be very close to the Y you observe in the future, then "build your model as big as a house" as John Tukey would say.
This does not excuse overfitting, unless your question is ill defined [plural "you" being the proverbial statistician]. Overfitting is too often the result of analysts who encounter data rather than approach it. By going through hundreds of models and picking "the best", you are prone to build models that lack generality. We see it all the time.
Inadvertantly you also ask a different question: "Alternatives to minimizing loss in regression". Minimax estimators, estimators that minimize a loss function, are a subset of the class of method of moments estimators: estimators that give a zero to an estimating function. The goal of the MOMs is to find an unbiased estimator, rather than one that minimizes a loss.
$endgroup$
But is accuracy the only important virtue of a model?
The practical aspects of what a model's for is too nuanced for a theoretical discussion. Interpretation and generalizability come to mind. "Who will use this model?" should be a top line question in all statistical analyses.
Friedman's statement is defensible in a classical statistics framework: we have no fundamental reason to object to black-box prediction. If you want a Y-hat that's going to be very close to the Y you observe in the future, then "build your model as big as a house" as John Tukey would say.
This does not excuse overfitting, unless your question is ill defined [plural "you" being the proverbial statistician]. Overfitting is too often the result of analysts who encounter data rather than approach it. By going through hundreds of models and picking "the best", you are prone to build models that lack generality. We see it all the time.
Inadvertantly you also ask a different question: "Alternatives to minimizing loss in regression". Minimax estimators, estimators that minimize a loss function, are a subset of the class of method of moments estimators: estimators that give a zero to an estimating function. The goal of the MOMs is to find an unbiased estimator, rather than one that minimizes a loss.
answered 9 hours ago
AdamOAdamO
37.7k2 gold badges68 silver badges151 bronze badges
37.7k2 gold badges68 silver badges151 bronze badges
1
$begingroup$
Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
$endgroup$
– user332577
6 hours ago
$begingroup$
@user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
$endgroup$
– AdamO
6 hours ago
$begingroup$
That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
$endgroup$
– user332577
4 hours ago
add a comment |
1
$begingroup$
Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
$endgroup$
– user332577
6 hours ago
$begingroup$
@user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
$endgroup$
– AdamO
6 hours ago
$begingroup$
That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
$endgroup$
– user332577
4 hours ago
1
1
$begingroup$
Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
$endgroup$
– user332577
6 hours ago
$begingroup$
Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
$endgroup$
– user332577
6 hours ago
$begingroup$
@user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
$endgroup$
– AdamO
6 hours ago
$begingroup$
@user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
$endgroup$
– AdamO
6 hours ago
$begingroup$
That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
$endgroup$
– user332577
4 hours ago
$begingroup$
That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
$endgroup$
– user332577
4 hours ago
add a comment |
$begingroup$
Rational choice theory says that any rational preference can be modeled with a utility function.
Therefore any (rational) decision process can be encoded in a loss function and posed as an optimization problem.
For example, L1 and L2 regularization can be viewed as encoding a preference for smaller parameters or more parsimonious models into the loss function. Any preference can be similarly encoded, assuming it's not irrational.
For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?
Then you would adjust your utility function to include a term penalizing those things, just as we did for L1/L2 regularization.
Now, this might make the problem computationally intractable; for example, 0/1 loss is known to result in an NP-hard problem. In general, people prefer to study tractable problems so you won't find much off-the-shelf software that does this, but nothing is stopping you from writing down such a function and applying some sufficiently generalized optimizer to it.
If you retort that you have a preference in mind which cannot be modeled by a loss function, even in principle, then all I can say is that such a preference is irrational. Don't blame me, that's just modus tollens from the above theorem. You are free to have such a preference (there is good empirical evidence that most preferences that people actually hold are irrational in one way or another) but you will not find much literature on solving such problems in a formal regression context.
$endgroup$
$begingroup$
"People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
$endgroup$
– user332577
6 hours ago
$begingroup$
It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
$endgroup$
– olooney
5 hours ago
add a comment |
$begingroup$
Rational choice theory says that any rational preference can be modeled with a utility function.
Therefore any (rational) decision process can be encoded in a loss function and posed as an optimization problem.
For example, L1 and L2 regularization can be viewed as encoding a preference for smaller parameters or more parsimonious models into the loss function. Any preference can be similarly encoded, assuming it's not irrational.
For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?
Then you would adjust your utility function to include a term penalizing those things, just as we did for L1/L2 regularization.
Now, this might make the problem computationally intractable; for example, 0/1 loss is known to result in an NP-hard problem. In general, people prefer to study tractable problems so you won't find much off-the-shelf software that does this, but nothing is stopping you from writing down such a function and applying some sufficiently generalized optimizer to it.
If you retort that you have a preference in mind which cannot be modeled by a loss function, even in principle, then all I can say is that such a preference is irrational. Don't blame me, that's just modus tollens from the above theorem. You are free to have such a preference (there is good empirical evidence that most preferences that people actually hold are irrational in one way or another) but you will not find much literature on solving such problems in a formal regression context.
$endgroup$
$begingroup$
"People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
$endgroup$
– user332577
6 hours ago
$begingroup$
It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
$endgroup$
– olooney
5 hours ago
add a comment |
$begingroup$
Rational choice theory says that any rational preference can be modeled with a utility function.
Therefore any (rational) decision process can be encoded in a loss function and posed as an optimization problem.
For example, L1 and L2 regularization can be viewed as encoding a preference for smaller parameters or more parsimonious models into the loss function. Any preference can be similarly encoded, assuming it's not irrational.
For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?
Then you would adjust your utility function to include a term penalizing those things, just as we did for L1/L2 regularization.
Now, this might make the problem computationally intractable; for example, 0/1 loss is known to result in an NP-hard problem. In general, people prefer to study tractable problems so you won't find much off-the-shelf software that does this, but nothing is stopping you from writing down such a function and applying some sufficiently generalized optimizer to it.
If you retort that you have a preference in mind which cannot be modeled by a loss function, even in principle, then all I can say is that such a preference is irrational. Don't blame me, that's just modus tollens from the above theorem. You are free to have such a preference (there is good empirical evidence that most preferences that people actually hold are irrational in one way or another) but you will not find much literature on solving such problems in a formal regression context.
$endgroup$
Rational choice theory says that any rational preference can be modeled with a utility function.
Therefore any (rational) decision process can be encoded in a loss function and posed as an optimization problem.
For example, L1 and L2 regularization can be viewed as encoding a preference for smaller parameters or more parsimonious models into the loss function. Any preference can be similarly encoded, assuming it's not irrational.
For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?
Then you would adjust your utility function to include a term penalizing those things, just as we did for L1/L2 regularization.
Now, this might make the problem computationally intractable; for example, 0/1 loss is known to result in an NP-hard problem. In general, people prefer to study tractable problems so you won't find much off-the-shelf software that does this, but nothing is stopping you from writing down such a function and applying some sufficiently generalized optimizer to it.
If you retort that you have a preference in mind which cannot be modeled by a loss function, even in principle, then all I can say is that such a preference is irrational. Don't blame me, that's just modus tollens from the above theorem. You are free to have such a preference (there is good empirical evidence that most preferences that people actually hold are irrational in one way or another) but you will not find much literature on solving such problems in a formal regression context.
edited 8 hours ago
answered 9 hours ago
olooneyolooney
2,2289 silver badges20 bronze badges
2,2289 silver badges20 bronze badges
$begingroup$
"People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
$endgroup$
– user332577
6 hours ago
$begingroup$
It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
$endgroup$
– olooney
5 hours ago
add a comment |
$begingroup$
"People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
$endgroup$
– user332577
6 hours ago
$begingroup$
It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
$endgroup$
– olooney
5 hours ago
$begingroup$
"People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
$endgroup$
– user332577
6 hours ago
$begingroup$
"People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
$endgroup$
– user332577
6 hours ago
$begingroup$
It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
$endgroup$
– olooney
5 hours ago
$begingroup$
It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
$endgroup$
– olooney
5 hours ago
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f419864%2falternatives-to-minimizing-loss-in-regression%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
I think your question is missing a crucial detail. Friedman's statement has the "if ... then" format. You ask about the "then" portion, but never state whether you are accepting the "if" portion as a premise. That is, are we assuming for the purpose of writing an answer as given that "the sole purpose of the regression analysis is to obtain a rule for predicting future values of the response $y$, given values for the covariates $(x_1,dots ,x_n)$"? Because a model might forego that premise in exchange for, e.g., interpretability of some phenomenon.
$endgroup$
– Sycorax
9 hours ago
$begingroup$
@Sycorax Apologies for any ambiguity. The question is more general than Friedman's "if". To your point, I'm interested in learning about 'model(s) that might forego that premise in exchange for, e.g., interpretability..." Thx.
$endgroup$
– user332577
6 hours ago