Alternatives to minimizing loss in regressionHow to design and implement an asymmetric loss function for regression?Difference between Random Forest and MARTDesigning Asymmetric regression (assymettric loss for regression)Minimizing symmetric mean absolute percentage error (SMAPE)Is minimizing squared error equivalent to minimizing absolute error? Why squared error is more popular than the latter?When is logistic regression minimizing under squared error loss the same as maximizing binomial likelihood?Alternatives to Regression AnalysisHow does one compare the statistical performance of different models on a regression or function approximation task over different data sets?Loss function of linear regressionNeural Networks - Difference between 1 and 2 layers?

Why does the Eurostar not show youth pricing?

8086 stack segment and avoiding overflow in interrupts

Blank spaces in a font

Why did House of Representatives need to condemn Trumps Tweets?

Tikzpicture doesn't display correctly

If you inherit a Roth 401(k), is it taxed?

Can a US President, after impeachment and removal, be re-elected or re-appointed?

Complaints from (junior) developers against solution architects: how can we show the benefits of our work and improve relationships?

Is it okay for me to decline a project on ethical grounds?

What is the German equivalent of the proverb 水清ければ魚棲まず (if the water is clear, fish won't live there)?

In syntax, why cannot we say things like "he took walked at the park"? but can say "he took a walk at the park"?

What would the United Kingdom's "optimal" Brexit deal look like?

Do the books ever say oliphaunts aren’t elephants?

How can Paypal know my card is being used in another account?

Self-deportation of American Citizens from US

Just how much information should you share with a former client?

Narset, Parter of Veils interaction with Aria of Flame

How does a poisoned arrow combine with the spell Conjure Barrage?

Why did Windows 95 crash the whole system but newer Windows only crashed programs?

Why did some Apollo missions carry a grenade launcher?

Should I intervene when a colleague in a different department makes students run laps as part of their grade?

Why would a personal invisible shield be necessary?

Story about separate individuals coming together to make some supernatural universe power

Why does aggregate initialization not work anymore since C++20 if a constructor is explicitly defaulted or deleted?



Alternatives to minimizing loss in regression


How to design and implement an asymmetric loss function for regression?Difference between Random Forest and MARTDesigning Asymmetric regression (assymettric loss for regression)Minimizing symmetric mean absolute percentage error (SMAPE)Is minimizing squared error equivalent to minimizing absolute error? Why squared error is more popular than the latter?When is logistic regression minimizing under squared error loss the same as maximizing binomial likelihood?Alternatives to Regression AnalysisHow does one compare the statistical performance of different models on a regression or function approximation task over different data sets?Loss function of linear regressionNeural Networks - Difference between 1 and 2 layers?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








2












$begingroup$


We know that loss (error) minimization originated with Gauss in the early 18th c.



More recently Friedman* extolled its virtues for use in predictive modeling:




“The aim of regression analysis is to use the data to construct a
function f(x) that can serve as a reasonable approximation of f(x)
over the domain D of interest. The notion of reasonableness depends on
the purpose for which the approximation is to be used. In nearly all
applications however accuracy is important...if the sole purpose of
the regression analysis is to obtain a rule for predicting future
values of the response y, given values for the covariates (x1,...,xn),
then accuracy is the only important virtue of the model...”




But is accuracy the only important virtue of a model?



My question concerns whether or not loss minimization is the only way to evaluate a regression function whether OLS, ML, gradient descent, quadratic or least absolute deviation, whatever.



Clearly, it's possible to evaluate multiple metrics beyond loss a posteriori, i.e., after models have been built. But are there multivariate functions which evaluate multiple metrics a priori or during the optimization phase?



For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?



Are such functions available and/or have these issues been research



*Friedman, J.H., Multivariate Adaptive Regression Splines, The Annals of Statistics, 1991, vol. 19,No. 1 (March), pp. 1-67.










share|cite|improve this question









$endgroup$













  • $begingroup$
    I think your question is missing a crucial detail. Friedman's statement has the "if ... then" format. You ask about the "then" portion, but never state whether you are accepting the "if" portion as a premise. That is, are we assuming for the purpose of writing an answer as given that "the sole purpose of the regression analysis is to obtain a rule for predicting future values of the response $y$, given values for the covariates $(x_1,dots ,x_n)$"? Because a model might forego that premise in exchange for, e.g., interpretability of some phenomenon.
    $endgroup$
    – Sycorax
    9 hours ago











  • $begingroup$
    @Sycorax Apologies for any ambiguity. The question is more general than Friedman's "if". To your point, I'm interested in learning about 'model(s) that might forego that premise in exchange for, e.g., interpretability..." Thx.
    $endgroup$
    – user332577
    6 hours ago

















2












$begingroup$


We know that loss (error) minimization originated with Gauss in the early 18th c.



More recently Friedman* extolled its virtues for use in predictive modeling:




“The aim of regression analysis is to use the data to construct a
function f(x) that can serve as a reasonable approximation of f(x)
over the domain D of interest. The notion of reasonableness depends on
the purpose for which the approximation is to be used. In nearly all
applications however accuracy is important...if the sole purpose of
the regression analysis is to obtain a rule for predicting future
values of the response y, given values for the covariates (x1,...,xn),
then accuracy is the only important virtue of the model...”




But is accuracy the only important virtue of a model?



My question concerns whether or not loss minimization is the only way to evaluate a regression function whether OLS, ML, gradient descent, quadratic or least absolute deviation, whatever.



Clearly, it's possible to evaluate multiple metrics beyond loss a posteriori, i.e., after models have been built. But are there multivariate functions which evaluate multiple metrics a priori or during the optimization phase?



For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?



Are such functions available and/or have these issues been research



*Friedman, J.H., Multivariate Adaptive Regression Splines, The Annals of Statistics, 1991, vol. 19,No. 1 (March), pp. 1-67.










share|cite|improve this question









$endgroup$













  • $begingroup$
    I think your question is missing a crucial detail. Friedman's statement has the "if ... then" format. You ask about the "then" portion, but never state whether you are accepting the "if" portion as a premise. That is, are we assuming for the purpose of writing an answer as given that "the sole purpose of the regression analysis is to obtain a rule for predicting future values of the response $y$, given values for the covariates $(x_1,dots ,x_n)$"? Because a model might forego that premise in exchange for, e.g., interpretability of some phenomenon.
    $endgroup$
    – Sycorax
    9 hours ago











  • $begingroup$
    @Sycorax Apologies for any ambiguity. The question is more general than Friedman's "if". To your point, I'm interested in learning about 'model(s) that might forego that premise in exchange for, e.g., interpretability..." Thx.
    $endgroup$
    – user332577
    6 hours ago













2












2








2


1



$begingroup$


We know that loss (error) minimization originated with Gauss in the early 18th c.



More recently Friedman* extolled its virtues for use in predictive modeling:




“The aim of regression analysis is to use the data to construct a
function f(x) that can serve as a reasonable approximation of f(x)
over the domain D of interest. The notion of reasonableness depends on
the purpose for which the approximation is to be used. In nearly all
applications however accuracy is important...if the sole purpose of
the regression analysis is to obtain a rule for predicting future
values of the response y, given values for the covariates (x1,...,xn),
then accuracy is the only important virtue of the model...”




But is accuracy the only important virtue of a model?



My question concerns whether or not loss minimization is the only way to evaluate a regression function whether OLS, ML, gradient descent, quadratic or least absolute deviation, whatever.



Clearly, it's possible to evaluate multiple metrics beyond loss a posteriori, i.e., after models have been built. But are there multivariate functions which evaluate multiple metrics a priori or during the optimization phase?



For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?



Are such functions available and/or have these issues been research



*Friedman, J.H., Multivariate Adaptive Regression Splines, The Annals of Statistics, 1991, vol. 19,No. 1 (March), pp. 1-67.










share|cite|improve this question









$endgroup$




We know that loss (error) minimization originated with Gauss in the early 18th c.



More recently Friedman* extolled its virtues for use in predictive modeling:




“The aim of regression analysis is to use the data to construct a
function f(x) that can serve as a reasonable approximation of f(x)
over the domain D of interest. The notion of reasonableness depends on
the purpose for which the approximation is to be used. In nearly all
applications however accuracy is important...if the sole purpose of
the regression analysis is to obtain a rule for predicting future
values of the response y, given values for the covariates (x1,...,xn),
then accuracy is the only important virtue of the model...”




But is accuracy the only important virtue of a model?



My question concerns whether or not loss minimization is the only way to evaluate a regression function whether OLS, ML, gradient descent, quadratic or least absolute deviation, whatever.



Clearly, it's possible to evaluate multiple metrics beyond loss a posteriori, i.e., after models have been built. But are there multivariate functions which evaluate multiple metrics a priori or during the optimization phase?



For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?



Are such functions available and/or have these issues been research



*Friedman, J.H., Multivariate Adaptive Regression Splines, The Annals of Statistics, 1991, vol. 19,No. 1 (March), pp. 1-67.







regression error






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked 9 hours ago









user332577user332577

364 bronze badges




364 bronze badges














  • $begingroup$
    I think your question is missing a crucial detail. Friedman's statement has the "if ... then" format. You ask about the "then" portion, but never state whether you are accepting the "if" portion as a premise. That is, are we assuming for the purpose of writing an answer as given that "the sole purpose of the regression analysis is to obtain a rule for predicting future values of the response $y$, given values for the covariates $(x_1,dots ,x_n)$"? Because a model might forego that premise in exchange for, e.g., interpretability of some phenomenon.
    $endgroup$
    – Sycorax
    9 hours ago











  • $begingroup$
    @Sycorax Apologies for any ambiguity. The question is more general than Friedman's "if". To your point, I'm interested in learning about 'model(s) that might forego that premise in exchange for, e.g., interpretability..." Thx.
    $endgroup$
    – user332577
    6 hours ago
















  • $begingroup$
    I think your question is missing a crucial detail. Friedman's statement has the "if ... then" format. You ask about the "then" portion, but never state whether you are accepting the "if" portion as a premise. That is, are we assuming for the purpose of writing an answer as given that "the sole purpose of the regression analysis is to obtain a rule for predicting future values of the response $y$, given values for the covariates $(x_1,dots ,x_n)$"? Because a model might forego that premise in exchange for, e.g., interpretability of some phenomenon.
    $endgroup$
    – Sycorax
    9 hours ago











  • $begingroup$
    @Sycorax Apologies for any ambiguity. The question is more general than Friedman's "if". To your point, I'm interested in learning about 'model(s) that might forego that premise in exchange for, e.g., interpretability..." Thx.
    $endgroup$
    – user332577
    6 hours ago















$begingroup$
I think your question is missing a crucial detail. Friedman's statement has the "if ... then" format. You ask about the "then" portion, but never state whether you are accepting the "if" portion as a premise. That is, are we assuming for the purpose of writing an answer as given that "the sole purpose of the regression analysis is to obtain a rule for predicting future values of the response $y$, given values for the covariates $(x_1,dots ,x_n)$"? Because a model might forego that premise in exchange for, e.g., interpretability of some phenomenon.
$endgroup$
– Sycorax
9 hours ago





$begingroup$
I think your question is missing a crucial detail. Friedman's statement has the "if ... then" format. You ask about the "then" portion, but never state whether you are accepting the "if" portion as a premise. That is, are we assuming for the purpose of writing an answer as given that "the sole purpose of the regression analysis is to obtain a rule for predicting future values of the response $y$, given values for the covariates $(x_1,dots ,x_n)$"? Because a model might forego that premise in exchange for, e.g., interpretability of some phenomenon.
$endgroup$
– Sycorax
9 hours ago













$begingroup$
@Sycorax Apologies for any ambiguity. The question is more general than Friedman's "if". To your point, I'm interested in learning about 'model(s) that might forego that premise in exchange for, e.g., interpretability..." Thx.
$endgroup$
– user332577
6 hours ago




$begingroup$
@Sycorax Apologies for any ambiguity. The question is more general than Friedman's "if". To your point, I'm interested in learning about 'model(s) that might forego that premise in exchange for, e.g., interpretability..." Thx.
$endgroup$
– user332577
6 hours ago










2 Answers
2






active

oldest

votes


















3












$begingroup$


But is accuracy the only important virtue of a model?




The practical aspects of what a model's for is too nuanced for a theoretical discussion. Interpretation and generalizability come to mind. "Who will use this model?" should be a top line question in all statistical analyses.



Friedman's statement is defensible in a classical statistics framework: we have no fundamental reason to object to black-box prediction. If you want a Y-hat that's going to be very close to the Y you observe in the future, then "build your model as big as a house" as John Tukey would say.



This does not excuse overfitting, unless your question is ill defined [plural "you" being the proverbial statistician]. Overfitting is too often the result of analysts who encounter data rather than approach it. By going through hundreds of models and picking "the best", you are prone to build models that lack generality. We see it all the time.



Inadvertantly you also ask a different question: "Alternatives to minimizing loss in regression". Minimax estimators, estimators that minimize a loss function, are a subset of the class of method of moments estimators: estimators that give a zero to an estimating function. The goal of the MOMs is to find an unbiased estimator, rather than one that minimizes a loss.






share|cite|improve this answer









$endgroup$










  • 1




    $begingroup$
    Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
    $endgroup$
    – user332577
    6 hours ago











  • $begingroup$
    @user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
    $endgroup$
    – AdamO
    6 hours ago











  • $begingroup$
    That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
    $endgroup$
    – user332577
    4 hours ago


















2












$begingroup$

Rational choice theory says that any rational preference can be modeled with a utility function.



Therefore any (rational) decision process can be encoded in a loss function and posed as an optimization problem.



For example, L1 and L2 regularization can be viewed as encoding a preference for smaller parameters or more parsimonious models into the loss function. Any preference can be similarly encoded, assuming it's not irrational.




For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?




Then you would adjust your utility function to include a term penalizing those things, just as we did for L1/L2 regularization.



Now, this might make the problem computationally intractable; for example, 0/1 loss is known to result in an NP-hard problem. In general, people prefer to study tractable problems so you won't find much off-the-shelf software that does this, but nothing is stopping you from writing down such a function and applying some sufficiently generalized optimizer to it.



If you retort that you have a preference in mind which cannot be modeled by a loss function, even in principle, then all I can say is that such a preference is irrational. Don't blame me, that's just modus tollens from the above theorem. You are free to have such a preference (there is good empirical evidence that most preferences that people actually hold are irrational in one way or another) but you will not find much literature on solving such problems in a formal regression context.






share|cite|improve this answer











$endgroup$














  • $begingroup$
    "People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
    $endgroup$
    – user332577
    6 hours ago











  • $begingroup$
    It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
    $endgroup$
    – olooney
    5 hours ago













Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "65"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f419864%2falternatives-to-minimizing-loss-in-regression%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









3












$begingroup$


But is accuracy the only important virtue of a model?




The practical aspects of what a model's for is too nuanced for a theoretical discussion. Interpretation and generalizability come to mind. "Who will use this model?" should be a top line question in all statistical analyses.



Friedman's statement is defensible in a classical statistics framework: we have no fundamental reason to object to black-box prediction. If you want a Y-hat that's going to be very close to the Y you observe in the future, then "build your model as big as a house" as John Tukey would say.



This does not excuse overfitting, unless your question is ill defined [plural "you" being the proverbial statistician]. Overfitting is too often the result of analysts who encounter data rather than approach it. By going through hundreds of models and picking "the best", you are prone to build models that lack generality. We see it all the time.



Inadvertantly you also ask a different question: "Alternatives to minimizing loss in regression". Minimax estimators, estimators that minimize a loss function, are a subset of the class of method of moments estimators: estimators that give a zero to an estimating function. The goal of the MOMs is to find an unbiased estimator, rather than one that minimizes a loss.






share|cite|improve this answer









$endgroup$










  • 1




    $begingroup$
    Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
    $endgroup$
    – user332577
    6 hours ago











  • $begingroup$
    @user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
    $endgroup$
    – AdamO
    6 hours ago











  • $begingroup$
    That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
    $endgroup$
    – user332577
    4 hours ago















3












$begingroup$


But is accuracy the only important virtue of a model?




The practical aspects of what a model's for is too nuanced for a theoretical discussion. Interpretation and generalizability come to mind. "Who will use this model?" should be a top line question in all statistical analyses.



Friedman's statement is defensible in a classical statistics framework: we have no fundamental reason to object to black-box prediction. If you want a Y-hat that's going to be very close to the Y you observe in the future, then "build your model as big as a house" as John Tukey would say.



This does not excuse overfitting, unless your question is ill defined [plural "you" being the proverbial statistician]. Overfitting is too often the result of analysts who encounter data rather than approach it. By going through hundreds of models and picking "the best", you are prone to build models that lack generality. We see it all the time.



Inadvertantly you also ask a different question: "Alternatives to minimizing loss in regression". Minimax estimators, estimators that minimize a loss function, are a subset of the class of method of moments estimators: estimators that give a zero to an estimating function. The goal of the MOMs is to find an unbiased estimator, rather than one that minimizes a loss.






share|cite|improve this answer









$endgroup$










  • 1




    $begingroup$
    Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
    $endgroup$
    – user332577
    6 hours ago











  • $begingroup$
    @user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
    $endgroup$
    – AdamO
    6 hours ago











  • $begingroup$
    That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
    $endgroup$
    – user332577
    4 hours ago













3












3








3





$begingroup$


But is accuracy the only important virtue of a model?




The practical aspects of what a model's for is too nuanced for a theoretical discussion. Interpretation and generalizability come to mind. "Who will use this model?" should be a top line question in all statistical analyses.



Friedman's statement is defensible in a classical statistics framework: we have no fundamental reason to object to black-box prediction. If you want a Y-hat that's going to be very close to the Y you observe in the future, then "build your model as big as a house" as John Tukey would say.



This does not excuse overfitting, unless your question is ill defined [plural "you" being the proverbial statistician]. Overfitting is too often the result of analysts who encounter data rather than approach it. By going through hundreds of models and picking "the best", you are prone to build models that lack generality. We see it all the time.



Inadvertantly you also ask a different question: "Alternatives to minimizing loss in regression". Minimax estimators, estimators that minimize a loss function, are a subset of the class of method of moments estimators: estimators that give a zero to an estimating function. The goal of the MOMs is to find an unbiased estimator, rather than one that minimizes a loss.






share|cite|improve this answer









$endgroup$




But is accuracy the only important virtue of a model?




The practical aspects of what a model's for is too nuanced for a theoretical discussion. Interpretation and generalizability come to mind. "Who will use this model?" should be a top line question in all statistical analyses.



Friedman's statement is defensible in a classical statistics framework: we have no fundamental reason to object to black-box prediction. If you want a Y-hat that's going to be very close to the Y you observe in the future, then "build your model as big as a house" as John Tukey would say.



This does not excuse overfitting, unless your question is ill defined [plural "you" being the proverbial statistician]. Overfitting is too often the result of analysts who encounter data rather than approach it. By going through hundreds of models and picking "the best", you are prone to build models that lack generality. We see it all the time.



Inadvertantly you also ask a different question: "Alternatives to minimizing loss in regression". Minimax estimators, estimators that minimize a loss function, are a subset of the class of method of moments estimators: estimators that give a zero to an estimating function. The goal of the MOMs is to find an unbiased estimator, rather than one that minimizes a loss.







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered 9 hours ago









AdamOAdamO

37.7k2 gold badges68 silver badges151 bronze badges




37.7k2 gold badges68 silver badges151 bronze badges










  • 1




    $begingroup$
    Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
    $endgroup$
    – user332577
    6 hours ago











  • $begingroup$
    @user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
    $endgroup$
    – AdamO
    6 hours ago











  • $begingroup$
    That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
    $endgroup$
    – user332577
    4 hours ago












  • 1




    $begingroup$
    Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
    $endgroup$
    – user332577
    6 hours ago











  • $begingroup$
    @user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
    $endgroup$
    – AdamO
    6 hours ago











  • $begingroup$
    That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
    $endgroup$
    – user332577
    4 hours ago







1




1




$begingroup$
Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
$endgroup$
– user332577
6 hours ago





$begingroup$
Thanks. "Alternatives" wasn't an inadvertent word, it is the point of the question. Apologies for any imprecision in specification. Based on your comment, it still sounds like MOM estimators optimize a single metric whether loss or unbiasedness, correct? I'm interested in hearing about functions that optimize multiple metrics. Thx.
$endgroup$
– user332577
6 hours ago













$begingroup$
@user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
$endgroup$
– AdamO
6 hours ago





$begingroup$
@user332577 well that's probably a fool's errand because Lagrange theory would state to optimize multiple losses, you would have to optimize a single loss of any linear combination of all the losses, but that's just not going to give a solution because the quadratic and log loss are unique. Lastly, that's not really an alternative to optimization because it's still optimization. MOMs don't have to optimize anything at all. That's why they're more general.
$endgroup$
– AdamO
6 hours ago













$begingroup$
That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
$endgroup$
– user332577
4 hours ago




$begingroup$
That's a pretty rule-bound theoretical rationale. For instance, optimize a single loss of any linear combination...why linear combinations? What about nonlinear combinations? Are you really claiming that there is absolutely no way to break out of the straitjacket you're imposing?
$endgroup$
– user332577
4 hours ago













2












$begingroup$

Rational choice theory says that any rational preference can be modeled with a utility function.



Therefore any (rational) decision process can be encoded in a loss function and posed as an optimization problem.



For example, L1 and L2 regularization can be viewed as encoding a preference for smaller parameters or more parsimonious models into the loss function. Any preference can be similarly encoded, assuming it's not irrational.




For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?




Then you would adjust your utility function to include a term penalizing those things, just as we did for L1/L2 regularization.



Now, this might make the problem computationally intractable; for example, 0/1 loss is known to result in an NP-hard problem. In general, people prefer to study tractable problems so you won't find much off-the-shelf software that does this, but nothing is stopping you from writing down such a function and applying some sufficiently generalized optimizer to it.



If you retort that you have a preference in mind which cannot be modeled by a loss function, even in principle, then all I can say is that such a preference is irrational. Don't blame me, that's just modus tollens from the above theorem. You are free to have such a preference (there is good empirical evidence that most preferences that people actually hold are irrational in one way or another) but you will not find much literature on solving such problems in a formal regression context.






share|cite|improve this answer











$endgroup$














  • $begingroup$
    "People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
    $endgroup$
    – user332577
    6 hours ago











  • $begingroup$
    It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
    $endgroup$
    – olooney
    5 hours ago















2












$begingroup$

Rational choice theory says that any rational preference can be modeled with a utility function.



Therefore any (rational) decision process can be encoded in a loss function and posed as an optimization problem.



For example, L1 and L2 regularization can be viewed as encoding a preference for smaller parameters or more parsimonious models into the loss function. Any preference can be similarly encoded, assuming it's not irrational.




For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?




Then you would adjust your utility function to include a term penalizing those things, just as we did for L1/L2 regularization.



Now, this might make the problem computationally intractable; for example, 0/1 loss is known to result in an NP-hard problem. In general, people prefer to study tractable problems so you won't find much off-the-shelf software that does this, but nothing is stopping you from writing down such a function and applying some sufficiently generalized optimizer to it.



If you retort that you have a preference in mind which cannot be modeled by a loss function, even in principle, then all I can say is that such a preference is irrational. Don't blame me, that's just modus tollens from the above theorem. You are free to have such a preference (there is good empirical evidence that most preferences that people actually hold are irrational in one way or another) but you will not find much literature on solving such problems in a formal regression context.






share|cite|improve this answer











$endgroup$














  • $begingroup$
    "People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
    $endgroup$
    – user332577
    6 hours ago











  • $begingroup$
    It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
    $endgroup$
    – olooney
    5 hours ago













2












2








2





$begingroup$

Rational choice theory says that any rational preference can be modeled with a utility function.



Therefore any (rational) decision process can be encoded in a loss function and posed as an optimization problem.



For example, L1 and L2 regularization can be viewed as encoding a preference for smaller parameters or more parsimonious models into the loss function. Any preference can be similarly encoded, assuming it's not irrational.




For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?




Then you would adjust your utility function to include a term penalizing those things, just as we did for L1/L2 regularization.



Now, this might make the problem computationally intractable; for example, 0/1 loss is known to result in an NP-hard problem. In general, people prefer to study tractable problems so you won't find much off-the-shelf software that does this, but nothing is stopping you from writing down such a function and applying some sufficiently generalized optimizer to it.



If you retort that you have a preference in mind which cannot be modeled by a loss function, even in principle, then all I can say is that such a preference is irrational. Don't blame me, that's just modus tollens from the above theorem. You are free to have such a preference (there is good empirical evidence that most preferences that people actually hold are irrational in one way or another) but you will not find much literature on solving such problems in a formal regression context.






share|cite|improve this answer











$endgroup$



Rational choice theory says that any rational preference can be modeled with a utility function.



Therefore any (rational) decision process can be encoded in a loss function and posed as an optimization problem.



For example, L1 and L2 regularization can be viewed as encoding a preference for smaller parameters or more parsimonious models into the loss function. Any preference can be similarly encoded, assuming it's not irrational.




For example, suppose one wanted to use a regression function which optimized fit based not just on loss minimization but which also maximized nonlinear dependence or Shannon entropy?




Then you would adjust your utility function to include a term penalizing those things, just as we did for L1/L2 regularization.



Now, this might make the problem computationally intractable; for example, 0/1 loss is known to result in an NP-hard problem. In general, people prefer to study tractable problems so you won't find much off-the-shelf software that does this, but nothing is stopping you from writing down such a function and applying some sufficiently generalized optimizer to it.



If you retort that you have a preference in mind which cannot be modeled by a loss function, even in principle, then all I can say is that such a preference is irrational. Don't blame me, that's just modus tollens from the above theorem. You are free to have such a preference (there is good empirical evidence that most preferences that people actually hold are irrational in one way or another) but you will not find much literature on solving such problems in a formal regression context.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited 8 hours ago

























answered 9 hours ago









olooneyolooney

2,2289 silver badges20 bronze badges




2,2289 silver badges20 bronze badges














  • $begingroup$
    "People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
    $endgroup$
    – user332577
    6 hours ago











  • $begingroup$
    It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
    $endgroup$
    – olooney
    5 hours ago
















  • $begingroup$
    "People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
    $endgroup$
    – user332577
    6 hours ago











  • $begingroup$
    It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
    $endgroup$
    – olooney
    5 hours ago















$begingroup$
"People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
$endgroup$
– user332577
6 hours ago





$begingroup$
"People prefer to study tractable problems so you won't find much off-the-shelf software that does this" Such software is exactly what I'm looking for. Suggestions? Thx.
$endgroup$
– user332577
6 hours ago













$begingroup$
It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
$endgroup$
– olooney
5 hours ago




$begingroup$
It all depends on the form your modified loss function takes. If your modified loss function is still differentiable, you can use an stochastic gradient descent optimizer like adam, for which many implementations exist. If you don't have a gradient but believe the function is still smooth, you can use Bayesian optimization. And for really intractable metrics like accuracy (0/1 loss function,) you would need to do something like branch-and-bound.
$endgroup$
– olooney
5 hours ago

















draft saved

draft discarded
















































Thanks for contributing an answer to Cross Validated!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f419864%2falternatives-to-minimizing-loss-in-regression%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

19. јануар Садржај Догађаји Рођења Смрти Празници и дани сећања Види још Референце Мени за навигацијуу

Israel Cuprins Etimologie | Istorie | Geografie | Politică | Demografie | Educație | Economie | Cultură | Note explicative | Note bibliografice | Bibliografie | Legături externe | Meniu de navigaresite web oficialfacebooktweeterGoogle+Instagramcanal YouTubeInstagramtextmodificaremodificarewww.technion.ac.ilnew.huji.ac.ilwww.weizmann.ac.ilwww1.biu.ac.ilenglish.tau.ac.ilwww.haifa.ac.ilin.bgu.ac.ilwww.openu.ac.ilwww.ariel.ac.ilCIA FactbookHarta Israelului"Negotiating Jerusalem," Palestine–Israel JournalThe Schizoid Nature of Modern Hebrew: A Slavic Language in Search of a Semitic Past„Arabic in Israel: an official language and a cultural bridge”„Latest Population Statistics for Israel”„Israel Population”„Tables”„Report for Selected Countries and Subjects”Human Development Report 2016: Human Development for Everyone„Distribution of family income - Gini index”The World FactbookJerusalem Law„Israel”„Israel”„Zionist Leaders: David Ben-Gurion 1886–1973”„The status of Jerusalem”„Analysis: Kadima's big plans”„Israel's Hard-Learned Lessons”„The Legacy of Undefined Borders, Tel Aviv Notes No. 40, 5 iunie 2002”„Israel Journal: A Land Without Borders”„Population”„Israel closes decade with population of 7.5 million”Time Series-DataBank„Selected Statistics on Jerusalem Day 2007 (Hebrew)”Golan belongs to Syria, Druze protestGlobal Survey 2006: Middle East Progress Amid Global Gains in FreedomWHO: Life expectancy in Israel among highest in the worldInternational Monetary Fund, World Economic Outlook Database, April 2011: Nominal GDP list of countries. Data for the year 2010.„Israel's accession to the OECD”Popular Opinion„On the Move”Hosea 12:5„Walking the Bible Timeline”„Palestine: History”„Return to Zion”An invention called 'the Jewish people' – Haaretz – Israel NewsoriginalJewish and Non-Jewish Population of Palestine-Israel (1517–2004)ImmigrationJewishvirtuallibrary.orgChapter One: The Heralders of Zionism„The birth of modern Israel: A scrap of paper that changed history”„League of Nations: The Mandate for Palestine, 24 iulie 1922”The Population of Palestine Prior to 1948originalBackground Paper No. 47 (ST/DPI/SER.A/47)History: Foreign DominationTwo Hundred and Seventh Plenary Meeting„Israel (Labor Zionism)”Population, by Religion and Population GroupThe Suez CrisisAdolf EichmannJustice Ministry Reply to Amnesty International Report„The Interregnum”Israel Ministry of Foreign Affairs – The Palestinian National Covenant- July 1968Research on terrorism: trends, achievements & failuresThe Routledge Atlas of the Arab–Israeli conflict: The Complete History of the Struggle and the Efforts to Resolve It"George Habash, Palestinian Terrorism Tactician, Dies at 82."„1973: Arab states attack Israeli forces”Agranat Commission„Has Israel Annexed East Jerusalem?”original„After 4 Years, Intifada Still Smolders”From the End of the Cold War to 2001originalThe Oslo Accords, 1993Israel-PLO Recognition – Exchange of Letters between PM Rabin and Chairman Arafat – Sept 9- 1993Foundation for Middle East PeaceSources of Population Growth: Total Israeli Population and Settler Population, 1991–2003original„Israel marks Rabin assassination”The Wye River Memorandumoriginal„West Bank barrier route disputed, Israeli missile kills 2”"Permanent Ceasefire to Be Based on Creation Of Buffer Zone Free of Armed Personnel Other than UN, Lebanese Forces"„Hezbollah kills 8 soldiers, kidnaps two in offensive on northern border”„Olmert confirms peace talks with Syria”„Battleground Gaza: Israeli ground forces invade the strip”„IDF begins Gaza troop withdrawal, hours after ending 3-week offensive”„THE LAND: Geography and Climate”„Area of districts, sub-districts, natural regions and lakes”„Israel - Geography”„Makhteshim Country”Israel and the Palestinian Territories„Makhtesh Ramon”„The Living Dead Sea”„Temperatures reach record high in Pakistan”„Climate Extremes In Israel”Israel in figures„Deuteronom”„JNF: 240 million trees planted since 1901”„Vegetation of Israel and Neighboring Countries”Environmental Law in Israel„Executive branch”„Israel's election process explained”„The Electoral System in Israel”„Constitution for Israel”„All 120 incoming Knesset members”„Statul ISRAEL”„The Judiciary: The Court System”„Israel's high court unique in region”„Israel and the International Criminal Court: A Legal Battlefield”„Localities and population, by population group, district, sub-district and natural region”„Israel: Districts, Major Cities, Urban Localities & Metropolitan Areas”„Israel-Egypt Relations: Background & Overview of Peace Treaty”„Solana to Haaretz: New Rules of War Needed for Age of Terror”„Israel's Announcement Regarding Settlements”„United Nations Security Council Resolution 497”„Security Council resolution 478 (1980) on the status of Jerusalem”„Arabs will ask U.N. to seek razing of Israeli wall”„Olmert: Willing to trade land for peace”„Mapping Peace between Syria and Israel”„Egypt: Israel must accept the land-for-peace formula”„Israel: Age structure from 2005 to 2015”„Global, regional, and national disability-adjusted life years (DALYs) for 306 diseases and injuries and healthy life expectancy (HALE) for 188 countries, 1990–2013: quantifying the epidemiological transition”10.1016/S0140-6736(15)61340-X„World Health Statistics 2014”„Life expectancy for Israeli men world's 4th highest”„Family Structure and Well-Being Across Israel's Diverse Population”„Fertility among Jewish and Muslim Women in Israel, by Level of Religiosity, 1979-2009”„Israel leaders in birth rate, but poverty major challenge”„Ethnic Groups”„Israel's population: Over 8.5 million”„Israel - Ethnic groups”„Jews, by country of origin and age”„Minority Communities in Israel: Background & Overview”„Israel”„Language in Israel”„Selected Data from the 2011 Social Survey on Mastery of the Hebrew Language and Usage of Languages”„Religions”„5 facts about Israeli Druze, a unique religious and ethnic group”„Israël”Israel Country Study Guide„Haredi city in Negev – blessing or curse?”„New town Harish harbors hopes of being more than another Pleasantville”„List of localities, in alphabetical order”„Muncitorii români, doriți în Israel”„Prietenia româno-israeliană la nevoie se cunoaște”„The Higher Education System in Israel”„Middle East”„Academic Ranking of World Universities 2016”„Israel”„Israel”„Jewish Nobel Prize Winners”„All Nobel Prizes in Literature”„All Nobel Peace Prizes”„All Prizes in Economic Sciences”„All Nobel Prizes in Chemistry”„List of Fields Medallists”„Sakharov Prize”„Țara care și-a sfidat "destinul" și se bate umăr la umăr cu Silicon Valley”„Apple's R&D center in Israel grew to about 800 employees”„Tim Cook: Apple's Herzliya R&D center second-largest in world”„Lecții de economie de la Israel”„Land use”Israel Investment and Business GuideA Country Study: IsraelCentral Bureau of StatisticsFlorin Diaconu, „Kadima: Flexibilitate și pragmatism, dar nici un compromis în chestiuni vitale", în Revista Institutului Diplomatic Român, anul I, numărul I, semestrul I, 2006, pp. 71-72Florin Diaconu, „Likud: Dreapta israeliană constant opusă retrocedării teritoriilor cureite prin luptă în 1967", în Revista Institutului Diplomatic Român, anul I, numărul I, semestrul I, 2006, pp. 73-74MassadaIsraelul a crescut in 50 de ani cât alte state intr-un mileniuIsrael Government PortalIsraelIsraelIsraelmmmmmXX451232cb118646298(data)4027808-634110000 0004 0372 0767n7900328503691455-bb46-37e3-91d2-cb064a35ffcc1003570400564274ge1294033523775214929302638955X146498911146498911

Кастелфранко ди Сопра Становништво Референце Спољашње везе Мени за навигацију43°37′18″ СГШ; 11°33′32″ ИГД / 43.62156° СГШ; 11.55885° ИГД / 43.62156; 11.5588543°37′18″ СГШ; 11°33′32″ ИГД / 43.62156° СГШ; 11.55885° ИГД / 43.62156; 11.558853179688„The GeoNames geographical database”„Istituto Nazionale di Statistica”проширитиууWorldCat156923403n850174324558639-1cb14643287r(подаци)