Selection Bias Is A Fact Of Life, Not An…

Dec 27, 2022

230

323

...

Read →

323 Comments

hnau

Dec 27, 2022

> And then generalize further to the entire world population over all of human history, and it stops holding again, because most people are cavemen who eat grubs and use shells for money, and having more shells doesn’t make it any easier to find grubs.

This is inaccurate. The numbers are pretty fuzzy but I find reputable-looking estimates (e.g. https://www.ined.fr/en/everything_about_population/demographic-facts-sheets/faq/how-many-people-since-the-first-humans/) that roughly 50% of humans who ever lived were born after 1 AD.

Expand full comment

I'm pretty confused by this kind of attitude. To be quite frank I think it's in-group protectionism.

I'll start off by saying I think most psych studies are absolute garbage and aella's is no worse. But that doesn't mean aella's are _good_.

In particular, aella's studies are often related to extremely sensitive topics like sex, gender, wealth, etc. She's a self-proclaimed "slut" who posts nudes on the internet. Of course the people who answer these kinds of polls _when aella posts them_ are heavily biased relative to the population!

I think drawing conclusions about sex, gender, and other things from aella's polls is at least as fraught as drawing those conclusions from college freshmen. If you did a poll on marriage and divorce rates among college-educated people you would get wildly different results then at the population level. I don't see how this is any different from aella's polls.

Expand full comment

Reply (11)

Dauphin

Dec 27, 2022

If smart people eat bananas because they know they are good for their something something potassium then we should be skeptical about the causal language in your putative study title. Perhaps something more like "Study finds Higher IQ People Eat More Bananas" would be more amenable to asterisking caveats and less utterly and completely false and misleading.

Expand full comment

Reply (2)

DaneelsSoul

Dec 27, 2022

I think the real difference here is that the studies are doing hypothesis testing, while the surveys are trying to get more granular information.

I mean you have a theory that bananas -> potassium -> some mechanism -> higher IQ, and you want to check if it is right, so you ask yourself how does the world look different if it is right versus if it is wrong. And you conclude that if it is correct, then in almost any population you should see a modest correlation between banana consumption and IQ, whereas the null hypothesis would be little to no correlation. So if you check basically any population for correlation and find it, it is evidence (at least in the Bayesian sense) in favor of your underlying theory.

On the other hand, if you were trying to pin down the strength of the effect (in terms of IQ points/ banana/ year or something), then measuring a correlation for just psych 101 students really might not generalize well to the human population as a whole. In fact, you'd probably want to do a controlled study rather than a correlational one.

Expand full comment

Reply (3)

divergiment

Dec 27, 2022

I agree that most people rush to "selection bias" too quickly as a trump card that invalidates any findings (up there with "correlation doesn't mean causation"). However, I disagree that "polls vs correlations" is the right lens to look through it (after all, polls are mostly only discovering correlations as well).

The problem is not the nature of the hypotheses or even the rigor of the research so much as whether the method by which the units were selected was itself correlated with the outcome of interest (i.e., selecting on the dependent variable). In those cases, correlations will often be illusory at best, or in the wrong direction at worst.

Expand full comment

What do you all think about the dominance of Amazon’s Mechanical Turk in finding people for studies? Has it worsened studies by only drawing from the same pool over and over?

Expand full comment

Reply (1)

Cerlufte

Dec 27, 2022·edited Dec 27, 2022

"Selection bias is fine-ish if..."

I'm interpreting this as saying that one's prior on a correlation not holding for the general population should be fairly low. But it seems like a correlation being interesting enough to hear about should be a lot of evidence in favour of the correlation not holding, because if the correlation holds, it's more likely (idk by how much, but I think by enough) to be widely known -> a lot less interesting, so you don't hear about it.

As an example, I run a survey on my blog, Ex-Translocated, with a thousand readers, a significant portion of which come from the rationality community. I have 9 innocuous correlations I'm measuring which give me exactly the information that common sense would expect, and one correlation between "how much time have you spent consuming self-help resources?" and "how much have self-help resources helped you at task X?" which is way higher than what common sense would naively expect. The rest of my correlations are boring and nobody hears about them except for my 1,000 readers, but my last correlation goes viral on pseudoscience Twitter that assumes this generalises to all self-help when it doesn't and uses it to justify actually unhelpful self-help. (If you feel the desire to nitpick this example you can probably generate another.)

I agree that this doesn't mean one ought to dismiss every such correlation out of hand, but I feel like this does mean that if I hear about an interesting survey result's or psych study's correlation in a context where I didn't also previously hear about the survey/study's intention to investigate said correlation (this doesn't just require preregistration because of memetic selection effects), I should ignore it unless I know enough to speculate as to the actual causal mechanisms behind that correlation.

This pretty much just bottoms out in "either trust domain experts or investigate every result of a survey/every study in the literature" which seems about right to me. So when someone e.g. criticises Aella for trying to run a survey at all to figure things out, that's silly, but it's also true that if one of Aella's tweets talking about an interesting result goes viral, they should ignore it, and this does seem like the actual response of most people to crazy-sounding effects; if anything, people seem to take psych studies too seriously rather than not taking random internet survey results seriously enough.

Expand full comment

Reply (1)

G. Retriever

(Banned)Dec 27, 2022

Like any kind of bias, selection bias matters when the selection process is correlated with BOTH the independent and dependent variables and as such represents a potential confounder. Study design is how you stop selection bias from making your study meaningless.

Expand full comment

Santi

Dec 27, 2022

The way I think about the key difference here (which I learned during some time doing pharma research, where this kind of issues are as bad as... well) is that when claiming that a correlation doesn't generalize, some of the *burden of proof* shifts to the person critizicing the result. Decent article reviewers were pretty good at this: giving an at least plausible-sounding mechanism by which when going to a different population there's som *additional* effect to cancel/revert the correlation. It's the fact that the failure of correlation requires this extra mechanism that goes against Occam's Razor.

Expand full comment

Michael Pershan

Pershmail

Dec 27, 2022·edited Dec 27, 2022

It's not about correlations, it's about the supposed causal mechanism. Your Psych 101 sample is fine if you are dealing with cognitive factors that you suppose are universal. If you're dealing with social or motivational ones, then you're perhaps going to be in danger of making a false generalization. This is particularly disastrous in educational contexts because of the wide variety of places and populations involved in school learning. It really does happen all the time, and the only solution is for researchers to really know the gamut of contexts (so that they realize how universal their mechanisms are likely to be) and make the context explicit and clear instead of burying it in limitations (so that others have a chance to catching them on an over-generalization, if there is one). Another necessary shift is for people to simply stop looking for universal effects in social sciencies and instead expect heterogeneity.

Expand full comment

Simon

Dec 27, 2022

“But real studies by professional scientists don’t have selection bias, because . . . sorry, I don’t know how their model would end this sentence.”

...because they control for demographics, is how they’d complete the sentence.

Generically, we know internet surveys are terrible for voting behavior. Whether they’re good for the kinds of things Aella uses them for is a good question!

I’m on the record in talks as saying “everything is a demand effect, and that’s OK.” I see surveys as eliciting not what a person thinks or feels, but what they are willing to say they think and feel in a context constructed by the survey. Aella is probably getting better answers about sexual desire (that’s her job, after all!) and better answers on basic cognition. Probably worse on consumer behavior, politics, and generic interpersonal.

Expand full comment

>And then generalize further to the entire world population over all of human history, and it stops holding again, because most people are cavemen who eat grubs and use shells for money, and having more shells doesn’t make it any easier to find grubs.

I know this is somewhat tongue-in-cheek, but for accuracy's sake: the number of people who were born before widespread adoption of agriculture was on the order of 10 billion, vs. about 100 billion after. https://www.prb.org/articles/how-many-people-have-ever-lived-on-earth/

Expand full comment

I am a professor of political science who does methodological research on the generalizability of online convenience samples. The gold standard of political science studies is indeed *random population samples* -- it's not the whole world, but it is the target population of American citizens. Yes this is getting harder and harder to do and yes imperfections creep in. But studies published in eg the august Public Opinion Quarterly are still qualitatively closer to "nationally representative" then are convenience samples, and Scott's flippancy here is I think a mistake.

My research is specifically about the limitations of MTurk (and other such online convenience samples) for questions related to digital media. My claim is that the mechanism of interest is "digital literacy" and that these samples are specifically biased to exclude low digital literacy people. That is, the people who can't figure out fake news on Facebook also can't figure out how to use MTurk, making MTurk samples almost uniquely bad for studying fake news.

(ungated studies: http://kmunger.github.io/pdfs/psrm.pdf

https://journals.sagepub.com/doi/full/10.1177/20531680211016968 )

This post is solid but it doesn't emphasize enough the crucial point: "If you’re right about the mechanism...". More generally, I think that there are good reasons that Scott's intuitions ('priors') about this are different from mine: medical mechanisms are less likely to be correlated with selection biases than are social scientific mechanisms.

There is a fundamental philosophy of science question at stake here. Can the study of a convenience sample *actually* test the mechanism of interest? As Scott says, there is always the possibility of eg collider bias (the relationship between family income and obesity "collides" in the sample of college students).

So how much evidence does a correlational convenience sample *actually* provide? This requires a qualitative call about "how good" the sample is for the mechanism at issue. And at that point, if we're making qualitative calls about our priors and about the "goodness" of the sample....can we really justify the quantitative rigor we're using the in the study itself?

In other words: should a study of a given mechanism on a given convenience sample be "valid until proven otherwise"? Or "valid until hypothesized otherwise"? Or "Not valid until proven otherwise"? Or "Not valid until hypothesized otherwise"?

Expand full comment

Reply (3)

Ali Kapadia

Dec 27, 2022

Is there a reason why you just wouldn't want to be somewhat specific with the headline of what you're publishing? So instead of "Study Finds Eating Bananas Raises IQ," you instead publish “Study Finds Eating Bananas Raises IQ in College Students," if they're all college students.

Expand full comment

I think the important issue is whether the selection bias is plausibly highly correlated to the outcomes being measured. I think the reason ppl scream selection bias about internet polls is that frequently participation is selected for based on strong feelings about the issue under discussion.

So if you are looking for surprising correlations in a long poll (as u do with your yearly polls) that's less of an issue but the standard internet survey tends to be in a situation where the audience can either guess at the intended analysis and decides to participate based on their feelings about it or is a situation where they are drawn to the blogger/tweeter because of similar ways of understanding the world so is quite likely to share whatever features of the author prompted them to generate the hypothesis.

Choosing undergrads based on a desire for cash is likely to reduce the extent of these problems (unless it's a study looking at something about how much ppl will do for money).

Expand full comment

Charlie Sanders

Irreverent Adverbs

Dec 27, 2022

Real scientists control for demographic effects when making generalizations outside the specifics of the dataset used. I'm confused why this article doesn't mention the practice - demographic adjustments are a well-understood phenomenon and Scott would have been exposed to them thousands of times in his career. And honestly, I think an argument can be made that the ubiquity of this practice in published science but its absence in amateur science mostly invalidates the thesis of this article, and I worry that Scott is putting on his metaphorical blinders due to his anger at being told off in his previous post for making this mistake.

This article does not feel like it was written in the spirit of objectivity and rationalism - it feels like an attempt at rationalization in order to avoid having to admit to something that would support Scott's outgroup.

Expand full comment

Reply (3)

Fujimura

Dec 27, 2022

(1) It's also worth noting that you can do a lot of sensitivity tests to see how far the results within your sample appear to be influenced by different subgroups which can help indicate where the unrepresentativeness of your sample might be a problem. IIRC the EA Survey does this a lot. This also helps with the question of whether an effect will generalise to other groups or whether, e.g. it only works in men.

Of course, this doesn't work for unobservables (ACX subscribers or Aella's Twitter readers are likely weird in ways that are not wholly captured by their observed characteristics, like their demographics).

(2) I think you are somewhat understating the potential power of "c) do a lot of statistical adjustments and pray", which understates the potential gap between an unrepresentative internet sample which you can and do statistically weight and an unrepresentative internet sample (like a Twitter poll) which you don't weight. Weighting very unrepresentative convenience samples can be extremely powerful in approximating the true population, while Twitter polls are almost always not going to be representative of the population.

Expand full comment

Melvin

Dec 27, 2022

Seems like a good argument for rejecting studies done on Psych 101 undergrads, not for accepting surveys done on highly idiosyncratic groups of blog readers.

Expand full comment

Reply (5)

Peter Gerdes

Peter’s Substack

Dec 27, 2022·edited Dec 27, 2022

Since someone evaluating a claim can never know how many polls didn't show interesting results so both the fact that real world surveys are much more expensive to conduct and have fewer variables under control of the survey giver (accepted practice isn't to say what the UG is coming in for and cash is primary motivator in all of them) is a very strong justification for treating online polls as less reliable.

In some sense the real selection bias is the selection bias in terms of what polls you haven't heard about but it's a good reason. Though it leads to an interesting epistemic situation where the survey giver may have no reason to doubt their poll more than the academic polling UGs but those they inform about it do.

Expand full comment

Reply (1)

Stephen Pimentel

Dec 27, 2022

> It doesn’t look like saying “This is an Internet survey, so it has selection bias, unlike real-life studies, which are fine.”

Eh, this seems like a highly uncharitable gloss of the concern. I would summarize it more as "Selection (and other) biases are a wicked hard problem even for 'real-life' studies that try very hard to control for them; therefore, one might justly be highly suspicious of internet studies for which there were no such controls."

One good summary of the problem of bias in 'real-life' studies: https://peterattiamd.com/ns003/

The issue is always generalization. How much are you going to try to generalize beyond the sample itself? If not all, then there is no problem. But, c'mon, the whole point of such surveys is that people do want to generalize from them.

Expand full comment

Reply (2)

Maximum Limelihood Estimator

Dec 27, 2022

So, this is kinda accurate, but I feel like you're underestimating the problems of selection bias in general. In particular, selection bias is a much bigger deal than I think you're realizing. The correlation coefficient between responding to polls and vote choice in 2016 was roughly 0.005 (Meng 2018, "Statistical Paradises and Paradoxes in Big Data"). That was enough to flip the outcome of the election. So for polls, even an R^2 of *.0025%* is enough to be disastrous. So yes, correlations are more resistant to selection bias, but that's not a very high bar.

Correlations are less sensitive, but selection effects can still matter a lot. As an example, consider that among students at any particular college, SAT reading and math scores will be strongly negatively correlated, despite being strongly positively correlated in the population as a whole: if a student had a higher score on both reading and math, they'd be going to a better college, after all, so we're effectively holding total SAT constant at any particular school.

So the question is, are people who follow Aella or read SSC as weird a population as a particular college's student body? I'd say yes. Of course though, it depends on the topic. For your mysticism result, I'm not worried, because IIRC you observe the same correlations in the GSS and NHIS--which get 60% response rates when sampling a random subset of the population. But I definitely wouldn't trust the magnitude, and I'd have made an attempt at poststratifying on at least a couple variables. Just weighting to the GSS+Census by race, income, religion, and education would probably catch the biggest problems.

Expand full comment

If this is about the last article your general point is correct but you polled a readership that's notoriously hostile to spirituality to determine if mental health correlates to spirituality. It'd be like giving Mensa folks a banana and measuring their IQ. You selected specifically for one of the variables and that's likely to introduce confounders.

Expand full comment

Reply (1)

Hopeful

Dec 27, 2022

Going to Aella's tweet that was linked:

> using it as a way to feel superior to studies, than judiciously using it as criticism when it's needed

just because people use selection bias as a way to feel superior to studies doesn't mean that the study isn't biased in the first place

and

> But real studies by professional scientists don’t have selection bias, because...

ignoring the fact that professional studies control for selection bias, or at least have a section in the paper where the participants are specified, unlike twitter polls

Expand full comment

Reply (1)

Federico

Dec 27, 2022

Selection bias can and absolutely does break correlations, frequently. The most obvious way is through colliders (http://www.the100.ci/2017/03/14/that-one-weird-third-variable-problem-nobody-ever-mentions-conditioning-on-a-collider/) - but there's tons of other ways in which this can happen: the mathematical conditions that have to hold for a correlation to generalize to a larger population when you are observing it in a very biased subset are pretty strict.

Further: large sample sizes do help, but, they do not help very much. There is a very good paper that only requires fairly basic math that tackles the problem of bias in surveys: https://statistics.fas.harvard.edu/files/statistics-2/files/statistical_paradises_and_paradoxes.pdf (not - this is not specifically correlations, but the problem is closely related). Here is the key finding:

Estimates obtained from the Cooperative Congressional Election Study (CCES) of the 2016 US presidential election suggest a ρR,X ≈ −0.005 for self-reporting to vote for Donald Trump. Because of LLP, this seemingly minuscule data defect correlation implies that the simple sample proportion of the self-reported voting preference for Trump from 1% of the US eligible voters, that is, n ≈ 2,300,000, has the same mean squared error as the corresponding sample proportion from a genuine simple random sample of size n ≈ 400, a 99.98% reduction of sample size (and hence our confidence)

And keep in mind - this is in polling, which 'tries' to obtain a somewhat representative sample (ie, this sample is significantly less biased than a random internet sample).

Expand full comment

melee_warhead

Dec 27, 2022

Looking at Aella's data & use of it, I don't have the same concerns I may have about the SSC survey used on religious issues.

So this chart, for example:

https://twitter.com/Aella_Girl/status/1607641197870186497

I am not aware of a likely rationale for these results to change by the selection effect, specifically on the axis studied. I may wave the selection effects concerns if the slope were the specific question, but not the presence of a slope.

Even further, I don't have a background where Aella is trying to turn a very messy & vague original problem statement into something to attempt to refute without providing a number of caveats.

It is valid to push back that selection effects are everywhere. It is valid to argue that SSC data has some evidentiary value, and that as good Bayesians we should use it as evidence. The tone of the post does not hit the right note not to have it rejected.

However, to push-back the push-back, I would seriously try to assess if you have a challenge in dealing with disagreements or challenges. Not trying to psychologize this too much, however, is this post actually trying to raise the discourse? Or is this post just trying to nullify criticism? Are you steelmanning the concern, or are you merely rebutting it?

Expand full comment

An interesting solution to the problem that surveys are so easy to give online (creating strong publication/heard of bias) would be to setup a website where poll givers have to post a certain sized donation (say to givewell) to give the survey to duplicate the effect of offline polls being expensive to give thereby reducing publication bias.

Expand full comment

apxhard

Dec 27, 2022

I’ve been thinking a possible online business / salve for democracy would be “a weekly election on what matters most to you.” Basically like a Twitter poll but slightly less crazy.

If people volunteer their demographic info, this would be very valuable for customers like businesses and politicians. End users get the satisfaction of someone somewhere finally listening

Expand full comment

kjz

Dec 27, 2022·edited Dec 27, 2022

I'm sympathetic to pushing back on lazy criticism, but also I think the context of how the result was produced is very important for calibrating how strongly one can take it as evidence. It's certainly true that all surveys are inherently "flawed" due to selection bias issues. There's a few ways to proceed from this:

(1) Throw up one's hands, declare the truth unknowable, and post a picture of an airplane wing with bullet holes.

(2) Acknowledge that this survey, like all surveys, is imperfect. But hey, the result sure is interesting, it makes some kind of intuitive sense, and there's no obvious reason why it really shouldn't generalize. Take the exact numbers with a grain of salt and hope that the first order effect dominates, as it often does.

(3) Do a lot of careful statistical analysis to attempt to correct for unrepresentative aspects of the sample. Compare results to literature for previous research into related questions. Submit to peer review and respond to critical feedback. Attempt to replicate.

Response (1) is the kind of lazy critique that this post argues against, and I agree that it is poor form and doesn't contribute much. Response (2) is reasonable for generating hypotheses and building intuition about the world, but it will also lead you astray a nontrivial fraction of the time. Response (3) is closer to what a professional researcher would do, but it takes a lot more time and expertise and will still be wrong sometimes.

I think the interesting conflict comes from conflating (2) vs (3). Someone accustomed to (3) and may look at people doing (2) as naïve and out of their depth, and also dilutive to more rigorous work because it may look the same to undiscerning lay people. Meanwhile someone doing (2) may look at people demanding (3) as gatekeepers with excessive demands for rigor whose preferred methods aren't exactly bulletproof either. This could easily degenerate into a toxic discourse where people just yell past each other. But provided they are given with appropriate context, I think both (2) & (3) can be useful ways to build knowledge about the world. Rigor is useful but it's not a binary where everything insufficiently rigorous must be discarded as useless and anything that meets the bar accepted as eternal truth.

Expand full comment

Reply (1)

Roger Sweeny

Dec 27, 2022

"But generalize to the entire US population, and poor people will be more obese, because they can’t afford healthy food / don’t have time to exercise / possible genetic correlations."

And, to be impolite, because many of the same things that make them more likely to be poor make them more likely to be obese: lower intelligence, less ability to defer gratification, less ability to plan and follow through, etc.

Expand full comment

Well, sure. I think the steelman argument is that selection bias is often much worse for a survey on the internet than a psych 101 study. No psych professor has to worry whether all their respondents are all horny, always online boys because they recruited by posting nudes on Twitter, or whether they’re all participating in the study just to fuck with someone’s results.

Also, your banana study title is killing me. It shows correlation, not causation, and as we all know…

Expand full comment

Cosimo Giusti

Sópori Books

Dec 27, 2022

When I was diagnosed with pancreatitis, I immediately searched the internet for information. Unfortunately, the first serious-looking research paper I found declared the ailment had a 60% survival rate in five years.

I didn't like that one bit, so I kept looking. After a couple weeks I found another paper that declared the five-year survival rate was over 90%. I liked that paper a lot better.

Seven years on, my survival rate is 100%. So, Is my confirmation bias confirmed?

Expand full comment

qbolec

Dec 27, 2022·edited Dec 27, 2022

I wonder if the mere fact that you restrict the sample on x axis, or y axis, causes the correlation between x and y variables to be completely different than in the general population.

For example: suppose that psychology students never eat less than one banana per year - other than that they do not have any fancy physiology or mental properties - wouldn't that alone restrict the "elliptic" picture of the x-y correlation to a fragment in which this ellipse has a particular slope?

I've made a tool to help me visualize this:

https://codepen.io/qbolec/pen/qBybXQe

in this demo there are two variables:

X is a normal variable with mean=0 and variance=1

Y depends on X, in that it is a Gaussian with mean=X*0.3 and variance=1

So, we expect the correlation to be positive, because the higher the X, the higher the Y in general and indeed the white dots form a slanted elliptic cloud. And the correlation in general population seems to be ~0.29.

But if we restrict the picture to the green zone in the upper right corner of the ellipse, I sometimes get negative correlation for such sub-sample, and I never get close to 0.3.

(Sorry, I could not get this demo to robustly show the negative value, though)

IIRC the https://www.lesswrong.com/posts/dC7mP5nSwvpL65Qu5/why-the-tails-come-apart was about this phenomenon.

Expand full comment

polscistoic

polscistoic’s Substack

Dec 27, 2022

Scott is a clever guy, but here he is on thin ice, for reasons others have pointed out above. Testing correlations (hunting for causality) the way he did in the blog post he refers to (healthy people less often report mystical experiences) is a subtler version of what is commonly referred to as “red wine research”.

…lots of studies find a positive correlation between drinking red wine and scoring high on various health measures. Some researcher is then quoted in the news media suggesting a causal relationship: There must be something in red wine that improves health. And there may be.

However, drinking red wine is correlated to being upper middle class. And upper middle class people score higher on many/most health indicators.

You can do multivariate regressions and the like to reduce the problem, but the number of control variables will always be limited. Unobserved heterogeneity is always with us, in such correlation studies. The problem is particularly acute if you do not even have a time series (panel study).

The problem with the correlation between health and mystical experience is more subtle - it is not a straightforward 3rd variable problem. So it is not a straightforward “red wine research” problem (I do not want to insinuate that Scott is not aware of statistics 101). The subtler problem first has to do with possible selection in who among healthy people that are ACX readers & who among them that filled in the survey. Perhaps they are a particularly secular bunch of healthy people, who give secular explanations to “strange” personal experiences that run-of-the-mill healthy people would label mystical experiences. Secondly, it has to do with the possibility that not-so-healthy ACX readers who filled in the survey may be a more mystically oriented bunch of people than run-of-the-mill not-so-healthy people. If so, they might be more likely than other not-so-healthy people to interpret “strange” experiences as mystical.

…this is based on a speculative hypothesis that ACX readers are composed of two groups of people: particularly secular rationalists drawn to Scott’s writing on rationalism, and particularly mystically-oriented people drawn to his writings on, well, mystical experiences of various sorts. And that there are correlations with self-declared health between these two select groups of readers (who responded to the survey).

Who knows.

Expand full comment

Reply (2)

Jordan

Dec 27, 2022

Is this because of all the comments on your last post?

The issue I had wasn't that selection bias is present in your survey, that's unavoidable. The issue I had was that you were far more conclusive than your survey allowed you to be. You misused your data and stood on a soapbox at the end there.

Expand full comment

Reply (1)

Martin S

Dec 28, 2022·edited Dec 28, 2022

Sample selection can be a problem for other reasons as well (i.e. Berkson's paradox).

Expand full comment

Astral Codex Ten

Selection Bias Is A Fact Of Life, Not An…