__Ignoring statistics (particularly
probability) in heuristic inductive thinking-__

__Dr. Munish Alagh, PDF-IIM(A).__

Inductive
Thinking involves, generalising from the particular to the general.

Heuristics,
involve cognitive short-cuts to make a decision easily.

Heuristics
people use in inductive reasoning tasks often do not respect the required
statistical principles. People consequently overlook statistical variables such
as sample size, correlation, and base rate when they solve inductive reasoning
problems
__Statistical
Problems and Nonstatistical Heuristics__

As we have seen, people often solve inductive
problems by use of a variety of intuitive heuristics—rapid and more or less
automatic judgmental rules of thumb. These include the representativeness
heuristic (Kahneman & Tversky, 1972, 1973), the availability heuristic
(Tversky & Kahneman, 1973), and the anchoring heuristic (Tversky &
Kahneman, 1974). In problems where these heuristics diverge from the correct
statistical approach, people commit serious errors of inference. The
following heuristics, the biases they lead to and the statistical principles
that are ignored therein are discussed
:

Ã˜
Representativeness.

Ã˜
Adjustment and Anchoring.

Ã˜
Availability.

__Representativeness:__

According to Kahneman and Tversky (1974)
there are three types of probabilistic questions with which people are
concerned.

What is the probability that object
A belongs to class B?

What is the probability that event A
originates from process B?

What is the probability that process
B will generate event A?

People answer such questions by
relying on the representativeness heuristic according to which probabilities
are evaluated by the degree to which A is representative of B, that is, by the
degree to which A resembles B. For example, when A is highly representative
of B, the probability that A originates
from B is judged to be high. On the other hand, if A is not similar to B, the
probability that A originates from B is judged to be low.

There is a type of research on problems of a particular
type which has shown that people order the occupations by probability and by
similarity in exactly the same way.^{[3]}
They consider that a person, Steve, whose probability that he is a librarian,
for example, is assessed by the degree to which he is representative of, or
similar to, the stereotype of a librarian. This is known as the
representativeness heuristic.

Infact people who are asked to
assess probability are not stumped, because they do not try to judge
probability as statisticians and philosophers use the word. A question about
probability or likelihood activates a mental shotgun, evoking answers to easier
questions. Judging probability by representativeness has important virtues: the
intuitive impressions that it produces are often-indeed, usually-more accurate
than chance guesses would be.^{[4]}

This approach to the judgement of
probability however leads to serious errors, because similarity, or
representativeness, is not influenced by several factors that should affect
judgments of probability:

__Insensitivity to prior probability of
outcomes: __

One
of the factors that have no effect on representativeness but should have a
major effect on probability is the prior probability, or base-rate frequency,
of the outcomes. In case of Steve, for example, the fact that there are many
more farmers than librarians in the population should enter into any reasonable
estimate of the probability that Steve is a librarian rather than a farmer.
Considerations of base-rate frequency, however, do not affect the similarity if
people evaluate probability by representativeness, therefore, prior
probabilities will be neglected. Certain differing prior probabilities were
given for two professions to subjects, in two different cases, also the
personality description of several individuals, allegedly sampled at random
from a group of 100 professionals, including both the occupations were given.
The subjects were asked to assess, for each description, the probability that
it belonged to one of the occupations. The odds that any particular description
belongs to any one of the professions should be higher when the prior
probability of that particular occupation is more. However subjects in the two
conditions produced essentially the same probability judgements. Apparently,
subjects evaluated the likelihood that a particular description belonged to a
particular occupation, from the two, by the degree to which the description was
representative of the two stereotypes, with little or no regard for the prior
probabilities of the categories.

The
subjects used prior probabilities correctly when they had no other information.
However, prior probabilities were effectively ignored when a description was
introduced, even when this description was totally uninformative. Evidently,
people respond differently when given no evidence and when given worthless
evidence. When no specific evidence is given, prior probabilities are ignored.^{[5]}
Nisbett
and Borgida (1975),quoted in The reference stated above
showed that consensus information, that is, base rate information about the
behaviour of a sample of people in a given situation, often has little effect
on subjects attributions about the causes of a particular target individual's
behavior. When told that most people behaved in the same way as the target,
subjects shift little or not at all in the direction of assuming that it was
situational forces, rather than the target's personal dispositions or traits,
that explain the target's behavior.
It
is noticed that subjects use prior probabilities correctly when they have no
other information. However, prior probabilities are effectively ignored when a
description is introduced, even when this description is totally uninformative.
Evidently, people respond differently when given no evidence and when given worthless
evidence. When no specific evidence is given, prior probabilities are ignored.^{[7]}
__Insensitivity
to sample size:__

To evaluate the probability of
obtaining a particular result in a sample drawn from a specified population,
people typically apply the representativeness heuristic. That is, they assess
the likelihood of a sample result by the similarity of this result to the
corresponding parameter The similarity of a sample statistic to a population
parameter does not depend on the size of the sample. Consequently, if
probabilities are assessed by representativeness, then the judged probability
of a sample statistic will be essentially independent of sample size. Indeed,
when subjects assessed the distributions of the sample results for samples of
various sizes, they produced identical distributions . A similar insensitivity
to sample size has been reported in judgments of posterior probability, that
is, of the probability that a sample has been drawn from one population rather
than from another. Here again, intuitive judgments are dominated by the sample
proportion and are essentially unaffected by the size of the sample, which
plays a crucial role in the determination of the actual posterior odds ^{[8]}. In
addition, intuitive estimates of posterior odds are far less extreme than the
correct values. The underestimation of the impact of evidence has been observed
repeatedly in problems of this type.^{[9]}
It has been labeled "conservatism."

__Misconceptions of chance:__

People
expect that a sequence of events generated by a random process will represent
the essential characteristics of that process even when the sequence is short.
Thus, people expect that the essential characteristics of the process will be
represented, not only globally in the entire sequence, but also locally in each
of its parts. A locally representative sequence, how-ever, deviates
systematically from chance expectation: it contains too many alternations and
too few runs. Another consequence of the belief in local representativeness is
the well-known gambler's fallacy. Chance is commonly viewed as a
self-correcting process in which a deviation in one direction induces a
deviation in the opposite direction to restore the equilibrium. In fact,
deviations are not "corrected" as a chance process unfolds, they are
merely diluted. Misconceptions of chance are not limited to naive subjects. A
study of the statistical intuitions of experienced research psychologists^{[10]}
revealed a lingering belief in what may be called the "law of small
numbers," according to which even small samples are highly representative
of the populations from which they are drawn. The responses of these
investigators reflected the expectation that a valid hypothesis about a
population will be represented by a statistically significant result in a
sample with little regard for its size. As a consequence, the researchers put
too much faith in the results of small samples and grossly overestimated the
replicability of such results. In the actual conduct of research, this bias
leads to the selection of samples of inadequate size and to overinterpretation
of findings.
__Insensitivity
to predictability:__ People are
sometimes called upon to make such numerical predictions as the future value of
a stock, the demand for a commodity, or the outcome of a football game. Such
predictions are often made by representativeness. The degree to which the
description is favorable is unaffected by the reliability of that description
or by the degree to which it permits accurate prediction. Hence, if people
predict solely in terms of the favorableness of the description, their
predictions will be insensitive to the reliability of the evidence and to the
expected accuracy of the prediction demonstrated that intuitive predictions
violate this rule, and that subjects show little or no regard for
considerations of predictability ^{[11]}
That
is, the prediction of a remote criterion was identical to the evaluation of the
information on which the prediction was based The students who made these
predictions were undoubtedly aware of the limited predictability never-theless,
their predictions were as ex-treme as their evaluations**.**

This
mode of judgment violates the normative statistical theory in which the
extremeness and the range of predictions are controlled by considerations of
predictability. When predictability is nil, the same prediction should be made
in all cases If predictability is perfect, of course, the values predicted will
match the actual values and the range of predic-tions will equal the range of
outcomes. In general, the higher the predictability, the wider the range of
predicted values. Several studies of numerical prediction have

__The illusion of validity:__

As we have seen, people often predict by selecting the
outcome (for example, an occupation) that is most representative of the input
(for example, the description of a person). The confidence they have in their
prediction depends primarily on the degree of representativeness (that is, on
the quality of the match between the selected outcome and the input) with little
or no regard for the factors that limit predictive accuracy The unwarranted
confidence which is produced by a good fit between the predicted outcome and
the input information may be called the illusion of validity. This illusion
persists even when the judge is aware of the factors that limit the accuracy of
his predictions.

The internal consistency of a pattern of inputs is a
major determinant of one's confidence in predictions based on these inputs
Highly consistent patterns are most often observed when the input variables are
highly redundant or correlated. Hence, people tend to have great con-fidence in
predictions based on redundant input variables. However, an elementary result
in the statistics of correlation asserts that, given input variables of stated
validity, a prediction based on several such inputs can achieve higher accuracy
when they are independent of each other than when they are redundant or
correlated. Thus, redundancy among inputs decreases accuracy even as it
increases confidence, and people are often confident in predictions that are
quite likely to be off the mark^{[12]}

__Regression
to the mean__:

Regression to the mean- involves moving closer
to the average than the earlier value of the variable observed. Also regression
to the mean has an explanation, but does not have a cause.^{[13]}
An
important principle of skill training: rewards for improved performance work
better than punishment of mistakes. This proposition is supported by much
evidence from research.

Regression
to the mean, involves that poor performance is typically followed by
improvement and good performance by deterioration, without any help from either
praise or punishment.

The
feedback to which life exposes us is perverse. Because we tend to be nice to
other people when they please us and nasty when they do not, we are
statistically punished for being nice and rewarded for being nasty.

Regression
does not have a causal explanation. Regression effects are ubiquitous, and so
are misguided casual stories to explain them. The point to remember is that the
change from the first to the second occurrence does not need a causal
explanation. It is a mathematically inevitable consequence of the fact that
luck played a role in the outcome of the first occurence.

Regression
inevitably occurs when the correlation between two measures is less than
perfect.

The
correlation coefficient between two measures, which varies between 0 and 1, is
a measure of the relative weight of the factors they share.

Correlation
and regression are not two concepts-they are different perspectives on the same
concept. The general rule is straightforward but has surprising consequences:
whenever the correlation between two scores is imperfect, there will be
regression to the mean.

Our
mind is strongly biased toward causal explanations and does not deal well with
“mere statistics.” When our attention is called to an event, associative memory
will look for its cause, more precisely, activation will automatically spread
to any cause that is already stored in memory. Causal explanations will be
evoked when regression is detected, but they will be wrong because the truth is
that regression to the mean has an explanation but does not have a cause.

Regression
effects are a common source of trouble in research, and experienced scientists
develop a healthy fear of the trap of unwarranted causal inference.

__Statistics
can be used, but is often not used in intuitive thinking:__

Even when judgments are
based on the representativeness heuristic, there may be an underlying stratum
of probabilistic thinking. In many of the problems studied by Kahneman and
Tversky, people probably conceive of the underlying process as random, but they
lack a means of making use of their intuitions about randomness and they fall
back on representativeness.

__Adjustment
and Anchoring____:__

Biases
in the evaluation of compound events are particularly significant in the
context of planning. The successful completion of an undertaking, such as the
development of a new product, typically has a conjunctive character: for the
undertaking to succeed, each of a series of events must occur. Even when each
of these events is very likely, the overall probability of success can be quite
low if the number of events is large. The general tendency to overestimate the
probability of conjunctive events leads to unwarranted optimism in the
evaluation of the likelihood that a plan will succeed or that a project will be
completed on time. Conversely, disjunctive structures are typically encountered
in the evaluation of risks. A complex system, such as a nuclear reactor or a
human body, will malfunction if any of its essential components fails. Even
when the likelihood of failure in each component is slight, the probability of
an overall failure can be high if many components are involved. Because of
anchoring, people will tend to underestimate the probabilities of failure in
complex systems.

The subjects
state overly narrow confidence intervals which reflect more certainty than is
justified by their knowledge about the assessed quantities.

__Anchoring in
the assessment of subjective probability distributions.:__ the
subjects state overly narrow confidence intervals which reflect more certainty
than is justified by their knowledge about the assessed quantities

it
is natural to begin by thinking about one's best estimate of the parameter and
to adjust this value upward. If this adjustment like most others is
insufficient, then the upper value of the distribution will not be sufficiently
extreme. A similar anchoring effect will occur in the selection of the lower
value of the distribution, which is presumably obtained by adjusting one's best
estimate downward. Consequently, the confidence interval between the lower and
upper values of the distribution will be too narrow, and the assessed probability
distribution will be too tight.

__Availability:
__

Availability
which is discussed above, is affected by various factors which are not related
to actual frequency. If the availability heuristic is applied, then such
factors will affect the perceived frequency of classes and the subjective
probability of events. Consequently, not only does the use of the availability
heuristic leads to systematic biases, there are also effects on the statistical
picture which is pictured by us as a result.

__“Errors” in probabilistic reasoning are in fact __*not
*violations of probability

Most so-called
“errors” in probabilistic reasoning are in fact *not *violations of
probability theory. Examples of such “errors” include overconfidence bias,
conjunction fallacy, and base-rate neglect.
Over-confidence
bias-systematic discrepancy between confidence and relative frequency is termed
“overconfidence.”

Has probability
theory been violated if one’s *degree of belief (confidence) in a single
event *(i.e., that a particular answer is correct) is different from the *relative
frequency *of correct answers one generates in the long run? The answer is “no.” It is in fact *not
*a violation according to several interpretations of probability. According
to the frequentists, probability theory is about frequencies, not about single
events. To compare the two means comparing apples with oranges. According to
subjectivists a discrepancy between confidence and relative frequency is not a
“bias,” albeit for diff erent reasons. For a subjectivist, probability *is *about
single events, but rationality is identified with the internal consistency of
subjective probabilities. So,
in conclusion, a discrepancy between confidence in single events and relative
frequencies in the long run is not an error or a violation of probability
theory from many experts’ points of view. It only looks so from a narrow
interpretation of probability that blurs the distinction between single events
and frequencies fundamental to probability theory.

__Conjunction
fallacy__-

The original demonstration of the “Conjunction fallacy” was with problems
involving matching a description of a lady, with a) her profession and b) her
profession and an activity she was involved in. Subjects were asked which of two
alternatives was more probable. Tversky and Kahneman, however, argued that the
“correct” answer is a), because the probability of a conjunction of two events,
such as b), can never be greater than that of one of its constituents. They explained
this “fallacy” as induced by the representativeness heuristic. They assumed
that judgments were based on the match (similarity, representativeness) between
the description of the lady and the

two alternatives.
That is, since the lady was described based on her activity and b)

contains
her activity people believe that b)is more probable.

Is the
“conjunction fallacy” a violation of probability theory, as has been claimed in
the literature? Has a person who chooses b) as the more probable alternative
violated probability theory? Again, the answer is “no.” Choosing b) is *not *a
violation of probability theory, and for the same reason given above. For a
frequentist, this problem has nothing to do with probability theory. Subjects
are asked for the probability of a *single event *(that the lady has a
particular profession), not for frequencies. Note that problems which are
claimed to demonstrate the “conjunction fallacy” are structurally slightly
different from “confidence” problems. In the former, subjective probabilities (
a) or b)) are compared with one another; in the latter, they are compared with
frequencies. To summarize the normative issue, what is called the “conjunction
fallacy” is a violation of *some* subjective theories of probability. It
is not, however, a violation of the major view of probability, the frequentist
conception.

__The base-rate fallacy__

The example is
from Casscells, Schoenberger, and Grayboys (1978, p. 999) and presented by
Tversky and Kahneman (1982, p. 154) to demonstrate the generality of the
phenomenon:

If a test to
detect a disease whose prevalence is 1/1000 has a false positive rate of 5%,
what is the chance that a person found to have a positive result actually has
the disease, assuming you know nothing about the person’s symptoms or signs?

Sixty students
and staff at Harvard
Medical School
answered this medical diagnosis problem. Almost half of them judged the
probability that the person actually had the disease to be 0.95 (modal answer),
the average answer was 0.56, and only 18% of participants responded 0.02. The
latter is what the authors believed to be the correct answer. Note the enormous
variability in judgments.

Little has been
achieved in explaining *how *people make these judgments and *why *the
judgments are so strikingly variable.

But do statistics
and probability give one and only one “correct” answer to that problem?

The answer is
again “no.” And for the same reason, as the reader will already guess. As in
the case of confidence and conjunction judgments, subjects were asked for the
probability of a *single event, *that is, that

“a person found
to have a positive result actually has the disease.” If the mind is an
intuitive statistician of the frequentist school, such a question has no
necessary connection to probability theory.

A more serious
difficulty is that the problem does not specify whether or not the person was *randomly
*drawn from the population to which the base rate refers.

__Discussion__**
:**

Statistical
principles are not learned from everyday experience because the relevant
in-stances are not coded appropriately.

The
lack of an appropriate code also explains why people usually do not detect the
biases in their judgments of probability.

The
inherently subjective nature of probability has led many students to the belief
that coherence, or internal consistency, is the only valid criterion by which
judged probabilities should be evaluated. From the standpoint of the formal
theory of subjective probability, any set of internally consistent probability
judgments is as good as any other. This criterion is not entirely satisfactory,
because an internally consistent set of subjective probabilities can be
incompatible with other beliefs held by the individual. Consider a person whose
subjective probabilities for all possible outcomes of a coin-tossing game
reflect the gambler's fallacy. That is, his estimate of the probability of
tails on a particular toss increases with the number of consecutive heads that
preceded that toss. The judgments of such a person could be internally
consistent and therefore acceptable as adequate subjective probabilities
according to the criterion of the formal theory. These probabilities, however,
are incompatible with the generally held belief that a coin has no memory and
is therefore incapable of generating sequential dependencies. For judged
probabilities to be considered adequate, or rational, in-ternal consistency is
not enough. The judgments must be compatible with the entire web of beliefs
held by the individual. Unfortunately, there can be no simple formal procedure
for assessing the compatibility of a set of probability judgments with the
judge's total system of beliefs.