Skip to main content

Section 3.5 Investigation 1.11: Ganzfeld Experiments

Statistician Jessica Utts has conducted extensive analysis of studies that have investigated psychic functioning. (Combining results across multiple studies, often to increase power, is called meta-analysis.) Utts (1995) cites research from Bem and Honorton (1994) that analyzed studies that used a technique called ganzfeld.
In a typical ganzfeld experiment, a "receiver" is placed in a room relaxing in a comfortable chair with halved ping-pong balls over the eyes, having a red light shone on them. The receiver also wears a set of headphones through which [static] noise is played. The receiver is in this state of mild sensory deprivation for half an hour. During this time, a "sender" observes a randomly chosen target and tries to mentally send this information to the receiver … The receiver is taken out of the ganzfeld state and given a set of [four] possible targets, from which they must decide which one most resembled the images they witnessed. Most commonly there are three decoys along with a copy of the target itself. [Wikipedia]
Utts has stated that she believes there is convincing evidence of ESP but that the effect is small.

Checkpoint 3.5.1. State Hypotheses for ESP Study.

Suppose you want to test whether the subjects in these studies have ESP, with \(\pi\) equal to the actual probability that receivers identify the correct image among the four possible targets. State appropriate null and alternative hypotheses by specifying correct symbols and values.
\(H_0\text{:}\)
\(H_a\text{:}\)
Hint.
If there is no ESP, what would be the probability of correctly identifying the image by chance alone among four choices?
Solution.
\(H_0: \pi = 0.25\) (The receivers are just guessing randomly among the four targets)
\(H_a: \pi > 0.25\) (The receivers have some psychic ability to identify the correct target)

Checkpoint 3.5.2. Check CLT and Describe Null Distribution.

The Bem and Honorton study reports on 329 sessions. Is this a large enough sample size to employ the Central Limit Theorem? Report the predicted mean and standard deviation of the null distribution of sample proportions.
Large enough for CLT?
Mean:
Standard deviation:
Hint.
Check whether \(n\pi \geq 10\) and \(n(1-\pi) \geq 10\text{.}\)
Solution.
The sample size is large enough because \(n\pi = 329(0.25) = 82.25 > 10\) and \(n(1-\pi) = 329(0.75) = 246.75 > 10\text{.}\)
We expect the distribution of sample proportions to be approximately normal with mean \(\mu_{\hat{p}} = 0.25\) and standard deviation \(\text{SD}(\hat{p}) = \sqrt{\frac{0.25(0.75)}{329}} \approx 0.02387\text{.}\)
In Investigation 1.3, you determined that if someone was asked to choose among 5 symbols in 10 rounds, they would need to correctly identify 5 or more of the symbols to produce a p-value below 0.05 and be at least two standard deviations away from the expected 2 correct identifications. The result "5 or more correct" is often called the rejection region (the values of the statistic that lead us to reject the null hypothesis).

Definition.

The rejection region consists of the values we would need to observe for the statistic in the study in order to be willing to reject the null hypothesis.

Checkpoint 3.5.3. Find Rejection Region.

According to the distribution in CheckpointΒ 2, what proportion of the 329 sessions would need to be successful ("hits"), in order to reject the null hypothesis in CheckpointΒ 1 in favor of the alternative hypothesis at the 5% level of significance?
Report the rejection region found by the applet and include a one-sentence interpretation of this region.
Rejection region:
Interpretation:
Solution.
Rejection region: \(\hat{p} \geq 0.289\)
Interpretation: Someone would have to get at least 28.9% of the trials correct to be in the top 5% of the distribution.
Power applet showing null distribution with rejection region shaded at the 5% significance level

Definition.

A Type I error in a test of significance occurs when the null hypothesis is true but we decide to reject the null hypothesis. This type of error is sometimes referred to as a false positive or a "false alarm."

Checkpoint 3.5.4. Probability of Type I Error.

According to the applet, what is the (normal approximation) probability of making a Type I error in this study? How could you have known this in advance? (Note that the normal approximation is not required to correspond to an integer value for the rejection region.)
Hint.
What is the relationship between the Type I error probability and the significance level?
Solution.
By setting the level of significance to 0.05, we ensure the probability of a Type I error is at most 0.05.
While a Type I error focuses on the null hypothesis being true, what if the null hypothesis is false – how likely are we to make the correct decision to reject the null hypothesis?

Definitions.

A Type II error occurs when we fail to reject the null hypothesis even though the null hypothesis is false. This type of error is sometimes referred to as a false negative or a "missed opportunity."
The power of a test when \(\pi = \pi_a\) is defined as the probability of (correctly) rejecting the null hypothesis assuming this specific alternative value for the parameter. Thus, power reveals how likely our test is to detect a specific difference (or improvement or effect) that really is there.
Note: Power = 1 – P(Type II error)

Checkpoint 3.5.5. Describe Alternative Distribution.

Suppose the probability of identifying the correct symbol is actually 0.30. Describe how the distribution of sample proportions will differ from CheckpointΒ 2.
Hint.
What will the mean and standard deviation be when \(\pi = 0.30\text{?}\)
Solution.
With \(\pi = 0.30\text{,}\) the mean changes to 0.30 and the standard deviation changes to \(\sqrt{0.30(0.70)/329} \approx 0.0253\text{.}\)

Checkpoint 3.5.6. Calculate Power for Ο€ = 0.30.

In the applet:
Review the applet output and provide a one-sentence interpretation of the normal approximation value, 0.6645.
Interpretation:
Is this value the probability of a Type II error or the power of the test?
Solution.
If we test \(\pi = 0.25\) but in reality \(\pi = 0.30\text{,}\) we will correctly reject the null hypothesis in favor of the one-sided alternative in about 66% of random samples of size n = 329. So the power of this test is approximately 0.66.
This value is the power of the test.
Power applet showing null and alternative distributions with power calculation for Ο€ = 0.30
Additional power applet output showing detailed statistics for Ο€ = 0.30

Checkpoint 3.5.7. Calculate Power for Ο€ = 0.35.

Now find the probability of rejecting the null hypothesis that \(\pi = 0.25\) when in reality \(\pi = 0.35\text{?}\) Explain your steps. How has the probability changed and why?
Power:
Explanation:
Solution.
Power when \(\pi = 0.35\text{:}\) approximately 0.99
If the alternative value changes to 0.35, then the power increases to about 0.99. This makes sense because when the true parameter is farther from the null hypothesis value (0.35 vs 0.30), it becomes easier to detect the difference, so we have a higher probability of correctly rejecting the null hypothesis.
Power applet showing null and alternative distributions with power calculation for Ο€ = 0.35

Checkpoint 3.5.8. Effect of Changing Significance Level.

Change the level of significance from 0.05 to 0.01 and press Count. How does lowering the level of significance influence the:
Rejection region:
Probability of Type I Error:
Probability of Type II Error:
Power (assuming \(\pi_a = 0.35\)):
Solution.
Rejection region: becomes more extreme (larger); approximately \(\hat{p} \geq 0.3055\)
Probability of Type I Error: decreases to match 0.01
Probability of Type II Error: increases slightly to about 0.05
Power (assuming \(\pi_a = 0.35\)): decreases slightly from about 0.99 to about 0.95

Checkpoint 3.5.9. Small Sample Size with Exact Binomial.

Suppose the study had only involved 35 sessions. Because the sample size is small, use of the Central Limit Theorem is questionable. Uncheck the Normal Approximation box and check the Exact Binomial box. Specify a hypothesized probability of 0.25, an alternative probability of 0.35, and a level of significance of 0.05. Report the following:
Rejection region:
Probability of Type I Error:
Power (assuming \(\pi_a = 0.35\)):
Solution.
Rejection region: \(\hat{p} > 0.40\) (or 15 or more successes out of 35)
Probability of Type I Error: approximately 0.036
Power (assuming \(\pi_a = 0.35\)): approximately 0.32
Note: The probability of a Type II error is approximately 0.68.
Power applet showing exact binomial calculation for small sample size with n=35

Checkpoint 3.5.10. Compare Power for Different Sample Sizes.

How does the power in CheckpointΒ 9 compare to what you found for power earlier? [Hint: see CheckpointΒ 7] Explain why this relationship makes intuitive sense.
Solution.
The power is much smaller with the smaller sample size (about 32% vs 99%). This makes sense as there will be more sample-to-sample variation by chance alone with the smaller sample size, making it more difficult to distinguish observations coming from the two different distributions. In other words, there is more overlap in the two distributions and the probability of finding a sample proportion to convince us to reject the null hypothesis is smaller.
Many software packages will calculate power for you, especially with the normal approximation to the binomial.

Technology Detour β€” Calculating Power.

Keep in mind the results will differ a bit if you use the binomial distribution/only allow discrete values to find the rejection region.

Checkpoint 3.5.11. Calculating Power in R.

In R: The iscambinompower and iscamnormpower functions use the following inputs:
For example: iscamnormpower(LOS=.05, n=20, prob1=.25, alternative="greater", prob2=0.333)
should reveal both distributions and report the rejection region to achieve the level of significance, the observed level of significance, and the power.
Solution.
Example output from R:
R output showing power calculation using iscamnormpower function

Checkpoint 3.5.12. Calculating Power in JMP.

In JMP:
  1. Choose DOE > Sample Size Explorers > Power > Power for One Sample Proportion
  2. Specify the form of the alternative, the level of significance (Alpha), an alternative probability of success, and set the Test Method to Normal Approximation.
  3. Specify the Sample Size, the null probability of success (Assumed Proportion), and alternative probability of success (Alternative Proportion).
  4. Press Enter.
Solution.
Example output from JMP:
JMP Sample Size Explorer output showing power calculations
In fact, the most common application is to specify the desired power and solve for the necessary sample size before conducting the study to determine how many observations you should take.

Checkpoint 3.5.13. Sample Size for 80% Power at Ο€ = 0.30.

If your technology allows (or use trial and error), see how many sessions would be needed in the ganzfeld study to have at least an 80% chance of rejecting the null hypothesis if the actual probability of success is \(\pi = 0.30\text{.}\)
Sample size needed:
Solution.
We would need a sample size close to 500 (approximately 490) to achieve 80% power when \(\pi = 0.30\text{.}\)
R output showing sample size calculation for 80% power using iscambinompower function
JMP Sample Size Explorer showing power calculation for n=490
Power applet showing sample size needed for 80% power when Ο€ = 0.30

Checkpoint 3.5.14. Sample Size for 80% Power at Ο€ = 0.35.

How will your answer to (k) change if the actual probability of success is \(\pi = 0.35\text{?}\) Determine the number of sessions needed in this case.
Sample size needed:
Explanation:
Solution.
Approximately 170-180 sessions would be needed to achieve 80% power when \(\pi = 0.35\text{.}\)
If the actual probability of success is even further away from the hypothesized value (0.35 vs 0.30), then we won’t need as large a sample size to achieve the same power. The effect size is larger, making it easier to detect the difference with fewer observations.

Key Ideas.

In any test of significance, we need to guard against two types of errors:
  • Type I error = falsely rejecting a true null hypothesis ("false alarm")
  • Type II error = failing to reject a false null hypothesis ("missed opportunity")
We control the probability of a Type I error by stating a level of significance before we collect any data. We chiefly control the probability of a Type II error by determining what sample size is needed in a study to achieve a desired level of power. To do so, you need to establish in advance how false you think the null hypothesis is/what size of a difference you want to be able to detect. This is often done through subject-matter knowledge or prior study results.

Study Conclusions.

If the researchers believe there is a genuine but small effect (say 5 percentage points) of ESP, then with a sample size of 329, there is about a 65% chance that, when the probability of a correct identification equals 0.30, they will obtain a sample result that convinces them to reject the null hypothesis that \(\pi = 0.250\text{.}\) With a sample size of around 500, there is over 80% chance that the researchers will find evidence of ESP when \(\pi = 0.30\text{.}\) Keep in mind that with a large sample size, we can often find a statistically significant result that we may not consider practically significant.
Discussion: When the sample size is small, the binomial distribution can be used for power calculations, but notice that the discreteness of the binomial probability distribution can complicate matters. This is another example where the normal distribution is simpler, but do keep in mind use of the normal approximation is not always valid.

Subsection 3.5.1 Practice Problem 1.11A

For the research study on the mortality rate at St. George’s hospital (Investigation 1.4), the goal was to compare the mortality rate of that hospital to the national benchmark of 0.15.

Checkpoint 3.5.15. Identify Type of Error (Exceeding Benchmark).

If you were to conclude that the hospital’s death rate exceeds the national benchmark when it really does not, what type of error would you be committing?
  • Type I
  • Type II
  • Both
  • Neither
Solution.
Type I error. This would be rejecting the null hypothesis (that \(\pi = 0.15\)) when it is actually true.

Checkpoint 3.5.16. Identify Type of Error (Not Exceeding Benchmark).

If you were to conclude that the hospital’s death rate does not exceed the national benchmark when it really does, which type of error would you be committing?
  • Type I
  • Type II
  • Both
  • Neither
Solution.
Type II error. This would be failing to reject the null hypothesis when it is actually false (the death rate really does exceed 0.15).

Checkpoint 3.5.17. More Critical Error.

Which error, Type I or Type II, would you consider more critical here? Explain.
Solution.
A Type II error might be considered more critical because failing to detect that the hospital has a higher death rate than the benchmark means patients may continue to receive substandard care. However, a Type I error could also be serious as it might unfairly damage the hospital’s reputation. The answer may depend on the consequences of each type of error.

Checkpoint 3.5.18. Compare Power for Different Alternatives.

Suppose you wanted to know the power of the test when the mortality rate at this hospital was 0.20 and when the mortality rate at this hospital was 0.25. For which alternative probability would you have a lower chance of committing a Type II error? Explain.
Solution.
You would have a lower chance of committing a Type II error when \(\pi = 0.25\text{.}\) The power is higher when the true parameter value is further from the null hypothesis value (0.25 is further from 0.15 than 0.20 is), making it easier to detect the difference and reject the null hypothesis. Higher power means lower probability of Type II error.

Subsection 3.5.2 Practice Problem 1.11B

Checkpoint 3.5.19. Compare Type I Error Probabilities.

If Abe were to use a significance level of 0.05 and Bianca were to use a significance level of 0.01, who would have a smaller probability of Type I error? Explain briefly.
Solution.
Bianca would have a smaller probability of Type I error. The probability of a Type I error equals the significance level, so Bianca’s probability (0.01) is smaller than Abe’s (0.05).

Checkpoint 3.5.20. Compare Type II Error Probabilities.

If Abe were to use a significance level of 0.05 and Bianca were to use a significance level of 0.01, who would have a smaller probability of Type II error? Explain briefly.
Solution.
Abe would have a smaller probability of Type II error. With a less stringent significance level (0.05 vs 0.01), it’s easier to reject the null hypothesis, which means higher power and therefore lower probability of Type II error.

Checkpoint 3.5.21. Compare Type I Error with Different Sample Sizes.

If Abe were to use a significance level of 0.05 with a sample size of \(n = 50\) and Bianca were to use a significance level of 0.01 with a sample size of \(n = 100\text{,}\) who would have a smaller probability of a Type I error?
Solution.
Bianca would have a smaller probability of Type I error. The probability of Type I error is determined solely by the significance level (not by sample size), and Bianca’s significance level (0.01) is smaller than Abe’s (0.05).

Checkpoint 3.5.22. Compare Power with Different Sample Sizes.

If Abe were to use a significance level of 0.05 with a sample size of \(n = 50\) and Bianca were to use a significance level of 0.01 with a sample size of \(n = 100\text{,}\) whose test would have more power?
Solution.
This depends on the specific alternative being considered, but in general, Bianca’s test might have more power despite the lower significance level, because her larger sample size (100 vs 50) substantially increases power. However, Abe’s higher significance level also increases power. The net effect would depend on the specific parameter values.

Subsection 3.5.3 Practice Problem 1.11C

For the research study on the mortality rate at St. George’s hospital (Investigation 1.4), the goal was to decide whether the mortality rate of that hospital exceeds the national benchmark of 0.15. Suppose you plan to monitor the next 20 operations, using a level of significance of 0.05. Also suppose the actual death rate at this hospital equals 0.20.

Checkpoint 3.5.23. Rejection Region and Power for Ο€ = 0.20.

Determine the rejection region and the power of this test.
Solution.
Use technology with \(n = 20\text{,}\) \(\pi_0 = 0.15\text{,}\) \(\pi_a = 0.20\text{,}\) and \(\alpha = 0.05\) (one-sided greater).
Rejection region: 7 or more deaths out of 20 operations.
Power: approximately 0.13

Checkpoint 3.5.24. Rejection Region and Power for Ο€ = 0.25.

Repeat assuming the actual death rate at this hospital equals 0.25. How does the power compare, and why does this make sense?
Solution.
Rejection region: Still 7 or more deaths (this doesn’t depend on the alternative).
Power: approximately 0.27
The power has increased because the true parameter value (0.25) is further from the null value (0.15), making it easier to detect the difference.

Subsection 3.5.4 Practice Problem 1.11D

Suppose you want to test a person’s ability to discriminate between two types of soda. You fill one cup with Soda A and two cups with Soda B. The subject tastes all 3 cups and is asked to identify the odd soda. You record the number of correct identifications in 10 attempts. Assume level of significance \(\alpha = 0.05\) and a one-sided alternative.

Checkpoint 3.5.25. Calculate Power.

If the subject’s actual probability of a correct identification is 0.50, what is the power of this test for a level of significance of \(\alpha = 0.05\text{?}\)
Power:
Hint.
What is the null hypothesis?
Solution.
The null hypothesis is \(\pi = 1/3\) (guessing among 3 cups). Use technology with \(n = 10\text{,}\) \(\pi_0 = 1/3\text{,}\) \(\pi_a = 0.50\text{,}\) and \(\alpha = 0.05\text{.}\)
Power: approximately 0.38

Checkpoint 3.5.26. Interpret Power.

Write a one-sentence interpretation of the power you calculated in the previous checkpointΒ 25 in context.
Solution.
If the subject can correctly identify the odd soda 50% of the time, there is approximately a 38% chance that we will correctly reject the null hypothesis (that they are just guessing) with 10 trials at the 0.05 significance level.

Checkpoint 3.5.27. Power with 20 Attempts.

What is the power if you give the subject 20 attempts?
Power:
Solution.
With \(n = 20\text{,}\) the power increases to approximately 0.63. The larger sample size makes it easier to detect the subject’s ability if they truly can identify the odd soda 50% of the time.

Subsection 3.5.5 Practice Problem 1.11E

Suppose that you want to re-conduct the kissing study in a large city with a sample of 100 kissing couples. You want to test the null hypothesis \(H_0: \pi = 0.667\) against a one-sided alternative \(H_a: \pi < 0.667\) using a significance level of \(\alpha = 0.05\text{,}\) and you are concerned about the power of your test when \(\pi = 0.5\text{.}\)
Consider the following graphs:
Two normal distributions showing null hypothesis centered at 0.667 and alternative at 0.5, with regions labeled I, II, III, and IV

Checkpoint 3.5.28. Type I Error Region.

Which region(s) represents the probability of making a Type I error?
  • I
  • II
  • III
  • IV
Solution.
Region I represents the Type I error probability - the area in the left tail of the null distribution (centered at 0.667) that falls in the rejection region.

Checkpoint 3.5.29. Type II Error Region.

Which region(s) represents the probability of making a Type II error?
  • I
  • II
  • III
  • IV
Solution.
Region II represents the Type II error probability - the area under the alternative distribution (centered at 0.5) that does not fall in the rejection region.

Checkpoint 3.5.30. Power Region.

Which region(s) represents the power of the test?
  • I
  • II
  • III
  • IV
Solution.
Region III represents the power - the area under the alternative distribution (centered at 0.5) that falls in the rejection region.

Subsection 3.5.6 Practice Problem 1.11F

Checkpoint 3.5.31. Error Types Table.

The table below shows the possible states of the world and the possible decisions we can make. Indicate where the types of errors fall in this table and where the test makes the correct decision.
Table 3.5.32. Test Decisions and Errors
State of the world
Test decision \(H_0\) true \(H_0\) false
Reject \(H_0\)
Fail to reject \(H_0\)
Solution.
Table 3.5.33. Test Decisions and Errors - Solution
State of the world
Test decision \(H_0\) true \(H_0\) false
Reject \(H_0\) Type I error Correct decision (Power)
Fail to reject \(H_0\) Correct decision Type II error
You have attempted of activities on this page.