SAS Tutorial

  • SAS Tutorial
  • SAS - Overview
  • SAS - Environment
  • SAS - User Interface
  • SAS - Program Structure
  • SAS - Basic Syntax
  • SAS - Data Sets
  • SAS - Variables
  • SAS - Strings
  • SAS - Arrays
  • SAS - Numeric Formats
  • SAS - Operators
  • SAS - Loops
  • SAS - Decision Making
  • SAS - Functions
  • SAS - Input Methods
  • SAS - Macros
  • SAS - Dates & Times
  • SAS Data Set Operations
  • SAS - Read Raw Data
  • SAS - Write Data Sets
  • SAS - Concatenate Data Sets
  • SAS - Merging Data Sets
  • SAS - Subsetting Data Sets
  • SAS - Sort Data Sets
  • SAS - Format Data Sets
  • SAS - Output Delivery System
  • SAS - Simulations
  • SAS Data Representation
  • SAS - Histograms
  • SAS - Bar Charts
  • SAS - Pie Charts
  • SAS - Scatterplots
  • SAS - Boxplots
  • SAS Basic Statistical Procedure
  • SAS - Arithmetic Mean
  • SAS - Standard Deviation
  • SAS - Frequency Distributions
  • SAS - Cross Tabulations
  • SAS - T Tests
  • SAS - Correlation Analysis
  • SAS - Linear Regression
  • SAS - Bland-Altman Analysis
  • SAS - Chi-Square
  • SAS - Fishers Exact Tests
  • SAS - Repeated Measure Analysis
  • SAS - One-Way Anova

SAS - Hypothesis Testing

  • SAS Useful Resources
  • SAS - Quick Guide
  • SAS - Useful Resources
  • SAS - Questions and Answers
  • SAS - Discussion
  • Selected Reading
  • UPSC IAS Exams Notes
  • Developer's Best Practices
  • Questions and Answers
  • Effective Resume Writing
  • HR Interview Questions
  • Computer Glossary

Hypothesis testing is the use of statistics to determine the probability that a given hypothesis is true. The usual process of hypothesis testing consists of four steps as shown below.

Formulate the null hypothesis H0 (commonly, that the observations are the result of pure chance) and the alternative hypothesis H1 (commonly, that the observations show a real effect combined with a component of chance variation).

Identify a test statistic that can be used to assess the truth of the null hypothesis.

Compute the P-value, which is the probability that a test statistic at least as significant as the one observed would be obtained assuming that the null hypothesis were true. The smaller the P-value, the stronger the evidence against the null hypothesis.

Compare the p-value to an acceptable significance value alpha (sometimes called an alpha value). If p <=alpha, that the observed effect is statistically significant, the null hypothesis is ruled out, and the alternative hypothesis is valid.

SAS programming language has features to carry out various types of hypothesis testing as shown below.

To Continue Learning Please Login

  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Institute for Digital Research and Education

Proc TTest | SAS Annotated Output

The ttest procedure performs t-tests for one sample, two samples and paired observations.  The single-sample t-test compares the mean of the sample to a given number (which you supply).  The dependent-sample t-test compares the difference in the means from the two variables to a given number (usually 0), while taking into account the fact that the scores are not independent.  The independent samples t-test compares the difference in the means from the two groups to a given value (usually 0).  In other words, it tests whether the difference in the means is 0.  In our examples, we will use the hsb2  data set.

Single sample t-test

For this example, we will compare the mean of the variable write with a pre-selected value of 50.  In practice, the value against which the mean is compared should be based on theoretical considerations and/or previous research.

Summary statistics

a.  Variable – This is the list of variables.  Each variable that was listed on the var statement will have its own line in this part of the output.

b.  N – This is the number of valid (i.e., non-missing) observations used in calculating the t-test.

c.  Lower CL Mean and Upper CL Mean – These are the lower and upper bounds of the confidence interval for the mean. A confidence interval for the mean specifies a range of values within which the unknown population parameter, in this case the mean, may lie.  It is given by

where s is the sample deviation of the observations and N is the number of valid observations.  The t-value in the formula can be computed or found in any statistics book with the degree of freedom being N-1 and the p-value being 1- alpha /2, where alpha is the confidence level and by default is .95.  If we drew 200 random samples, then about 190 (200*.95) times, the confidence interval would capture the parameter mean of the population.

d.  Mean – This is the mean of the variable.

e.  Lower CL Std Dev and Upper CL Std Dev – Those are the lower and upper bound of the confidence interval for the standard deviation. A confidence interval for the standard deviation specifies a range of values within which the unknown parameter, in this case, the standard deviation, may lie. The computation of the confidence interval is based on a chi-square distribution and is given by the following formula

where S 2 is the estimated variance of the variable and alpha is the confidence level. If we drew 200 random samples, then about 190 (200*.95) of times, the confidence interval would capture the parameter standard deviation of the population.

f.  Std Dev – This is the standard deviation of the variable.

g.  Std Err – This is the estimated standard deviation of the sample mean.  If we drew repeated samples of size 200, we would expect the standard deviation of the sample means to be close to the standard error. The standard deviation of the distribution of sample mean is estimated as the standard deviation of the sample divided by the square root of sample size. This provides a measure of the variability of the sample mean.  The Central Limit Theorem tells us that the sample means are approximately normally distributed when the sample size is 30 or greater.

Test statistics

The single sample t-test tests the null hypothesis that the population mean is equal to the given number specified using the option H0= .  The default value in SAS for H0 is 0.  It calculates the t-statistic and its p-value for the null hypothesis under the assumption that the sample comes from an approximately normal distribution. If the p-value associated with the t-test is small (usually set at p < 0.05), there is evidence that the mean is different from the hypothesized value.  If the p-value associated with the t-test is not small (p > 0.05), then the null hypothesis is not rejected, and you conclude that the mean is not different from the hypothesized value.

In our example, the t-value for variable write is 4.14 with 199 degrees of freedom.  The corresponding p-value is .0001, which is less than 0.05.  We conclude that the mean of variable write is different from 50.

a.  Variable – This is the list of variables.  Each variable that was listed on the var statement will have its own line in this part of the output. If a var statement is not specified, proc ttest will conduct a t-test on all numerical variables in the dataset.

h.  DF – The degrees of freedom for the single sample t-test is simply the number of valid observations minus 1.  We loose one degree of freedom because we have estimated the mean from the sample.  We have used some of the information from the data to estimate the mean; therefore, it is not available to use for the test and the degrees of freedom accounts for this.

i.  t Value – This is the Student t-statistic.  It is the ratio of the difference between the sample mean and the given number to the standard error of the mean.  Since that the standard error of the mean measure the variability of the sample mean, the smaller the standard error of the mean, the more likely that our sample mean is close to the true population mean.  This is illustrated by the following three figures.

All three cases the difference between the population means are the same.  But with large variability of sample means, two populations overlap a great deal.  Therefore, the difference may well come by chance. On the other hand, with small variability, the difference is more clear. The smaller the standard error of the mean, the larger the magnitude of the t-value.  Therefore, the smaller the p-value. The t-value takes into account of this fact.

j.  Pr > |t| – The p-value is the two-tailed probability computed using t distribution.  It is the probability of observing a greater absolute value of t under the null hypothesis.  For a one-tailed test, halve this probability.  If p-value is less than the pre-specified alpha level (usually .05 or .01) we will conclude that mean is statistically significantly different from zero.  For example, the p-value for write   is smaller than 0.05. So we conclude that the mean for  write is significantly different from 50.

Dependent group t-test

A dependent group t-test is used when the observations are not independent of one another.  In the example below, the same students took both the writing and the reading test.  Hence, you would expect there to be a relationship between the scores provided by each student.  The dependent group t-test accounts for this.  In the example below, the t-value for the difference between the variables  write   and read is 0.87 with 199 degrees of freedom, and the corresponding p-value is .3868.  This is greater than our pre-specified alpha level, 0.05.  We conclude that the difference between the variables  write and read is not statistically significantly different from 0. In other words, the means for write and read are not statistically significantly different from one another.

a.  Difference – This is the list of variables.

c.  Lower CL Mean and Upper CL Mean – These are the lower and upper bounds of the confidence interval for the mean.  A confidence interval for the mean specifies a range of values within which the unknown population parameter, in this case the mean, may lie.  It is given by

h.  Difference – The t-test for dependent groups is to form a single random sample of the paired difference. Therefore, essentially it is a simple random sample test. The interpretation for t-value and p-value is the same as for the case of simple random sample.

i.  DF – The degrees of freedom for the paired observations is simply the number of observations minus 1. This is because the test is conducted on the one sample of the paired differences.

j.  t Value – This is the t-statistic.  It is the ratio of the mean of the difference in means to the standard error of the difference (.545/.6284).

k.  Pr > |t| – The p-value is the two-tailed probability computed using t distribution.  It is the probability of observing a greater absolute value of t under the null hypothesis.  For a one-tailed test, halve this probability.  If p-value is less than our pre-specified alpha level, usually 0.05, we will conclude that the difference is significantly from zero.  For example, the p-value for the difference between write and read is greater than 0.05, so we conclude that the difference in means is not statistically significantly different from 0.

Independent group t-test

This t-test is designed to compare means of same variable between two groups. In our example, we compare the mean writing score between the group of female students and the group of male students. Ideally, these subjects are randomly selected from a larger population of subjects. Depending on if we assume that the variances for both populations are the same or not, the standard error of the mean of the difference between the groups and the degree of freedom are computed differently. That yields two possible different t-statistic and two different p-values. When using the t-test for comparing independent groups, we need to test the hypothesis on equal variance and this is a part of the output that proc ttest produces. The interpretation for p-value is the same as in other type of t-tests.

a.  Variable – This column lists the dependent variable(s). In our example, the dependent variable is write .

b.  female – This column gives values of the class variable, in our case female . This variable is necessary for doing the independent group t-test and is specified by class statement.

c.  N – This is the number of valid (i.e., non-missing) observations in each group defined by the variable listed on the class statement (often called the independent variable).

d.  Lower CL Mean and Upper CL Mean – These are the lower and upper confidence limits of the mean.  By default, they are 95% confidence limits.

e.  Mean – This is the mean of the dependent variable for each level of the independent variable.  On the last line the difference between the means is given.

f.  Lower CL Std Dev and Upper LC Std Dev – These are the lower and upper 95% confidence limits for the standard deviation for the dependent variable for each level of the independent variable.

g.  Std Dev – This is the standard deviation of the dependent variable for each of the levels of the independent variable.  On the last line the standard deviation for the difference is given.

h.  Std Err – This is the standard error of the mean.

a. Variable – This column lists the dependent variable(s).  In our example, the dependent variable is write .

i.  Method – This column specifies the method for computing the standard error of the difference of the means.  The method of computing this value is based on the assumption regarding the variances of the two groups. If we assume that the two populations have the same variance, then the first method, called pooled variance estimator, is used. Otherwise, when the variances are not assumed to be equal, the Satterthwaite’s method is used.

j.  Variances – The pooled estimator of variance is a weighted average of the two sample variances, with more weight given to the larger sample and is defined to be

s 2 = ((n1-1)s1+(n2-1)s2)/(n1+n2-2),

where s1 and s2 are the sample variances and n1 and n2 are the sample sizes for the two groups. the This is called pooled variance. The standard error of the mean of the difference is the pooled variance adjusted by the sample sizes. It is defined to be the square root of the product of pooled variance and (1/n1+1/n2). In our example, n1=109, n2=91. The pooled variance = 108*8.1337 2 +90*10.305 2 /198=84.355. It follows that the standard error of the mean of the difference = sqrt(84.355*(1/109+1/91))=1.304. This yields our t-statistic to be -4.87/1.304=-3.734.

Satterthwaite is an alternative to the pooled-variance t test and is used when the assumption that the two populations have equal variances seems unreasonable. It provides a t statistic that asymptotically (that is, as the sample sizes become large) approaches a t distribution, allowing for an approximate t test to be calculated when the population variances are not equal.

k.  DF – The degrees of freedom for the paired observations is simply the number of observations minus 2. We use one degree of freedom for estimating the mean of each group, and because there are two groups, we use two degrees of freedom.

l.  t Value – This t-test is designed to compare means between two groups of the same variable such as in our example, we compare the mean writing score between the group of female students and the group of male students. Depending on if we assume that the variances for both populations are the same or not, the standard error of the mean of the difference between the groups and the degrees of freedom are computed differently.  That yields two possible different t-statistic and two different p-values.  When using the t-test for comparing independent groups, you need to look at the variances for the two groups. As long as the two variances are close (one is not more than two or three times the other), go with the equal variances test.  The interpretation for the p-value is the same as in other types of t-tests.

m.  Pr > |t| – The p-value is the two-tailed probability computed using the t distribution.  It is the probability of observing a t-value of equal or greater absolute value under the null hypothesis.  For a one-tailed test, halve this probability.  If the p-value is less than our pre-specified alpha level, usually 0.05, we will conclude that the difference is significantly different from zero. For example, the p-value for the difference between females and males is less than 0.05, so we conclude that the difference in means is statistically significantly different from 0.

n.  Num DF and Den DF – The F distribution is the ratio of two estimates of variances. Therefore it has two parameters, the degrees of freedom of the numerator and the degrees of freedom of the denominator. In SAS convention, the numerator corresponds to the sample with larger variance and the denominator corresponds to the sample with smaller variance. In our example, the male students group ( female =0) has variance of 10.305^2 (the standard deviation squared) and for the female students the variance is 8.1337^2. Therefore, the degrees of freedom for the numerator is 91-1=90 and the degrees of freedom for the denominator 109-1=108.

o.  F Value – SAS labels the F statistic not F, but F’, for a specific reason. The test statistic of the two-sample F test is a ratio of sample variances, F = s 1 2 /s 2 2 where it is completely arbitrary which sample is labeled sample 1 and which is labeled sample 2. SAS’s convention is to put the larger sample variance in the numerator and the smaller one in the denominator. This is called the folded F-statistic,

F’ = max(s 1 2 ,s 2 2 )/min(s 1 2 ,s 2 2 )

which will always be greater than 1. Consequently, the F test rejects the null hypothesis only for large values of F’. In this case, we get  10.305^2 / 8.1337^2 = 1.605165, which SAS rounds to 1.61.

p.  Pr > F – This is the two-tailed significance probability. In our example, the probability is less than 0.05. So there is evidence that the variances for the two groups, female students and male students, are different. Therefore, we may want to use the second method (Satterthwaite variance estimator) for our t-test.

Your Name (required)

Your Email (must be a valid email for us to receive the report!)

Comment/Error Report (required)

How to cite this page

  • © 2021 UC REGENTS
  • Flashes Safe Seven
  • FlashLine Login
  • Faculty & Staff Phone Directory
  • Emeriti or Retiree
  • All Departments
  • Maps & Directions

Kent State University Home

  • Building Guide
  • Departments
  • Directions & Parking
  • Faculty & Staff
  • Give to University Libraries
  • Library Instructional Spaces
  • Mission & Vision
  • Newsletters
  • Circulation
  • Course Reserves / Core Textbooks
  • Equipment for Checkout
  • Interlibrary Loan
  • Library Instruction
  • Library Tutorials
  • My Library Account
  • Open Access Kent State
  • Research Support Services
  • Statistical Consulting
  • Student Multimedia Studio
  • Citation Tools
  • Databases A-to-Z
  • Databases By Subject
  • Digital Collections
  • Discovery@Kent State
  • Government Information
  • Journal Finder
  • Library Guides
  • Connect from Off-Campus
  • Library Workshops
  • Subject Librarians Directory
  • Suggestions/Feedback
  • Writing Commons
  • Academic Integrity
  • Jobs for Students
  • International Students
  • Meet with a Librarian
  • Study Spaces
  • University Libraries Student Scholarship
  • Affordable Course Materials
  • Copyright Services
  • Selection Manager
  • Suggest a Purchase

Library Locations at the Kent Campus

  • Architecture Library
  • Fashion Library
  • Map Library
  • Performing Arts Library
  • Special Collections and Archives

Regional Campus Libraries

  • East Liverpool
  • College of Podiatric Medicine

how to do null hypothesis in sas

  • Kent State University
  • SAS Tutorials

Chi-Square Test of Independence

Sas tutorials: chi-square test of independence.

  • The SAS 9.4 User Interface
  • SAS Syntax Rules
  • SAS Libraries
  • The Data Step
  • Informats and Formats
  • User-Defined Formats (Value Labels)
  • Defining Variables
  • Missing Values
  • Importing Excel Files into SAS
  • Computing New Variables
  • Date-Time Functions and Variables in SAS
  • Sorting Data
  • Subsetting and Splitting Datasets
  • Merging Datasets
  • Transposing Data using PROC TRANSPOSE
  • Summarizing dataset contents with PROC CONTENTS
  • Viewing Data
  • Frequency Tables using PROC FREQ
  • Crosstabs using PROC FREQ
  • Pearson Correlation with PROC CORR
  • t tests are used to test if the means of two independent groups are significantly different. In SAS, PROC TTEST with a CLASS statement and a VAR statement can be used to conduct an independent samples t test." href="https://libguides.library.kent.edu/SAS/IndependentTTest" style="" >Independent Samples t Test
  • t tests are used to test if the means of two paired measurements, such as pretest/posttest scores, are significantly different. In SAS, PROC TTEST with a PAIRED statement can be used to conduct a paired samples t test." href="https://libguides.library.kent.edu/SAS/PairedSamplestTest" style="" >Paired Samples t Test
  • Exporting Results to Word or PDF
  • Importing Data into SAS OnDemand for Academics
  • Connecting to WRDS from SAS
  • SAS Resources Online
  • How to Cite the Tutorials

Sample Data Files

Our tutorials reference a dataset called "sample" in many examples. If you'd like to download the sample dataset to work through the examples, choose one of the files below:

  • Data definitions (*.pdf)
  • Data - Comma delimited (*.csv)
  • Data - Tab delimited (*.txt)
  • Data - Excel format (*.xlsx)
  • Data - SAS format (*.sas7bdat)
  • Data - SPSS format (*.sav)
  • SPSS Syntax (*.sps) Syntax to add variable labels, value labels, set variable types, and compute several recoded variables used in later tutorials.
  • SAS Syntax (*.sas) Syntax to read the CSV-format sample data and set variable labels and formats/value labels.

The Chi-Square Test of Independence determines whether there is an association between categorical variables (i.e., whether the variables are independent or related). It is a nonparametric test.

This test is also known as:

  • Chi-Square Test of Association.

This test utilizes a contingency table to analyze the data. A contingency table (also known as a cross-tabulation , crosstab , or two-way table ) is an arrangement in which data is classified according to two categorical variables. The categories for one variable appear in the rows, and the categories for the other variable appear in columns. Each variable must have two or more categories. Each cell reflects the total count of cases for a specific pair of categories.

There are several tests that go by the name "chi-square test" in addition to the Chi-Square Test of Independence. Look for context clues in the data and research question to make sure what form of the chi-square test is being used.

Common Uses

The Chi-Square Test of Independence is commonly used to test the following:

  • Statistical independence or association between two categorical variables.

The Chi-Square Test of Independence can only compare categorical variables. It cannot make comparisons between continuous variables or between categorical and continuous variables. Additionally, the Chi-Square Test of Independence only assesses associations between categorical variables, and can not provide any inferences about causation.

If your categorical variables represent "pre-test" and "post-test" observations, then the chi-square test of independence is not appropriate . This is because the assumption of the independence of observations is violated. In this situation, McNemar's Test is appropriate.

Data Requirements

Your data must meet the following requirements:

  • Two categorical variables.
  • Two or more categories (groups) for each variable.
  • There is no relationship between the subjects in each group.
  • The categorical variables are not "paired" in any way (e.g. pre-test/post-test observations).
  • Expected frequencies for each cell are at least 1.
  • Expected frequencies should be at least 5 for the majority (80%) of the cells.

The null hypothesis ( H 0 ) and alternative hypothesis ( H 1 ) of the Chi-Square Test of Independence can be expressed in two different but equivalent ways:

H 0 : "[ Variable 1 ] is independent of [ Variable 2 ]" H 1 : "[ Variable 1 ] is not independent of [ Variable 2 ]"

H 0 : "[ Variable 1 ] is not associated with [ Variable 2 ]" H 1 :  "[ Variable 1 ] is associated with [ Variable 2 ]"

Data Set-Up

Your dataset should have the following structure:

  • Each case (row) represents a subject, and each subject appears once in the dataset, represented in columns. That is, each row represents an observation from a unique subject.
  • The dataset contains at least two nominal categorical variables (string or numeric). The categorical variables used in the test must have two or more categories; they should also not have too many categories.

Example of a dataset structure where each row represents a case or subject. Screenshot shows a Vietable window with cases 1-11 and 425-435 from the sample dataset, with columns ids, Class rank, Gender, and Athlete.

Test Statistic

The test statistic for the Chi-Square Test of Independence is denoted Χ 2 , and is computed as:

$$ \chi^{2} = \sum_{i=1}^{R}{\sum_{j=1}^{C}{\frac{(o_{ij} - e_{ij})^{2}}{e_{ij}}}} $$

\(o_{ij}\) is the observed cell count in the i th row and j th column of the table

\(e_{ij}\) is the expected cell count in the i th row and j th column of the table, computed as

$$ e_{ij} = \frac{\mathrm{ \textrm{row } \mathit{i}} \textrm{ total} * \mathrm{\textrm{col } \mathit{j}} \textrm{ total}}{\textrm{grand total}} $$

The quantity ( o ij - e ij ) is sometimes referred to as the residual of cell ( i , j ), denoted \(r_{ij}\).

The calculated Χ 2 value is then compared to the critical value from the Χ 2 distribution table with degrees of freedom df = ( R - 1)( C - 1) and chosen confidence level. If the calculated Χ 2 value > critical Χ 2 value, then we reject the null hypothesis.

Run a Chi-Square Test of Independence with PROC FREQ

The general form is

The CHISQ option is added to the TABLES statement after the slash ( / ) character.

Many of PROC FREQ's most useful options have been covered in the tutorials on Frequency Tables and Crosstabs , but there are several additional options that can be useful when conducting a chi-square test of independence:

  • EXPECTED Adds expected cell counts to the cells of the crosstab table.
  • DEVIATION Adds deviation values (i.e., observed minus expected values) to the cells of the crosstab table.

Example: Chi-square Test for 2x2 Table

Problem statement.

Let's continue the row and column percentage example from the Crosstabs tutorial, which described the relationship between the variables RankUpperUnder (upperclassman/underclassman) and LivesOnCampus (lives on campus/lives off-campus). Recall that the column percentages of the crosstab appeared to indicate that upperclassmen were less likely than underclassmen to live on campus:

  • The proportion of underclassmen who live off campus is 34.8%, or 79/227.
  • The proportion of underclassmen who live on campus is 65.2%, or 148/227.
  • The proportion of upperclassmen who live off campus is 94.4%, or 152/161.
  • The proportion of upperclassmen who live on campus is 5.6%, or 9/161.

Suppose that we want to test the association between class rank and living on campus using a Chi-Square Test of Independence (using α = 0.05).

The first table in the output is the crosstabulation. If you included the EXPECTED and DEVIATION options in your syntax, you should see the following:

Crosstab produced by PROC FREQ when the EXPECTED, DEVIATION, NOROW, NOCOL, and NOPERCENT options are used.

With the Expected Count values shown, we can confirm that all cells have an expected value greater than 5.

These numbers can be plugged into the chi-square test statistic formula:

$$ \chi^{2} = \sum_{i=1}^{R}{\sum_{j=1}^{C}{\frac{(o_{ij} - e_{ij})^{2}}{e_{ij}}}} = \frac{(-56.15)^{2}}{135.15} + \frac{(56.147)^{2}}{91.853} + \frac{(56.147)^{2}}{95.853} + \frac{(-56.15)^{2}}{65.147} = 138.926 $$

We can confirm this computation with the results in the table labeled Statistics for Table of RankUpperUnder by LiveOnCampus :

Output table generated from the CHISQ option in PROC FREQ.

The row of interest here is Chi-Square .

  • The value of the test statistic is 138.926.
  • Because the crosstabulation is a 2x2 table, the degrees of freedom (df) for the test statistic is $$ df = (R - 1)*(C - 1) = (2 - 1)*(2 - 1) = 1 $$.
  • The corresponding p-value of the test statistic is so small that it is presented as p < 0.001.

Decision and Conclusions

Since the p-value is less than our chosen significance level α = 0.05, we can reject the null hypothesis, and conclude that there is an association between class rank and whether or not students live on-campus.

Based on the results, we can state the following:

  • There was a significant association between class rank and living on campus ( Χ 2 (1) = 138.9, p < .001).

Tutorial Feedback

  • << Previous: Pearson Correlation with PROC CORR
  • Next: Independent Samples t Test >>
  • Last Updated: Dec 18, 2023 12:59 PM
  • URL: https://libguides.library.kent.edu/SAS

Street Address

Mailing address, quick links.

  • How Are We Doing?
  • Student Jobs

Information

  • Accessibility
  • Emergency Information
  • For Our Alumni
  • For the Media
  • Jobs & Employment
  • Life at KSU
  • Privacy Statement
  • Technology Support
  • Website Feedback

Statology

Statistics Made Easy

How to Perform White’s Test in SAS

White’s test is used to determine if heteroscedasticity is present in a regression model.

Heteroscedasticity refers to the unequal scatter of residuals at different levels of a response variable in a regression model, which violates one of the key assumptions of linear regression that the residuals are equally scattered at each level of the response variable.

This tutorial explains how to perform White’s test in SAS to determine whether or not heteroscedasticity is a problem in a given regression model.

Example: White’s Test in SAS

Suppose we want to fit a multiple linear regression model that uses number of hours spent studying and number of prep exams taken to predict the final exam score of students:

Exam Score = β 0 + β 1 (hours) +β 2 (prep exams)

First, we’ll use the following code to create a dataset that contains this information for 20 students:

how to do null hypothesis in sas

Next, we’ll use proc reg to fit this multiple linear regression model along with the spec option to perform White’s test test for heteroscedasticity:

White's test in SAS

The last table in the output shows the results of White’s test.

From this table we can see that the Chi-Square test statistic is 3.54 and the corresponding p-value is  0.6175 .

White’s test uses the following null and alternative hypotheses:

  • Null (H 0 ) : Heteroscedasticity is not present.
  • Alternative (H A ): Heteroscedasticity is present.

Since the p-value is not less than 0.05, we fail to reject the null hypothesis.

This means we do not have sufficient evidence to say that heteroscedasticity is present in the regression model.

Thus, it’s safe to interpret the standard errors of the coefficient estimates in the regression summary table.

What To Do Next

If you fail to reject the null hypothesis of White’s test, then heteroscedasticity is not present and you can proceed to interpret the output of the original regression.

However, if you reject the null hypothesis, this means heteroscedasticity is present in the data. In this case, the standard errors that are shown in the output table of the regression may be unreliable.

There are a couple common ways that you can fix this issue, including:

1. Transform the response variable.  You can try performing a transformation on the response variable.

For example, you could use the log of the response variable instead of the original response variable.

T ypically taking the log of the response variable is an effective way of making heteroscedasticity go away.

Another common transformation is to use the square root of the response variable.

2. Use weighted regression.  This type of regression assigns a weight to each data point based on the variance of its fitted value.

This gives small weights to data points that have higher variances, which shrinks their squared residuals.

When the proper weights are used, this can eliminate the problem of heteroscedasticity.

Featured Posts

7 Common Beginner Stats Mistakes and How to Avoid Them

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Join the Statology Community

Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!

By subscribing you accept Statology's Privacy Policy.

IMAGES

  1. SAS Hypothesis Testing

    how to do null hypothesis in sas

  2. Null Hypothesis

    how to do null hypothesis in sas

  3. Null hypothesis

    how to do null hypothesis in sas

  4. Null and Alternative hypothesis

    how to do null hypothesis in sas

  5. Research Hypothesis Generator

    how to do null hypothesis in sas

  6. Null And Alternative Hypothesis Calculator Cheap Orders, Save 66%

    how to do null hypothesis in sas

VIDEO

  1. Hypothesis Testing: the null and alternative hypotheses

  2. How can you avoid null values or unusual values in your datasets #sas

  3. SAS Studio

  4. Testing and Estimation

  5. Testing and Estimation

  6. Hypothsis Testing in Statistics Part 2 Steps to Solving a Problem

COMMENTS

  1. SAS

    SAS programming language has features to carry out various types of hypothesis testing as shown below. Test. Description. SAS PROC. T-Test. A t-tests is used to test whether the mean of one variable is significantly different than a hypothesized value.We also determine whether means for two independent groups are significantly different and ...

  2. Hypothesis Testing and Estimation :: SAS/STAT (R) 13.2 User's Guide

    The global null hypothesis refers to the null hypothesis that all the explanatory effects can be eliminated and the model contains only intercepts. By using the notations in the section Logistic Regression Models, the global null hypothesis is defined as the following:

  3. 24094

    By default, tests of parameters provided by modeling procedures are two-tailed tests of the null hypothesis that each parameter is equal to zero. This is true of the estimates that appear in the table of parameter estimates provided by the modeling procedure as well as the tests provided by the TEST, ESTIMATE, and CONTRAST statements (where available). The following methods can be used to test ...

  4. 13. Statistical Analysis in SAS

    The default values are 0 for the null hypothesis value and two sided (2) for the alternative hypothesis. The output provides some summary statistics, the p-value for the test, confidence interval and a histogram and QQ plot to assess the normality assumption.

  5. Introduction to Statistical Modeling with SAS/STAT Software

    In statistical hypothesis testing, you typically express the belief that some effect exists in a population by specifying an alternative hypothesis . You state a null hypothesis as the assertion that the effect does not exist and attempt to gather evidence to reject in favor of .

  6. SAS Hypothesis Testing

    Step 1: State the null hypothesis testing in SAS, H 0, and the alternative hypothesis, H a. The alternative hypothesis represents what the researcher is trying to prove.

  7. How to Write a Null Hypothesis (5 Examples)

    This tutorial explains how to write a null hypothesis, including several step-by-step examples.

  8. Proc TTest

    The single sample t-test tests the null hypothesis that the population mean is equal to the given number specified using the option H0= . The default value in SAS for H0 is 0. It calculates the t-statistic and its p-value for the null hypothesis under the assumption that the sample comes from an approximately normal distribution.

  9. Reduced regression models and tests for linear hypotheses

    For a one-variable regression model and for the null hypothesis β 2 =0, the reduced model happens to be intercept-only model. Therefore, in this situation, the first row ("Numerator") of the TestANOVA table contains the same information as the first row ("Model") of the ANOVA table.

  10. How to Perform a Kolmogorov-Smirnov Test in SAS

    The following step-by-step example shows how to perform a Kolmogorov-Smirnov test on a sample dataset in SAS.

  11. Testing the Global Null Hypothesis :: SAS/STAT (R) 12.1 User's Guide

    The following statistics can be used to test the global null hypothesis . Under mild assumptions, each statistic has an asymptotic chi-square distribution with degrees of freedom given the null hypothesis. The value is the dimension of . For clustered data, the likelihood ratio test, the score test, and the Wald test assume independence of ...

  12. Summary statistics and t tests in SAS

    Did you know that you can provide summary statistics (rather than raw data) to PROC TTEST in SAS and obtain hypothesis tests and confidence intervals? This article shows how to create a data set that contains summary statistics and how to call PROC TTEST to compute a two-sample or one-sample t test for the mean.

  13. 5 Tips for Interpreting P-Values Correctly in Hypothesis Testing

    Here are five essential tips for ensuring the p-value from a hypothesis test is understood correctly. 1. Know What the P-value Represents. First, it is essential to understand what a p-value is. In hypothesis testing, the p-value is defined as the probability of observing your data, or data more extreme, if the null hypothesis is true.

  14. Perform a Hypothesis Test for the Population Mean in SAS

    In this video, we perform a hypothesis test for the population mean in SAS.This video is a part of MAT320 Biostatistics and my SAS series.

  15. Null & Alternative Hypotheses

    The null and alternative hypotheses are two competing claims that researchers weigh evidence for and against using a statistical test: Null hypothesis

  16. Hypothesis Testing

    Hypothesis Testing. Sample size and power calculations are available for one-sample and two-sample paired and independent designs where the proposed analysis is hypothesis testing of a mean or means via a t -test. These computations assume equally sized groups. Suppose you want to compute the power for a one-sample t -test.

  17. How Does the Null Hypothesis Work?

    The null hypothesis is the hypothesis of "no effect," i.e., the hypothesis opposite to the effect we want to test for. In contrast, the alternative hypothesis is the one positing the existence of the effect of interest. 3. Effects and Null Hypothesis. The effect depends on our research question.

  18. How to Perform a Chi-Square Test of Independence in SAS

    Use the following steps to perform a Chi-Square Test of Independence in SAS to determine if gender is associated with political party preference. Step 1: Create the data. First, we will create a dataset in SAS to hold the survey responses: /*create dataset*/. data my_data; input Gender $ Party $ Count; datalines;

  19. SAS Tutorials: Chi-Square Test of Independence

    The null hypothesis ( H0) and alternative hypothesis ( H1) of the Chi-Square Test of Independence can be expressed in two different but equivalent ways:

  20. Testing the Global Null Hypothesis

    The following statistics can be used to test the global null hypothesis =. Under mild assumptions, each statistic has an asymptotic chi-square distribution with degrees of freedom given the null hypothesis. The value is the dimension of . For clustered data, the likelihood ratio test, the score test, and the Wald test assume independence of ...

  21. How to Perform White's Test in SAS

    If you fail to reject the null hypothesis of White's test, then heteroscedasticity is not present and you can proceed to interpret the output of the original regression.