Sometimes researchers are only interested in a subset of the entire population. In this case, it is much easier and efficient to limit the observations in a SAS dataset to this subset.

This section will introduce several ways to subset a dataset using statements and options.

IF Statement, subsetting:

The IF statement, used alone, tells SAS to either continue with the DATA step if the conditions are true. If the conditions are false, then SAS will go onto the next observation.

Syntax:
IF expression

Arguments:
expression is any SAS expression.

Example:

Often in merging two overlapping datasets, the programmer only wants to keep the overlap and discard observations only in one or the other dataset. The subsetting IF statement is often used to do this and then continue the DATA step to create additional variables for this overlap.

If Statement, subsetting

DELETE Statement with IF:

The DELETE statement tells SAS to stop processing on the current observation. This is very helpful with combined with the IF statement.

Syntax:
IF expression THEN DELETE;

Arguments:
expression is any SAS expression.

Example:

In the following example, the study had identified persons who had problematic data issues. Not everyone in the problem_ids has a pharmacy claim. When we merge the two datasets, we do not want anyone in this problem dataset without claims. We also do not limit the claims dataset to the intersection. DELETE with IF allows the programmer to delete anyone who does not have a claim.

Delete Statement

WHERE= Data Set Option:

The WHERE= option is similar to the IF subsetting statement, but this options selects observations from the input dataset. This means fewer observations may be read from the input dataset.

Syntax:
WHERE=(where-expression-1)

Syntax Description:

  • where-expression is an arithmetic or logical expression that consists of a sequence of operators, operands, and SAS functions. An operand is a variable, SAS function, or constant. An operator is a symbol that requests a comparison, logical operation, or arithmetic calculation. The expression must be enclosed in parentheses.
  • logical-operator can be AND, AND NOT, OR, or OR NOT.

Example:

In the first example, the where= data set option is used to read in only the medical claims data for inpatient claims.

Where= Option

The WHERE= option is often used on PROCs to subset the output to specific populations or observations in checking data. The following code does a frequency on NDC codes for those claims data whose NDC was not found in the drug dictionary. The output revealed that all the NDC values= 00000000000.

Where= Option