Keeping Datasets Efficient and Small

New programmers tend to keep everything. They keep every variable, even the interim or temporary variables, in the permanent dataset. There is a fear that maybe they will need a variable later so they keep it. This tendency to hoard variables means datasets can get very large.

If variables are not labelled, programmers and analysts won't know what these variables are or how to use them. Things can become muddled when the programmer reuses a variable name in a later program -- now, it is unclear the values mean. Worse, reusing names can introduce major errors into the dataset.

The best method is to keep only the variables you absolutely need in a final permanent dataset. Fewer variables means programs run faster. The smaller output dataset will take up less disk space.

For an in-depth technical explanation, read the following article: Programming with the KEEP, RENAME, and DROP Data Set Options.