Sas dataset select statement

Lists the columns that will appear in the output.

See Also: column-definition
Featured in: Creating a Table and Inserting Data into It and Creating a Table from a Query's Result
SELECT object-item

assigns a temporary, alternate name to the column.

eliminates duplicate rows.

Tip: A row is considered a duplicate when all of its values are the same as the values of another row. The DISTINCT argument applies to all columns in the SELECT list. One row is displayed for each existing combination of values.

Note: DISTINCT works on the internal or stored value, not necessarily on the value as it is displayed. Numeric precision can cause multiple rows to be returned with values that appear to be the same.

Tip: If available, PROC SQL uses index files when processing SELECT DISTINCT statements. Featured in: Producing All the Possible Combinations of the Values in a Column

object-item

is one of the following:

represents all columns in the tables or views that are listed in the FROM clause.

derives a column from a CASE expression. See CASE expression.

names a single column. See column-name and column-modifier.

derives a column from an sql-expression. See sql-expression and column-modifier.

specifies all columns in the PROC SQL table that is specified in table-name .

specifies all columns in the PROC SQL table that has the alias that is specified in table-alias .

specifies all columns in the SAS view that is specified in view-name .

specifies all columns in the SAS view that has the alias that is specified in view-alias .

The asterisk (*) represents all columns of the table or tables listed in the FROM clause. When an asterisk is not prefixed with a table name, all the columns from all tables in the FROM clause are included; when it is prefixed (for example, table-name.* or table-alias.* ), all the columns from that table only are included.

Note: A warning will occur if you create an output table using the SELECT * syntax when columns with the same name exist in the multiple tables that are listed on the FROM clause. You can avoid the warning by using one of the following actions:

proc sql; create table all(drop=tmpid) as select * from one, two(rename=(id=tmpid)) where one.id=two.tmpid; quit;

A column alias is a temporary, alternate name for a column. Aliases are specified in the SELECT clause to name or rename columns so that the result table is clearer or easier to read. Aliases are often used to name a column that is the result of an arithmetic expression or summary function. An alias is one word only. If you need a longer column name, then use the LABEL= column-modifier, as described in column-modifier. The keyword AS is required with a column alias to distinguish the alias from other column names in the SELECT clause.

Column aliases are optional, and each column name in the SELECT clause can have an alias. After you assign an alias to a column, you can use the alias to refer to that column in other clauses.

If you use a column alias when creating a PROC SQL view, then the alias becomes the permanent name of the column for each execution of the view.

Stores the value of one or more columns for use later in another PROC SQL query or SAS statement.

Restriction: An INTO clause cannot be used in a CREATE TABLE statement.
See also: Using the PROC SQL Automatic Macro Variables in the SAS 9.2 SQL Procedure User's Guide
INTO macro-variable-specification

specifies a SAS macro variable that stores the values of the rows that are returned.

is one of the following:

stores the values that are returned into a single macro variable.

stores the values that are returned into a range of macro variables.

Tip: When you specify a range of macro variables, the SAS Macro Facility creates only the number of macro variables that are needed. For example, if you specify :var1-:var9999 and only 55 variables are needed, only :var1-:var55 is created. The SQLOBS automatic variable is useful if a subsequent part of your program needs to know how many variables were actually created. In this example, SQLOBS would have the value of 55.

protects the leading and trailing blanks from being deleted from values that are stored in a range of macro variables or multiple values that are stored in a single macro variable.

SEPARATED BY ' character '

specifies a character that separates the values of the rows.

These examples use the PROCLIB.HOUSES table:

The SAS System 1 Style SqFeet ------------------ CONDO 900 CONDO 1000 RANCH 1200 RANCH 1400 SPLIT 1600 SPLIT 1800 TWOSTORY 2100 TWOSTORY 3000 TWOSTORY 1940 TWOSTORY 1860

With the macro-variable-specification , you can do the following:

proc sql noprint; select style, sqfeet into :style, :sqfeet from proclib.houses; %put &style &sqfeet;
The results are written to the SAS log:
1 proc sql noprint; 2 select style, sqfeet 3 into :style, :sqfeet 4 from proclib.houses; 5 6 %put &style &sqfeet; CONDO 900
proc sql noprint; select distinct Style, SqFeet into :style1 - :style3, :sqfeet1 - :sqfeet4 from proclib.houses; %put &style1 &sqfeet1; %put &style2 &sqfeet2; %put &style3 &sqfeet3; %put &sqfeet4;
The %PUT statements write the results to the SAS log:
1 proc sql noprint; 2 select distinct style, sqfeet 3 into :style1 - :style3, :sqfeet1 - :sqfeet4 4 from proclib.houses; 5 6 %put &style1 &sqfeet1; CONDO 900 7 %put &style2 &sqfeet2; CONDO 1000 8 %put &style3 &sqfeet3; RANCH 1200 9 %put &sqfeet4; 1400
proc sql noprint; select distinct style into :s1 separated by ',' from proclib.houses; %put &s1; %put There were &sqlobs distinct values.;
The results are written to the SAS log:
3 proc sql noprint; 4 select distinct style 5 into :s1 separated by ',' 6 from proclib.houses; 7 8 %put &s1 CONDO,RANCH,SPLIT,TWOSTORY There were 4 distinct values.
proc sql noprint; select SqFeet into :sqfeet01 - :sqfeet10 from proclib.houses; %put &sqfeet01 &sqfeet02 &sqfeet03 &sqfeet04 &sqfeet05; %put &sqfeet06 &sqfeet07 &sqfeet08 &sqfeet09 &sqfeet10;
The results are written to the SAS log:
11 proc sql noprint; 12 select sqfeet 13 into :sqfeet01 - :sqfeet10 14 from proclib.houses; 15 %put &sqfeet01 &sqfeet02 &sqfeet03 &sqfeet04 &sqfeet05; 900 1000 1200 1400 1600 16 %put &sqfeet06 &sqfeet07 &sqfeet08 &sqfeet09 &sqfeet10; 1800 2100 3000 1940 1860
proc sql noprint; select style, sqfeet into :style1 - :style4 notrim, :sqfeet separated by ',' notrim from proclib.houses; %put *&style1* *&sqfeet*; %put *&style2* *&sqfeet*; %put *&style3* *&sqfeet*; %put *&style4* *&sqfeet*;
The results are written to the SAS log, as shown in the following output:
3 proc sql noprint; 4 select style, sqfeet 5 into :style1 - :style4 notrim, 6 :sqfeet separated by ',' notrim 7 from proclib.houses; 8 9 %put *&style1* *&sqfeet*; *CONDO * * 900, 1000, 1200, 1400, 1600, 1800, 2100, 3000, 1940, 1860* 10 %put *&style2* *&sqfeet*; *CONDO * * 900, 1000, 1200, 1400, 1600, 1800, 2100, 3000, 1940, 1860** 11 %put *&style3* *&sqfeet*; *RANCH * * 900, 1000, 1200, 1400, 1600, 1800, 2100, 3000, 1940, 1860** 12 %put *&style4* *&sqfeet*; *RANCH * * 900, 1000, 1200, 1400, 1600, 1800, 2100, 3000, 1940, 1860**
FROM Clause

Specifies source tables or views.

Featured in: Creating a Table and Inserting Data into It, Joining Two Tables, Joining Three Tables, and Querying an In-Line View
FROM from-list

specifies a temporary, alternate name for a table, view, or in-line view that is specified in the FROM clause.

names the column that appears in the output. The column names that you specify are matched by position to the columns in the output.

is one of the following:

names a single PROC SQL table. table-name can be a one-level name, a two-level libref.table name, or a physical pathname that is enclosed in single quotation marks.

names a single SAS view. view-name can be a one-level name, a two-level libref.view name, or a physical pathname that is enclosed in single quotation marks.

specifies a join. See joined-table.

specifies an in-line view. See query-expression.

specifies a DBMS table. See CONNECTION TO.

Note: With table-name and view-name , you can use data set options by placing them in parentheses immediately after table-name or view-name . See Using SAS Data Set Options with PROC SQL in the SAS 9.2 SQL Procedure User's Guide for details.

A table alias is a temporary, alternate name for a table that is specified in the FROM clause. Table aliases are prefixed to column names to distinguish between columns that are common to multiple tables. Column names in reflexive joins (joining a table with itself) must be prefixed with a table alias in order to distinguish which copy of the table the column comes from. Column names in other kinds of joins must be prefixed with table aliases or table names unless the column names are unique to those tables.

The optional keyword AS is often used to distinguish a table alias from other table names.

The FROM clause can itself contain a query-expression that takes an optional table alias. This kind of nested query-expression is called an in-line view . An in-line view is any query-expression that would be valid in a CREATE VIEW statement. PROC SQL can support many levels of nesting, but it is limited to 256 tables in any one query. The 256-table limit includes underlying tables that can contribute to views that are specified in the FROM clause.

An in-line view saves you a programming step. Rather than creating a view and referring to it in another query, you can specify the view in-line in the FROM clause.

Characteristics of in-line views include the following:

Subsets the output based on specified conditions.

Featured in: Joining Two Tables and Joining Three Tables
WHERE sql-expression

where max(measure1) > 50;
However, this WHERE clause will work.
where max(measure1,measure2) > 50;
GROUP BY Clause

Specifies how to group the data for summarizing.

Featured in: Creating a View from a Query's Result and Joining Two Tables and Calculating a New Value
GROUP BY group-by-item

is one of the following:

is a positive integer that equates to a column's position.

is the name of a column or a column alias. See column-name.

select x, sum(y) from table1 group by int(x);

Similarly, if Y is a character variable, then the output of the following is grouped by the second character of values of Y:

select sum(x), y from table1 group by substring(y from 2 for 1);

Note that an expression that contains only numeric literals (and functions of numeric literals) or only character literals (and functions of character literals) is ignored. An expression in a GROUP BY clause cannot be a summary function. For example, the following GROUP BY clause is not valid:

group by sum(x)
HAVING Clause

Subsets grouped data based on specified conditions.

Featured in: Creating a View from a Query's Result and Joining Two Tables and Calculating a New Value
HAVING sql-expression

The HAVING clause is used with at least one summary function and an optional GROUP BY clause to summarize groups of data in a table. A HAVING clause is any valid SQL expression that is evaluated as either true or false for each group in a query. Alternatively, if the query involves remerged data, then the HAVING expression is evaluated for each row that participates in each group. The query must include one or more summary functions.

Typically, the GROUP BY clause is used with the HAVING expression and defines the group or groups to be evaluated. If you omit the GROUP BY clause, then the summary function and the HAVING clause treat the table as one group.

The following PROC SQL step uses the PROCLIB.PAYROLL table (shown in Creating a Table from a Query's Result) and groups the rows by Gender to determine the oldest employee of each gender. In SAS, dates are stored as integers. The lower the birthdate as an integer, the greater the age. The expression birth=min(birth) is evaluated for each row in the table. When the minimum birthdate is found, the expression becomes true and the row is included in the output.

proc sql; title 'Oldest Employee of Each Gender'; select * from proclib.payroll group by gender having birth=min(birth);

Note: This query involves remerged data because the values returned by a summary function are compared to values of a column that is not in the GROUP BY clause. See Remerging Data for more information about summary functions and remerging data.

ORDER BY Clause

Specifies the order in which rows are displayed in a result table.

See also: query-expression
Featured in: Retrieving Values with the SOUNDS-LIKE Operator
ORDER BY order-by-item < ASC | DESC >>;

is one of the following:

equates to a column's position.

is the name of a column or a column alias. See column-name.

orders the data in ascending order. This is the default order; if neither ASC nor DESC is specified, the data is ordered in ascending order.

orders the data in descending order.

proc sql; select country from census order by pop95-pop90 desc;
NOTE: The query as specified involves ordering by an item that doesn't appear in its SELECT clause.
select x, y from table1 order by int(x);

Similarly, if Y is a character variable, then the output of the following is ordered by the second character of values of Y:

select x, y from table1 order by substring(y from 2 for 1);