Sas dataset select statement

Lists the columns that will appear in the output.

See Also:	column-definition
Featured in:	Creating a Table and Inserting Data into It and Creating a Table from a Query's Result

SELECT object-item

assigns a temporary, alternate name to the column.

eliminates duplicate rows.

Tip:	A row is considered a duplicate when all of its values are the same as the values of another row. The DISTINCT argument applies to all columns in the SELECT list. One row is displayed for each existing combination of values.

Note: DISTINCT works on the internal or stored value, not necessarily on the value as it is displayed. Numeric precision can cause multiple rows to be returned with values that appear to be the same.

Tip: If available, PROC SQL uses index files when processing SELECT DISTINCT statements. Featured in: Producing All the Possible Combinations of the Values in a Column

object-item

is one of the following:

represents all columns in the tables or views that are listed in the FROM clause.

derives a column from a CASE expression. See CASE expression.

names a single column. See column-name and column-modifier.

derives a column from an sql-expression. See sql-expression and column-modifier.

specifies all columns in the PROC SQL table that is specified in table-name .

specifies all columns in the PROC SQL table that has the alias that is specified in table-alias .

specifies all columns in the SAS view that is specified in view-name .

specifies all columns in the SAS view that has the alias that is specified in view-alias .

The asterisk (*) represents all columns of the table or tables listed in the FROM clause. When an asterisk is not prefixed with a table name, all the columns from all tables in the FROM clause are included; when it is prefixed (for example, table-name.* or table-alias.* ), all the columns from that table only are included.

Note: A warning will occur if you create an output table using the SELECT * syntax when columns with the same name exist in the multiple tables that are listed on the FROM clause. You can avoid the warning by using one of the following actions:

Individually list the desired columns in the SELECT statement at the same time as you omit the duplicate column names.
Use the RENAME= and DROP= data set options. In this example, the ID column is renamed tmpid .

proc sql; create table all(drop=tmpid) as select * from one, two(rename=(id=tmpid)) where one.id=two.tmpid; quit;

A column alias is a temporary, alternate name for a column. Aliases are specified in the SELECT clause to name or rename columns so that the result table is clearer or easier to read. Aliases are often used to name a column that is the result of an arithmetic expression or summary function. An alias is one word only. If you need a longer column name, then use the LABEL= column-modifier, as described in column-modifier. The keyword AS is required with a column alias to distinguish the alias from other column names in the SELECT clause.

Column aliases are optional, and each column name in the SELECT clause can have an alias. After you assign an alias to a column, you can use the alias to refer to that column in other clauses.

If you use a column alias when creating a PROC SQL view, then the alias becomes the permanent name of the column for each execution of the view.

Stores the value of one or more columns for use later in another PROC SQL query or SAS statement.

Restriction:	An INTO clause cannot be used in a CREATE TABLE statement.
See also:	Using the PROC SQL Automatic Macro Variables in the SAS 9.2 SQL Procedure User's Guide

INTO macro-variable-specification

specifies a SAS macro variable that stores the values of the rows that are returned.

is one of the following:

stores the values that are returned into a single macro variable.

stores the values that are returned into a range of macro variables.

Tip:

When you specify a range of macro variables, the SAS Macro Facility creates only the number of macro variables that are needed. For example, if you specify :var1-:var9999 and only 55 variables are needed, only :var1-:var55 is created. The SQLOBS automatic variable is useful if a subsequent part of your program needs to know how many variables were actually created. In this example, SQLOBS would have the value of 55.

protects the leading and trailing blanks from being deleted from values that are stored in a range of macro variables or multiple values that are stored in a single macro variable.

SEPARATED BY ' character '

specifies a character that separates the values of the rows.

Use the INTO clause only in the outer query of a SELECT statement and not in a subquery.
When storing a single value into a macro variable, PROC SQL preserves leading or trailing blanks. However, when storing values into a range of macro variables, or when using the SEPARATED BY option to store multiple values in one macro variable, PROC SQL trims leading or trailing blanks unless you use the NOTRIM option.
You can put multiple rows of the output into macro variables. You can check the PROC SQL macro variable SQLOBS to see the number of rows that are produced by a query-expression. See Using the PROC SQL Automatic Macro Variables in the SAS 9.2 SQL Procedure User's Guide for more information on SQLOBS. Note: The SQLOBS automatic macro variable is assigned a value after the SQL SELECT statement executes.
Values assigned by the INTO clause use the BEST8. format.

These examples use the PROCLIB.HOUSES table:

The SAS System 1 Style SqFeet ------------------ CONDO 900 CONDO 1000 RANCH 1200 RANCH 1400 SPLIT 1600 SPLIT 1800 TWOSTORY 2100 TWOSTORY 3000 TWOSTORY 1940 TWOSTORY 1860

With the macro-variable-specification , you can do the following:

proc sql noprint; select style, sqfeet into :style, :sqfeet from proclib.houses; %put &style &sqfeet;

The results are written to the SAS log:

1 proc sql noprint; 2 select style, sqfeet 3 into :style, :sqfeet 4 from proclib.houses; 5 6 %put &style &sqfeet; CONDO 900

proc sql noprint; select distinct Style, SqFeet into :style1 - :style3, :sqfeet1 - :sqfeet4 from proclib.houses; %put &style1 &sqfeet1; %put &style2 &sqfeet2; %put &style3 &sqfeet3; %put &sqfeet4;

The %PUT statements write the results to the SAS log:

1 proc sql noprint; 2 select distinct style, sqfeet 3 into :style1 - :style3, :sqfeet1 - :sqfeet4 4 from proclib.houses; 5 6 %put &style1 &sqfeet1; CONDO 900 7 %put &style2 &sqfeet2; CONDO 1000 8 %put &style3 &sqfeet3; RANCH 1200 9 %put &sqfeet4; 1400

proc sql noprint; select distinct style into :s1 separated by ',' from proclib.houses; %put &s1; %put There were &sqlobs distinct values.;

The results are written to the SAS log:

3 proc sql noprint; 4 select distinct style 5 into :s1 separated by ',' 6 from proclib.houses; 7 8 %put &s1 CONDO,RANCH,SPLIT,TWOSTORY There were 4 distinct values.

proc sql noprint; select SqFeet into :sqfeet01 - :sqfeet10 from proclib.houses; %put &sqfeet01 &sqfeet02 &sqfeet03 &sqfeet04 &sqfeet05; %put &sqfeet06 &sqfeet07 &sqfeet08 &sqfeet09 &sqfeet10;

The results are written to the SAS log:

11 proc sql noprint; 12 select sqfeet 13 into :sqfeet01 - :sqfeet10 14 from proclib.houses; 15 %put &sqfeet01 &sqfeet02 &sqfeet03 &sqfeet04 &sqfeet05; 900 1000 1200 1400 1600 16 %put &sqfeet06 &sqfeet07 &sqfeet08 &sqfeet09 &sqfeet10; 1800 2100 3000 1940 1860

proc sql noprint; select style, sqfeet into :style1 - :style4 notrim, :sqfeet separated by ',' notrim from proclib.houses; %put *&style1* *&sqfeet*; %put *&style2* *&sqfeet*; %put *&style3* *&sqfeet*; %put *&style4* *&sqfeet*;

The results are written to the SAS log, as shown in the following output:

3 proc sql noprint; 4 select style, sqfeet 5 into :style1 - :style4 notrim, 6 :sqfeet separated by ',' notrim 7 from proclib.houses; 8 9 %put *&style1* *&sqfeet*; *CONDO * * 900, 1000, 1200, 1400, 1600, 1800, 2100, 3000, 1940, 1860* 10 %put *&style2* *&sqfeet*; *CONDO * * 900, 1000, 1200, 1400, 1600, 1800, 2100, 3000, 1940, 1860** 11 %put *&style3* *&sqfeet*; *RANCH * * 900, 1000, 1200, 1400, 1600, 1800, 2100, 3000, 1940, 1860** 12 %put *&style4* *&sqfeet*; *RANCH * * 900, 1000, 1200, 1400, 1600, 1800, 2100, 3000, 1940, 1860**

FROM Clause

Specifies source tables or views.

Featured in:

Creating a Table and Inserting Data into It, Joining Two Tables, Joining Three Tables, and Querying an In-Line View

FROM from-list

specifies a temporary, alternate name for a table, view, or in-line view that is specified in the FROM clause.

names the column that appears in the output. The column names that you specify are matched by position to the columns in the output.

is one of the following:

names a single PROC SQL table. table-name can be a one-level name, a two-level libref.table name, or a physical pathname that is enclosed in single quotation marks.

names a single SAS view. view-name can be a one-level name, a two-level libref.view name, or a physical pathname that is enclosed in single quotation marks.

specifies a join. See joined-table.

specifies an in-line view. See query-expression.

specifies a DBMS table. See CONNECTION TO.

Note: With table-name and view-name , you can use data set options by placing them in parentheses immediately after table-name or view-name . See Using SAS Data Set Options with PROC SQL in the SAS 9.2 SQL Procedure User's Guide for details.

A table alias is a temporary, alternate name for a table that is specified in the FROM clause. Table aliases are prefixed to column names to distinguish between columns that are common to multiple tables. Column names in reflexive joins (joining a table with itself) must be prefixed with a table alias in order to distinguish which copy of the table the column comes from. Column names in other kinds of joins must be prefixed with table aliases or table names unless the column names are unique to those tables.

The optional keyword AS is often used to distinguish a table alias from other table names.

The FROM clause can itself contain a query-expression that takes an optional table alias. This kind of nested query-expression is called an in-line view . An in-line view is any query-expression that would be valid in a CREATE VIEW statement. PROC SQL can support many levels of nesting, but it is limited to 256 tables in any one query. The 256-table limit includes underlying tables that can contribute to views that are specified in the FROM clause.

An in-line view saves you a programming step. Rather than creating a view and referring to it in another query, you can specify the view in-line in the FROM clause.

Characteristics of in-line views include the following:

An in-line view is not assigned a permanent name, although it can take an alias.
An in-line view can be referred to only in the query in which it is defined. It cannot be referenced in another query.
You cannot use an ORDER BY clause in an in-line view.
The names of columns in an in-line view can be assigned in the object-item list of that view or with a list of names enclosed in parentheses following the alias. This syntax can be useful for renaming columns. See Querying an In-Line View for an example.
In order to visually separate an in-line view from the rest of the query, you can enclose the in-line view in any number of pairs of parentheses. Note that if you specify an alias for the in-line view, the alias specification must appear outside the outermost pair of parentheses for that in-line view.

Subsets the output based on specified conditions.

Featured in:

Joining Two Tables and Joining Three Tables

WHERE sql-expression

When a condition is met (that is, the condition resolves to true), those rows are displayed in the result table; otherwise, no rows are displayed.
You cannot use summary functions that specify only one column. In this example, MAX is a summary function; therefore, its context is that of a GROUP BY clause. It cannot be used to group, or summarize, data.

where max(measure1) > 50;

However, this WHERE clause will work.

where max(measure1,measure2) > 50;

GROUP BY Clause

Specifies how to group the data for summarizing.

Featured in:

Creating a View from a Query's Result and Joining Two Tables and Calculating a New Value

GROUP BY group-by-item

is one of the following:

is a positive integer that equates to a column's position.

is the name of a column or a column alias. See column-name.

You can specify more than one group-by-item to get more detailed reports. Both the grouping of multiple items and the BY statement of a PROC step are evaluated in similar ways. If more than one group-by-item is specified, then the first one determines the major grouping.
Integers can be substituted for column names (that is, SELECT object-items) in the GROUP BY clause. For example, if the group-by-item is 2, then the results are grouped by the values in the second column of the SELECT clause list. Using integers can shorten your coding and enable you to group by the value of an unnamed expression in the SELECT list. Note that if you use a floating-point value (for example, 2.3), then PROC SQL ignores the decimal portion.
The data does not have to be sorted in the order of the group-by values because PROC SQL handles sorting automatically. You can use the ORDER BY clause to specify the order in which rows are displayed in the result table.
If you specify a GROUP BY clause in a query that does not contain a summary function, then your clause is transformed into an ORDER BY clause and a message to that effect is written to the SAS log.
You can group the output by the values that are returned by an expression. For example, if X is a numeric variable, then the output of the following is grouped by the integer portion of values of X:

select x, sum(y) from table1 group by int(x);

Similarly, if Y is a character variable, then the output of the following is grouped by the second character of values of Y:

select sum(x), y from table1 group by substring(y from 2 for 1);

Note that an expression that contains only numeric literals (and functions of numeric literals) or only character literals (and functions of character literals) is ignored. An expression in a GROUP BY clause cannot be a summary function. For example, the following GROUP BY clause is not valid:

group by sum(x)

HAVING Clause

Subsets grouped data based on specified conditions.

Featured in:

Creating a View from a Query's Result and Joining Two Tables and Calculating a New Value

HAVING sql-expression

The HAVING clause is used with at least one summary function and an optional GROUP BY clause to summarize groups of data in a table. A HAVING clause is any valid SQL expression that is evaluated as either true or false for each group in a query. Alternatively, if the query involves remerged data, then the HAVING expression is evaluated for each row that participates in each group. The query must include one or more summary functions.

Typically, the GROUP BY clause is used with the HAVING expression and defines the group or groups to be evaluated. If you omit the GROUP BY clause, then the summary function and the HAVING clause treat the table as one group.

The following PROC SQL step uses the PROCLIB.PAYROLL table (shown in Creating a Table from a Query's Result) and groups the rows by Gender to determine the oldest employee of each gender. In SAS, dates are stored as integers. The lower the birthdate as an integer, the greater the age. The expression birth=min(birth) is evaluated for each row in the table. When the minimum birthdate is found, the expression becomes true and the row is included in the output.

proc sql; title 'Oldest Employee of Each Gender'; select * from proclib.payroll group by gender having birth=min(birth);

Note: This query involves remerged data because the values returned by a summary function are compared to values of a column that is not in the GROUP BY clause. See Remerging Data for more information about summary functions and remerging data.

ORDER BY Clause

Specifies the order in which rows are displayed in a result table.

See also:	query-expression
Featured in:	Retrieving Values with the SOUNDS-LIKE Operator

ORDER BY order-by-item < ASC | DESC >>;

is one of the following:

equates to a column's position.

is the name of a column or a column alias. See column-name.

orders the data in ascending order. This is the default order; if neither ASC nor DESC is specified, the data is ordered in ascending order.

orders the data in descending order.

The ORDER BY clause sorts the result of a query expression according to the order specified in that query. When this clause is used, the default ordering sequence is ascending, from the lowest value to the highest. You can use the SORTSEQ= option to change the collating sequence for your output. See PROC SQL Statement.
If an ORDER BY clause is omitted, then a particular order to the output rows, such as the order in which the rows are encountered in the queried table, cannot be guaranteed--even if an index is present. Without an ORDER BY clause, the order of the output rows is determined by the internal processing of PROC SQL, the default collating sequence of SAS, and your operating environment. Therefore, if you want your result table to appear in a particular order, then use the ORDER BY clause.
If more than one order-by-item is specified (separated by commas), then the first one determines the major sort order.
Integers can be substituted for column names (that is, SELECT object-items) in the ORDER BY clause. For example, if the order-by-item is 2 (an integer), then the results are ordered by the values of the second column. If a query-expression includes a set operator (for example, UNION), then use integers to specify the order. Doing so avoids ambiguous references to columns in the table expressions. Note that if you use a floating-point value (for example, 2.3) instead of an integer, then PROC SQL ignores the decimal portion.
In the ORDER BY clause, you can specify any column of a table or view that is specified in the FROM clause of a query-expression, regardless of whether that column has been included in the query's SELECT clause. For example, this query produces a report ordered by the descending values of the population change for each country from 1990 to 1995:

proc sql; select country from census order by pop95-pop90 desc;

NOTE: The query as specified involves ordering by an item that doesn't appear in its SELECT clause.

select x, y from table1 order by int(x);

Similarly, if Y is a character variable, then the output of the following is ordered by the second character of values of Y:

select x, y from table1 order by substring(y from 2 for 1);