Monday, January 11, 2010

Notes on SQL 2

Previous related posts:

Aggregate functions I didn't know about

CHECKSUM_AGG: checksum of all values in the dataset.

COUNT_BIG: same as count but return bigint

GROUPING: returns 0 or 1 to specify detail or aggregate row in certain uses of the GROUP BY clause

STDEV/STDEVP: standard deviation and standard deviation for population (still not very clear what this does...)

VAR/VARP: variance and variance for population (still not very clear what this does...)

Additions to the GROUP BY clause

WITH ROLLUP: adds result rows that specify the "summary" of an aggregate function like if the GROUP BY hadn't been applied. E.g., if I group products by CategoryId and request the average cost (SELECT AVG(Cost) FROM Products GROUP BY CategoryId) , I will get a column with each Category identifier and another with the corresponding average cost. If I add WITH ROLLUP at the end, an extra row will appear with the average cost for ALL products across all categories. This extends to several columns in the GROUP BY clause, and in that case each grouping level gets its own summary rows.

WITH CUBE: similar to WITH ROLLUP, but creates summaries as if the GROUP BY clause had been applied to each column individually (maybe each pair of columns too?). These two seem like a minor touch of Business Intelligence over basic SQL Server queries.

Grouping Sets (added in SQL Server 2008)

They provide clearer syntax when requesting aggregate data over several different groupings. This is useful if you have a table which you can group independently by several different columns, and want to see the summaries for each possible grouping in a single query.

SELECT CategoryID, SubCategoryID, AVG(Price) FROM Products GROUP BY GROUPING SETS ( (Products.CategoryID), (Products.SubCategoryID) )

UNION, INTERSECT, EXCEPT

They exist! I thought they only did in Oracle. Minor (expected) nuances like having to match the number and type of columns in both sides of each operator. Regarding efficiency against other operators that provide equivalent functionality (e.g., EXCEPT VS NOT IN), some sources claim that the execution plans are exactly the same, so the clearer syntax should be used.

APPLY Operator

You select some rows from some table, and then use one (or more? probably) of the columns in each row as parameters for a table-valued function, from which you extract additional information. Comes in 2 versions: CROSS APPLY, and OUTER APPLY. The second one returns all rows from the original table, with NULL in the rest of the columns if the table-valued function did not return any data. If the table-valued function returns several rows, the behavior expected from a regular JOIN still applies.

Scalar-valued functions

DATEDIFF(datepart, startdate, enddate) -> startdate gets substracted from enddate. Minor reminder.

DATALENGTH(expression) -> returns the number of bytes required to represent expression.

CHARINDEX(expression1, expression2, [start_location]) -> returns the index of the first instance of expression2 inside expression1, starting to look at start_location.

PATINDEX('pattern', expression) -> similar to CHARINDEX but allows for wildcards.

Click here for the next part in this series of posts.