Sunday, January 31, 2010

Notes on SQL 8

Previous related posts:

Query performance and tuning

Main performance metrics: Query Cost, Page Reads, Query execution time.

Theoretical execution order of a query without UNION clause:

  1. FROM, JOIN, APPLY, ON

  2. WHERE

  3. GROUP BY and aggregate functions

  4. HAVING

  5. SELECT

  6. ORDER BY

  7. TOP

  8. FOR XML


For queries with a UNION clause, the theoretical execution order is:

  1. FROM, JOIN, APPLY, ON

  2. WHERE

  3. GROUP BY and aggregate functions

  4. HAVING

  5. TOP (note that in this case it is executed BEFORE the ORDER BY clause)

  6. UNION and SELECT

  7. ORDER BY

  8. FOR XML


The reason for the difference is that TOP is not ANSI/ISO SQL standard-compliant. This can cause a query to return incorrect results if care is not taken.

General ideas for query tuning:

  • Try not to apply UDFs to columns since this prevents indexes from being used (a table scan must be performed so the UDF is applied to every row).

  • Correlated subqueries can normally perform faster if substituted with CTEs (WITH (<blabla>) as A, SELECT * FROM A).

  • Scalar-valued UDFs can perform faster if defined inline in the query (instead of being function calls), because in that case they can be optimized.

  • Inline table-valued UDFs are just like views with parameters, so they're optimized; multi-statement table-valued UDFs are like SPs that populate temporary tables for the query, and they must be executed fully before the outer query can use the results. CLR table-valued UDFs stream their results (by using yield return statements).

  • Avoid cursors whenever possible.


Indexes

A covered index is one that contains all the columns referenced in a query (in any SELECT, JOIN, WHERE, GROUP BY, HAVING, etc, clauses). When this is the case, the index can be used and no further lookups need to be done to the actual rows (in the case of a non-clustered index). Since a clustered index contains all the columns, it is by default a covered index.

When defining an index, columns can be specified as part of the index's key, or as included columns. The latter cannot be used for tasks such as filtering or sorting, they're useful only if they need to be displayed in a query which makes use of the index based on its key.

Partitioning

Requires a partition function, that defines the values where each partition ends, and a partition scheme, which defines on which file group each partition goes.

When creating partition functions (CREATE PARTITION FUNCTION), you specify if the partition values are to stay in the left group, or the right one (<= or <). When creating partition schemes (CREATE PARTITION SCHEME), you define to which filegroup does each partition goes to. You know how many partitions there are because the partition function's name is used in the statement. You can assign all partitions to the same file group in one go.

Existing tables are moved to a partition with ALTRE TABLE SourceTable SWITCH TO PartitionedTable PARTITION n. A check constraint must be added first to guarantee that the partitioned column satisfies the values accepted in this partition.

No comments: