Sunday, January 31, 2010

Notes on SQL 9

Previous related posts:

XML

FOR XML _____:

  • RAW: returns each row as an XML element and each column as an XML attribute (<row ClientID="13" Name="Fake client name" />). Aliases can be used for columns, using RAW('element_type') causes the rows to be "named" element_type (instead of "row" in the example), adding ROOT('root_name') wraps everything in <root_name> </root_name> tags, and adding ELEMENTS causes each column to be listed as its own element instead of an attribute. By default, NULL values are not included as attributes in the XML element (or as XML elements if the ELEMENTS directive is used), but adding the XSINIL directve to the ELEMENTS directive adds an "xsi:nil="true"" attribute for elements whose corresponding columns have NULL. As a side-effect, a namespace is added to the root element in the XML, since that's where the nil attribute is declared.

  • AUTO: supports basic ("single-child") hierarchies. The hierarchy is created based on the order of the columns in the SELECT statement (not the order in which the tables are referenced). Each table referenced, though, gets its object in the hierarchy, named after the alias used for the table. The order of the rows in the result determines how the hierarchy is built, so ORDER BY clauses are important when using this mode.

  • EXPLICIT: the ELEMENTS and XSINIL directives do not exist for this mode. ORDER BY is important to construct the result, just as in AUTO mode. Any XML structure (even some not supported by the PATH mode) can be created, by using specific aliases for the columns (e.g. "Parent", "Customers!1!!element", "Customer!2!Id", "Order!3!OrderDate"). A tag and a parent column are necessary; the first one uniquely identifies an XML element in the result, and the second specifies the tag of the parent element (or NULL).

  • PATH: the ELEMENTS and XSINIL directives do not exist for this mode. Preferred mode. Using certain syntax in the column aliases, you can generate XML structures:


SELECT
c.ClientID as "@Id"
,c.Name as "@ClientName"
,c.RowGuid as "comment()"
,CAST('<Test/>' as XML) as "node()"
,c.ClientType as "AdditionalInfo/@Type"
,c.ModifiedDate as "AdditionalInfo/text()"
FROM dbo.Customer AS c
FOR XML PATH('Customer'), ROOT('Customers')


Results in:

<Customers>
<Customer Id="1" ClientName="Fake client 1">
<!-- 3ACB432D-B87D-4AED-95B4-C3019DA348B1-->
<Test />
<AdditionalInfo Type="S">2007-10-10T10:00:00.123</AdditionalInfo>
</Customer>
</Customers>


When nesting FOR XML queries, the TYPE declarative must be added at the end of each subquery, so the XML is not interpreted as a string.

A custom namespace can be included by preceding the query with a WITH XMLNAMESPACES clause.

XML Data Type

Typed XML attaches a schema collection to a column to verify its contents. You create a schema collection with the CREATE XML SCHEMA COLLECTION statement. When declaring XML columns for a table, just "XML" means untyped, or a schema collection can be indicated in parenthesis. Also inside those parenthesis, the DOCUMENT option forces the column to support only XML documents (not fragments).

A disadvantage of typed XML columns is that if a modification must be done to the schema, all columns that use it must be changed to use untyped XML (costly due to differences in internal representation) and updated so they comply with the new schema, the schema dropped and recreated, and then the columns converted back to it (costly again).

The XML data types provide methods to use them:

  • EXIST: obvious.

  • VALUE: retrieves a single, scalar value from somewhere in the XML.

  • NODES: converts XML into a tabular form.

  • QUERY: used to return an XML fragment.


CLR

Usage of CLR within the database requires that the SQL Server instance allows CLR code to run (EXEC sp_configure 'clr enabled', 1; RECONFIGURE;). Once the (VB.NET/C#) code is compiled into an assembly, the assembly must be loaded into SQL Server and a database object must be created and pointed to the loaded assembly using DDL.

The SqlContext and SqlPipe classes (in the Microsoft.SqlServer.Server namespace) must be used for command execution, since they are used to communicate to the SQL Server session within which the code will execute.

To load a compiled assembly (dll) into the database, the CREATE ASSEMBLY statement is used (CREATE ASSEMBLY AssemblyNameInDB FROM 'assembly_path_in_the_filesystem'). The dll file is no longer needed since the assembly is copied into the DB (the sys.assemblies catalog view contains information about existing assemblies).

An example of a SP that calls a CLR assembly is the following:

CREATE PROCEDURE ExampleSP @Parameter1 INT AS EXTERNAL NAME AssemblyName.[ClassNameWithNamespace].MethodName

ClassNameWithNamespace is enclosed in brackets (it could have been quotes) because it probably contains periods, which are not part of a valid identifier in SQL Server.

Source code can also be stored in the DB, like this:

ALTER ASSEMBLY AssemblyNameInDB ADD FILE FROM 'source_code_path_in_the_filesystem'

CLR Scalar UDFs

The SqlFunction attribute is required on the method. You can specify IsDeterministic, and IsPrecise (if it uses any floating point arithmetic, then this is false).

CLR Table-valued UDFs

2 methods are needed: the one that is called from SQL Server (decorated with an attribute like [SqlFunction(FillRowMethodName = "OtherMethodName", TableDefinition = "Column1 INT, Column2 NVARCHAR(100)" )]), which uses yield return to "generate" each row in the result, and the one that fills each row that gets returned ("OtherMethodName"). The second method always receives one object parameter (which is whatever the "yield return" statement was called for in the first method), and the same amount of OUT parameters as columns in the output table definition (these will be the actual column values).

CLR triggers

Similar to CLR SPs, but you can use the TriggerContext class in SqlContext. to perform trigger-specific checks.

CLR User-Defined aggregates

An entire CLR type (class/struct) must be created, so the aggregate can accomplish this:

  • Initialize: by using the Init method of the defined CLR type.

  • Add another value to the calculation: with the Accumulate method.

  • Combine itself with another instance of the same aggregate: with the Merge method.

  • Return its result: with the Terminate method.


The decorating attribute SqlUserDefinedAggregate can specify if serialization should be handled automatically or by the user (that creates the code), if an empty input returns NULL, if duplicates matter, if NULLs matter, if order matters, and the maximum size of the serialized structure.

CLR User-Defined Types

The attribute SqlUserDefinedType can define who handles serialization (see CLR User-Defined Aggregates), a method to Validate integrity when a binary type is casted to the UDT, if the type is byte-ordered (SQL Server only supports byte sorting, so you should be VERY CAREFUL when implementing this), if it is of fixed length, and its max size. It must implement the INullable interface. Parse and ToString methods are required. More details here.

The following permission sets can be used for CLR assemblies:

  • SAFE: not allowed access to any resources outside the DB to which it is deployed.

  • EXTERNAL ACCESS: allowed to access resources outside the local SQL Server instance, such as another instance, the file system, or even a network resource.

  • UNSAFE: allowed to execute non-CLR code (e.g. Win32 API or a COM component).


Filestream

Option for columns of the varbinary(max) data type. Allows SQL Server to store data for those columns in separate files in the file system. To enable filestream usage, use EXEC sp_configure 'sp_filestream_access', (1|2|3). 1 allows only T-SQL access to the filestream data; 2 allows direct file-system access; 3 allows access through a network share. A filegroup that contains a Filestream database is also required.

Usage requires the table to have a UNIQUEIDENTIFIER column with the ROWGUIDOL property and a Unique constraint on it.

Deleting of filestream data is done with a garbage collector, not necessarily at the moment when the UPDATE/DELETE statement is executed.

Notes on SQL 8

Previous related posts:

Query performance and tuning

Main performance metrics: Query Cost, Page Reads, Query execution time.

Theoretical execution order of a query without UNION clause:

  1. FROM, JOIN, APPLY, ON

  2. WHERE

  3. GROUP BY and aggregate functions

  4. HAVING

  5. SELECT

  6. ORDER BY

  7. TOP

  8. FOR XML


For queries with a UNION clause, the theoretical execution order is:

  1. FROM, JOIN, APPLY, ON

  2. WHERE

  3. GROUP BY and aggregate functions

  4. HAVING

  5. TOP (note that in this case it is executed BEFORE the ORDER BY clause)

  6. UNION and SELECT

  7. ORDER BY

  8. FOR XML


The reason for the difference is that TOP is not ANSI/ISO SQL standard-compliant. This can cause a query to return incorrect results if care is not taken.

General ideas for query tuning:

  • Try not to apply UDFs to columns since this prevents indexes from being used (a table scan must be performed so the UDF is applied to every row).

  • Correlated subqueries can normally perform faster if substituted with CTEs (WITH (<blabla>) as A, SELECT * FROM A).

  • Scalar-valued UDFs can perform faster if defined inline in the query (instead of being function calls), because in that case they can be optimized.

  • Inline table-valued UDFs are just like views with parameters, so they're optimized; multi-statement table-valued UDFs are like SPs that populate temporary tables for the query, and they must be executed fully before the outer query can use the results. CLR table-valued UDFs stream their results (by using yield return statements).

  • Avoid cursors whenever possible.


Indexes

A covered index is one that contains all the columns referenced in a query (in any SELECT, JOIN, WHERE, GROUP BY, HAVING, etc, clauses). When this is the case, the index can be used and no further lookups need to be done to the actual rows (in the case of a non-clustered index). Since a clustered index contains all the columns, it is by default a covered index.

When defining an index, columns can be specified as part of the index's key, or as included columns. The latter cannot be used for tasks such as filtering or sorting, they're useful only if they need to be displayed in a query which makes use of the index based on its key.

Partitioning

Requires a partition function, that defines the values where each partition ends, and a partition scheme, which defines on which file group each partition goes.

When creating partition functions (CREATE PARTITION FUNCTION), you specify if the partition values are to stay in the left group, or the right one (<= or <). When creating partition schemes (CREATE PARTITION SCHEME), you define to which filegroup does each partition goes to. You know how many partitions there are because the partition function's name is used in the statement. You can assign all partitions to the same file group in one go.

Existing tables are moved to a partition with ALTRE TABLE SourceTable SWITCH TO PartitionedTable PARTITION n. A check constraint must be added first to guarantee that the partitioned column satisfies the values accepted in this partition.

Notes on SQL 7

Previous related posts:

Triggers

DML VS DDL triggers.

The order of trigger execution (when several are defined) can be modified with the sp_settriggerorder system SP.

A trigger only executes once regardless of the number of rows affected (e.g. DELETE FROM TableName deletes all rows in the table, but only causes a FOR DELETE trigger to execute once).

The affected rows (inserted/deleted) are available in the inserted and deleted special tables inside the trigger definition.

DDL triggers can be specified at the instance (ON ALL SERVER) or database (ON DATABASE) levels. The events on which DDL triggers can be defined (e.g. ALTER_TABLE, CREATE_DATABASE) are grouped (e.g. DDL_TABLE_EVENTS), so some triggers can be written more concisely. DDL triggers have access to the EVENTDATA function, which returns an XML with information about the event type, the time it happened, the SPID, ServerName, LoginName, UserName, DatabaseName, SchemaName, ObjectName, ObjectType, and TSQLCommand that caused it to fire. Each event has a different XML schema.

LOGON triggers can also be defined, but they cannot show messages to the user, since they execute before the session is established. A ROLLBACK inside a LOGON trigger causes the connection to terminate.

Views

Besides being able to use the ENCRYPTION and SCHEMABINDING options when creating a view, you can specify VIEW_METADATA, which is used for updatable views, so SQL Server returns metadata about it to client applications (instead of metadata about the underlying tables).

Data modifications through a view can only happen if:

  • It references exactly one table

  • Columns in the view reference columns in a table directly

  • The column is not derived from an aggregate

  • The column is not computed as a result of a UNION (ALL), CROSSJOIN, EXCEPT or INTERSECT.

  • The column is not affected by the DISTINCT, GROUP BY, or HAVING clause.

  • The TOP operator is not used


Also, the WITH CHECK OPTION clause (when creating a view) means that the only data manipulation that can occur through the view must also be retrievable when you select from it.

An exception to the previous restrictions are partitioned views (created with UNION ALL statements), which have a somewhat restrictive set of conditions. Broadly speaking, they need to ensure that the result set in each of the member tables (which must be all equally defined) are unique.

Indexed views also have a special list of requirements because data is materialized and physically stored (aka materialized views). Among these requirements are:

  • The SELECT statement cannot reference other views

  • All functions must be deterministic

  • AVG, MIN, MAX, STDEV, STDEVP, VAR, VARP are not allowed

  • The index must be both clustered and unique

  • ANSI_NULLS must have been set to ON when the view and any tables referenced by it were created

  • It must be created with the SCHEMABINDING option

  • The SELECT statement must not contain subqueries, outer joins, EXCEPT, INTERSECT, TOP, UNION, ORDER BY, DISTINCT, COMPUT/COMPUTE BY, CROSS/OUTER APPLY, PIVOT or UNPIVOT.

Saturday, January 30, 2010

Notes on SQL 6

Previous related posts:

Stored procedures

These statements cannot be used in an SP: USE <database_name>, SET SHOWPLAN_TEXT, SET SHOWPLAN_ALL, SET PARSEONLY, SET SHOWPLAN_XML, CREATE AGGREGATE, CREATE RULE, CREATE DEFAULT, CREATE SCHEMA, CREATE/ALTER FUNCTION, CREATE/ALTER TRIGER, CREATE/ALTER PROCEDURE, CREATE/ALTER VIEW. In other words, you cannot define what database to use, cannot ask the server for information for each executed statement, cannot ask the server to only parse the SP, and cannot create/alter objects.

The ENCRYPTION option when creating an SP is not really encryption, and is easily reversed.

WAITFOR:

  • WAITFOR DELAY: obvious. WAITFOR DELAY '00:00:02'.

  • WAITFOR TIME: waits for a specified time to occurr.

  • WAITFOR RECEIVE: used in conjunction with Service Broker.


Query (or execution) plans are created when an SP is going to be executed, and then reused when it gets re-executed. If the SP behaves differently due to conditions inside it (control flow constructs), it might be better to re-generate the execution plan each time to avoid using non-optimal plans in some cases. This can be done by creating the SP with the RECOMPILE option. Optionally, different SPs can be created for each block of code in the conditions, so the "master" execution plan is simple but always the same, and each "sub-SP" has its own execution plan created when it is needed.

The EXECUTE AS clause can be used to run a SP under a specific security context:

  • LOGIN

  • USER

  • CALLER


Variables

If you assign the results of a SELECT statement that returns several rows, to a variable, the last value (row) is assigned, and the rest discarded.

The +=, -=, *= and /= operators are new to SQL Server 2008 (2005 and older require @var = @var + 1 instead of @var += 1).

Errors


Severity levels go from 0 to 25. 16 and up is logged automatically to the SQL Server error log and the Windows Application Event Log. 19 to 25 can only be specified by members of the sysadmin role. 20 to 25 are considered fatal and cause the connection to be terminated, and any open transactions to be rolled back.

In the CATCH part of a TRY-CATCH block, you can use the ERROR_NUMBER(), ERROR_MESSAGE(), ERROR_SEVERITY(), ERROR_STATE(), ERROR_PROCEDURE(), ERROR_LINE() functions for error handling. Also, the XACT_STATE function returns 1 if there are open transactions that can be committed or rolled back, 0 if there are no open transactions, and -1 if transactions that can only be rolled back (due to the type of error) exist. If you use SET XACT_ABORT ON before a TRY-CATCH block, any error passes control to the CATCH block but XACT_STATE always returns -1.

Cursors

You need to DECLARE, OPEN, FETCH, CLOSE, and DEALLOCATE a cursor (CLOSE and DEALLOCATE are performed automatically when an SP terminates). FETCHs are normally done together with with a WHILE @@FETCH_STATUS = 0

Cursor options:

  • FAST_FORWARD: can only move forward one row at a time, and scrolling is not supported. You cannot modify the underlying table.

  • STATIC: the result set is stored in tempdb, so changes to the tables are not seen by the cursor. Modifications are not supported.

  • KEYSET: the keys for the rows in the result set are stored, so modifications to these rows are seen by the cursor. Inserts into the source table are not. Accessing a row that has been deleted causes @@FETCH_STATUS to return -2.

  • DYNAMIC: all changes to the underlying table (including inserts) are reflected in the cursor. You cannot use FETCH ABSOLUTE with these.


Updates/deletes to the source tables can be done by an UPDATE/DELETE statement with the WHERE CURRENT OF <cursor name> clause. If modifications are not necessary, the cursor should be declared as READ_ONLY.

Instead of FORWARD_ONLY, you can declare a cursor as SCROLL, in which case you can:

  • FETCH FIRST

  • FETCH LAST

  • FETCH NEXT: equivalent to FETCH

  • FETCH PRIOR

  • FETCH ABSOLUTE n: fetches the n'th row.

  • FETCH RELATIVE n: n rows forward.


Concurrency options for cursors are:

  • READ_ONLY: no lock acquired.

  • SCROLL_LOCKS: locks acquired as each row is read into the cursor.

  • OPTIMISTIC: uses timestamps or checksums instead of locks, and if the data has changed when a modification through the cursor is attempted, it fails.


User-defined Functions

THey cannot:

  • Perform an action that changes the state of an instance or database

  • Modify data in a table

  • Call a function that has an external effect, such as the RAND function

  • Create or access temporary tables

  • Execute code dynamically


Scalar-valued functions are straightforward, table-valued functions can be inline (treated like views), or multi-statement (with a BEGIN-END block which does calculations and returns a table.

Can be created with:

  • SCHEMABINDING: if specified, prevents dropping of objects on which the function depends (e.g. tables).

  • EXECUTE AS: same as for SPs

  • ENCRYPTION: same as for SPs

  • RETURNS NULL ON NULL INPUT/CALLED ON NULL INPUT: the former causes NULL to be returned immediately if a NULL parameter is passed to the function; the latter lets the function code execute with a NULL parameter.


Random notes:

A GO inside a comment is STILL considered as a batch delimiter in T-SQL. Careful.

SPs allow GOTO statements. Discouraged.

The @@IDENTITY global variable contains the last identity inserted by any statement in the current connection. If an INSERT activates triggers that insert other rows, @@IDENTITY contains the ID of the last row inserted by the trigger(s). The SCOPE_IDENTITY() function should be used to retrieve just-inserted IDs instead.

Cool optical illusion

Another one for the collection. One of the best ones I've seen.

Notes on SQL 5

Previous related posts:

Common Table Expressions (CTEs)

Basic CTE:

WITH TableName As (<Any SELECT statement that generates a valid table>)
<SELECT statement that references TableName>

Recursive CTE:

WITH TableName As (<Any SELECT statement that produces the anchor result> UNION ALL <SELECT statement that references TableName>)
<SELECT statement that references TableName>

Use OPTION(MAXRECURSION = <number>) in the outer query to determine the max amount of recursive calls.

More details on CTEs here.

Note: since the WITH keyword has several possible meanings in different contexts, the statements preceding the WITH keyword are required to be terminated with a semicolon.

Ranking data functions


ROW_NUMBER: specifies the row number for each row in the resultset. Could be use to paginate in SPs (WHERE RowNumber BETWEEN @min AND @max). Syntax: ROW_NUMBER() OVER ([ <partition_by_clause> ] <order_by_clause>). The partition separates the result set into subsets to which the ROW_NUMBER() function applies independently (row numbering is restarted for each subset). The order by specifies the order in which to number rows. DOES NOT WORK WELL WITH TIES (of the order_by_clause), since it could return different row numberings when executing the same query.

RANK: same as ROW_NUMBER, but if there are ties the same value is assigned to all rows in a tie, and skips the corresponding amount of values for the next assignment. E.g. 1, 2, 3, 3, 5, 5, 5, 8, 9, etc.

DENSE_RANK: same as RANK but does not skip values. E.g. 1, 2, 3, 3, 4, 4, 4, 5, 6, etc.

NTILE(integer_expression): partitions the result set into  a specified number of groups, and rows in each group are assigned a number that indicates to which group they pertain. (Note: "If the number of rows in a partition is not divisible by integer_expression, this will cause groups of two sizes that differ by one member. Larger groups come before smaller groups in the order specified by the OVER clause").

Thursday, January 21, 2010

Notes on SQL 4

Previous related posts:

Data Types

SQL Server System data types VS User-defined types (UDTs) or SQL Common Language Runtime (SQLCLR) types

Collations for character types are provided by the table-valued fn_helpcollations function. Names of collations (e.g. Latin1_General_CI_AI) specify case sensitivity (CI VS CS) and accent sensitivity (AI VS AS). However, note that what counts as an "accent" differs in different languages (and thus collations).

When doing operations on a column, a specific collation (different from the one originally defined on the column) can be used like this: WHERE Name = 'Öqvist' COLLATE Finnish_Swedish_CI_AS. This prevents indexes on that column from being used for that particular expression, though.

int and decimal types (in all of their versions) are exact and work the same no matter the underlying processor architecture. Decimal (without parameters) defaults to decimal(18,0).

float and real are approximate (floating point) numeric types. For float(n), n specifies the amount of bits that represent the mantissa of the number. But in reality, this can only be 24 or 53, and n is "rounded up" to one of these values. real is equivalent to float(24).

  • datetime: 1753-01-01 => 9999-12-31, accuracy to every third millisecond (so the last digit can only be 0, 4, 7). 23:59:59.998 gets rounded down to 23:59:59.997, but 23:59:59.999 gets rounded up to the next day.

  • smalldatetime: 1900-01-01 => 2079-06-06, accuracy to the minute

  • datetime2(fractional seconds precision): 0001-01-01 => 9999-12-31, accuracy to specified fractional seconds

  • datetimeoffset(fractional seconds precision): as datetime2, plus a +/- 14:00 offset.

  • date: 0001-01-01 => 9999-12-31

  • time(fractional seconds precision): 00:00:00 => 23:59:59


rowversion columns are automatically set to a database-scoped binary counter whenever they are inserted or updated. It is like a "sequence number" that can be used to determine the order in which things happened (up to some point, since it can be overwritten if another modification to the row happens).

varchar(max), nvarchar(max), varbinary(max) and xml prevents online index rebuilding, and cannot be used in the key of an index.

Tables

"A schema is similar to a namespace in other programming languages, but only 1 level of schemas exist." The question that I never gave myself time to answer =P.

IDENTITY allows for a seed and an increment step (which can be negative): IDENTITY(seed, increment). It can only be applied on integer data types, and decimal(0). If an insertion fails, the generated number is NOT given to the next insert, it is lost.

Compression can be enabled on a page or row level.

Computed columns are not physically stored, unless PERSISTED is specified. "Computed columns used as CHECK, FOREIGN KEY, or NOT NULL constraints must be marked PERSISTED. A computed column can be used as a key column in an index or as part of any PRIMARY KEY or UNIQUE constraint if the computed column value is defined by a deterministic expression and the data type of the result is allowed in index columns."

Data integrity

Declarative data integrity (constraints on table columns) VS procedural data integrity (SPs and triggers).

Notes on constraint types:

  • Primary Key/Unique: a unique index is automatically created (which cannot contain more than 16 columns or 900 bytes of data). PK defaults to creating a clustered index, Unique defaults to a non-clustered one (both can be overridden). They can be created on computed columns.

  • Foreign Key: the column being referenced MUST have a unique index (either PK or Unique). Since they are used a lot (in joins and to verify referential data integrity), indexes are highly recommended on FK columns. The default behavior (called NO ACTION) when a referenced value is deleted (raise an error and rollback the deletion) can be changed to SET NULL, SET DEFAULT, or CASCADE. Different actions can be specified for (ON) DELETE and (ON) UPDATE. For SET NULL and SET DEFAULT, the columns must be nullable and have a default respectively (in the second case, NULL is the implicit default).

  • Check: it's not that it accepts values that evaluate to true, but that it rejects values that evaluate to false (different because of NULL, which is accepted). Scalar UDFs can be used.

  • Default: nothing of interest to say.


FK and Check constraints can be turned off, but doing some flags them as "not trusted", because someone could have messed with the data during that time (this can cause significant differences in execution plans). To make them "trusted" again, they must be turned back on specifying WITH CHECK, e.g.:

ALTER TABLE tableName
WITH CHECK
CHECK CONSTRAINT FKForeignKey

Click here for the next part in this series of posts.

Notes on SQL 3

Previous related posts:

INSERT Statement

To cause DEFAULT constraints to "execute" on a table while inserting, it is possible to do this: INSERT INTO Table VALUES ('value 1', 'value 2', DEFAULT). Additionally, specifying all the columns in the table removes the need to explicitly state the column name for each value.

For batch insertion into an existing table: INSERT SELECT; for temporary/permanent table creation: SELECT INTO.

Advanced DML - OUTPUT Clause

Allows things like these (for INSERT, UPDATE and DELETE statements):

DELETE FROM [Table]
OUTPUT deleted.* INTO [output_table]
WHERE Condition

However, there are limitations: output_table cannot

  • Have enabled triggers defined on it.

  • Participate on either side of a foreign key constraint.

  • Have CHECK constraints or enabled rules.


Advanced DML - MERGE Statement

The syntax should be clear enough as to what this does. I couldn't find much worth mentioning without going into too much detail:
[ WITH <common_table_expression> [,...n] ]
MERGE
        [ TOP ( expression ) [ PERCENT ] ]
        [ INTO ] <target_table> [ WITH ( <merge_hint> ) ] [ [ AS ] table_alias ]
        USING <table_source>
        ON <merge_search_condition>
        [ WHEN MATCHED [ AND <clause_search_condition> ]
            THEN <merge_matched> ] [...n ]
        [ WHEN NOT MATCHED [ BY TARGET ] [ AND <clause_search_condition> ]
            THEN <merge_not_matched> ]
        [ WHEN NOT MATCHED BY SOURCE [ AND <clause_search_condition> ]
            THEN <merge_matched> ] [...n ]
        [ <output_clause> ]
        [ OPTION ( <query_hint> [ ,...n ] ) ]

Transactions

When implicit transactions is set to ON (SET IMPLICIT_TRANSACTIONS {ON|OFF}), every one of the following statements starts a transaction: ALTER TABLE, CREATE, DELETE, DENY, DROP, FETCH, GRANT, INSERT, OPEN, REVOKE, SELECT, TRUNCATE TABLE, UPDATE.

Note on Rollbacks: "Although a ROLLBACK statement returns the data to its prior state, some functionalities, such as seed values for identity columns, are not reset."

With nested transactions, a ROLLBACK goes all the way to the outermost transaction (even if COMMITS were done for any of the inner ones).

The SAVE TRANSACTION savepoint_name statement allows partial rollbacks, if a savepoint_name is specified during rollback. However, it cannot be used with distributed transactions (damn it, CDR!).

When transactions deadlock, SQL Server chooses a victim based on the estimated cost of rolling back each one, and a 1205 error is issued.

Locking

"Locks can be assigned to resources such as rows, pages, indexes, tables and databases".

Locking modes:

Shared (S): placed on resources for read (SELECT) operations. They are compatible among them, but not with exclusive locks. If the isolation level is REPEATABLE READ or higher, the lock is kept throughout the transaction instead of just during the read.

Update (U): "placed on resources where a shared (S) lock is required, but the need to upgrade to an exclusive (X) lock is anticipated." ONLY ONE TRANSACTION CAN OBTAIN A "U" LOCK ON A RESOURCE AT A GIVEN TIME. This is to prevent locking deadlocks when two or more transactions have a "S" lock on a resource, that they want to upgrade to an "X" lock. But since each one has to wait for the other to release the "S" lock first, deadlock occurs.

Exclusive (X): for data modification. Not compatible with any other kind of lock, and can only be "overridden" by a NOLOCK hint or the READ UNCOMMITED isolation level.

Intent (IS, IX, SIX): from TechNet:

"The Database Engine uses intent locks to protect placing a shared (S) lock or exclusive (X) lock on a resource lower in the lock hierarchy. Intent locks are named intent locks because they are acquired before a lock at the lower level, and therefore signal intent to place locks at a lower level.

Intent locks serve two purposes:

  • To prevent other transactions from modifying the higher-level resource in a way that would invalidate the lock at the lower level.

  • To improve the efficiency of the Database Engine in detecting lock conflicts at the higher level of granularity."


Schema (Sch-M, Sch-S): they stand for Schema Modification, and Schema Stability. Obvious meaning for the first (e.g. adding a new column to a table), while the second ones "are placed on objects while queries are being compiled or executed". Need more info on this one...

Bulk Update (BU): used for bulk insert.

Key-range: again from TechNet:

"Key-range locks protect a range of rows implicitly included in a record set being read by a Transact-SQL statement while using the serializable transaction isolation level. Key-range locking prevents phantom reads. By protecting the ranges of keys between rows, it also prevents phantom insertions or deletions into a record set accessed by a transaction."

???

Transaction Isolation Levels

Set with the SET TRANSACTION ISOLATION LEVEL statement.

READ UNCOMITTED: allows reading what other transaction has inserted but not committed.

READ COMMITED: prevents dirty reads (the previous case). This is the default.

REPEATABLE READ: no dirty reads, and ensures that shared locks are maintained until the transaction completes.

SNAPSHOT: (requires the ALLOW_SNAPSHOT_ISOLATION option). Literally takes a snapshot of the data that will be read, and consequently does not issue any locks. It's still not clear to me if data modification is done only "locally" on the snapshot or how is it handled...

SERIALIZABLE: total blocking. No one can read what the transaction has modified, and no one can modify what the transaction has read.

Other notes

Tables without a clustered index are stored in a heap.

"Bound sessions allow two or more sessions to share the same transaction and locks, and can work on the same data without lock conflicts." Cool!

Click here for the next part in this series of posts.

Sunday, January 17, 2010

"On thinking for oneself"

Ensayo de Arthur Schopenhauer con el que me sentí parcialmente identificado (and I must do something about that), y que —aunque tal vez un poco arrogante— me gustó bastante.

Here.

Bitten apple + rainbow = Alan Turing ?

Encontré sin querer este comentario mientras veía este video de YouTube (parte 1 acá):
@Bobbiethejean
The Apple logo for Apple computers was inspired by Turing's death. Apple = Poisoned apple + Rainbow colours = his homosexuality.

Hace poco me mandaron un mail que decía que los grafos que usamos actualmente para los números eran así por la cantidad de ángulos que tienen, pero como los ejemplos se me hicieron medio far-fetched y forzados no me la creí. Wikipedia luego confirmó mis sospechas. Pero la teoría del logo de Apple no tiene nada forzado y hasta tiene cierto sentido, con el papel de Turing en la historia de la computación. Pero again, para aclararlo recurrí a Wikipedia, que dice esto:
Apple’s first logo, designed by Jobs and Wayne, depicts Sir Isaac Newton sitting under an apple tree. Almost immediately, though, this was replaced by Rob Janoff’s “rainbow Apple”, the now-familiar rainbow-colored silhouette of an apple with a bite taken out of it. Janoff presented Jobs with several different monochromatic themes for the "bitten" logo, and Jobs immediately took a liking to it. While Jobs liked the logo, he insisted it be in color, as a way to humanize the company.[122][123]

The original hand drawn logo features Sir Isaac Newton, and one theory states that the symbol refers to his discoveries of gravity (the apple) and the separation of light by prisms (the colors). Another explanation exists that the bitten apple pays homage to the mathematician Alan Turing, who committed suicide by eating an apple he had laced with cyanide.[124] Turing is regarded as one of the fathers of the computer. The rainbow colors of the logo were rumored to be a reference to the rainbow flag, as a homage to Turing's homosexuality.[125]

However, Rob Janoff stated in an interview that the alternate theories are all wonderful urban legends, but, unfortunately, "B.S." The Apple logo was designed with a bite for scale, so that people would recognise that it was an apple, not a cherry, and the rainbow color was not a coded reference to homosexuality or prism light, but was conceived to make the logo more accessible and represent the fact the monitor could reproduce images in color [126].

Ni modo, si el diseñador dice que no es por eso, qué se le hace jeje. Cool coincidence though. E interesante fact el del primer diseño del logo, que se ve bastante old-school.

Monday, January 11, 2010

Notes on SQL 2

Previous related posts:

Aggregate functions I didn't know about

CHECKSUM_AGG: checksum of all values in the dataset.

COUNT_BIG: same as count but return bigint

GROUPING: returns 0 or 1 to specify detail or aggregate row in certain uses of the GROUP BY clause

STDEV/STDEVP: standard deviation and standard deviation for population (still not very clear what this does...)

VAR/VARP: variance and variance for population (still not very clear what this does...)

Additions to the GROUP BY clause

WITH ROLLUP: adds result rows that specify the "summary" of an aggregate function like if the GROUP BY hadn't been applied. E.g., if I group products by CategoryId and request the average cost (SELECT AVG(Cost) FROM Products GROUP BY CategoryId) , I will get a column with each Category identifier and another with the corresponding average cost. If I add WITH ROLLUP at the end, an extra row will appear with the average cost for ALL products across all categories. This extends to several columns in the GROUP BY clause, and in that case each grouping level gets its own summary rows.

WITH CUBE: similar to WITH ROLLUP, but creates summaries as if the GROUP BY clause had been applied to each column individually (maybe each pair of columns too?). These two seem like a minor touch of Business Intelligence over basic SQL Server queries.

Grouping Sets (added in SQL Server 2008)

They provide clearer syntax when requesting aggregate data over several different groupings. This is useful if you have a table which you can group independently by several different columns, and want to see the summaries for each possible grouping in a single query.

SELECT CategoryID, SubCategoryID, AVG(Price) FROM Products GROUP BY GROUPING SETS ( (Products.CategoryID), (Products.SubCategoryID) )

UNION, INTERSECT, EXCEPT

They exist! I thought they only did in Oracle. Minor (expected) nuances like having to match the number and type of columns in both sides of each operator. Regarding efficiency against other operators that provide equivalent functionality (e.g., EXCEPT VS NOT IN), some sources claim that the execution plans are exactly the same, so the clearer syntax should be used.

APPLY Operator

You select some rows from some table, and then use one (or more? probably) of the columns in each row as parameters for a table-valued function, from which you extract additional information. Comes in 2 versions: CROSS APPLY, and OUTER APPLY. The second one returns all rows from the original table, with NULL in the rest of the columns if the table-valued function did not return any data. If the table-valued function returns several rows, the behavior expected from a regular JOIN still applies.

Scalar-valued functions

DATEDIFF(datepart, startdate, enddate) -> startdate gets substracted from enddate. Minor reminder.

DATALENGTH(expression) -> returns the number of bytes required to represent expression.

CHARINDEX(expression1, expression2, [start_location]) -> returns the index of the first instance of expression2 inside expression1, starting to look at start_location.

PATINDEX('pattern', expression) -> similar to CHARINDEX but allows for wildcards.

Click here for the next part in this series of posts.

Saturday, January 9, 2010

Interesting passages from "This is your brain on music"

Just some extracts from the book that I liked and happened to take note of:

Regarding sound production:
The introduction of energy to an instrument - the attack phase - usually creates energy at many different frequencies that are not related to one another by simply integer multiples. In other words, for the brief period after we strike, blow into, pluck, or otherwise cause an instrument to start making sound, the impact itself has a rather noisy quality that is not especially musical - more like the sound of a hammer hitting a piece of wood, say, than like a hammer hitting a bell or a piano string, or like the sound of wind rushing through a tube. Following the attack is a more stable phase in which the musical tone takes on the orderly pattern of overtone frequencies as the metal or wood (or other material) that the instrument is made of starts to resonate. This middle part of a musical tone is referred to as the steady state - in most instances the overtone profile is relatively stable while the sound emanates from the instrument during this time.
After Schaeffer edited out the attack of orchestral instrument recordings, he played back the tape and found that it was nearly impossible for most people to identify the instrument that was playing. Without the attack, pianos and bells sounded remarkably unlike pianos and bells, and remarkably similar to one another. If you splice the attack of one instrument onto the steady state, or body, from another, you get varied results. In some cases, you hear an ambiguous hybrid instrument that sounds more like the instrument that the attack came from than the one the steady state came from. Michelle Castellengo and others have discovered that you can create entirely new instruments this way; for example, splicing a violin bow sound onto a flute tone creates a sound that strongly resembles a hurdy-gurdy street organ. These experiments showed the importance of the attack.

Regarding sound identification/discernment:
One case of auditory grouping is the way that the many different sounds emanating from a single musical instrument cohere into a percept of a single instrument. We don't hear the individual harmonics of an oboe or of a trumpet, we hear an oboe or we hear a trumpet. This is all the more remarkable if you imagine an oboe and a trumpet playing at the same time. Our brains are capable of analyzing the dozens of different frequencies reaching our ears, and putting them together in just the right way. We don't have the impression of dozens of disembodied harmonics, nor do we hear just a single hybrid instrument. Rather, our brains construct for us separate mental images of the two of them playing together - the basis for our appreciation of timbral combinations in music. This is what Pierce was talking about when he marveled at the timbres of rock music - the sounds that an electric bass and an electric guitar made when the were playing together - two instruments perfectly distinguishable from one another, and yet simultaneously creating a new sonic combination that can be heard, discussed, and remembered.

Regarding the function of the auditory system:
Imagine that you stretch a pillowcase tightly across the opening of a bucket, and different people throw Ping-Pong balls at it from different distances. Each person can throw as many Ping-Pong balls as he likes, and as often as he likes. Your job is to figure out, just by looking at how the pillowcase moves up and down, how many people there are, who they are, and whether they are walking toward you, away from you, or are standing still. This is analogous to what the auditory system has to contend with in making identifications of auditory objects in the world, using only the movement of the eardrum as a guide.

Regarding emotions in music:
Composers imbue music with emotien by knowing what our expectations are ande then very deliberately controlling when those expectations will be met, and when they won't. The thrills, chills, and tears we experience from music are the result of having our expectations artfully manipulated by a skilled composer and the musicians who interpret that music.