Friday, November 27, 2009

Notes on SQL 1

I'll be posting notes and insights on SQL now that I'm preparing for the Microsoft SQL Server 2008, Database Development certification exam. Both as a way of improving my retention over the things I learn, and as a reference for my future self =P. Note that these will most probably NOT be a full-depth treatment of the topics, but isolated notes that are new or interesting to me (in other words, I strongly advise against using these notes as a tutorial for learning SQL). Without further ado:

ANSI_NULLS


The ANSI_NULLS option determines if "normal" comparisons, i.e. using the = and <> operators, can be done against NULL values or not. Setting ANSI_NULLS to ON, these comparisons won't work (will return "unknown") and thus IS NULL and IS NOT NULL should be used. When ANSI_NULLS is set to OFF, queries like the following work:

SELECT * FROM Table WHERE Column <> NULL

CONTAINS clause


"The CONTAINS clause is available only when you create a full text index on the column being compared." This is a big topic, so I'll summarize some of my findings here (I'll definitely cover this to a deeper extent in a future post though).

Full-text capabilities in SQL Server 2008 allow us to go beyond Column = 'Exact_text_match' and Column LIKE '%Partial_text_match%' and do searches like the following (or so I've read haha):

  • Two words near each other

  • Any word derived from a particular root (for example run, ran, or running)

  • Multiple words with distinct weightings

  • A word or phrase close to the search word or phrase


Everything needed for this functionality to work is included in SQL Server 2008 if it's properly configured during installation (this was not true for previous versions of SQL Server, that depended on some components of the OS). A full-text catalog must be created on the database, and a full-text index on the table of interest (only 1 full-text index can exist per table, and it requires a unique key index on the table too). The index must then be populated. Full-text search can be done over columns of the following types: char, varchar, nchar, nvarchar, text, ntext, image, xml, varbinary, or varbinary(max). While creating these indexes, several options can be specified, like sensitivity to accents, the language of the text we're indexing, a stoplist word list ('a', 'an', 'the', etc.), the extension of the file contents stored in a column (in the case of a varbinary, varbinary(max) or image column, that store documents in binary format), automatic/manual/off settings for index updates. SQL Server includes "language helpers" like stemmers (processes that conjugate verbs based on the linguistic rules of the language) and initially empty thesaurus. The CONTAINS predicate (along with the FREETEXT predicate, and their [command]TABLE function counterparts, which return the results in a table), is one of the ways to actually use full-text search in a query.

Note from the MSDN:
A LIKE query against millions of rows of text data can take minutes to return; whereas a full-text query can take only seconds or less against the same data, depending on the number of rows that are returned.

Boolean operators (AND, OR, NOT)
The NOT operator typically hurts performance because the query optimizer cannot use indexes for the WHERE clause when a NOT operator is specified. For indexes to be utilized when an OR operator is specified, all columns referenced by the OR condition must be included in an index or none of the indexes are used.

LIKE clause

  • A percent sign (%) replaces any amount of characters in the string (including 0)

  • An underscore (_) replaces exactly one character in the string.

  • Square brackets ([]) replace any one character within a set or a range of characters.

  • A caret (^) as the first character inside square brackets means "match any character NOT in this set/range".


Just as the NOT operator, Wildcard characters at the beginning of a string do not allow the query optimizer to use indexes.

BETWEEN clause

It is inclusive, so "SELECT * FROM Products WHERE Price BETWEEN 4 AND 8" includes products with price = 4 and price = 8. Note that this might cause weird behavior when using alphanumeric ranges (i.e. BETWEEN 'S' and 'Z' returns 'S', 'Slice', 'Tom', and 'Z', but not 'Zebra')

Click here for the next part in this series of posts.