Background

From 1999 to 2002, I worked on Microsoft's SQLXML product. SQLXML is a collection of XML technologies for Microsoft's SQL Server database, including ways to convert relational data into XML and vice-versa. SQLXML initially shipped with SQL Server 2000 in the box, and then we shipped three additional versions on the Web.

I worked on several parts of SQLXML, but primarily I implemented a feature known as XPath over Annotated Schemas. An Annotated Schema is an ordinary XML Schema (using the XDR or XSD language) with a few extra attributes added to describe how the XML shape maps to the database. It describes an XML view over relational data. Users can then query these views using the XPath query langauge. XPath is a simple XML query language from the W3C, with a familiar syntax reminiscent of file paths (hence the name). Together, they provide an easy-to-use, high-performance way to publish relational data as XML.

An XML view, like a relational view, describes data as you wish to work with it, not necessarily how the data is actually stored. For example, data might be stored in several different databases or XML documents, and an XML view might combine them together into a single XML shape. SQLXML's XML views work only over tables.

Like relational views, XML views can be materialized or virtualized. Roughly speaking, a materialized XML view actually creates the new shape as a full XML document. A virtual XML view leaves the data in its original location, and retrieves values on demand. Like SQL views, both approaches have their uses.

SQLXML focuses on virtual XML views, in which only the final query result is converted to XML. This is quite hard to implement, but delivers a lot of functionality and performance to end-users.

How It Works

The canonical example takes two relational tables, say Customers and Orders, and maps them into an XML hierarchy of Customer elements containing Order elements. These tables are joined by primary/foreign keys in the database, and the join is represented by the XML parent/child hierarchy. Example 1 expresses this view as an annotated XSD schema.

Example 1: A simple view
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
  <xs:element name="Customer" sql:relation="Customers">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Order" sql:relation="Orders" >
          <xs:complexType>
            <xs:attribute name="OrderID" type="xs:int" />
          </xs:complexType>
          <xs:annotation>
            <xs:appinfo>
              <sql:relationship parent="Customer" child="Order"
                                   parent-key="CustomerID" child-key="CustomerID"/>
            </xs:appinfo>
          </xs:annotation>
        </xs:element>
      </xs:sequence>
      <xs:attribute name="OrderID" type="xs:int" />
    </xs:complexType>
  </xs:element>
</xs:schema>

The root XPath query (/) retrieves the entire view, materializing it. More interestingly, a query such as /Customer[@Region='WA']/Order retrieves only the Order elements for customers in Washington. The Customer elements are never created.

This magic is performed by composing the query with the view. Example 2 illustrates view composition using a pseudo-XQuery syntax to describe the query and view, and the result of composition.

Example 2: The query /Customer[@Region='WA']/Order composed with a view
declare function sql:view() {
    for $i in sql:table("Customers")
    return <Customer Region="{$i/@Region}">{
               for $j in sql:table("Orders")
               where $i/@CustomerID = $j/@CustomerID
               return <Order ID="{$i/@OrderID}"/>
          }</Customer>
};

sql:view()/Customer[@Region='WA']/Order

==>

for $i in sql:table("Customers")
where $i/@Region = "WA"
return
  for $j in sql:table("Orders")
  where $i/@CustomerID = $j/@CustomerID
  return <Order ID="{$i/@OrderID}"/>

Instead of instantiating the entire XML view and then querying over that with XPath, SQLXML instead takes the XPath and combines it with the XML view to create an efficient SQL query, and then converts only the results of this query into XML.

Tips for Users

If you're using SQLXML, there are a few things you should do to get the most out of it:

Install the Latest Web Release

Use the latest supported release available from MSDN. A lot of customers think that because these updates are provided free on the web, they're somehow less stable than a full point release or service pack. However, this is incorrect -- these releases have been through the full testing process and should be deployed ASAP. These releases contain important functional and security fixes that you don't want to be without, in addition to new features and in most cases performance improvements.

Of course, you'll want to test the upgrade before deployment, like you would with any software upgrade.

Use sql:key-fields

As you'd expect with any innovative new software, we learned some important lessons after the first release of SQLXML. One issue we discovered during development was that for some XML shapes -- which we thought to be rare -- enough information to uniquely identify each row is required to construct a correct SQL query. Without this information, the query will sort the "universal table" incorrectly, and FOR XML EXPLICIT could fail to produce correctly nested XML (or fail with a SQL error). You can specify this extra information with the sql:key-fields annotation in the Annotated Schema. Because we thought this was a rare case, we didn't require key information for every table, so this annotation is optional.

Unfortunately, it turns out to be a fairly common case after all and the error message (or strangely incorrect XML shape) that can result gives no clue that you would need sql:key-fields to fix it. The product team members themselves are often unaware of this problem when responding to customer questions in the newsgroup, often incorrectly suggesting the customer may have found a bug when all that's required is to add sql:key-fields to the schema. If I could go back in time and change one thing about SQLXML, it'd be this -- you should use sql:key-fields everywhere that you have sql:relation (or an element with the default mapping to a table of the same name).

Improve Perf with sql:datatype

A few weeks after the release of SQL Server 2000, I got this email from Turing Award Winner Jim Gray about perf issues he's having with SQLXML. It turned out that dbobject queries (which use a syntax similar to XPath but with different effects) were performing unnecessary CONVERTs in the generated SQL queries -- causing what should be index seeks to become index scans, and ruining perf by a factor of 10. Ouch! Needless to say, this was fixed in the first SQLXML web release.

The worst part was that we had fixed this in XPath before release, by using the sql:datatype annotation to optimize out unnecessary conversions. However, most users don't realize this annotation exists, or its effect on XPath performance (even though it's documented...). You should use the sql:datatype annotation and the XSD or XDR type attribute in your schemas for any columns that are indexed.

Conclusion

In many ways, SQLXML was ahead of its time. XML query/view composition is a deep problem that is not yet well-understood. In fact, XSLT 1.0 doesn't allow it, and XQuery 1.0 allows it but has no way to represent it in its formal semantics. Big companies like IBM and BEA continue to hammer away at this area, but we're probably several years away from the kind of stability and rich feature sets available in SQL query/view processors today.

We didn't have time to implement and test the entire XPath language before release. This is of course normal for software releases, but usually they're followed up with subsequent releases that increase coverage. Unfortunately, SQLXML seems to have stagnated and some useful features like // and string functions remain unsupported even in SQLXML 3.2 (the latest release at the time of this writing).

Despite its shortcomings, the XPath subset that SQLXML supports has been used for many interesting applications, like the European Environmental Protection Agency's website, Jim Gray's PhotoServer, and MSN Music Search, to name just a few.

Further Reading

Websites and Newsgroups

Books

Presentations