Microsoft SQLXML

by Created: 09 Oct 1999 Updated: 22 Apr 2014

Introduction

In 1999, Adam Bosworth hired me into the WebData team at Microsoft to implement XSLT on top of SQL Server as part of the SQLXML project. By the time I started in October, however, the team had decided that was too ambitious, and wanted to start with just XPath. I implemented several features, starting with XPath over Annotated Schemas, and eventually became Technical Lead for the team.

The idea behind SQLXML was that most of the world’s data at the time was locked up in relational databases, and by enabling this data to be queried and retrieved as XML, we would enable applications to combine data from many sources. We were enabling “mashups” before that term had been invented. This idea later became my inspiration for the Yahoo Query Language.

SQLXML was used by millions of customers around the world, including the European Environmental Protection Agency’s website, Jim Gray’s PhotoServer, and MSN Music Search, to name just a few.1

Annotated XML Schemas

SQLXML required you to first write an XML Schema with annotations to describee how your desired XML structure maps to your database. These were W3C standard XSD schemas (and before that, XDR) with additional namespaced attributes to describe the mapping.

Example 1: An Annotated Schema describing a Customer element that maps to the Customers table containing an Order element (joined with the Orders table).
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
  <xs:element name="Customer" sql:relation="Customers">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Order" sql:relation="Orders" >
          <xs:complexType>
            <xs:attribute name="OrderID" type="xs:int" />
          </xs:complexType>
          <xs:annotation>
            <xs:appinfo>
              <sql:relationship parent="Customer" child="Order"
                                   parent-key="CustomerID" child-key="CustomerID"/>
            </xs:appinfo>
          </xs:annotation>
        </xs:element>
      </xs:sequence>
      <xs:attribute name="Region" type="xs:string" />
    </xs:complexType>
  </xs:element>
</xs:schema>

You could materialize this entire document as XML by performing the root-level XPath /. However, it was more common to leave the document virtualized and extract only parts of it. For example, an XPath like /Customer[@Region='WA']/Order retrieves all the orders for all customers in WA state. Conceptually, this is similar to the SQL query shown in Example 2:

Example 2: A SQL query similar to the XPath/Customer[@Region='WA']
select * from Orders
where exists (
    select * from Customers
    where Customers.Region = 'WA'
      and Customers.CustomerID = Orders.CustomerID
)

Implementation

My initial C++ implementation consisted of a recursive-descent parser for XPath and then code that walked the ASTs for the XPath query and the Annotated Schema together. This worked ok, and shipped in the first few versions of SQLXML to customers.

However, I noticed that assigning a developer to every combination of query lanuage ✕ data store (in-memory XML, SQL Server databases) ✕ platform (C++,C#) was very inefficient, and a classic compiler problem. I invented the Common Query Runtime platform to solve this, which we shipped in Microsoft’s .NET Framework as System.Xml.Query2.

One of the key insights was that what we were doing was related to the nested relational algebra3 and query/view composition. The annotated schema and XPath above are conceptually similar to the XQuery shown in Example 3, with a hyptothetical sql:table() function that retrieves records from the database.

Example 3: An XQuery equivalent to the Annotated Schema and XPath above.
declare function sql:view() {
    for $i in sql:table("Customers")
    return <Customer Region="{$i/@Region}">{
               for $j in sql:table("Orders")
               where $i/@CustomerID = $j/@CustomerID
               return <Order ID="{$i/@OrderID}"/>
          }</Customer>
};
sql:view()/Customer[@Region='WA']/Order

Instead of executing one query to construct the entire document and then another query to extract only part of it, SQLXML combines (composes) the XPath with the XML View to create a more efficient query, and then converts only this query’s smaller result set into XML.

Example 4: An XQuery equivalent to the composed view and XPath.
for $i in sql:table("Customers")
where $i/@Region = "WA"
return
  for $j in sql:table("Orders")
  where $i/@CustomerID = $j/@CustomerID
  return <Order ID="{$i/@OrderID}"/>

Chris Suver, my mentor and at the time our team Architect, and I invented an XML query algebra to represent views and queries from many different XML query languages and enable performing this composition in a more rigorous way.

Additional Reading

Websites and Newsgroups

###Books

Presentations


Footnotes

  1. The links for these sites have broken over time. The EU EPA site has moved from this to this. I don’t know if they still use SQLXML. MSN Music Search is now Bing Search.

  2. System.Xml.Query was documented on MSDN but that link no longer exists. This article talks a little about it.

  3. The nested relational algebra is now used in many other systems, including the Dremel query language.