Despite its reputation, I’m still a massive fan of linked servers. Microsoft developed it for a reason and it’s ongoing presence on the platform since day one says it all. No one told you to pull back a gazillion rows on an inner join with a local table. So, when I saw it’s now possible to create a linked server to Azure’s CosmosDB, a cloud based NoSQL database, wow, this could be useful and had to try it out myself. It now means you can query NoSQL within SQL Server Management Studio through open queries like you do with normal SQL. It’s all possible through a simple ODBC Driver.
I first came across the ODBC Drivers for Cosmos DB a bit over a year ago (called DocumentDB back then) and at the time with little documentation and the platform still relatively new, only played around with it briefly ( connect Cosmos DB in Azure to various Visualisation tools like Tableau etc) and didn’t think much more about it.
I first learnt linked servers for Cosmos DB was possible about a week ago through a post on LinkedIn. The problem I had with that article was one of the more important steps was anything but clear so it didn’t work. In any event, I saw a better and easier work around after playing around with it and in case anyone came across the same issue, this approach below also works.
The first step is to download the drivers which is available here. If you’re new to Azure, an understanding of endpoints urls, access key, etc is absolutely critical. This page will show you where to obtain these information together with getting started.
Once you’ve downloaded and installed the drivers, go and launch the ODBC Datasource App which looks this. Click Add
Step above will launch this interface below. Select the DocumentDB ODBC Driver as seen below and click finish.
Fill this in (remember those keys, endpoints I mentioned-this is where that comes handy). For the first two columns you can just use anything meaningful. Then Test connection and if successful, click the Schema Editor
This gets launched. Click Create New.
The interface below gets launched. Contents will differ depending on how many database/collection you have in Cosmos-what you see just happens to be my databases for this resource group. For the purpose of linked server, selection below is not important so you can just pick any one of the checkbox and click the Sample button (it’s greyed out til you make a selection). If you intend to use Cosmos DB as a data source on other tools like Tableau through this DSN we’re creating, the choice does matter so pick the correct collection accordingly. For instance, if you pick the AW collection, you’ll be limited to that collection for this DSN.
This gets launched and the Connection String you see is exactly what we’re after for the work around with the link server. Select and copy the contents into your clipboard (or another file for use later).
The next two step/screen shot is only applicable if you plan on using the DSN for tools other than the linked server, such as Tableau. If you only want the linked server, skip straight to SSMS to create link server.
Click preview and if happy, go to file and save.
After saving the file, go back to DSN setup and click Advanced Options and add the schema file we saved from the step above.
Now in SSMS for the link server: just follow your normal steps to creating a linked server. Select New Linked Server as below.
Give your server a name and then paste the connection string from above earlier into the Provider String. That’s it, that’s all you need. If you look at the contents of what you’ve copied over, you’ll see somewhere within that string is an encrypted Authentication Key, and that is how you get authenticated. Click OK and you’re done.
Back in main screen, refresh Linked Server (right click). If you expand the nodes, you’ll see your new linked server to CosmosDB.
You can then start running open queries as you would with any linked server.
The best thing about this, you can use normal SQL to query JSON structure including those with deep nesting-everything just gets flattened for you. I pulled this JSON collection down from a REST API for Atlassian Jira and persisted it into Cosmos. Biggest warning as mentioned in the other article, while the structure is denormalized, you will still need to take care with formatting the collection. Your back end developer should be able to sort this out before storage.
The equivalent of the above JSON, flattened in SSMS. Note the column name “fields_assignee_avatarUrls_24x24”. This is concatenated from fields,assignee, avartarUrls and 24×24 from the JSON above.
If you work with Power BI, you’ll notice there’s a connection to Cosmos DB that is still in beta. While that works, if the idea of beta doesn’t sound reassuring or still buggy, you could also use the DSN connection we created too. It’ll be the same for Tableau users except there is no dedicated connection for Cosmos DB in Tableau so you’ll need to use the ODBC driver/ DSN we created earlier as seen below.
Being ODBC, you can also do the same thing in Visual Studio similar to what we did with Management Studio for linked server only it is even easier in Visual Studio as you don’t need open queries, you can just use normal SQL and it even has a query designer. The steps may not be immediately obvious but I’m sure you’ll be able to work it out if you use Visual Studio (it is dedicated to cloud/NoSQL development), just change the default drivers when adding a server within Server Explorer.
Still early days but I think analytics for NoSQL just became more accessible and a lot easier. Pretty cool hey?
Update 2019: A number of people have contacted me for clarification. To make things easier, I’ve uploaded a video of the process. Please https://samtran.me/2019/01/26/linked-servers-for-azures-nosql-cosmos-db-video-edition/?preview=true&_thumbnail_id=4893