Columnstore Index: Speeding up analytics in SQL Server 2016 |
Please refer to the post
Thanks
Sunil Agarwal
Метки: ColumnStore Index |
Built-in functions for compression/decompression in SQL Server 2016 |
SQL Server 2016 provides built in functions for compression and decompression:
CREATE TABLE People (
_id int primary key identity,
name nvarchar(max),
surname nvarchar(max),
info varbinary(max)
)
INSERT INTO People (name, surname, info)
SELECT FirstName, LastName, COMPRESS(AdditionalInfo) FROM Person.Person
SELECT name, surname, DECOMPRESS(info) AS original
FROM People
ALTER TABLE People
ADD info_text as CAST( DECOMPRESS(info) AS NVARCHAR(MAX))
declare @text nvarchar(max) = (select top 100 * from sys.all_objects for json path)
select DATALENGTH(@text) AS original, DATALENGTH(COMPRESS(@text)) AS compressed, 1.0 * DATALENGTH(@text) / DATALENGTH(COMPRESS(@text)) AS ratio
Метки: Data Compression SQL Server 2016 |
AdventureWorksCTP3 JSON Sample |
New database and samples for AdventureWorks 2016 CTP3 database are published on AdventureWorks Sample Databases and Scripts for SQL Server 2016 CTP3. In SQLServer2016CTP3Samples.zip file (attached in this post) you can find various example scripts that show how to use various features added in SQL Server 2016.
One of the folders in this archive is JSON folder where you can find some JSON example scripts. In this post we will see short description of the files in this folder.
When you download AdventureWorks2016CTP3 database you will notice that there are a lot of new tables with sufixes _json, _temporal_inmemory, etc. These are new tables that are used in example scripts.
For JSON support are added two new tables:
These tables are just copies of the existing Person.Person and Sales.SalesOrderHeader tables. Note that when you install AdventureWorks2016CTP3 database you don't have any JSON data in this table - you will need to run de-normalization script described below to populate tables with JSON data.
In SQLServer2016CTP3Samples.zip (attached) you will find JSON folder with the following content:
These are scripts that show how to use various JSON features in this database. In the following sections we will see short explanations of these scripts.
First script you would need to run is de-normalization script. Even if you don't need de-normalization you will need this script because this is prerequisite for all other scripts.
This script generates JSON data and stores JSON text in new Person.Person_json and Sales.SalesOrder_json tables. In this script you will see SQL statements that perform following actions:
This script shows how you can transform complex relational structure into de-normalized JSON collections.
Also, this script is important because it initializes new JSON tables with the content that is required for all procedures, views, and queries in the sample.
Run this script even if you don't need de-normalization because without this script you will not have JSON column and JSON content in Person.Person_json and Sales.SalesOrder_json tables.
These two files show how you can create stored procedures and views that query and process JSON text in Person.Person_json and Sales.SalesOrder_json tables.
First JSON view Sales.vwSalesOrderItems_json shows how you can create relational "table view" on the array of order items stored in OrderItems in Sales.SalesOrder_json table.
Second JSON view Sales.vwSalesOrderInfo_json returns information about shipping address and method, salesman, and customer from Info JSON column. This view shows how you can encapsulate values in JSON text end expose them as regular columns. If you don't want to use JSON data structures, you also have Sales.vwSalesOrderInfoRel_json view that shows how this query would look in the original relational schema.
JSON procedures shows different use case that you might use for JSON processing such as:
In this file you can see how to create B-tree or full-text search indexes on JSON columns, This file also contains some stored procedures with queries that use new indexes.
Show actual execution plan option in SQL Server Management Studio when you runt these stored procedures to verify that queries use indexes during execution.
In this file you find procedures that export table data from Person.Person_json and Sales.SalesOrder_json tables and import JSON text into these tables. This script demonstrates how easily you can transform relational data to JSON and vice versa.
In this file you find various queries that use views and stored procedures created in previous scripts.
In this file you find script that deletes all new columns, JSON data, stored procedures, indexes, and views that are created in previous steps.
Метки: SQL Server 2016 JSON |
Using non-relational models in SQL Server |
In this post I will explain how you can use both relational and non-relational data models in your database schema.
Relational database schema might contain a lot of tables that require many JOINs or separate queries to retrieve all necessary data in the query. Also, in order to insert new records, we need to update many table, and in many case we need to follow parent/child order defined by referential integrity, read primary key of the parent row before we insert child row, etc.
NoSQL databases mitigate some of these problems by adding all related information in the single entity. This approach reduces complexity of model so we don’t need to join different entities because all information are contained in a single entry. Also, inserts are simple – new record with all related information is simply inserted in one entity.
Using the same approach in relational database might also simplify schema and improve performance of some workloads.
SQL Server enables you to combine relational and non-relational (JSON, XML) data in order to define optimal data model for your workloads. If you have a structure where primary object (row) has single related entity or collections of related entities that you don’t want to represent them as relational schema, you can store related information as JSON documents. The same approach use NoSQL databases.
In this scenario we will use SalesOrderHeader table from AdventureWorks database as an example. SalesOrderHeader table has a number of related tables that contain additional information about the sales order such as order items, customer data, salesman info, shipping and billing information, etc. Design of these tables in AdventureWorks database is shown in the following figure:
Although this design follows the standard principles for modeling standard relational schema, it might not be optimal for this use case.
This schema is optimized for workloads that have frequent updates in child tables. However, once we create sales order with all related items, it is unlikely that someone will constantly update quantities and unit prices of the items, sales reasons, shipping methods, customer/salesman info, etc. Information related to sales order are mostly read-only so we are not getting some benefits from normalization. The only case when someone will update this information are error correction cases.
To demonstrate how this structure can be represented in non-relational schema, in Adventure Works 2016 CTP3 database is added non-relational version of SalesOrderHeader table called SalesOrder_json that represents a copy of standard SalesOrderHeader table with following changes:
Now we can store information in JSON columns:
ID | .... | Info | SalesReasons | OrderItems |
43697 | {"ShippingInfo":{"Address":"601 Asilomar Dr.","City":"Metchosin","PostalCode":"V9","Province":"British Columbia","TerritoryID":6,"Method":"XRQ - TRUCK GROUND","ShipBase":3.9500,"ShipRate":0.9900},"BillingInfo":{"Address":"601 Asilomar Dr.","City":"Metchosin","PostalCode":"V9"}} | ["Manufacturer","Quality"] | [{"Item":{"Qty":1,"Price":3578.2700,"Discount":0.0000,"Total":3578.270000},"Product":{"Number":"BK-R93R-62","Name":"Road-150 Red, 62"}}] |
If you need to get information about the sales order and related items, you can do it with a single query. If your client expects JSON as response (e.g. Ajax client in web browser) this might be perfect solution for you because all information will be prepared in the format you need. Also, inserts and data load might be faster if you prepare data in JSON format on the client side because you will not need to update multiple tables in the same transaction. You just need to update one row in the sales order table.
Note that you are not loosing query capabilities. SQL Server provides functions JSON_VALUE, JSON_QUERY, and OPENJSON that can access any value in JSON columns.
If you want to access values in JSON using standard column syntax you can add a non-persisted computed column that returns value on some paths, e.g.:
ALTER TABLE Sales.SalesOrder_json
ADD vCustomerName AS JSON_VALUE(Info, '$.Customer.Name')
Now you can reference JSON value as any other real column via vCustomerName column.
If you need relational view on the array of JSON objects stored in OrderItems column you can create view that looks exactly as relational SalesOrderDetails table:
CREATE VIEW Sales.vwSalesOrderItems_json
AS
SELECT SalesOrderID, SalesOrderNumber, OrderDate,
CarrierTrackingNumber, OrderQty, UnitPrice,
UnitPriceDiscount, LineTotal, ProductNumber, Name
FROM Sales.SalesOrder_json
CROSS APPLY
OPENJSON (OrderItems)
WITH ( CarrierTrackingNumber NVARCHAR(20),
OrderQty int '$.Item.Qty',
UnitPrice float '$.Item.Price',
UnitPriceDiscount float '$.Item.Discount',
LineTotal float '$.Item.Total',
ProductNumber NVARCHAR(20) '$.Product.Number',
Name NVARCHAR(50) '$.Product.Name'
)
Queries executed on this view will return the same results as queries on the real SalesOrderDetails table.
SQL Server enables you to make trade-offs between relational and non-relational modeling in order to find the optimal data model for your workload. In SQL server you can create your database schema based on characteristics of your workload, using some of the following guidelines:
Метки: SQL Server 2016 JSON |
De-normalize database schema using JSON |
In this post I will talk about de-normalization - transforming complex relational structures into simpler tables using JSON format.
In traditional relational schema there can be a lot of one-to-many relationships (e.g. Person may have several phones, or several email addresses). Even for the simple string array in the C#/Java classes in application domain model, we must create separate tables in relational model. As an example, an array of email addresses that belong to person class must be stored in EmailAddress table with a foreign key to the parent Person row because array of strings cannot be mapped to some data type.
In order to read Person data with associated emails and phone numbers; we would need to run several queries (e.g. one to return person info, another to return person phones, third to return person email addresses), or join tables and process one flat result set on the client side. This method requires a lot of JOINs and indexes on foreign keys. The similar problem is with inserts. In order to insert person with his emails and phone numbers, we need to insert person row, take identity value generated for primary key and then use this value to insert records in email address/phone number tables. For each "Logical" insert we need to have several actual table inserts and reads.
NoSQL database use different approach. In NoSQL document database you can store primary entity as JSON together with all related information formatted as JSON arrays. This way you can read person with associated emails and phone numbers with a single read operation, and insert new person with all associated information with a single write. There is some cost for serializing/deserializing child entities as JSON in the application layer; however, there is bigger benefit because we don't have multiple read/write operations for every logical action.
How can we use similar approach in SQL Server?
In order to avoid complex schema, information from related child entities can be formatted as JSON arrays and stored into parent table. This way we have simpler table schema that matches design in application logic and we need a single table seek/scan to return necessary information.
New FOR JSON clause enables you to denormalize complex relational schema and move information from the child tables in the parent row formatted as JSON.
In this example will be used Person table structure from AdventureWorks2016CTP3 database. In AdventureWorks2016CTP3 database we have a lot of "one to many" relationships between tables, e.g. one person can have many email addresses, many phones, etc. If you want to store these information in relational database, you would need to use something like a structure on the following figure:
Even for the simple structures such as phone numbers or emails that will be represented as arrays in C# or Java, we need separate tables with foreign key relationships, indexes on foreign keys, etc.
In the another post I have shown how you can return person info with all related information in the single query. Here I will show different approach - pre-aggregating related tables as JSON arrays in parent table. This way we don't need to format related records at the query time - we can just read entire collection from JSON column.
In the following examples will be used Person.Person_json table available in AdventureWorks2016CTP3 database that represents de-normalized version of standard Person.Person table.
In this example will be created new text column in Person_json table that will contain array of phone numbers and types:
ALTER TABLE Person.Person_json
ADD PhoneNumbers NVARCHAR(MAX)
CONSTRAINT [Phone numbers must be formatted as JSON array]
CHECK (ISJSON(PhoneNumbers)>0)
Information from PersonPhone and PhoneNumberType tables can be stored in this column using the following query:
UPDATE Person.Person_json
SET PhoneNumbers = (SELECT Person.PersonPhone.PhoneNumber, Person.PhoneNumberType.Name AS PhoneNumberType
FROM Person.PersonPhone
INNER JOIN Person.PhoneNumberType ON Person.PersonPhone.PhoneNumberTypeID = Person.PhoneNumberType.PhoneNumberTypeID
WHERE Person.Person_json.PersonID = Person.PersonPhone.BusinessEntityID
FOR JSON PATH)
In Person_json table will be stored phone number information per each person row. Now we can drop PersonPhone and PhoneNumberType tables since we don't need them anymore.
In the second example will be created new text column in Person_json table that will contain array of email addresses:
ALTER TABLE Person.Person_json
ADD EmailAddresses NVARCHAR(MAX)
CONSTRAINT [Email addresses must be formatted as JSON array]
CHECK (ISJSON(EmailAddresses)>0)
Now we can update this column and put all email addresses that belongs to persons formatted as JSON:
UPDATE Person.Person_json
SET EmailAddresses =
dbo.ufnToRawJsonArray(
(SELECT Person.EmailAddress.EmailAddress
FROM Person.EmailAddress
WHERE Person.Person_json.PersonID = Person.EmailAddress.BusinessEntityID
FOR JSON PATH)
, 'EmailAddress')
As a result, in the Email addresses column will be stored emails formatted as JSON arrays:
["ken0@adventure-works.com","ken.adams@adventure-works.com","ken@aw.microsoft.com"]
In this example is used following function that converts array of key:value pairs to array of strings:
-- Utility function that removes keys from JSON
-- Used when we need to remove keys from FOR JSON output,
-- e.g. to generate [1,2,"cell"] format instead of [{"val":1,{"val":2},{"val":"cell"}]
CREATE FUNCTION
dbo.ufnToRawJsonArray(@json nvarchar(max), @key nvarchar(400)) returns nvarchar(max)
as begin
return replace(replace(@json, FORMATMESSAGE('{"%s":', @key),''), '}','')
end
This function simply removes keys from an array so we can get simple arrays of strings.
In this example is shown de-normalization process that helped us to reduce schema with four tables to a single table with two additional JSON columns. De-normalization is common method for simplifying data schema and improving read performance. This is one of the reasons why NoSQL databases become popular.
However, de-normalization is not a silver bullet that will solve all schema complexity and performance problems in your systems. With de-normalization you are getting better read performances but you are increasing table space and your updates will be slower.
Use this approach only if information you want to be de-normalized are not changed frequently (in the example above assumptions is that person will not frequently change his phone numbers and telephone addresses) and when the bottleneck in the system are JOINs and read performance.
Do not de-normalize information that will be frequently changed - separate tables are still the best structures for these kind of data access patterns.
Метки: SQL Server 2016 JSON |
Query JSON data |
New JSON support in SQL server enables you to create SQL queries that use both relational and semi-structured data formatted as JSON text. In this post we will see some query examples.
In my database I have information stored both as scalars in standard relation columns and as properties in JSON columns. I need to create queries that combine information both from JSON text and standard columns.
New functions for JSON text processing in SQL server enable us to query both relational and non-structured data formatted as JSON.
In this post I will use the Sales.SalesOrder_json table published in AdventureWorks2016CTP3 database. Sales.SalesOrder_json table is a copy of Sales.SalesOrderHeader table from AdventureWorks2016 database with additional JSON columns:
In the first example we will create a stored procedure that reads both columns and JSON values from sales order table:
CREATE PROCEDURE
Sales.SalesOrderInfo_json(@SalesOrderID int)
AS BEGIN
SELECT SalesOrderNumber, OrderDate, Status, ShipDate, Status, AccountNumber, TotalDue,
JSON_QUERY(Info, '$.ShippingInfo') ShippingInfo,
JSON_QUERY(Info, '$.BillingInfo') BillingInfo,
JSON_VALUE(Info, '$.SalesPerson.Name') SalesPerson,
JSON_VALUE(Info, '$.ShippingInfo.City') City,
JSON_VALUE(Info, '$.Customer.Name') Customer,
JSON_QUERY(OrderItems, '$') OrderItems
FROM Sales.SalesOrder_json
WHERE SalesOrderID = @SalesOrderID
END
GO
In this stored procedure you can read both standard relational columns from table and extract objects (e.g. shipping/billing information) and values (sales person, city from shipping address, and customer name) from Info JSON column. If you want to use relational model instead of JSON functions you would need to create complex query with several table joins:
CREATE PROCEDURE
Sales.SalesOrderInfoRel_json(@SalesOrderID int)
as begin
SELECT SalesOrderNumber, OrderDate, ShipDate, Status, Sales.SalesOrder_json.AccountNumber, TotalDue,
shipprovince.Name as [Shipping Province],
shipmethod.Name as [Shipping Method],
shipmethod.ShipRate as ShipRate,
billaddr.AddressLine1 + COALESCE ( ', ' + shipaddr.AddressLine2, '') as [Billing Address],
sp.FirstName + ' ' + sp.LastName as [Sales Person],
cust.FirstName + ' ' + cust.LastName as Customer
FROM Sales.SalesOrderHeader
JOIN Person.Address shipaddr
ON Sales.SalesOrderHeader.ShipToAddressID = shipaddr.AddressID
LEFT JOIN Person.StateProvince shipprovince
ON shipaddr.StateProvinceID = shipprovince.StateProvinceID
JOIN Purchasing.ShipMethod shipmethod
ON Sales.SalesOrdereHeader.ShipMethodID = shipmethod.ShipMethodID
JOIN Person.Address billaddr
ON Sales.SalesOrderHeader.BillToAddressID = billaddr.AddressID
LEFT JOIN Sales.SalesPerson
ON Sales.SalesPerson.BusinessEntityID = Sales.SalesOrderHeader.SalesPersonID
LEFT JOIN Person.Person AS sp
ON Sales.SalesPerson.BusinessEntityID = sp.BusinessEntityID
LEFT JOIN Sales.Customer
ON Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
LEFT JOIN Person.Person AS cust
ON Sales.Customer.CustomerID = cust.BusinessEntityID
WHERE Sales.SalesOrderHeader.SalesOrderID = @SalesOrderID
end
GO
As you can see, some basic information required in the report require several table joins. Besides simplicity of query, you might get some performance improvements. First stored procedure uses just one table seek operation to read a single row; however, relational query must seek into several table and join results.
You can use both standard relational columns and properties from JSON fields in any part of query. In the following stored procedure are filtered and aggregated results using values from JSON text:
CREATE PROCEDURE
Sales.SalesOrdersPerCustomerAndStatusReport_json(@city nvarchar(50), @territoryid int)
AS BEGIN
SELECT JSON_VALUE(Info, '$.Customer.Name') AS Customer, Status, SUM(SubTotal) AS Total
FROM Sales.SalesOrder_json
WHERE TerritoryID = @territoryid
AND JSON_VALUE(Info, '$.ShippingInfo.City') = @city
AND OrderDate > '1/1/2015'
GROUP BY JSON_VALUE(Info, '$.Customer.Name'), Status
HAVING SUM(SubTotal) > 1000
END
Use Case 3: Aggregate results using collections of JSON objects
In the previous examples we have used just single values from JSON text. OPENJSON function enables us to transform JSON arrays to bales and apply any aggregation function. In the following stored procedure we will take sales reasons stored as arrays of JSON string values in SalesReasons column and group sales orders by the values in this array and return count of sales order by each sales reason:
CREATE PROCEDURE
Sales.SalesOrdersBySalesReasonReport_json(@city nvarchar(50))
AS BEGIN
SELECT value, COUNT(SalesOrderNumber) AS NumberOfOrders
FROM Sales.SalesOrder_json
CROSS APPLY OPENJSON (SalesReasons)
WHERE JSON_VALUE(Info, '$.ShippingInfo.City') = @city
GROUP BY value
END
GO
Function for JSON text processing in SQL Server enable you to combine JSON data with standard columns in any SQL query. With JSON text function you can use full SQL language for querying and analyzing JSON data. Also, you can combine both standard relational data with JSON values in any query.
http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/11/28/query-json-data.aspx
Метки: SQL Server 2016 JSON |
Log analysis with SQL Server |
In any application you will need to have some kind of logs where you write events, errors, debug and other information. One of the biggest problems is how to analyze information from log files. If you have some predefined format you can use some log analysis tools or languages such as Perl Scripts, Hadoop MapReduce (if you have big log files), Graylog2 (open source), LogStash (open source), Loggly (SaaS), etc.
The problem in this case might be that you have limited query/analysis capabilities because you will need to code your own logic for filtering and aggregating log information.
If you format information in log files as JSON you might have good trade-off between flexibility of information that you can store and ability to process information with some kind of query language. One of the common choice are NoSQL database such as MongoDB or Azure DocumentDB where you can store JSON messages and search them using some platform specific query language.
In this article we will see ho can you analyze log files containing JSON messages.
We have log files containing log messages formatted as JSON. How to analyze these log files?
Load text files in SQL Server and analyze them with OPENJSON function.
SQL Server enables you to load content of the file formatted as JSON and transform it into relational format that can be analyzed with standard SQL. We will start with and example of JSON log file shown in the following example:
[
{"time":"2015-11-27T02:33:05.063","ip":"214.0.57.12","request":"/", "method":"GET", "status":"200"},
{"time":"2015-11-27T02:33:06.003","ip":"224.12.07.25","request":"/",method":"GET", "status":"200"},
{"time":"2015-11-27T02:33:06.032","ip":"192.10.81.115","request":"/contact", "method":"POST", "status":"500", "exception":"Object reference not set to an instance of object",”stackTrace”:”…” },
……..
{"time":"2015-11-27T02:37:06.203","ip":"204.12.27.21","request":"/login",method":"GET", "status":"200"},
{"time":"2015-11-27T02:37:12.016","ip":"214.0.57.12","request":"/login", "method":"POST","status":"404", "exception":"Potentially dangerous value was detected in request"}
]
Here we have some standard information such as time, IP address, requested URL, HTTP method, etc. If some error occurred we can have additional data such as exception message, stack trace, etc.
In SQL Server, we can easily read this log file and query results:
SELECT log.*
FROM OPENROWSET (BULK 'C:\logs\json-log-2015-11-27.txt', SINGLE_CLOB) as log_file
CROSS APPLY OPENJSON(BulkColumn)
WITH( time datetime, status varchar(5), method varchar(10), exception nvarchar(200)) AS log
For better manageability and performance, it might be good to split your log files by date or size (using something like Rolling file appender in Log4J). Now we can do any type of analysis on the returned data set. The following report returns list of pages and number of server errors detected on them for each HTTP method:
SELECT request, method, COUNT(*)
FROM OPENROWSET (BULK N'C:\logs\log-json-2015-11-27.txt', SINGLE_CLOB) as log_file
CROSS APPLY OPENJSON(BulkColumn)
WITH( time datetime, status int, method varchar(10), request varchar(20), exception nvarchar(200)) AS log
WHERE status >= 500
GROUP BY request, method
You can notice that we have full power of T-SQL on JSON log files. You can also load JSON data into standard table and create reports on that table.
One of the problem with JSON is the fact that you cannot continuously append JSON messages. If you want to have valid array of JSON objects you will need to surround them with brackets. Once you add final bracket, you cannot add new data.
LD JSON is an alternative JSON format that might be a good choice for logging. LD JSON address one of the main issues in standard JSON format – ability to continuously append valid JSON objects. LD-JSON introduces few changes in standard JSON format:
An example of LD-JSON content is shown in the following example:
{"time":"2015-11-27T02:33:05.063","ip":"214.0.57.12","request":"/", "method":"GET", "status":"200"}
{"time":"2015-11-27T02:33:06.003","ip":"224.12.07.25","request":"/",method":"GET", "status":"200"}
{"time":"2015-11-27T02:33:06.032","ip":"192.10.81.115","request":"/contact", "method":"POST", "status":"500", "exception":"Object reference not set to an instance of object",”stackTrace”:”…” }
……..
{"time":"2015-11-27T02:37:06.203","ip":"204.12.27.21","request":"/login",method":"GET", "status":"200"}
{"time":"2015-11-27T02:37:12.016","ip":"214.0.57.12","request":"/login", "method":"POST","status":"404", "exception":"Potentially dangerous value was detected in request"}
Now we can read this file with FORMATFILE and run the same report:
SELECT request, method, COUNT(*)
FROM OPENROWSET(BULK 'C:\logs\log-ld-json-2015-11-27.txt',
FORMATFILE= 'c:\logs\csv.xml') AS log_file
CROSS APPLY OPENJSON(json)
WITH( time datetime, status int, method varchar(10), request nvarchar(200)) AS log
WHERE status >= 500
GROUP BY request, method
You would need to have format file with the following content:
|
Note that json column in the T-SQL query aove is defined in format file.
You can use the same approach to read JSON files stored on Azure File Storage. As an example, Azure File Storage supports SMB protocol, so you can map your local virtual drive to the Azure File storage share using the following procedure:
net use [drive letter] \\[storage name].file.core.windows.net\[share name] /u:[storage account name] [storage account access key]
Example that I have used is:
net use t: \\mystorage.file.core.windows.net\sharejson /u:myaccont hb5qy6eXLqIdBj0LvGMHdrTiygkjhHDvWjUZg3Gu7bubKLg==
Storage account key and primary or secondary storage account access key can be found in the Keys section in Settings on Azure portal.
Now if you setup you application to log data to some log file into Azure File Storage (e.g. log-file.json), you can use queries above to analyze data loaded from path mapped to t: \\mystorage.file.core.windows.net\sharejson\log-file.json
Logging information in traditional text files is the fastest way to log information. Formatting log messages as JSON or LD-JSON enables you to have simple human readable log format with ability to query and analyze log data.
New SQL Server with JSON support enables you to easily load log files and analyze them with standard T-SQL.
http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/11/27/log-analysis-with-sql-server.aspx
Метки: SQL Server 2016 JSON |
Storing JSON in SQL Server |
SQL Server offers built-in functions for processing JSON text data. In this post, we will see how you can store JSON in SQL Server database.
JSON is textual format so in SQL Server it is stored in NVARCHAR columns. The simplest table that represents collection of JSON objects is shown in the following listing:
DROP TABLE IF EXISTS Person
CREATE TABLE Person
( _id int identity constraint PK_JSON_ID primary key,
value nvarchar(max)
CONSTRAINT [Content should be formatted as JSON]
CHECK ( ISJSON(value)>0 )
)
This simple structure is similar to the standard NoSQL collection that you can create in NoSQL databases (e.g. Azure DocumentDB or MongoDB) where you just have key that represents ID and value that represents JSON.
Note that NVARCHAR is not just a plain text. SQL Server has built-in text compressions mechanism that can transparently compress data stored on disk. Compression depends on language and can go up to 50% depending on your data (see UNICODE compression ).
The key difference between SQL server and other plain NoSQL databases is that SQL Server enables you to use hybrid data model where you can store several JSON objects in the same “collection” and combine them with regular relational columns.
As an example, imagine that we know that every person in your collection will have FirstName and LastName, and that you can store general information about the person as one JSON object, and phone numbers/email addresses as separate objects. In SQL Server 2016 we can easily create this structure without any additional syntax:
DROP TABLE IF EXISTS Person
CREATE TABLE Person (
PersonID int IDENTITY PRIMARY KEY,
FirstName nvarchar(100) NOT NULL,
LastName nvarchar(100) NOT NULL,
AdditionalInfo nvarchar(max) NULL,
PhoneNumbers nvarchar(max) NULL,
EmailAddresses nvarchar(max) NULL
CONSTRAINT [Email addresses must be formatted as JSON array]
CHECK ( ISJSON(EmailAddresses)>0 )
)
Instead of single JSON object you can organize your data in this “collection”. If you do not want to explicitly check structure of each JSON column, you don’t need to add JSON check constraint on every column (in this example I have added CHECK constraint only on EmailAddresses column).
If you compare this structure to the standard NoSQL collection, you might notice that you will have faster access to strongly typed data (FirstName and LastName). Therefore, this solution is good choice for hybrid models where you can identify some information that are repeated across all objects, and other variable information can be stored as JSON. This way, you can combine flexibility and performance.
If you compare this structure with the schema of Person table AdventureWorks database, you might notice that we have removed many related tables.
Beside simplicity of schema, your data access operations will be simpler compared to complex relational structure. Now you can read single table instead of joining several tables. When you need to insert new person with related information (email addresses, phone numbers) you can insert a single record in one table instead of inserting one record in AdventureWorks Person table, taking identity column to find foreign key that will be used to store phones, email addresses, etc. In addition, in this model you can easily delete single person row without cascade deletes using foreign key relationships.
NoSQL databases are optimized for simple, read, insert, and delete operations – SQL Server 2016 enables you to apply the same logic in relational database.
In the previous examples, we have seen how to add simple constraint that validates that text stored in the column is properly formatted. Although JSON do not have strong schema, you can also add complex constraints by combining functions that read values from JSON and standard T-SQL functions:
ALTER TABLE Person
ADD CONSTRAINT [Age should be number]
CHECK ( ISNUMERIC(JSON_VALUE(value, '$.age'))>0 )
ALTER TABLE Person
ADD CONSTRAINT [Person should have skills]
CHECK ( JSON_QUERY(value, '$.skills') IS NOT NULL)
First constraint will take the value of $.age property and check is this numeric value. Second constraint will try to find JSON object in $.skills property and verify that it exists. The following INSERT statements will fail due to the violation of constraints:
INSERT INTO Person(value)
VALUES ('{"age": "not a number", "skills":[]}')
INSERT INTO Person(value)
VALUES ('{"age": 35}')
Note that CHECK constraints might slow down your insert/update processes so you might avoid them if you need faster write performance.
http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/11/23/storing-json-in-sql-server.aspx
Метки: SQL Server 2016 JSON |
In-Memory OLTP - what's new in SQL2016 CTP3 |
We have been hard at work to add features to In-Memory OLTP to make it easier to build apps and to adopt In-Memory OLTP in existing apps, as well as to increase overall performance and scaling. Below is the list of In-Memory OLTP enhancements that we have released in SQL2016 thus far, up to and including CTP3. There is more to come!
Let us know what you think!
Transact-SQL Improvements:
Disjunction (OR, NOT)
UNION and UNION ALL
SELECT DISTINCT
OUTER JOIN
Subqueries in SELECT statements (EXISTS, IN, scalar subqueries)
Nested execution (EXECUTE) of natively compiled modules
Natively compiled inline table-valued functions (TVFs)
EXECUTE AS CALLER support - the EXECUTE AS clause is no longer required for native modules
Built-in security functions and increased support for built-in math functions
FOREIGN KEY constraints between memory-optimized tables
CHECK constraints
UNIQUE constraints
Triggers (AFTER) for INSERT/UPDATE/DELETE operations. Triggers on memory-optimized tables are natively compiled, and thus use WITH NATIVE_COMPILATION.
The new ALTER TABLE ... ADD/DROP/ALTER INDEX syntax is introduced for index changes on memory-optimized tables
(var)char columns can use any code page supported by SQL Server
Character columns in index keys can use any SQL Server collation
Expressions in natively compiled modules as well as constraints on memory-optimized tables can use any SQL Server collation
Performance and Scaling improvements:
Enhancements in SQL Server Management Studio:
Cross-feature support:
For more information, see In-Memory OLTP (In-Memory Optimization).
Метки: SQL Server 2016 performance in-memory oltp |
FOR JSON performance - simple experiment |
In SQL Server 2016 CTP2 is introduced FOR JSON clause that formats results of SQL query as JSON text. Very common question is what is performance impact of FOR JSON clause and how it is compared to JSON that is generated in application layer?
In this post, I will compare performance of FOR JSON and traditional approach for generating JSON in application layer.
This is not a general performance comparison. I will focus on one simple and common use case - returning one row from a table with a set of related rows from a child table. In this experiment I will use SalesOrderHeader -> SalesOrderDetails structure from AdventureWorks database where we have one to many relationship between SalesOrderHeader and SalesOrderDetails.
I have an application that reads SalesOrderHeader and SalesOrderDetails rows and returns them as JSON shown in the following code:
{"SalesOrderID":55859,"RevisionNumber":8,"OrderDate":"2013-09-07T00:00:00",
"DueDate":"2013-09-19T00:00:00","ShipDate":"2013-09-14T00:00:00","Status":5,
"Sales.SalesOrderDetail":[
{"SalesOrderID":55859,OrderQty":1,"ProductID":779,"SpecialOfferID":1,"UnitPrice":2319.9900},
{"SalesOrderID":55859,"SalesOrderDetailID":57519,"OrderQty":1,"ProductID":871"},
{"SalesOrderID":55859,"SalesOrderDetailID":57520,"UnitPrice":4.9900,"UnitPriceDiscount":0.0000},
{"SalesOrderID":55859,"SalesOrderDetailID":57521,"OrderQty":1,UnitPrice":34.9900}
]
}
In the following examples, I will explain how you can generate this JSON.
If you don't want to format JSON in database, you would need to read one record from the parent table and then read all related records from the child table.
declare @id int = (ROUND(rand() * 31464, 0) + 43660)
SELECT *
FROM Sales.SalesOrderHeader
WHERE SalesOrderID = @id;
SELECT *
FROM Sales.SalesOrderDetail
WHERE SalesOrderID = @id
Then you can use some formatter to generate JSON text in application layer (e.g. Json.Net). Problem in this method is the fact that you will need to generate two queries to get parent row and related child rows.
Second option would be to join these two tables in database layer and return flat result set that joins them, e.g.:
declare @id int = (ROUND(rand() * 31464, 0) + 43660)
SELECT *
From Sales.SalesOrderHeader
LEFT JOIN Sales.SalesOrderDetail
ON Sales.SalesOrderHeader.SalesOrderID
= Sales.SalesOrderDetail.SalesOrderID
WHERE Sales.SalesOrderHeader.SalesOrderID = @id
Here you will get repeated header values for every details row so you would need to remove duplicated information in application layer when you generate JSON.
Third option would be to return all information from SalesOrderHeader table and format related child rows as JSON in column expression:
declare @id int = (ROUND(rand() * 31464, 0) + 43660)
SELECT *,
(SELECT *
From Sales.SalesOrderDetail
WHERE SalesOrderID = @id
FOR JSON PATH) AS Details
From Sales.SalesOrderHeader
WHERE SalesOrderID = @id
Here, you are getting single row as a result so you don't need to remove duplicates, but you would need to format information from header as JSON key:value pairs and include related child information that are already formatted as JSON.
Last option would be to join tables in SQL Server and format them using FOR JSON AUTO. This is similar to option 2, but we are delegating formatting to database layer:
declare @id int = (ROUND(rand() * 31464, 0) + 43660)
SELECT *
From Sales.SalesOrderHeader
LEFT JOIN Sales.SalesOrderDetail
ON Sales.SalesOrderHeader.SalesOrderID
= Sales.SalesOrderDetail.SalesOrderID
WHERE Sales.SalesOrderHeader.SalesOrderID = @id
FOR JSON AUTO
This query will prepare results you need in database layer and return it as text. Problem with this approach is that response is bigger so make sure that you have good connection between app layer and database server. In this case you can just stream results to client without any additional formatting.
I'm using Adam Machanic's SQLQueryStress tool to generate workloads and test workloads. I'm executing 50 iterations of 20 concurrent batches for all three options. Results are shown in the following figure:
As you can see, we don't have big difference in query performance in all queries results vary between 7.8 to 8.3 seconds. Note that here I'm not counting additional time that would be required to format JSON in application layer. Therefore, conclusions in this case are:
Note that this might not be applicable to any configuration and use case. Results might vary depending on you connection bandwidth between application layer and database server, number of concurrent threads, etc. However in mu case I'm getting similar results in all cases.
Also, this experiment do not covers other cases where you return multiple parent rows, with different numbers of fields returned from parent and child tables, or many child table collections. Feel free to repeat this experiment, try other use cases and report performance results you are getting.
Метки: SQL Server 2016 JSON |
Indexing JSON arrays using full-text search indexes |
In the previous post I described how you can index JSON fields using standard B-Tree indexes. In this post we will see how to use Full-text search indexes on JSON arrays.
Full-text indexes can be used if JSON in your column is formatted as a simple array of scalar values. In the ideal case as array of numbers, but array of strings will also work in most of the cases. In this example I will use Sales.SalesOrder_json table that can be found in official AdventureWorks2016CTP3 database. We can assume that Sales.SalesOrder_json table has the following data:
SalesOrderID | SalesOrderNumber | ......... | SalesReasons |
1 | SO44484 | ........ | ["Manufacturer","Quality"] |
2 | SO44352 | ........ | ["Price"] |
3 | SO44367 | .......... | ["Manufacturer","Quality"] |
4 | SO44464 | ......... | ["Price","Quality"] |
Instead of related tables SalesOrderHeaderSalesReasons and SalesReasons I have placed sales reasons as an array of JSON strings.
If we create Full-text index on SalesReasons column, it will split tokens in this text using separators " and ,. This can be used to improve your queries that search for sales orders by some sales reason values in this array.
First we will create full text search catalog and index:
-- Create full text catalog for JSON data
CREATE FULLTEXT CATALOG jsonFullTextCatalog;
GO
-- Create full text index on SalesReason column.
CREATE FULLTEXT INDEX ON Sales.SalesOrder_json(SalesReasons)
KEY INDEX PK_SalesOrder__json_SalesOrderID
ON jsonFullTextCatalog;
GO
With the full text index we can easily find all rows where SalesReasons JSON array contains some value (e.g. Price):
SELECT SalesOrderNumber, OrderDate, SalesReasons
FROM Sales.SalesOrder_json
WHERE CONTAINS(SalesReasons, 'Price')
We can even use more complex expression with AND, OR, NOT operators:
SELECT SalesOrderNumber, OrderDate, SalesReasons
FROM Sales.SalesOrder_json
WHERE CONTAINS(SalesReasons, 'Price OR Quality')
As you can see, although FTS is not specialized for JSON text, you can use it to improve performance of your queries where you need to filter results by some value in JSON array.
Метки: SQL Server 2016 JSON |
Generate serie of numbers in SQL Server 2016 using OPENJSON |
In this post I will talk about one unusual use case of OPENJSON - generating series of numbers.
Problem: I want to dynamically generate a table of numbers (e.g. from 0 to N). Unfortunately we don't have this kind of function in SQL Server 2016, but we can use OPENJSON as a workaround.
OPENJSON can parse array of numbers [1,25,3,5,32334,54,24,3] and return a table with [key,value] pairs. Values will be elements of the array, and keys will be indexes (e.g. numbers from 0 to length of array - 1). In this example I don't care about values I just need indexes.
I can easily dynamically generate valid JSON array of length N using replicate function:
SELECT '[1' + replicate(',1',@count-1)+']'
Since I don't care about values in the array, I'm generating an array that has all elements equal to 1. Now, we can see how to generate series of numbers using this approach and OPENJSON table value function.
In the first example, I will create table value function that generates @count numbers:
drop function if exists dbo.fn_gen_numbers
go
create function
dbo.fn_gen_numbers(@count int)
returns table
return (select cast([key] as int) as number from OPENJSON( '[1' + replicate(',1',@count-1)+']'))
go
The following call will return numbers from 0 to 9:
select * from dbo.fn_gen_numbers(10)
This function might be useful if you want to generate all months in the year, or all days in a month:
select DATETIMEFROMPARTS ( 2015, number+1, 1, 0, 0, 0, 0 )
from dbo.fn_gen_numbers(12)
select DATETIMEFROMPARTS ( 2015, 1, number+1, 0, 0, 0, 0 )
from dbo.fn_gen_numbers(31)
In slightly modified example, I will generate sequence from @start to @end inclusive:
drop function if exists dbo.fn_gen_range
go
create function dbo.fn_gen_range(@start int, @end int)
returns table
return (select cast([key] as int) + @start as number from OPENJSON( '[1' + replicate(',1',@end-@start)+']'))
The following call will return numbers in the range [10,30]:
select * from dbo.fn_gen_range(10,30)
In the final example, I will generate sequence with step using following function:
drop function if exists dbo.fn_gen_sequence
go
create function dbo.fn_gen_sequence(@start int, @end int, @step int)
returns table
return (select cast([key] as int) * @step + @start as number from OPENJSON( '[1' + replicate(',1',(@end-@start)/@step)+']'))
The following function will return odd numbers from 10 to 18:
select *
from dbo.fn_gen_sequence(10,29,2)
Although we don't have built-in function that generates sequence of numbers, OPENJSON might be easiest workaround for this.
Note that in this examples I'm using new DROP IF EXISTS syntax that is available in SQL Server 2016, see http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/11/03/drop-if-exists-new-thing-in-sql-server-2016.aspx
Licence
Table value functions in this example can be copied and modified with one requirement - table value function derived from this code that generates numbers using OPENJSON MUST be called with fully qualified schema name, e.g.:
SELECT * FROM dbo.fn_gen_range()
and NOT just with function name:
SELECT * FROM fn_gen_range()
T-SQL enables you to call table value functions with or without schema name. However, if you don't use schema name function might be either user defined function or built-in function.
SQL Server team might decide to implement similar built-in table value function function in the future, and new function might collide with your names of user defined functions. Using schema name will prevent this kind of collision, and this is general coding guidance for T-SQL, so I'm adding this as a licence requirement :).
Vote for this function here if you want to see it in next version of SQL Server Add a built-in table of numbers .
Метки: SQL Server 2016 JSON |
DROP IF EXISTS - new thing in SQL Server 2016 |
Do you like to write following conditional DROP statements:
IF OBJECT_ID('dbo.Product, 'U') IS NOT NULL
DROP TABLE dbo.Product;
IF EXISTS (SELECT * FROM sys.triggers WHERE name = 'trProductInsert')
DROP TRIGGER trProductInsert
I don't like these, and if you also don't like them, then you might try new DROP IF EXISTS (a.k.a. DIE :) ) statements in SQL Server 2016.
From SQL Server 2016 CTP3 you can use new DIE statements instead of big IF wrappers, e.g.:
DROP TABLE IF EXISTS dbo.Product
DROP TRIGGER IF EXISTS trProductInsert
If the object does not exists, DIE will not fail and execution will continue. Currently, the following objects can DIE:
AGGREGATE |
PROCEDURE |
TABLE |
ASSEMBLY |
ROLE |
TRIGGER |
VIEW |
RULE |
TYPE |
DATABASE |
SCHEMA |
USER |
DEFAULT |
SECURITY POLICY |
VIEW |
FUNCTION |
SEQUENCE |
|
INDEX |
SYNONYM |
DIE is added on columns and constraints in ALTER TABLE statement
Documentation is already published on MSDN:
DROP TABLE (Transact-SQL),DROP PROCEDURE (Transact-SQL), DROP TRIGGER (Transact-SQL), ALTER TABLE (Transact-SQL), etc.
Метки: SQL Server 2016 |
OPENJSON - one of best ways to select rows by list of ids |
In this post I will talk about different way to select rows from the table by list of ids. I will compare performance of some common approaches (IN, LIKE, and OPENJSON).
Let's talk about simple and common problem - We have a list of identifiers and we want to select all rows from a table that have ids in that list. As an example ,user selected some records in the UI, and we need to fetch these records by ids. How can we do this?
You have following options:
1. Use IN where you can pass list of ids:
select BusinessEntityID, FirstName, LastName
from Person.Person
where BusinessEntityID IN (1,2,3)
This is the simplest method but it has few problems:
2. Use LIKE, e.g:
select BusinessEntityID, FirstName, LastName
from Person.Person
where ','+ @list + ',' LIKE '%,'+cast(BusinessEntityID as nvarchar(20)) + ',%'
In this case @list must be formatted as CSV array e.g. 1,2,3. This is probably the most commonly used anti-pattern that has following problems:
3. Use OPENJSON that will split array of ids into table value function and merge it with source table:
select BusinessEntityID, FirstName, LastName
from Person.Person
join openjson(@list) ON value = BusinessEntityID
Note that @list must be formatted as JSON array e.g. [1,2,3] so you will need to surround your CSV list with brackets.
This is my favorite approach because:
I'm using SQLQueryStress tool provided by Adam Machanic to compare results of these three approaches. In three examples I will create random list of three ids and select some rows using these three methods. I'm running 100 threads in 50 iterations:
As you can see, LIKE is the slowest way to do this and you should avoid it. Performance of OPENJSON and IN are close 10s compared to 11s (results might vary and in some cases I got very close numbers).
Therefore, you can see that OPENJSON is good approach for this scenario. feel free to repeat this experiment in your databases and compare results.
Метки: SQL Server 2016 JSON |
Samples for SQL Server 2016 CTP3 |
CTP3 of SQL Server 2016 was just announced, and is available for download.
To accompany CTP3, we also created an updated version of AdventureWorks, called AdventureWorks2016CTP3, and we created a number of samples to showcase some of the new functionality in SQL2016. We recommend you download and install the sample databases and play with the sample scripts. Download link:
AdventureWorks Sample Databases and Scripts for SQL Server 2016 CTP3
Let us know what you think, and how we can improve the samples.
--
SQL Server Team
Метки: sample CTP3 SQL2016 |
Returning child rows formatted as JSON in SQL Server queries |
In this post I will talk about one annoying thing – how to return a result set containing one to many relationships between tables? In that case you will have multiple primary rows because one row is generated per for each child row. in this post we will see how you can resolve this problem using JSON.
...(read more)
Метки: SQL Server 2016 JSON Azure Sql Db |
Importing JSON files into SQL Server using OPENROWSET (BULK) |
Currently you can find many JSON documents stored in files. Sensors generate information that are stored in files, applications log information in JSON files, etc. One important thing that you would need to do is to read JSON data stored in files, load them in SQL Server, and analyze them.
In this post we will see how you can import JSON files in SQL Server.
OPENROWSET (BULK) is a table value function that can read data from any file on the local drive or network if Sql Server has read access to that location. It returns a table with a single column (i.e. BulkColumn) that contains content of the file. In general case you can use various options with OPENROWSET (BULK) function, such as separators etc., but in the simplest case you can directly load entire content of a file as a text value (or single character large object a.k.a. SINGLE_CLOB in OPENROWSET “terminology”) and load content of that cell in any table or variable. An example of OPENROWSET (BULK) function that reads content of JSON file and return it to user as a query result is shown in the following example:
SELECT BulkColumn
FROM OPENROWSET (BULK 'C:\JSON\Books\book.json', SINGLE_CLOB) as j
OPENJSON( BULK) will read content of the file and return it via BulkColumn. You can also load content of file in some local variable or table, as it is shown in the following example:
SELECT @json = BulkColumn
FROM OPENROWSET (BULK 'C:\JSON\Books\book.json', SINGLE_CLOB) as j
SELECT BulkColumn
INTO #temp
FROM OPENROWSET (BULK 'C:\JSON\Books\book.json', SINGLE_CLOB) as j
In this post I will show you how to import content of JSON file directly into SQL Server tables using OPENROWSET(BULK). I will use the fact that JSON is regular text so it can be imported as any other text format. Therefore, I can leverage existing functions that work with text and in this case import text from a file. In this example I will use JSON file containing books taken from this site. Instead of reading entire JSON text, I want to parse it and return either books in the file, or their properties.
In the simplest example we can read JSON objects from the file:
SELECT value
FROM OPENROWSET (BULK 'C:\JSON\Books\books.json', SINGLE_CLOB) as j
CROSS APPLY OPENJSON(BulkColumn)
OPENROWSET will read single text value from a file, return it as a BulkColumn, and pass it to the applied OPENJSON function. OPENJSON will iterate through the array of JSON objects in the BulkColumn array and return one book formatted as JSON in each row:
Value |
{"id" : "978-0641723445","cat" : ["book","hardcover"],"name" : "The Lightning Thief", … |
{"id" : "978-1423103349","cat" : ["book","paperback"],"name" : "The Sea of Monsters", … |
{"id" : "978-1857995879","cat" : ["book","paperback"],"name" : "Sophie's World : The Greek … |
{"id" : "978-1933988177","cat" : ["book","paperback"],"name" : "Lucene in Action, Second … |
With new OPENJSON function we can parse that JSON content and transform it to a table or result set. In this example we will load content, parse loaded JSON and return five fields as columns:
SELECT book.*
FROM OPENROWSET (BULK 'C:\JSON\Books\books.json', SINGLE_CLOB) as j
CROSS APPLY OPENJSON(BulkColumn)
WITH( id nvarchar(100), name nvarchar(100), price float,
pages_i int, author nvarchar(100)) AS book
OPENROWSET(BULK) will read content of the file and we can pass that content to OPENJSON function with defined schema. OPENJSON will match properties in JSON objects using column names (e.g. price property will be returned as a price column and converted to float type). Results in this case would look like:
Id |
Name |
price |
pages_i |
Author |
978-0641723445 |
The Lightning Thief |
12.5 |
384 |
Rick Riordan |
978-1423103349 |
The Sea of Monsters |
6.49 |
304 |
Rick Riordan |
978-1857995879 |
Sophie's World : The Greek Philosophers |
3.07 |
64 |
Jostein Gaarder |
978-1933988177 |
Lucene in Action, Second Edition |
30.5 |
475 |
Michael McCandless |
Now we can either return this table to the user, or load it into another table.
You can use the same approach to read JSON files from any file that can be accessed by SQL Server. As an example, Azure File Storage supports SMB protocol, so you can map your local virtual drive to the Azure File storage share using the following procedure:
net use [drive letter] \\[storage name].file.core.windows.net\[share name] /u:[storage account name] [storage account access key]
Example that I have used is:
net use t: \\mystorage.file.core.windows.net\sharejson /u:myaccont hb5qy6eXLqIdBj0LvGMHdrTiygkjhHDvWjUZg3Gu7bubKLg==
Storage account key and primary or secondary storage account access key can be found in the Keys section in Settings on Azure portal.
SELECT book.* FROM
OPENROWSET(BULK N't:\books\books.json', SINGLE_CLOB) AS json
CROSS APPLY OPENJSON(BulkColumn)
WITH( id nvarchar(100), name nvarchar(100), price float,
pages_i int, author nvarchar(100)) AS book
We can use the same approach to load a set of JSON files from the file system into local variables (we will assume that files are named as book
declare @i int = 1
declare @json AS nvarchar(MAX)
while(@i<10)
begin
SET @file = 'C:\JSON\Books\book' + cast(@i as varchar(5)) + '.json';
SELECT @json = BulkColumn FROM OPENROWSET (BULK (@file), SINGLE_CLOB) as j
SELECT * FROM OPENJSON(@json) as json
set @i = @i +1;
end
As you can see, with OPENROWSET(BULK) function you can easily import files into SQL Server.
Метки: SQL Server 2016 JSON |
Querying JSON documents in Sql Server 2016 and DocumentDB |
Sql Server 2016 and DocumentDb enable you to query JSON documents. DocumentDb has nice syntax for querying JSON documents – you can find some good examples on DocumentDb site . Sql Server provides built-in functions for accessing JSON fields (JSON_VALUE), fragments (JSON_QUERY) and opening JSON documents (OPENJSON). In this post, I will show you some queries that can be executed in DocumentDb and equivalent Sql Server queries.
...(read more)
Метки: SQL Server 2016 JSON Azure Sql Db |
Parsing GeoJSON format in Sql Server 2016 |
OPENJSON function in Sql Server enables you to open complex JSON structures. In this post we will see how you can open GeoJSON format.
...(read more)
Метки: SQL Server 2016 JSON |
Indexing JSON documents in Sql Server 2016 |
One important thing in every database is indexing. Indexes can speed-up your filter or sort operations. Without indexes, Sql Server would need to perform full table scan every time you query data.
Sql Server 2016 do not have some customized JSON index. However, you can use standard indexes on JSON documents. In this post we will see how you can use these indexes to optimize your queries.
In many cases you would filter results from a table by some JSON property or order results using some value in a JSON document.
An example, we can imagine that we have “Info” column in the AdventureWorks SalesOrderHeader table that contains various information about sales order (e.g. information about customer, sales person, shipping/billing addresses, etc.). We want to use this column to filter sales orders for some customer. The query that we might want to optimize using index is shown in the following code:
SELECT SalesOrderNumber, OrderDate,
JSON_VALUE(Info, '$.Customer.Name') AS CustomerName
FROM Sales.SalesOrderHeader
WHERE JSON_VALUE(Info, '$.Customer.Name') = N'Aaron Campbell'
If you want to speed-up your filters or order by clauses applied on some property in JSON document, you can use the same indexes that you are using on any other columns. However, you cannot directly reference properties in the JSON documents. You need to create “virtual column” that will return values that will be used for filtering and create index on that column (in our example column that will expose customer name stored in the $.Customer.Name path in JSON documents). An example of computed column that will be used for indexing and index on that column is shown in the following code:
ALTER TABLE Sales.SalesOrderHeader
ADD vCustomerName AS JSON_VALUE(Info, '$.Customer.Name')
CREATE INDEX idx_soh_json_CustomerName
ON Sales.SalesOrderHeader(vCustomerName)
Note that column is not persisted. It does not occupy additional space in the table and it is just computed when index needs to be rebuilt.
It is important that you create computed column with the same expression as the one that is used in the query, i.e. JSON_VALUE(Info, '$.Customer.Name') in the query above.
You don’t need to rewrite your queries - if you use expressions with JSON_VALUE function, Sql Server will see that there is an equivalent computed column with the same expression and apply index if possible. Execution plan for the query above might look like to one on the following figure:
Instead of the full table scan, Sql Server uses index seek into non-clustered index and finds rows that satisfy conditions. Then it uses Key Lookup in the SalesOrderHeader table to fetch other columns that are used in the query (i.e. SalesOrderNumber and OrderDate in our case).
Note that you can avoid this additional lookop in the table if you add required columns in the index. You could add these columns in the JSON index as standard included columns:
CREATE INDEX idx_soh_json_CustomerName
ON Sales.SalesOrderHeader(vCustomerName)
INCLUDE (SalesOrderNumber, OrderDate)
In this case Sql Server would not read additional data from the SalesOrderHeader table because everything is included in the non-clustered JSON index. This is might be a good way to combine JSON and column data in the queries and create optimal indexes fro your workload.
The important feature of JSON indexes is the fact that they are collation aware. Result of JSON_VALUE function is a text that inherits collation from input expression. Therefore, values in the index will be ordered using the collation rules defined in the source columns.
To demonstrate this, we can create a simple collection table with primary key and json content:
CREATE TABLE JsonCollection
( id int identity constraint PK_JSON_ID primary key,
json nvarchar(max) COLLATE Serbian_Cyrillic_100_CI_AI
CONSTRAINT [Content should be formatted as JSON]
CHECK (ISJSON(json)>0)
)
In this case we have set Serbian Cyrillic collation on the JSON column. Now we can populate table and create index on name property:
INSERT INTO JsonCollection
VALUES (N'{"name":"Иво","surname":"Андрић"}'),
(N'{"name":"Андрија","surname":"Герић"}'),
(N'{"name":"Владе","surname":"Дивац"}'),
(N'{"name":"Новак","surname":"Ђоковић"}'),
(N'{"name":"Предраг","surname":"Стојаковић"}'),
(N'{"name":"Михајло","surname":"Пупин"}'),
(N'{"name":"Борислав","surname":"Станковић"}'),
(N'{"name":"Владимир","surname":"Грбић"}'),
(N'{"name":"Жарко","surname":"Паспаљ"}'),
(N'{"name":"Дејан","surname":"Бодирога"}'),
(N'{"name":"Ђорђе","surname":"Вајферт"}'),
(N'{"name":"Горан","surname":"Бреговић"}'),
(N'{"name":"Милутин","surname":"Миланковић"}'),
(N'{"name":"Никола","surname":"Тесла"}')
GO
ALTER TABLE JsonCollection
ADD vName AS JSON_VALUE(json, '$.name')
CREATE INDEX idx_name
ON JsonCollection(vName)
We have created standard index on computed column vName that represents the value from the $.name property. In Serbian Cyrillic code page, order of letters is ‘А’,’Б’,’В’,’Г’,’Д’,’Ђ’,’Е’, etc. Since the result of JSON_VALUE function inherits collation from the source column, order in the index will be compliant with Serbian Cyrillic rules. Now if we want to query this collection and sort results by name:
SELECT JSON_VALUE(json, '$.name'), * FROM JsonCollection
ORDER BY JSON_VALUE(json, '$.name')
If you look at the actual execution plan, you will see that it uses sorted values from non-clustered index:
We don’t have Sort operator although we have ORDER BY clause. JSON index is already ordered according to the Serbian Cyrillic rules so Sql Server can use non clustered index where results are already sorted.
However, if we change collation of the order by expression (e.g. put COLLATE French_100_CI_AS_SC after JSON_VALUE function), we will get different query execution plan:
Since the order of values in the index is not compliant to the French collation rules, Sql Server cannot use it to order results. Therefore, it will add separate Sort operator that will sort results using French collation rules.
Метки: SQL Server 2016 JSON |