Columnstore Index: Speeding up analytics in SQL Server 2016

Среда, 09 Декабря 2015 г. 07:57 + в цитатник

Please refer to the post

http://blogs.technet.com/b/dataplatforminsider/archive/2015/12/08/speeding-up-business-analytics-using-in-memory-technology-in-sql-server-2016-and-azure-sql-database.aspx

Thanks

Sunil Agarwal

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/12/09/columnstore-index-speeding-up-analytics-in-sql-server-2016.aspx

Метки: ColumnStore Index

Комментарии (0)

Built-in functions for compression/decompression in SQL Server 2016

Вторник, 08 Декабря 2015 г. 13:33 + в цитатник

SQL Server 2016 provides built in functions for compression and decompression:

COMPRESS - compress data using GZip algorithm and returns binary data.
DECOMPRESS - decompress binary data using GZip algorithm and returns binary data. You will need to cast binary data to text if you have originally compressed text.

These functions use standard GZip algorithm so you can compress data in application layer/client side and send compressed data to SQL Server, or you can compress data in SELECT query and DECOMPRESS it in client side.

In this example I will create standard table with binary data that will contain compressed content:

CREATE TABLE People (
 _id int primary key identity,
 name nvarchar(max),
 surname nvarchar(max),
 info varbinary(max)
)

Now, we can directly load compressed information into this column.

INSERT INTO People (name, surname, info)
 SELECT FirstName, LastName, COMPRESS(AdditionalInfo) FROM Person.Person

We can return compressed data directly to client who can decompress it, or we can decompress data in query:

SELECT name, surname, DECOMPRESS(info) AS original
 FROM People

As an alternative, we can add computed column (non-persisted) that dynamically decompress data:

ALTER TABLE People
 ADD info_text as CAST( DECOMPRESS(info) AS NVARCHAR(MAX))

What is compression rate?

You can try it - just create any text, compress it and see the ratio:

declare @text nvarchar(max) = (select top 100 * from sys.all_objects for json path)
select DATALENGTH(@text) AS original, DATALENGTH(COMPRESS(@text)) AS compressed, 1.0 * DATALENGTH(@text) / DATALENGTH(COMPRESS(@text)) AS ratio

You can change the number of rows that will be formatted as JSON to change size of text.

With these functions you can choose what data should be compressed in your tables. Compression may help you if you have text data, but if you have already compressed binary content (jpeg, pdf) you might not have good compression ratio and you will just spend CPU cycles.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/12/08/built-in-functions-for-compression-decompression-in-sql-server-2016.aspx

Комментарии (0)

AdventureWorksCTP3 JSON Sample

Понедельник, 07 Декабря 2015 г. 15:14 + в цитатник

New database and samples for AdventureWorks 2016 CTP3 database are published on AdventureWorks Sample Databases and Scripts for SQL Server 2016 CTP3. In SQLServer2016CTP3Samples.zip file (attached in this post) you can find various example scripts that show how to use various features added in SQL Server 2016.

One of the folders in this archive is JSON folder where you can find some JSON example scripts. In this post we will see short description of the files in this folder.

New JSON tables

When you download AdventureWorks2016CTP3 database you will notice that there are a lot of new tables with sufixes _json, _temporal_inmemory, etc. These are new tables that are used in example scripts.

For JSON support are added two new tables:

Person.Person_json table that will contain information related to person records
Sales.SalesOrder_json table that will contain information related to Sales orders

These tables are just copies of the existing Person.Person and Sales.SalesOrderHeader tables. Note that when you install AdventureWorks2016CTP3 database you don't have any JSON data in this table - you will need to run de-normalization script described below to populate tables with JSON data.

JSON script examples

In SQLServer2016CTP3Samples.zip (attached) you will find JSON folder with the following content:

These are scripts that show how to use various JSON features in this database. In the following sections we will see short explanations of these scripts.

De-normalization

First script you would need to run is de-normalization script. Even if you don't need de-normalization you will need this script because this is prerequisite for all other scripts.

This script generates JSON data and stores JSON text in new Person.Person_json and Sales.SalesOrder_json tables. In this script you will see SQL statements that perform following actions:

Create two new JSON columns in Person.Person_json table that will contain JSON text with arrays of phone numbers and email addresses

Populate PhoneNumbers JSON column with the content of related phone numbers from Person.PersonPhone and Person.PhoneNumberType tables.
Populate EmailAddresses JSON column with the content of related email addresses from Person.EmailAddress table.

Create three new JSON columns in Sales.Sales_json table that will contain sales reasons and order items, and various information about sales orders

Populate SalesReasons JSON column with the content of related sales reasons from Sales.SalesOrderHeaderSalesReason and Sales.SalesReason tables.
Populate OrderItems JSON column with the content of related order items from Sales.SalesOrderDetail tables.
Populate Info JSON column with the billing/shipping information, salesman, customer information, etc.

This script shows how you can transform complex relational structure into de-normalized JSON collections.

Also, this script is important because it initializes new JSON tables with the content that is required for all procedures, views, and queries in the sample.

Run this script even if you don't need de-normalization because without this script you will not have JSON column and JSON content in Person.Person_json and Sales.SalesOrder_json tables.

JSON views and procedures

These two files show how you can create stored procedures and views that query and process JSON text in Person.Person_json and Sales.SalesOrder_json tables.

First JSON view Sales.vwSalesOrderItems_json shows how you can create relational "table view" on the array of order items stored in OrderItems in Sales.SalesOrder_json table.

Second JSON view Sales.vwSalesOrderInfo_json returns information about shipping address and method, salesman, and customer from Info JSON column. This view shows how you can encapsulate values in JSON text end expose them as regular columns. If you don't want to use JSON data structures, you also have Sales.vwSalesOrderInfoRel_json view that shows how this query would look in the original relational schema.

JSON procedures shows different use case that you might use for JSON processing such as:

Querying new tables with JSON columns
Selecting information from tables using the list of ids formatted as JSON array

JSON indexes

In this file you can see how to create B-tree or full-text search indexes on JSON columns, This file also contains some stored procedures with queries that use new indexes.

Show actual execution plan option in SQL Server Management Studio when you runt these stored procedures to verify that queries use indexes during execution.

JSON import/export

In this file you find procedures that export table data from Person.Person_json and Sales.SalesOrder_json tables and import JSON text into these tables. This script demonstrates how easily you can transform relational data to JSON and vice versa.

JSON query examples

In this file you find various queries that use views and stored procedures created in previous scripts.

Cleanup script

In this file you find script that deletes all new columns, JSON data, stored procedures, indexes, and views that are created in previous steps.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/12/07/adventureworksctp3-json-sample.aspx

Метки: SQL Server 2016 JSON

Комментарии (0)

Using non-relational models in SQL Server

Понедельник, 30 Ноября 2015 г. 14:56 + в цитатник

In this post I will explain how you can use both relational and non-relational data models in your database schema.

Problem

Relational database schema might contain a lot of tables that require many JOINs or separate queries to retrieve all necessary data in the query. Also, in order to insert new records, we need to update many table, and in many case we need to follow parent/child order defined by referential integrity, read primary key of the parent row before we insert child row, etc.

NoSQL databases mitigate some of these problems by adding all related information in the single entity. This approach reduces complexity of model so we don’t need to join different entities because all information are contained in a single entry. Also, inserts are simple – new record with all related information is simply inserted in one entity.

Using the same approach in relational database might also simplify schema and improve performance of some workloads.

Solution

SQL Server enables you to combine relational and non-relational (JSON, XML) data in order to define optimal data model for your workloads. If you have a structure where primary object (row) has single related entity or collections of related entities that you don’t want to represent them as relational schema, you can store related information as JSON documents. The same approach use NoSQL databases.

Data model

In this scenario we will use SalesOrderHeader table from AdventureWorks database as an example. SalesOrderHeader table has a number of related tables that contain additional information about the sales order such as order items, customer data, salesman info, shipping and billing information, etc. Design of these tables in AdventureWorks database is shown in the following figure:

Although this design follows the standard principles for modeling standard relational schema, it might not be optimal for this use case.

This schema is optimized for workloads that have frequent updates in child tables. However, once we create sales order with all related items, it is unlikely that someone will constantly update quantities and unit prices of the items, sales reasons, shipping methods, customer/salesman info, etc. Information related to sales order are mostly read-only so we are not getting some benefits from normalization. The only case when someone will update this information are error correction cases.

To demonstrate how this structure can be represented in non-relational schema, in Adventure Works 2016 CTP3 database is added non-relational version of SalesOrderHeader table called SalesOrder_json that represents a copy of standard SalesOrderHeader table with following changes:

Added column OrderItems that contains an array of SalesOrderDetail rows formatted as JSON array
Added column sales reasons that will contain JSON array of strings
Added column Info that contains various information about sales order (customer, salesman, shipping method, shipping address, etc.)

Now we can store information in JSON columns:

ID	....	Info	SalesReasons	OrderItems
43697		{"ShippingInfo":{"Address":"601 Asilomar Dr.","City":"Metchosin","PostalCode":"V9","Province":"British Columbia","TerritoryID":6,"Method":"XRQ - TRUCK GROUND","ShipBase":3.9500,"ShipRate":0.9900},"BillingInfo":{"Address":"601 Asilomar Dr.","City":"Metchosin","PostalCode":"V9"}}	["Manufacturer","Quality"]	[{"Item":{"Qty":1,"Price":3578.2700,"Discount":0.0000,"Total":3578.270000},"Product":{"Number":"BK-R93R-62","Name":"Road-150 Red, 62"}}]

If you need to get information about the sales order and related items, you can do it with a single query. If your client expects JSON as response (e.g. Ajax client in web browser) this might be perfect solution for you because all information will be prepared in the format you need. Also, inserts and data load might be faster if you prepare data in JSON format on the client side because you will not need to update multiple tables in the same transaction. You just need to update one row in the sales order table.

Note that you are not loosing query capabilities. SQL Server provides functions JSON_VALUE, JSON_QUERY, and OPENJSON that can access any value in JSON columns.

If you want to access values in JSON using standard column syntax you can add a non-persisted computed column that returns value on some paths, e.g.:

ALTER TABLE Sales.SalesOrder_json
ADD vCustomerName AS JSON_VALUE(Info, '$.Customer.Name')

Now you can reference JSON value as any other real column via vCustomerName column.

If you need relational view on the array of JSON objects stored in OrderItems column you can create view that looks exactly as relational SalesOrderDetails table:

CREATE VIEW Sales.vwSalesOrderItems_json
 
 AS
 
 SELECT SalesOrderID, SalesOrderNumber, OrderDate,
 CarrierTrackingNumber, OrderQty, UnitPrice,
 UnitPriceDiscount, LineTotal, ProductNumber, Name
 FROM Sales.SalesOrder_json
 CROSS APPLY
 OPENJSON (OrderItems)
 WITH ( CarrierTrackingNumber NVARCHAR(20),
 OrderQty int '$.Item.Qty',
 UnitPrice float '$.Item.Price',
 UnitPriceDiscount float '$.Item.Discount',
 LineTotal float '$.Item.Total',
 ProductNumber NVARCHAR(20) '$.Product.Number',
 Name NVARCHAR(50) '$.Product.Name'
 )

Queries executed on this view will return the same results as queries on the real SalesOrderDetails table.

Conclusion

SQL Server enables you to make trade-offs between relational and non-relational modeling in order to find the optimal data model for your workload. In SQL server you can create your database schema based on characteristics of your workload, using some of the following guidelines:

If your workload is read-heavy and you need to reduce complexity of schema and number of JOINs, you can format information as JSON.
If you need to load hierarchical data into database and you don’t want to update multiple tables, read auto generated keys/identity/ sequence for each parent table to load child rows, you can store all information in the single row and load each record using a single insert.
If you have heavy transnational workload where related information are frequently changed you can use relational model that gives you best performance in this case.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/11/30/using-non-relational-models-in-sql-server.aspx

Метки: SQL Server 2016 JSON

Комментарии (0)

De-normalize database schema using JSON

Воскресенье, 29 Ноября 2015 г. 03:14 + в цитатник

In this post I will talk about de-normalization - transforming complex relational structures into simpler tables using JSON format.

Problem

In traditional relational schema there can be a lot of one-to-many relationships (e.g. Person may have several phones, or several email addresses). Even for the simple string array in the C#/Java classes in application domain model, we must create separate tables in relational model. As an example, an array of email addresses that belong to person class must be stored in EmailAddress table with a foreign key to the parent Person row because array of strings cannot be mapped to some data type.

In order to read Person data with associated emails and phone numbers; we would need to run several queries (e.g. one to return person info, another to return person phones, third to return person email addresses), or join tables and process one flat result set on the client side. This method requires a lot of JOINs and indexes on foreign keys. The similar problem is with inserts. In order to insert person with his emails and phone numbers, we need to insert person row, take identity value generated for primary key and then use this value to insert records in email address/phone number tables. For each "Logical" insert we need to have several actual table inserts and reads.

NoSQL database use different approach. In NoSQL document database you can store primary entity as JSON together with all related information formatted as JSON arrays. This way you can read person with associated emails and phone numbers with a single read operation, and insert new person with all associated information with a single write. There is some cost for serializing/deserializing child entities as JSON in the application layer; however, there is bigger benefit because we don't have multiple read/write operations for every logical action.

How can we use similar approach in SQL Server?

Solution

In order to avoid complex schema, information from related child entities can be formatted as JSON arrays and stored into parent table. This way we have simpler table schema that matches design in application logic and we need a single table seek/scan to return necessary information.

New FOR JSON clause enables you to denormalize complex relational schema and move information from the child tables in the parent row formatted as JSON.

Data model

In this example will be used Person table structure from AdventureWorks2016CTP3 database. In AdventureWorks2016CTP3 database we have a lot of "one to many" relationships between tables, e.g. one person can have many email addresses, many phones, etc. If you want to store these information in relational database, you would need to use something like a structure on the following figure:

Even for the simple structures such as phone numbers or emails that will be represented as arrays in C# or Java, we need separate tables with foreign key relationships, indexes on foreign keys, etc.

In the another post I have shown how you can return person info with all related information in the single query. Here I will show different approach - pre-aggregating related tables as JSON arrays in parent table. This way we don't need to format related records at the query time - we can just read entire collection from JSON column.

In the following examples will be used Person.Person_json table available in AdventureWorks2016CTP3 database that represents de-normalized version of standard Person.Person table.

Use Case 1: Store phone numbers and their types into single column as JSON array

In this example will be created new text column in Person_json table that will contain array of phone numbers and types:

ALTER TABLE Person.Person_json
 ADD PhoneNumbers NVARCHAR(MAX)
 CONSTRAINT [Phone numbers must be formatted as JSON array]
 CHECK (ISJSON(PhoneNumbers)>0)

Information from PersonPhone and PhoneNumberType tables can be stored in this column using the following query:

UPDATE Person.Person_json
 SET PhoneNumbers = (SELECT Person.PersonPhone.PhoneNumber, Person.PhoneNumberType.Name AS PhoneNumberType
 FROM Person.PersonPhone
 INNER JOIN Person.PhoneNumberType ON Person.PersonPhone.PhoneNumberTypeID = Person.PhoneNumberType.PhoneNumberTypeID
 WHERE Person.Person_json.PersonID = Person.PersonPhone.BusinessEntityID
 FOR JSON PATH)

In Person_json table will be stored phone number information per each person row. Now we can drop PersonPhone and PhoneNumberType tables since we don't need them anymore.

Use Case 2: Store values of email addreses into single column as JSON array of values

In the second example will be created new text column in Person_json table that will contain array of email addresses:

ALTER TABLE Person.Person_json
 ADD EmailAddresses NVARCHAR(MAX)
 CONSTRAINT [Email addresses must be formatted as JSON array]
 CHECK (ISJSON(EmailAddresses)>0)

Now we can update this column and put all email addresses that belongs to persons formatted as JSON:

UPDATE Person.Person_json
 SET EmailAddresses = 
 dbo.ufnToRawJsonArray(
 (SELECT Person.EmailAddress.EmailAddress
 FROM Person.EmailAddress
 WHERE Person.Person_json.PersonID = Person.EmailAddress.BusinessEntityID
 FOR JSON PATH)
 , 'EmailAddress')

As a result, in the Email addresses column will be stored emails formatted as JSON arrays:

["ken0@adventure-works.com","ken.adams@adventure-works.com","ken@aw.microsoft.com"]

In this example is used following function that converts array of key:value pairs to array of strings:

-- Utility function that removes keys from JSON
 -- Used when we need to remove keys from FOR JSON output,
 -- e.g. to generate [1,2,"cell"] format instead of [{"val":1,{"val":2},{"val":"cell"}]
 CREATE FUNCTION
 dbo.ufnToRawJsonArray(@json nvarchar(max), @key nvarchar(400)) returns nvarchar(max)
 as begin
 return replace(replace(@json, FORMATMESSAGE('{"%s":', @key),''), '}','')
 end

This function simply removes keys from an array so we can get simple arrays of strings.

Conclusion

In this example is shown de-normalization process that helped us to reduce schema with four tables to a single table with two additional JSON columns. De-normalization is common method for simplifying data schema and improving read performance. This is one of the reasons why NoSQL databases become popular.

However, de-normalization is not a silver bullet that will solve all schema complexity and performance problems in your systems. With de-normalization you are getting better read performances but you are increasing table space and your updates will be slower.

Use this approach only if information you want to be de-normalized are not changed frequently (in the example above assumptions is that person will not frequently change his phone numbers and telephone addresses) and when the bottleneck in the system are JOINs and read performance.

Do not de-normalize information that will be frequently changed - separate tables are still the best structures for these kind of data access patterns.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/11/29/de-normalize-database-schema-using-json.aspx

Метки: SQL Server 2016 JSON

Комментарии (0)

Query JSON data

Суббота, 28 Ноября 2015 г. 19:43 + в цитатник

New JSON support in SQL server enables you to create SQL queries that use both relational and semi-structured data formatted as JSON text. In this post we will see some query examples.

Problem

In my database I have information stored both as scalars in standard relation columns and as properties in JSON columns. I need to create queries that combine information both from JSON text and standard columns.

Solution

New functions for JSON text processing in SQL server enable us to query both relational and non-structured data formatted as JSON.

Data model

In this post I will use the Sales.SalesOrder_json table published in AdventureWorks2016CTP3 database. Sales.SalesOrder_json table is a copy of Sales.SalesOrderHeader table from AdventureWorks2016 database with additional JSON columns:

OrderItems that contains an array of SalesOrderDetails rows belonging to sales order header
SalesReasons that contains array of sales reasons associated to sales order
Info that contains various information about the sales order such as customer name, sales person

Use Case 1

In the first example we will create a stored procedure that reads both columns and JSON values from sales order table:

CREATE PROCEDURE
 Sales.SalesOrderInfo_json(@SalesOrderID int)
 AS BEGIN
 SELECT SalesOrderNumber, OrderDate, Status, ShipDate, Status, AccountNumber, TotalDue,
 JSON_QUERY(Info, '$.ShippingInfo') ShippingInfo,
 JSON_QUERY(Info, '$.BillingInfo') BillingInfo,
 JSON_VALUE(Info, '$.SalesPerson.Name') SalesPerson,
 JSON_VALUE(Info, '$.ShippingInfo.City') City,
 JSON_VALUE(Info, '$.Customer.Name') Customer,
 JSON_QUERY(OrderItems, '$') OrderItems
 FROM Sales.SalesOrder_json
 WHERE SalesOrderID = @SalesOrderID
 END
 GO

In this stored procedure you can read both standard relational columns from table and extract objects (e.g. shipping/billing information) and values (sales person, city from shipping address, and customer name) from Info JSON column. If you want to use relational model instead of JSON functions you would need to create complex query with several table joins:

CREATE PROCEDURE
 Sales.SalesOrderInfoRel_json(@SalesOrderID int)
 as begin
 SELECT SalesOrderNumber, OrderDate, ShipDate, Status, Sales.SalesOrder_json.AccountNumber, TotalDue,
 shipprovince.Name as [Shipping Province], 
 shipmethod.Name as [Shipping Method], 
 shipmethod.ShipRate as ShipRate,
 billaddr.AddressLine1 + COALESCE ( ', ' + shipaddr.AddressLine2, '') as [Billing Address],
 sp.FirstName + ' ' + sp.LastName as [Sales Person],
 cust.FirstName + ' ' + cust.LastName as Customer 
 FROM Sales.SalesOrderHeader
 JOIN Person.Address shipaddr
 ON Sales.SalesOrderHeader.ShipToAddressID = shipaddr.AddressID
 LEFT JOIN Person.StateProvince shipprovince
 ON shipaddr.StateProvinceID = shipprovince.StateProvinceID
 JOIN Purchasing.ShipMethod shipmethod
 ON Sales.SalesOrdereHeader.ShipMethodID = shipmethod.ShipMethodID
 JOIN Person.Address billaddr
 ON Sales.SalesOrderHeader.BillToAddressID = billaddr.AddressID
 LEFT JOIN Sales.SalesPerson
 ON Sales.SalesPerson.BusinessEntityID = Sales.SalesOrderHeader.SalesPersonID
 LEFT JOIN Person.Person AS sp
 ON Sales.SalesPerson.BusinessEntityID = sp.BusinessEntityID
 LEFT JOIN Sales.Customer
 ON Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
 LEFT JOIN Person.Person AS cust
 ON Sales.Customer.CustomerID = cust.BusinessEntityID
 WHERE Sales.SalesOrderHeader.SalesOrderID = @SalesOrderID
 end
 GO

As you can see, some basic information required in the report require several table joins. Besides simplicity of query, you might get some performance improvements. First stored procedure uses just one table seek operation to read a single row; however, relational query must seek into several table and join results.

Use case 2: Aggregate results using columns and values from JSON text

You can use both standard relational columns and properties from JSON fields in any part of query. In the following stored procedure are filtered and aggregated results using values from JSON text:

CREATE PROCEDURE
 Sales.SalesOrdersPerCustomerAndStatusReport_json(@city nvarchar(50), @territoryid int)
 AS BEGIN
 SELECT JSON_VALUE(Info, '$.Customer.Name') AS Customer, Status, SUM(SubTotal) AS Total
 FROM Sales.SalesOrder_json
 WHERE TerritoryID = @territoryid
 AND JSON_VALUE(Info, '$.ShippingInfo.City') = @city
 AND OrderDate > '1/1/2015'
 GROUP BY JSON_VALUE(Info, '$.Customer.Name'), Status
 HAVING SUM(SubTotal) > 1000
 END

`Use Case 3: Aggregate results using collections of JSON objects`

In the previous examples we have used just single values from JSON text. OPENJSON function enables us to transform JSON arrays to bales and apply any aggregation function. In the following stored procedure we will take sales reasons stored as arrays of JSON string values in SalesReasons column and group sales orders by the values in this array and return count of sales order by each sales reason:

CREATE PROCEDURE
 Sales.SalesOrdersBySalesReasonReport_json(@city nvarchar(50))
 AS BEGIN
 SELECT value, COUNT(SalesOrderNumber) AS NumberOfOrders
 FROM Sales.SalesOrder_json 
 CROSS APPLY OPENJSON (SalesReasons)
 WHERE JSON_VALUE(Info, '$.ShippingInfo.City') = @city 
 GROUP BY value
 END
 GO

Conclusion

Function for JSON text processing in SQL Server enable you to combine JSON data with standard columns in any SQL query. With JSON text function you can use full SQL language for querying and analyzing JSON data. Also, you can combine both standard relational data with JSON values in any query.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/11/28/query-json-data.aspx

Метки: SQL Server 2016 JSON

Комментарии (0)

Log analysis with SQL Server

Пятница, 27 Ноября 2015 г. 20:18 + в цитатник

Logging

In any application you will need to have some kind of logs where you write events, errors, debug and other information. One of the biggest problems is how to analyze information from log files. If you have some predefined format you can use some log analysis tools or languages such as Perl Scripts, Hadoop MapReduce (if you have big log files), Graylog2 (open source), LogStash (open source), Loggly (SaaS), etc.

The problem in this case might be that you have limited query/analysis capabilities because you will need to code your own logic for filtering and aggregating log information.

If you format information in log files as JSON you might have good trade-off between flexibility of information that you can store and ability to process information with some kind of query language. One of the common choice are NoSQL database such as MongoDB or Azure DocumentDB where you can store JSON messages and search them using some platform specific query language.

In this article we will see ho can you analyze log files containing JSON messages.

Problem

We have log files containing log messages formatted as JSON. How to analyze these log files?

Solution

Load text files in SQL Server and analyze them with OPENJSON function.

How to analyze JSON logs with SQL Server?

SQL Server enables you to load content of the file formatted as JSON and transform it into relational format that can be analyzed with standard SQL. We will start with and example of JSON log file shown in the following example:

[
{"time":"2015-11-27T02:33:05.063","ip":"214.0.57.12","request":"/", "method":"GET", "status":"200"},
{"time":"2015-11-27T02:33:06.003","ip":"224.12.07.25","request":"/",method":"GET", "status":"200"},
{"time":"2015-11-27T02:33:06.032","ip":"192.10.81.115","request":"/contact", "method":"POST", "status":"500", "exception":"Object reference not set to an instance of object",”stackTrace”:”…” },
……..
{"time":"2015-11-27T02:37:06.203","ip":"204.12.27.21","request":"/login",method":"GET", "status":"200"},
{"time":"2015-11-27T02:37:12.016","ip":"214.0.57.12","request":"/login", "method":"POST","status":"404", "exception":"Potentially dangerous value was detected in request"}
]

Here we have some standard information such as time, IP address, requested URL, HTTP method, etc. If some error occurred we can have additional data such as exception message, stack trace, etc.

In SQL Server, we can easily read this log file and query results:

SELECT log.*
 FROM OPENROWSET (BULK 'C:\logs\json-log-2015-11-27.txt', SINGLE_CLOB) as log_file
 
 CROSS APPLY OPENJSON(BulkColumn)
 WITH( time datetime, status varchar(5), method varchar(10), exception nvarchar(200)) AS log

For better manageability and performance, it might be good to split your log files by date or size (using something like Rolling file appender in Log4J). Now we can do any type of analysis on the returned data set. The following report returns list of pages and number of server errors detected on them for each HTTP method:

SELECT request, method, COUNT(*)
 
 FROM OPENROWSET (BULK N'C:\logs\log-json-2015-11-27.txt', SINGLE_CLOB) as log_file
 
 CROSS APPLY OPENJSON(BulkColumn)
 
 WITH( time datetime, status int, method varchar(10), request varchar(20), exception nvarchar(200)) AS log
 
 WHERE status >= 500
 
 GROUP BY request, method

You can notice that we have full power of T-SQL on JSON log files. You can also load JSON data into standard table and create reports on that table.

Analyzing LD-JSON logs

One of the problem with JSON is the fact that you cannot continuously append JSON messages. If you want to have valid array of JSON objects you will need to surround them with brackets. Once you add final bracket, you cannot add new data.

LD JSON is an alternative JSON format that might be a good choice for logging. LD JSON address one of the main issues in standard JSON format – ability to continuously append valid JSON objects. LD-JSON introduces few changes in standard JSON format:

New line is separator between objects
Stream of JSON objects is not surrounded with brackets so you can continuously append JSON information in the file.

An example of LD-JSON content is shown in the following example:

{"time":"2015-11-27T02:33:05.063","ip":"214.0.57.12","request":"/", "method":"GET", "status":"200"}
{"time":"2015-11-27T02:33:06.003","ip":"224.12.07.25","request":"/",method":"GET", "status":"200"}
{"time":"2015-11-27T02:33:06.032","ip":"192.10.81.115","request":"/contact", "method":"POST", "status":"500", "exception":"Object reference not set to an instance of object",”stackTrace”:”…” }
……..
{"time":"2015-11-27T02:37:06.203","ip":"204.12.27.21","request":"/login",method":"GET", "status":"200"}
{"time":"2015-11-27T02:37:12.016","ip":"214.0.57.12","request":"/login", "method":"POST","status":"404", "exception":"Potentially dangerous value was detected in request"}

Now we can read this file with FORMATFILE and run the same report:

SELECT request, method, COUNT(*)
 
 FROM OPENROWSET(BULK 'C:\logs\log-ld-json-2015-11-27.txt',
 FORMATFILE= 'c:\logs\csv.xml') AS log_file
 
 CROSS APPLY OPENJSON(json)
 WITH( time datetime, status int, method varchar(10), request nvarchar(200)) AS log
 WHERE status >= 500
 GROUP BY request, method

You would need to have format file with the following content:

Note that json column in the T-SQL query aove is defined in format file.

Analyzing file from Azure File Storage

You can use the same approach to read JSON files stored on Azure File Storage. As an example, Azure File Storage supports SMB protocol, so you can map your local virtual drive to the Azure File storage share using the following procedure:

Create file storage account (e.g. mystorage), file share (e.g. logs), and folder using Azure portal or Azure PowerShell SDK.
Create firewall outbound rule in Windows Firewall on your computer that allows port 445. Note that this port might be blocked by your internet provider. If you are getting DNS error (error 53) in the following step then you have not opened that port or it is blocked by your ISP.
Mount Azure File Storage share as local drive (e.g. t: ) using the following command:

net use [drive letter] \\[storage name].file.core.windows.net\[share name] /u:[storage account name] [storage account access key]

Example that I have used is:

net use t: \\mystorage.file.core.windows.net\sharejson /u:myaccont hb5qy6eXLqIdBj0LvGMHdrTiygkjhHDvWjUZg3Gu7bubKLg==

Storage account key and primary or secondary storage account access key can be found in the Keys section in Settings on Azure portal.

Now if you setup you application to log data to some log file into Azure File Storage (e.g. log-file.json), you can use queries above to analyze data loaded from path mapped to t: \\mystorage.file.core.windows.net\sharejson\log-file.json

Conclusion

Logging information in traditional text files is the fastest way to log information. Formatting log messages as JSON or LD-JSON enables you to have simple human readable log format with ability to query and analyze log data.

New SQL Server with JSON support enables you to easily load log files and analyze them with standard T-SQL.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/11/27/log-analysis-with-sql-server.aspx

Метки: SQL Server 2016 JSON

Комментарии (0)

Storing JSON in SQL Server

Понедельник, 23 Ноября 2015 г. 16:51 + в цитатник

Storing JSON in SQL Server

SQL Server offers built-in functions for processing JSON text data. In this post, we will see how you can store JSON in SQL Server database.

Storing JSON in text columns

JSON is textual format so in SQL Server it is stored in NVARCHAR columns. The simplest table that represents collection of JSON objects is shown in the following listing:

DROP TABLE IF EXISTS Person
 
 CREATE TABLE Person 
 ( _id int identity constraint PK_JSON_ID primary key,
 value nvarchar(max)
 CONSTRAINT [Content should be formatted as JSON]
 CHECK ( ISJSON(value)>0 )
 )

This simple structure is similar to the standard NoSQL collection that you can create in NoSQL databases (e.g. Azure DocumentDB or MongoDB) where you just have key that represents ID and value that represents JSON.

Note that NVARCHAR is not just a plain text. SQL Server has built-in text compressions mechanism that can transparently compress data stored on disk. Compression depends on language and can go up to 50% depending on your data (see UNICODE compression ).

The key difference between SQL server and other plain NoSQL databases is that SQL Server enables you to use hybrid data model where you can store several JSON objects in the same “collection” and combine them with regular relational columns.

As an example, imagine that we know that every person in your collection will have FirstName and LastName, and that you can store general information about the person as one JSON object, and phone numbers/email addresses as separate objects. In SQL Server 2016 we can easily create this structure without any additional syntax:

DROP TABLE IF EXISTS Person
 
CREATE TABLE Person (
 
 PersonID int IDENTITY PRIMARY KEY,
 
 FirstName nvarchar(100) NOT NULL,
 
 LastName nvarchar(100) NOT NULL,
 
 AdditionalInfo nvarchar(max) NULL,
 
 PhoneNumbers nvarchar(max) NULL,
 
 EmailAddresses nvarchar(max) NULL
 CONSTRAINT [Email addresses must be formatted as JSON array]
 CHECK ( ISJSON(EmailAddresses)>0 )
 
 )

Instead of single JSON object you can organize your data in this “collection”. If you do not want to explicitly check structure of each JSON column, you don’t need to add JSON check constraint on every column (in this example I have added CHECK constraint only on EmailAddresses column).

If you compare this structure to the standard NoSQL collection, you might notice that you will have faster access to strongly typed data (FirstName and LastName). Therefore, this solution is good choice for hybrid models where you can identify some information that are repeated across all objects, and other variable information can be stored as JSON. This way, you can combine flexibility and performance.

If you compare this structure with the schema of Person table AdventureWorks database, you might notice that we have removed many related tables.

Beside simplicity of schema, your data access operations will be simpler compared to complex relational structure. Now you can read single table instead of joining several tables. When you need to insert new person with related information (email addresses, phone numbers) you can insert a single record in one table instead of inserting one record in AdventureWorks Person table, taking identity column to find foreign key that will be used to store phones, email addresses, etc. In addition, in this model you can easily delete single person row without cascade deletes using foreign key relationships.

NoSQL databases are optimized for simple, read, insert, and delete operations – SQL Server 2016 enables you to apply the same logic in relational database.

JSON constraints

In the previous examples, we have seen how to add simple constraint that validates that text stored in the column is properly formatted. Although JSON do not have strong schema, you can also add complex constraints by combining functions that read values from JSON and standard T-SQL functions:

ALTER TABLE Person
 ADD CONSTRAINT [Age should be number]
 CHECK ( ISNUMERIC(JSON_VALUE(value, '$.age'))>0 )
 
 ALTER TABLE Person
 ADD CONSTRAINT [Person should have skills]
 CHECK ( JSON_QUERY(value, '$.skills') IS NOT NULL)

First constraint will take the value of $.age property and check is this numeric value. Second constraint will try to find JSON object in $.skills property and verify that it exists. The following INSERT statements will fail due to the violation of constraints:

INSERT INTO Person(value)
 VALUES ('{"age": "not a number", "skills":[]}')
 
 INSERT INTO Person(value)
 VALUES ('{"age": 35}')

Note that CHECK constraints might slow down your insert/update processes so you might avoid them if you need faster write performance.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/11/23/storing-json-in-sql-server.aspx

Метки: SQL Server 2016 JSON

Комментарии (0)

In-Memory OLTP - what's new in SQL2016 CTP3

Пятница, 13 Ноября 2015 г. 00:19 + в цитатник

We have been hard at work to add features to In-Memory OLTP to make it easier to build apps and to adopt In-Memory OLTP in existing apps, as well as to increase overall performance and scaling. Below is the list of In-Memory OLTP enhancements that we have released in SQL2016 thus far, up to and including CTP3. There is more to come!

Let us know what you think!

Transact-SQL Improvements:

Query Surface Area in Native Modules for:

Disjunction (OR, NOT)

UNION and UNION ALL

SELECT DISTINCT

OUTER JOIN

Subqueries in SELECT statements (EXISTS, IN, scalar subqueries)

Nested execution (EXECUTE) of natively compiled modules

Natively compiled inline table-valued functions (TVFs)

EXECUTE AS CALLER support - the EXECUTE AS clause is no longer required for native modules

Built-in security functions and increased support for built-in math functions

Support with memory-optimized tables for:

FOREIGN KEY constraints between memory-optimized tables

CHECK constraints

UNIQUE constraints

Triggers (AFTER) for INSERT/UPDATE/DELETE operations. Triggers on memory-optimized tables are natively compiled, and thus use WITH NATIVE_COMPILATION.

ALTER and sp_recompile support for natively compiled procedures, functions, and triggers
ALTER TABLE support for schema and index changes

The new ALTER TABLE ... ADD/DROP/ALTER INDEX syntax is introduced for index changes on memory-optimized tables

Full support for collations and code pages

(var)char columns can use any code page supported by SQL Server

Character columns in index keys can use any SQL Server collation

Expressions in natively compiled modules as well as constraints on memory-optimized tables can use any SQL Server collation

Scalar User-Defined Functions for In-Memory OLTP

Performance and Scaling improvements:

Increased data size: a database can now have up to 2 terabytes of user data in memory-optimized tables with SCHEMA_AND_DATA
Scalability improvements in the persistence layer
Parallel plan support for Accessing Memory-Optimized Tables Using Interpreted Transact-SQL
Parallel scan support for HASH indexes

Enhancements in SQL Server Management Studio:

The transaction analysis report no longer requires the configuration of data collectors or management data warehouse. The report can now run directly on a production database. Simply right-click on the database and select Reports -> Standard Reports -> Transaction Performance Analysis Report.
PowerShell Cmdlet for Migration Evaluation for evaluating the migration fitness of multiple objects in a SQL Server database.
Generate migration checklists by right-clicking on a database, and selecting Tasks -> Generate In-Memory OLTP migration checklists

Cross-feature support:

Multiple Active Result Sets (MARS) connections can now access memory-optimized tables and natively compiled stored procedures
Transparent Data Encryption support. If a database is configured for ENCRYPTION, files in the MEMORY_OPTIMIZED_DATA filegroup are now also encrypted.
Support for using temporal system-versioning with In-Memory OLTP. For more information, see System-Versioned Temporal Tables with Memory-Optimized Tables
Query store support for natively compiled code from In-Memory OLTP workloads. For more information, see Using the Query Store with In-Memory OLTP.
Row-Level Security in Memory-Optimized Tables
Transparent Data Encryption support. If a database is configured for ENCRYPTION, files in the MEMORY_OPTIMIZED_DATA filegroup are now also encrypted.

For more information, see In-Memory OLTP (In-Memory Optimization).

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/11/12/in-memory-oltp-what-s-new-in-sql2016-ctp3.aspx

Метки: SQL Server 2016 performance in-memory oltp

Комментарии (0)

FOR JSON performance - simple experiment

Вторник, 10 Ноября 2015 г. 19:39 + в цитатник

How fast is FOR JSON compared to generating JSON in application layer?

In SQL Server 2016 CTP2 is introduced FOR JSON clause that formats results of SQL query as JSON text. Very common question is what is performance impact of FOR JSON clause and how it is compared to JSON that is generated in application layer?

In this post, I will compare performance of FOR JSON and traditional approach for generating JSON in application layer.

This is not a general performance comparison. I will focus on one simple and common use case - returning one row from a table with a set of related rows from a child table. In this experiment I will use SalesOrderHeader -> SalesOrderDetails structure from AdventureWorks database where we have one to many relationship between SalesOrderHeader and SalesOrderDetails.

Problem

I have an application that reads SalesOrderHeader and SalesOrderDetails rows and returns them as JSON shown in the following code:

{"SalesOrderID":55859,"RevisionNumber":8,"OrderDate":"2013-09-07T00:00:00",
 "DueDate":"2013-09-19T00:00:00","ShipDate":"2013-09-14T00:00:00","Status":5,
 "Sales.SalesOrderDetail":[
 {"SalesOrderID":55859,OrderQty":1,"ProductID":779,"SpecialOfferID":1,"UnitPrice":2319.9900},
 {"SalesOrderID":55859,"SalesOrderDetailID":57519,"OrderQty":1,"ProductID":871"},
 {"SalesOrderID":55859,"SalesOrderDetailID":57520,"UnitPrice":4.9900,"UnitPriceDiscount":0.0000},
 {"SalesOrderID":55859,"SalesOrderDetailID":57521,"OrderQty":1,UnitPrice":34.9900}
 ]
}

In the following examples, I will explain how you can generate this JSON.

Option 1 - Formatting JSON in application

If you don't want to format JSON in database, you would need to read one record from the parent table and then read all related records from the child table.

declare @id int = (ROUND(rand() * 31464, 0) + 43660)
 
SELECT *
 FROM Sales.SalesOrderHeader
 WHERE SalesOrderID = @id;
 
SELECT *
 FROM Sales.SalesOrderDetail
 WHERE SalesOrderID = @id

Then you can use some formatter to generate JSON text in application layer (e.g. Json.Net). Problem in this method is the fact that you will need to generate two queries to get parent row and related child rows.

Option 2 - JOIN parent and child table

Second option would be to join these two tables in database layer and return flat result set that joins them, e.g.:

declare @id int = (ROUND(rand() * 31464, 0) + 43660)
 
 SELECT *
 From Sales.SalesOrderHeader
 LEFT JOIN Sales.SalesOrderDetail
 ON Sales.SalesOrderHeader.SalesOrderID
 = Sales.SalesOrderDetail.SalesOrderID
 WHERE Sales.SalesOrderHeader.SalesOrderID = @id

Here you will get repeated header values for every details row so you would need to remove duplicated information in application layer when you generate JSON.

Option 3 - Format related rows as JSON

Third option would be to return all information from SalesOrderHeader table and format related child rows as JSON in column expression:

declare @id int = (ROUND(rand() * 31464, 0) + 43660)
 
 SELECT *,
 (SELECT *
 From Sales.SalesOrderDetail
 WHERE SalesOrderID = @id
 FOR JSON PATH) AS Details
 From Sales.SalesOrderHeader
 WHERE SalesOrderID = @id

Here, you are getting single row as a result so you don't need to remove duplicates, but you would need to format information from header as JSON key:value pairs and include related child information that are already formatted as JSON.

Option 4 - Format JSON in database layer using FOR JSON AUTO

Last option would be to join tables in SQL Server and format them using FOR JSON AUTO. This is similar to option 2, but we are delegating formatting to database layer:

declare @id int = (ROUND(rand() * 31464, 0) + 43660)
 
 SELECT *
 From Sales.SalesOrderHeader
 LEFT JOIN Sales.SalesOrderDetail
 ON Sales.SalesOrderHeader.SalesOrderID
 = Sales.SalesOrderDetail.SalesOrderID
 WHERE Sales.SalesOrderHeader.SalesOrderID = @id
 FOR JSON AUTO

This query will prepare results you need in database layer and return it as text. Problem with this approach is that response is bigger so make sure that you have good connection between app layer and database server. In this case you can just stream results to client without any additional formatting.

Performance results

I'm using Adam Machanic's SQLQueryStress tool to generate workloads and test workloads. I'm executing 50 iterations of 20 concurrent batches for all three options. Results are shown in the following figure:

As you can see, we don't have big difference in query performance in all queries results vary between 7.8 to 8.3 seconds. Note that here I'm not counting additional time that would be required to format JSON in application layer. Therefore, conclusions in this case are:

Response time of queries that use FOR JSON is not significantly increased comparing to equivalent queries that you would need to run to format JSON in application layer.
You don't need extra processing time in application layer and additional logic that will join parent (SalesOrderHeader) and child (SalesOrderDetails) tables/entities.

Note that this might not be applicable to any configuration and use case. Results might vary depending on you connection bandwidth between application layer and database server, number of concurrent threads, etc. However in mu case I'm getting similar results in all cases.

Also, this experiment do not covers other cases where you return multiple parent rows, with different numbers of fields returned from parent and child tables, or many child table collections. Feel free to repeat this experiment, try other use cases and report performance results you are getting.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/11/10/for-json-performance-simple-experiment.aspx

Метки: SQL Server 2016 JSON

Комментарии (0)

Indexing JSON arrays using full-text search indexes

Понедельник, 09 Ноября 2015 г. 17:14 + в цитатник

Indexing JSON data (Part II) - Full text search

In the previous post I described how you can index JSON fields using standard B-Tree indexes. In this post we will see how to use Full-text search indexes on JSON arrays.

Table structure

Full-text indexes can be used if JSON in your column is formatted as a simple array of scalar values. In the ideal case as array of numbers, but array of strings will also work in most of the cases. In this example I will use Sales.SalesOrder_json table that can be found in official AdventureWorks2016CTP3 database. We can assume that Sales.SalesOrder_json table has the following data:

SalesOrderID	SalesOrderNumber	.........	SalesReasons
1	SO44484	........	["Manufacturer","Quality"]
2	SO44352	........	["Price"]
3	SO44367	..........	["Manufacturer","Quality"]
4	SO44464	.........	["Price","Quality"]

Instead of related tables SalesOrderHeaderSalesReasons and SalesReasons I have placed sales reasons as an array of JSON strings.

If we create Full-text index on SalesReasons column, it will split tokens in this text using separators " and ,. This can be used to improve your queries that search for sales orders by some sales reason values in this array.

Full text index

First we will create full text search catalog and index:

-- Create full text catalog for JSON data
 CREATE FULLTEXT CATALOG jsonFullTextCatalog;
 GO
 
-- Create full text index on SalesReason column.
 CREATE FULLTEXT INDEX ON Sales.SalesOrder_json(SalesReasons)
 KEY INDEX PK_SalesOrder__json_SalesOrderID
 ON jsonFullTextCatalog;
 GO

With the full text index we can easily find all rows where SalesReasons JSON array contains some value (e.g. Price):

SELECT SalesOrderNumber, OrderDate, SalesReasons
 FROM Sales.SalesOrder_json
 WHERE CONTAINS(SalesReasons, 'Price')

We can even use more complex expression with AND, OR, NOT operators:

SELECT SalesOrderNumber, OrderDate, SalesReasons FROM Sales.SalesOrder_json WHERE CONTAINS(SalesReasons, 'Price OR Quality')

As you can see, although FTS is not specialized for JSON text, you can use it to improve performance of your queries where you need to filter results by some value in JSON array.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/11/09/indexing-json-arrays-using-full-text-search-indexes.aspx

Метки: SQL Server 2016 JSON

Комментарии (0)

Generate serie of numbers in SQL Server 2016 using OPENJSON

Вторник, 03 Ноября 2015 г. 18:50 + в цитатник

How can we use OPENJSON to generate series of numbers?

In this post I will talk about one unusual use case of OPENJSON - generating series of numbers.

Problem: I want to dynamically generate a table of numbers (e.g. from 0 to N). Unfortunately we don't have this kind of function in SQL Server 2016, but we can use OPENJSON as a workaround.

OPENJSON can parse array of numbers [1,25,3,5,32334,54,24,3] and return a table with [key,value] pairs. Values will be elements of the array, and keys will be indexes (e.g. numbers from 0 to length of array - 1). In this example I don't care about values I just need indexes.

I can easily dynamically generate valid JSON array of length N using replicate function:

SELECT '[1' + replicate(',1',@count-1)+']'

Since I don't care about values in the array, I'm generating an array that has all elements equal to 1. Now, we can see how to generate series of numbers using this approach and OPENJSON table value function.

Case 1: Generate table with N numbers

In the first example, I will create table value function that generates @count numbers:

drop function if exists dbo.fn_gen_numbers
 go
create function
 dbo.fn_gen_numbers(@count int)
 returns table
 return (select cast([key] as int) as number from OPENJSON( '[1' + replicate(',1',@count-1)+']'))
 go

The following call will return numbers from 0 to 9:

 select * from dbo.fn_gen_numbers(10)

This function might be useful if you want to generate all months in the year, or all days in a month:

select DATETIMEFROMPARTS ( 2015, number+1, 1, 0, 0, 0, 0 )
 from dbo.fn_gen_numbers(12)
select DATETIMEFROMPARTS ( 2015, 1, number+1, 0, 0, 0, 0 )
 from dbo.fn_gen_numbers(31)

Case 2: Generate range from [@start to @end]

In slightly modified example, I will generate sequence from @start to @end inclusive:

drop function if exists dbo.fn_gen_range
 go
create function dbo.fn_gen_range(@start int, @end int)
 returns table
 return (select cast([key] as int) + @start as number from OPENJSON( '[1' + replicate(',1',@end-@start)+']'))

The following call will return numbers in the range [10,30]:

select * from dbo.fn_gen_range(10,30)

Case 3: Generate range from [@start to @end] with @step:

In the final example, I will generate sequence with step using following function:

drop function if exists dbo.fn_gen_sequence
 go
create function dbo.fn_gen_sequence(@start int, @end int, @step int)
 returns table
 return (select cast([key] as int) * @step + @start as number from OPENJSON( '[1' + replicate(',1',(@end-@start)/@step)+']'))

The following function will return odd numbers from 10 to 18:

select *
 from dbo.fn_gen_sequence(10,29,2)

Conclusion

Although we don't have built-in function that generates sequence of numbers, OPENJSON might be easiest workaround for this.

Note that in this examples I'm using new DROP IF EXISTS syntax that is available in SQL Server 2016, see http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/11/03/drop-if-exists-new-thing-in-sql-server-2016.aspx

Licence

Table value functions in this example can be copied and modified with one requirement - table value function derived from this code that generates numbers using OPENJSON MUST be called with fully qualified schema name, e.g.:

SELECT * FROM dbo.fn_gen_range()

and NOT just with function name:

SELECT * FROM fn_gen_range()

T-SQL enables you to call table value functions with or without schema name. However, if you don't use schema name function might be either user defined function or built-in function.

SQL Server team might decide to implement similar built-in table value function function in the future, and new function might collide with your names of user defined functions. Using schema name will prevent this kind of collision, and this is general coding guidance for T-SQL, so I'm adding this as a licence requirement :).

Vote for this function here if you want to see it in next version of SQL Server Add a built-in table of numbers .

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/11/03/generate-serie-of-numbers-in-sql-server-2016-using-openjson.aspx

Метки: SQL Server 2016 JSON

Комментарии (0)

DROP IF EXISTS - new thing in SQL Server 2016

Вторник, 03 Ноября 2015 г. 14:04 + в цитатник

In SQL Server 2016 CTP3 objects can DIE (DROP IF EXISTS)

Do you like to write following conditional DROP statements:

IF OBJECT_ID('dbo.Product, 'U') IS NOT NULL
 DROP TABLE dbo.Product;
 
IF EXISTS (SELECT * FROM sys.triggers WHERE name = 'trProductInsert')
 DROP TRIGGER trProductInsert

I don't like these, and if you also don't like them, then you might try new DROP IF EXISTS (a.k.a. DIE :) ) statements in SQL Server 2016.

From SQL Server 2016 CTP3 you can use new DIE statements instead of big IF wrappers, e.g.:

DROP TABLE IF EXISTS dbo.Product
 
DROP TRIGGER IF EXISTS trProductInsert

If the object does not exists, DIE will not fail and execution will continue. Currently, the following objects can DIE:

AGGREGATE	PROCEDURE	TABLE
ASSEMBLY	ROLE	TRIGGER
VIEW	RULE	TYPE
DATABASE	SCHEMA	USER
DEFAULT	SECURITY POLICY	VIEW
FUNCTION	SEQUENCE
INDEX	SYNONYM

DIE is added on columns and constraints in ALTER TABLE statement

ALTER TABLE DROP COLUMN IF EXISTS
ALTER TABLE DROP CONSTRAINT IF EXISTS

Documentation is already published on MSDN:

DROP TABLE (Transact-SQL),DROP PROCEDURE (Transact-SQL), DROP TRIGGER (Transact-SQL), ALTER TABLE (Transact-SQL), etc.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/11/03/drop-if-exists-new-thing-in-sql-server-2016.aspx

Метки: SQL Server 2016

Комментарии (0)

OPENJSON - one of best ways to select rows by list of ids

Вторник, 03 Ноября 2015 г. 12:20 + в цитатник

Selecting table rows by id in SQL Server

In this post I will talk about different way to select rows from the table by list of ids. I will compare performance of some common approaches (IN, LIKE, and OPENJSON).

Problem

Let's talk about simple and common problem - We have a list of identifiers and we want to select all rows from a table that have ids in that list. As an example ,user selected some records in the UI, and we need to fetch these records by ids. How can we do this?

Solutions

You have following options:

1. Use IN where you can pass list of ids:

select BusinessEntityID, FirstName, LastName
 from Person.Person
 where BusinessEntityID IN (1,2,3)

This is the simplest method but it has few problems:

You would need to create this query in application layer by concatenating ids or use dynamic SQL
There is a risk of SQL injection
Query plans cannot be cached because query text is always different.

2. Use LIKE, e.g:

select BusinessEntityID, FirstName, LastName
 from Person.Person
 where ','+ @list + ',' LIKE '%,'+cast(BusinessEntityID as nvarchar(20)) + ',%'

In this case @list must be formatted as CSV array e.g. 1,2,3. This is probably the most commonly used anti-pattern that has following problems:

It is slow
There is a risk of SQL injection
Query plans cannot be cached because query text is always different.

However, people are using this approach because they can pass @list as a parameter to stored procedures.

3. Use OPENJSON that will split array of ids into table value function and merge it with source table:

select BusinessEntityID, FirstName, LastName
 from Person.Person
 join openjson(@list) ON value = BusinessEntityID

Note that @list must be formatted as JSON array e.g. [1,2,3] so you will need to surround your CSV list with brackets.

This is my favorite approach because:

There is no SQL injection
I can parametrize @list and use it in stored procedures
Performance are fine (see below)

Performance comparison

I'm using SQLQueryStress tool provided by Adam Machanic to compare results of these three approaches. In three examples I will create random list of three ids and select some rows using these three methods. I'm running 100 threads in 50 iterations:

As you can see, LIKE is the slowest way to do this and you should avoid it. Performance of OPENJSON and IN are close 10s compared to 11s (results might vary and in some cases I got very close numbers).

Therefore, you can see that OPENJSON is good approach for this scenario. feel free to repeat this experiment in your databases and compare results.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/11/03/openjson-one-of-best-ways-to-select-rows-by-list-of-id.aspx

Метки: SQL Server 2016 JSON

Комментарии (0)

Samples for SQL Server 2016 CTP3

Пятница, 30 Октября 2015 г. 02:16 + в цитатник

CTP3 of SQL Server 2016 was just announced, and is available for download.

To accompany CTP3, we also created an updated version of AdventureWorks, called AdventureWorks2016CTP3, and we created a number of samples to showcase some of the new functionality in SQL2016. We recommend you download and install the sample databases and play with the sample scripts. Download link:

AdventureWorks Sample Databases and Scripts for SQL Server 2016 CTP3

Let us know what you think, and how we can improve the samples.

--

SQL Server Team

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/10/29/samples-for-sql-server-2016-ctp3.aspx

Метки: sample CTP3 SQL2016

Комментарии (0)

Returning child rows formatted as JSON in SQL Server queries

Пятница, 09 Октября 2015 г. 16:49 + в цитатник

In this post I will talk about one annoying thing – how to return a result set containing one to many relationships between tables? In that case you will have multiple primary rows because one row is generated per for each child row. in this post we will see how you can resolve this problem using JSON.

Метки: SQL Server 2016 JSON Azure Sql Db

Комментарии (0)

Importing JSON files into SQL Server using OPENROWSET (BULK)

Среда, 07 Октября 2015 г. 12:44 + в цитатник

Importing JSON files in SQL Server

Currently you can find many JSON documents stored in files. Sensors generate information that are stored in files, applications log information in JSON files, etc. One important thing that you would need to do is to read JSON data stored in files, load them in SQL Server, and analyze them.

In this post we will see how you can import JSON files in SQL Server.

Importing files using OPENROWSET(BULK)

OPENROWSET (BULK) is a table value function that can read data from any file on the local drive or network if Sql Server has read access to that location. It returns a table with a single column (i.e. BulkColumn) that contains content of the file. In general case you can use various options with OPENROWSET (BULK) function, such as separators etc., but in the simplest case you can directly load entire content of a file as a text value (or single character large object a.k.a. SINGLE_CLOB in OPENROWSET “terminology”) and load content of that cell in any table or variable. An example of OPENROWSET (BULK) function that reads content of JSON file and return it to user as a query result is shown in the following example:

SELECT BulkColumn
 FROM OPENROWSET (BULK 'C:\JSON\Books\book.json', SINGLE_CLOB) as j

OPENJSON( BULK) will read content of the file and return it via BulkColumn. You can also load content of file in some local variable or table, as it is shown in the following example:

SELECT @json = BulkColumn
 FROM OPENROWSET (BULK 'C:\JSON\Books\book.json', SINGLE_CLOB) as j
 
 SELECT BulkColumn
 INTO #temp
 FROM OPENROWSET (BULK 'C:\JSON\Books\book.json', SINGLE_CLOB) as j

Importing JSON content

In this post I will show you how to import content of JSON file directly into SQL Server tables using OPENROWSET(BULK). I will use the fact that JSON is regular text so it can be imported as any other text format. Therefore, I can leverage existing functions that work with text and in this case import text from a file. In this example I will use JSON file containing books taken from this site. Instead of reading entire JSON text, I want to parse it and return either books in the file, or their properties.

In the simplest example we can read JSON objects from the file:

SELECT value
 FROM OPENROWSET (BULK 'C:\JSON\Books\books.json', SINGLE_CLOB) as j
 CROSS APPLY OPENJSON(BulkColumn)

OPENROWSET will read single text value from a file, return it as a BulkColumn, and pass it to the applied OPENJSON function. OPENJSON will iterate through the array of JSON objects in the BulkColumn array and return one book formatted as JSON in each row:

Value

{"id" : "978-0641723445","cat" : ["book","hardcover"],"name" : "The Lightning Thief", …

{"id" : "978-1423103349","cat" : ["book","paperback"],"name" : "The Sea of Monsters", …

{"id" : "978-1857995879","cat" : ["book","paperback"],"name" : "Sophie's World : The Greek …

{"id" : "978-1933988177","cat" : ["book","paperback"],"name" : "Lucene in Action, Second …

With new OPENJSON function we can parse that JSON content and transform it to a table or result set. In this example we will load content, parse loaded JSON and return five fields as columns:

SELECT book.*
 FROM OPENROWSET (BULK 'C:\JSON\Books\books.json', SINGLE_CLOB) as j
 CROSS APPLY OPENJSON(BulkColumn)
 WITH( id nvarchar(100), name nvarchar(100), price float,
 pages_i int, author nvarchar(100)) AS book

OPENROWSET(BULK) will read content of the file and we can pass that content to OPENJSON function with defined schema. OPENJSON will match properties in JSON objects using column names (e.g. price property will be returned as a price column and converted to float type). Results in this case would look like:

Id	Name	price	pages_i	Author
978-0641723445	The Lightning Thief	12.5	384	Rick Riordan
978-1423103349	The Sea of Monsters	6.49	304	Rick Riordan
978-1857995879	Sophie's World : The Greek Philosophers	3.07	64	Jostein Gaarder
978-1933988177	Lucene in Action, Second Edition	30.5	475	Michael McCandless

Now we can either return this table to the user, or load it into another table.

Loading data from Azure File Storage

You can use the same approach to read JSON files from any file that can be accessed by SQL Server. As an example, Azure File Storage supports SMB protocol, so you can map your local virtual drive to the Azure File storage share using the following procedure:

Create file storage account (e.g. mystorage), file share (e.g. sharejson), and folder using Azure portal or Azure PowerShell SDK. Upload some JSON file in the file storage share.
Create firewall outbound rule in Windows Firewall on your computer that allows port 445. Note that this port might be blocked by your internet provider. If you are getting DNS error (error 53) in the following step then you have not opened that port or it is blocked by your ISP.
Mount Azure File Storage share as local drive (e.g. t: ) using the following command:

net use [drive letter] \\[storage name].file.core.windows.net\[share name] /u:[storage account name] [storage account access key]

Example that I have used is:

net use t: \\mystorage.file.core.windows.net\sharejson /u:myaccont hb5qy6eXLqIdBj0LvGMHdrTiygkjhHDvWjUZg3Gu7bubKLg==

Storage account key and primary or secondary storage account access key can be found in the Keys section in Settings on Azure portal.

Now you should be able to access your JSON file via share name, e.g.:

SELECT book.* FROM

OPENROWSET(BULK N't:\books\books.json', SINGLE_CLOB) AS json

CROSS APPLY OPENJSON(BulkColumn)

WITH( id nvarchar(100), name nvarchar(100), price float,

pages_i int, author nvarchar(100)) AS book

Loading set of files into SQL Server

We can use the same approach to load a set of JSON files from the file system into local variables (we will assume that files are named as book.json):

declare @i int = 1
 
 declare @json AS nvarchar(MAX)
 
 while(@i<10)
 
 begin
 
 SET @file = 'C:\JSON\Books\book' + cast(@i as varchar(5)) + '.json';
 
 SELECT @json = BulkColumn FROM OPENROWSET (BULK (@file), SINGLE_CLOB) as j
 
 SELECT * FROM OPENJSON(@json) as json
 
 set @i = @i +1;
 
 end

As you can see, with OPENROWSET(BULK) function you can easily import files into SQL Server.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/10/07/importing-json-files-into-sql-server-using-openrowset-bulk.aspx

Метки: SQL Server 2016 JSON

Комментарии (0)

Querying JSON documents in Sql Server 2016 and DocumentDB

Вторник, 06 Октября 2015 г. 12:48 + в цитатник

Sql Server 2016 and DocumentDb enable you to query JSON documents. DocumentDb has nice syntax for querying JSON documents – you can find some good examples on DocumentDb site . Sql Server provides built-in functions for accessing JSON fields (JSON_VALUE), fragments (JSON_QUERY) and opening JSON documents (OPENJSON). In this post, I will show you some queries that can be executed in DocumentDb and equivalent Sql Server queries.

Метки: SQL Server 2016 JSON Azure Sql Db

Комментарии (0)

Parsing GeoJSON format in Sql Server 2016

Пятница, 02 Октября 2015 г. 15:55 + в цитатник

OPENJSON function in Sql Server enables you to open complex JSON structures. In this post we will see how you can open GeoJSON format.

Метки: SQL Server 2016 JSON

Комментарии (0)

Indexing JSON documents in Sql Server 2016

Среда, 30 Сентября 2015 г. 11:50 + в цитатник

JSON indexes in SQL Server

One important thing in every database is indexing. Indexes can speed-up your filter or sort operations. Without indexes, Sql Server would need to perform full table scan every time you query data.

Sql Server 2016 do not have some customized JSON index. However, you can use standard indexes on JSON documents. In this post we will see how you can use these indexes to optimize your queries.

Indexing JSON properties

In many cases you would filter results from a table by some JSON property or order results using some value in a JSON document.

An example, we can imagine that we have “Info” column in the AdventureWorks SalesOrderHeader table that contains various information about sales order (e.g. information about customer, sales person, shipping/billing addresses, etc.). We want to use this column to filter sales orders for some customer. The query that we might want to optimize using index is shown in the following code:

SELECT SalesOrderNumber, OrderDate,
 JSON_VALUE(Info, '$.Customer.Name') AS CustomerName
 FROM Sales.SalesOrderHeader 
 WHERE JSON_VALUE(Info, '$.Customer.Name') = N'Aaron Campbell'

If you want to speed-up your filters or order by clauses applied on some property in JSON document, you can use the same indexes that you are using on any other columns. However, you cannot directly reference properties in the JSON documents. You need to create “virtual column” that will return values that will be used for filtering and create index on that column (in our example column that will expose customer name stored in the $.Customer.Name path in JSON documents). An example of computed column that will be used for indexing and index on that column is shown in the following code:

ALTER TABLE Sales.SalesOrderHeader
 ADD vCustomerName AS JSON_VALUE(Info, '$.Customer.Name')
 
 CREATE INDEX idx_soh_json_CustomerName
 ON Sales.SalesOrderHeader(vCustomerName)

Note that column is not persisted. It does not occupy additional space in the table and it is just computed when index needs to be rebuilt.

It is important that you create computed column with the same expression as the one that is used in the query, i.e. JSON_VALUE(Info, '$.Customer.Name') in the query above.

You don’t need to rewrite your queries - if you use expressions with JSON_VALUE function, Sql Server will see that there is an equivalent computed column with the same expression and apply index if possible. Execution plan for the query above might look like to one on the following figure:

Instead of the full table scan, Sql Server uses index seek into non-clustered index and finds rows that satisfy conditions. Then it uses Key Lookup in the SalesOrderHeader table to fetch other columns that are used in the query (i.e. SalesOrderNumber and OrderDate in our case).

Note that you can avoid this additional lookop in the table if you add required columns in the index. You could add these columns in the JSON index as standard included columns:

CREATE INDEX idx_soh_json_CustomerName
 ON Sales.SalesOrderHeader(vCustomerName)
 INCLUDE (SalesOrderNumber, OrderDate)

In this case Sql Server would not read additional data from the SalesOrderHeader table because everything is included in the non-clustered JSON index. This is might be a good way to combine JSON and column data in the queries and create optimal indexes fro your workload.

Collation aware indexes

The important feature of JSON indexes is the fact that they are collation aware. Result of JSON_VALUE function is a text that inherits collation from input expression. Therefore, values in the index will be ordered using the collation rules defined in the source columns.

To demonstrate this, we can create a simple collection table with primary key and json content:

CREATE TABLE JsonCollection
 ( id int identity constraint PK_JSON_ID primary key,
 json nvarchar(max) COLLATE Serbian_Cyrillic_100_CI_AI
 CONSTRAINT [Content should be formatted as JSON]
 CHECK (ISJSON(json)>0)
 )

In this case we have set Serbian Cyrillic collation on the JSON column. Now we can populate table and create index on name property:

INSERT INTO JsonCollection
 VALUES (N'{"name":"Иво","surname":"Андрић"}'),
 (N'{"name":"Андрија","surname":"Герић"}'),
 (N'{"name":"Владе","surname":"Дивац"}'),
 (N'{"name":"Новак","surname":"Ђоковић"}'),
 (N'{"name":"Предраг","surname":"Стојаковић"}'),
 (N'{"name":"Михајло","surname":"Пупин"}'),
 (N'{"name":"Борислав","surname":"Станковић"}'),
 (N'{"name":"Владимир","surname":"Грбић"}'),
 (N'{"name":"Жарко","surname":"Паспаљ"}'),
 (N'{"name":"Дејан","surname":"Бодирога"}'),
 (N'{"name":"Ђорђе","surname":"Вајферт"}'),
 (N'{"name":"Горан","surname":"Бреговић"}'),
 (N'{"name":"Милутин","surname":"Миланковић"}'),
 (N'{"name":"Никола","surname":"Тесла"}')
 
 GO
 ALTER TABLE JsonCollection
 ADD vName AS JSON_VALUE(json, '$.name')
 
 CREATE INDEX idx_name
 ON JsonCollection(vName)

We have created standard index on computed column vName that represents the value from the $.name property. In Serbian Cyrillic code page, order of letters is ‘А’,’Б’,’В’,’Г’,’Д’,’Ђ’,’Е’, etc. Since the result of JSON_VALUE function inherits collation from the source column, order in the index will be compliant with Serbian Cyrillic rules. Now if we want to query this collection and sort results by name:

SELECT JSON_VALUE(json, '$.name'), * FROM JsonCollection
 ORDER BY JSON_VALUE(json, '$.name')

If you look at the actual execution plan, you will see that it uses sorted values from non-clustered index:

We don’t have Sort operator although we have ORDER BY clause. JSON index is already ordered according to the Serbian Cyrillic rules so Sql Server can use non clustered index where results are already sorted.

However, if we change collation of the order by expression (e.g. put COLLATE French_100_CI_AS_SC after JSON_VALUE function), we will get different query execution plan:

Since the order of values in the index is not compliant to the French collation rules, Sql Server cannot use it to order results. Therefore, it will add separate Sort operator that will sort results using French collation rules.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/09/30/indexing-json-documents-in-sql-server-2016.aspx

Метки: SQL Server 2016 JSON

LiveInternetLiveInternet

-Поиск по дневнику

-Подписка по e-mail

-Постоянные читатели

-Статистика

Перейти к полной формеБыстрая запись

New JSON tables

JSON script examples

De-normalization

JSON views and procedures

JSON indexes

JSON import/export

JSON query examples

Cleanup script

Problem

Solution

Data model

Conclusion

Problem

Solution

Data model

Use Case 1: Store phone numbers and their types into single column as JSON array

Use Case 2: Store values of email addreses into single column as JSON array of values

Conclusion

Problem

Solution

Data model

Use Case 1

Use case 2: Aggregate results using columns and values from JSON text

Use Case 3: Aggregate results using collections of JSON objects

Conclusion

Logging

Problem

Solution

How to analyze JSON logs with SQL Server?

Analyzing LD-JSON logs

Analyzing file from Azure File Storage

Conclusion

Storing JSON in SQL Server

Storing JSON in text columns

JSON constraints

How fast is FOR JSON compared to generating JSON in application layer?

Problem

Option 1 - Formatting JSON in application

Option 2 - JOIN parent and child table

Option 3 - Format related rows as JSON

Option 4 - Format JSON in database layer using FOR JSON AUTO

Performance results

Indexing JSON data (Part II) - Full text search

Table structure

Full text index

How can we use OPENJSON to generate series of numbers?

Case 1: Generate table with N numbers

Case 2: Generate range from [@start to @end]

Case 3: Generate range from [@start to @end] with @step:

Conclusion

In SQL Server 2016 CTP3 objects can DIE (DROP IF EXISTS)

Selecting table rows by id in SQL Server

Problem

Solutions

Performance comparison

Importing JSON files in SQL Server

Importing files using OPENROWSET(BULK)

Importing JSON content

Loading data from Azure File Storage

Loading set of files into SQL Server

JSON indexes in SQL Server

Indexing JSON properties

Collation aware indexes

`Use Case 3: Aggregate results using collections of JSON objects`