OPENJSON - The easiest way to import JSON text into table

Вторник, 22 Сентября 2015 г. 12:12 + в цитатник

Introduction

In SQL Server 2016 will be added a new table value function that can transform JSON text into row(s). In this post I will show how you can directly insert/update table rows using JSON objects.

Opening JSON objects

OPENJSON table value function transforms JSON object to one or many rows. Syntax looks like:

OPENJSON()
WITH ( <mapping definitions>> )

In the WITH clause you can define shema of the returned rows with names of columns and their types. OPENJSON function will parse JSON object, match properties in JSON object with column names and convert their values to specified types. Example of query that opens content of JSON variable is shown in the following listing:


 declare @json nvarchar(max) = '{
 "id" : 2,
 "firstName": "John",
 "lastName": "Smith",
 "isAlive": true,
 "age": 25,
 "dateOfBirth": "2015-03-25T12:00:00",
 "spouse": null
 }';
 
 SELECT * 
 FROM OPENJSON(@json)
 WITH (id int,
 firstName nvarchar(50), lastName nvarchar(50), 
 isAlive bit, age int,
 dateOfBirth datetime2, spouse nvarchar(50))

In this example, we have opened json object and mapped fields to columns by name. Result of execution would look like:

id	firstName	lastName	isAlive	age	dateOfBirth	spouse
2	John	Smith	1	25	2015-03-25T12:00:00	NULL

OPENJSON function will also work with JSON arrays. In that case, it will return a multi-row table (each table row will match one JSON object in array) instead of single row table. This function can also open nested/hierarchical JSON object (I will describe this in a separate post). In this post you will see how easily you can insert and update tables using JSON objects.

Inserting JSON content in table

Now, when we have opened JSON we can do anything. As an example, we can use OPENJSON function to directly load JSON object into a table:

declare @json nvarchar(max) = '{
 "id" : 2,
 "firstName": "John",
 "lastName": "Smith",
 "isAlive": true,
 "age": 25,
 "dateOfBirth": "2015-03-25T12:00:00",
 "spouse": null
 }';
 
 INSERT INTO Person
 SELECT * 
 FROM OPENJSON(@json)
 WITH (id int,
       firstName nvarchar(50), lastName nvarchar(50), 
       isAlive bit, age int,
       dateOfBirth datetime2, spouse nvarchar(50))

JSON variable will be converted to a row that can be inserted into a table as any other row. You can use this approach to directly load JSON objects received via REST service without need to transform JSON to object model, set values as parameters in SQL command etc. This way, you can directly send single JSON text file and load it into table. If your JSON is valid there is no risk that you will have some SQL injection attack. OPENJSON will not execute any command - it just returns a table row if JSON text is properly formatted.

You can also insert an array of JSON rows because the same code will work. OPENJSON will just return set of rows instead of single row that should be inserted.

Updating tables using JSON objects

We can easily update any table based on JSON text. You just need to pass entire JSON object, open it using OPENJSON function and update table row as shown in the following example:

declare @json nvarchar(max) = N'{
 "id" : 2,
 "firstName": "George",
 "lastName": "Smith",
 "isAlive": true,
 "age": 31,
 "dateOfBirth": "2015-03-25T12:00:00",
 "spouse": "Marry"
 }';
 
 
 UPDATE Person
 SET firstName = json.firstname,
 lastName = json.lastname,
 isAlive = json.isAlive,
 age = json.age,
 dateOfBirth = json.dateOfBirth,
 spouse = json.spouse
 FROM OPENJSON(@json)
 WITH (id int,
       firstName nvarchar(50), lastName nvarchar(50), 
       isAlive bit, age int,
       dateOfBirth datetime2, spouse nvarchar(50)) AS json
 WHERE Person.id = json.id

In this example we have opened JSON variable and updated columns in the Person table. We have used id field from the opened JSON to match the row that should be updated, while the other fields are used to update Person row.

Delete data

I don't believe that you will send JSON object if you want to delete it (probably you will directly send id of table row). However, even if you decide to use this approach, you can delete row in the table using following example:

 DELETE Person
 WHERE id = CAST(JSON_VALUE(@json, '$.id') AS int)

JSON_VALUE function will take id on the path $.id and row in the table will be deleted. You can use OPENJSON instead of JSON_VALUE, but this is simpler approach.

Conclusion

OPENJSON function will help you to easily parse, open and transform JSON object to table rows. Using OPENJSON you can easily insert or update table rows using JSON text.

You don't need to parse JSON in application layer, you don't need to pass properties as SQL parameters, there is no risk of SQL injection attacks if you have valid JSON.

This is probably easiest and most secure method to import your JSON documents in Sql Server.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/09/22/openjson-the-easiest-way-to-import-json-text-into-table.aspx

Комментарии (0)

Loading non-relational data formatted as JSON text

Среда, 16 Сентября 2015 г. 12:37 + в цитатник

JSON can be used to improve performance and reduce complexity in data load process if you serialize some entities as JSON collections. In this post we will see how you can use JSON columns in data load process.

Метки: Bulk Import/Export SQL Server 2016 JSON Azure Sql Db

Комментарии (0)

Enforcing uniqueness constraint in table with Clustered Columnstore Index

Вторник, 15 Сентября 2015 г. 04:19 + в цитатник

SQL Server introduced ‘updateable’ clustered columnstore index (CCI) starting with SQL Server 2014 to speed up analytics by orders of magnitude up to 100x while significantly reducing storage, typically 10x. Your mileage will vary. Microsoft SQL Server team strongly recommends using CCI for your Data Warehouse for larger tables (> 1 million rows). A typical recommended deployment for DW is to use CCI for the Fact table and traditional rowstore for Dimension tables. However, you should consider using CCI for dimension tables especially now that SQL Server 2016 supports creating one or more traditional btree indexes for efficient equality and short-range searches.

One interesting thing to know about Clustered Columnstore Index (CCI) is that it has no ‘key’ columns and the rows are NOT ordered. From this perspective, the word ‘clustered’ may be a bit confusing but the intent is to designate CCI as the ‘master’ copy of the data. If it helps, you can think of CCI as a ‘heap’ that stores data in ‘columnar storage format.

One of challenges with CCI in SQL Server 2014 is that there is no direct way to enforce uniqueness constraint. You can enforce uniqueness in a round-about way using materialized view as shown in the example here

create table t_account (

accountkey int not null,

accountdescription nvarchar (50),

accounttype nvarchar(50),

AccountCodeAlternatekey int)

-- create CCI on it

CREATE CLUSTERED COLUMNSTORE index ACCOUNT_CI on t_account

drop view dbo.myview_t_account

go

create view dbo.myview_t_account with schemabinding

as select accountkey,accounttype, accountcodealternatekey

from dbo.t_account

go

-- create a materialized view

create unique clustered index clix_myview_t_account on myview_t_account (accountkey)

insert into t_account values (1, 'hello', 'hello', 1)

-- now insert a row with duplicate key which will fail due to uniqueness violation

insert into t_account values (1, 'hello', 'hello', 2)

-- Msg 2601, Level 14, State 1, Line 36

-- Cannot insert duplicate key row in object 'dbo.myview_t_account'

-- with unique index 'clix_my_t_account'. The duplicate key value is (1).

-- The statement has been terminated.

Starting with SQL Server 2016, you can enforce uniqueness on CCI directly using a traditional btree index as shown in the example here

-- create the table. Unique constraint defaults to NCI

create table t_account (

accountkey int not null,

accountdescription nvarchar (50),

accounttype nvarchar(50),

unitsold int,

CONSTRAINT uniq_account UNIQUE (AccountKey)

)

-- when you create CCI, it inherits existing NCIs. In our case, the NCI used for uniqueness

CREATE CLUSTERED COLUMNSTORE INDEX t_account_cci ON t_account

--Test the unique key:

insert into t_account values (1,1,1,1)

insert into t_account values (1,2,2,2)

Msg 2627, Level 14, State 1, Line 22

Violation of UNIQUE KEY constraint 'uniq_account'. Cannot insert duplicate key in object 'dbo.t_account'. The duplicate key value is (1).

The statement has been terminated.

You can see this is much simpler and intuitive. In my next blog, I will describe how to enforce PK/FK constraint on CCI

Thanks

Sunil

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/09/15/enforcing-uniqueness-constraint-in-table-with-clustered-columnstore-index.aspx

Метки: ColumnStore Index

Комментарии (0)

Passing arrays to T-SQL procedures as JSON

Среда, 09 Сентября 2015 г. 00:36 + в цитатник

SQL Server do not supports complex types such as arrays or lists. If you need to send some parametrized query or execute stored procedure, you will need to use primitive scalar types. Currently, it is hard to send some structured information to SQL server modules. New OPENJSON function can be used to parse array elements. If you need to send an array of element to SQL server it is better that dynamic SQL, and also simpler than table value functions.In this blog post you can see how to use this function. In this post you will see how you can use JSON arrays to send complex data to SQL Server.

Метки: SQL Server 2016 JSON Azure Sql Db

Комментарии (0)

Combining relational and NoSQL concepts in SQL Server

Вторник, 01 Сентября 2015 г. 16:53 + в цитатник

Intro

In this article we will see how you can combine NoSQL and relational concepts in SQL Server database.

We will see how you can transform tables to JSON objects and vice versa using FOR JSON and OPENJSON functions.

Relational vs. NoSQL approach

Traditional relational schema design follow strict normalization approach – every logical entity is stored as a separate table and there are foreign key relationships between related entities. Imagine products that may have many reviews – you should create one table for products, another for reviews, propagate foreign key from product into reviews table, and add a foreign key relationship between them, as shown in the following figure:

Although this is a proper database design, if you have many related entities you might end-up with a lot of tables and foreign key relationships, you will need to join tables every time you need to take product information, maintain indexes or foreign key columns to speed-up joins, etc. Example of query that reads products and related reviews is shown in following example:

SELECT Production.Product.ProductID AS ID, Production.Product.Name, 
               Color, ListPrice, ReviewerName, ReviewDate, Rating
FROM Production.Product
 JOIN Production.ProductReview
   ON Production.ProductReview.ProductID = Production.Product.ProductID

Every time you need to retrieve related reviews you need to join tables and you would probably need to create additional index on foreign key column in Review table. Results are shown in the following table:

ID	Name	Color	ListPrice	ReviewerName	ReviewDate	Rating
709	Mountain Bike Socks, M	White	9.50	John Smith	2007-10-20 00:00:00.000	5
937	HL Mountain Pedal	Silver/Black	80.99	David	2007-12-15 00:00:00.000	4
937	HL Mountain Pedal	Silver/Black	80.99	Jill	2007-12-17 00:00:00.000	2
798	Road-550-W Yellow, 40	Yellow	1120.49	Laura Norman	2007-12-17 00:00:00.000	5

On the client-side you probably need to process results and merge two 937 product rows as a single object with two related reviews items (e.g. as an array with two elements). Instead of single query that joins two tables you can use two separate queries. First query would read product information ad second query would read related product reviews. Some ORM tools use this approach when they need to read related informaiton. This is not so bad unless if you have a lot of related entities (e.g. product images, product attachments, product categories, etc.) and you need to run several independent queries for a single product.

One of the reasons why NoSQL systems become popular is the fact that you can use composite objects where you can store attributes of primary entity (product in our example) with related records (product reviews) within the primary entity as an array or collection of sub-objects. As an example, in MongoDb or DocumentDb you will create one JSON document for Product and add related reviews as an array of JSON objects like in the following example:

Products collection
{ "ProductID":709, "Name":"Mountain Bike Socks, M", "Color":"White", "Reviews":[ { "Reviewer":{ "Name":"John Smith", "Email":"john@fourthcoffee.com" }, "ReviewDate":"2007-10-20T00:00:00", "Rating":5, "ModifiedDate":"2007-10-20T00:00:00" } ] }
{ "ProductID":798, "Name":"Road-550-W Yellow, 40", "Color":"Yellow", "Reviews":[ { "Reviewer":{ "Name":"Laura Norman", "Email":"laura@treyresearch.net" }, "ReviewDate":"2007-12-17T00:00:00", "Rating":5, "ModifiedDate":"2007-12-17T00:00:00" } ] }
{ "ProductID":937, "Name":"HL Mountain Pedal", "Color":"Silver\/Black", "Reviews":[ { "Reviewer":{ "Name":"David", "Email":"david@graphicdesigninstitute.com" }, "ReviewDate":"2007-12-15T00:00:00", "Rating":4, "ModifiedDate":"2007-12-15T00:00:00" }, { "Reviewer":{ "Name":"Jill", "Email":"jill@margiestravel.com" }, "ReviewDate":"2007-12-17T00:00:00", "Rating":2, "ModifiedDate":"2007-12-17T00:00:00" } ] }

You have simpler data model, no JOINs, no additional requests/queries/indexes, all data available in the same record. However, this model is also far from perfect. Although it is a good choice for smaller data models, in more complex models you might end-up with heavy objects, or you would need to organize objects in separate collections. In some cases you might need to join objects stored in different collections, and you would find that this is not possible (e.g. in DocumentDb) or you need to write complex map/reduce jobs for a simple join (e.g. in MongoDb).

How to transform relational schema into NoSQL in SQL server?

SQL Server 2016/Azure SQL Database introduce hybrid approach where you can choose between relational and NoSQL concepts. As an example, if you have products and their reviews you don’t need to create separate tables if you don’t want them. You can create additional columns in the primary table that will contain collection of related entities formatted as JSON arrays:

ALTER TABLE Production.Product
 ADD Reviews NVARCHAR(MAX) 
 CONSTRAINT [Reviews are formatted as JSON] CHECK(ISJSON(Reviews)>0)

In this example, we are adding a simple text column with a constraint that specifies that reviews should be formatted as JSON (similar to NoSQL databases). There is no new syntax for JSON constraint - you can use standard check constraint with function ISJSON that checks is Reviews text formatted as a JSON object.

If we want to move related product reviews from a separate table into this column we can use a simple UPDATE query:

UPDATE Production.Product
 SET Reviews = (
 SELECT ReviewerName AS [Reviewer.Name],
 EmailAddress AS [Reviewer.Email],
 ReviewDate, Rating, ModifiedDate
 FROM Production.ProductReview
 WHERE Production.ProductReview.ProductID = Production.Product.ProductID
 FOR JSON PATH)

Inner query fetches all related reviews, formats them as JSON documents using FOR JSON clause and stores them as JSON text in Reviews column. We can format properties in JSON document using dot syntax (e.g. Product.Name will be created as a Name property in Product object).

Now, we can read products and related reviews with a single query:

SELECT ProductID, Name, Color, ListPrice, Reviews
 FROM Production.Product
 WHERE Reviews IS NOT NULL

Results are shown in the following table:

ProductID	Name	Color	ListPrice	Reviews
709	Mountain Bike Socks, M	White	9.5	[{"Reviewer":{"Name":"John Smith","E-mail":"john@fourthcoffee.com"},"ReviewDate":"2007-10-20T00:00:00","Rating":5,"ModifiedDate":"2007-10-20T00:00:00"}]
798	Road-550-W Yellow, 40	Yellow	1120.49	[{"Reviewer":{"Name":"Laura Norman","E-mail":"laura@treyresearch.net"},"ReviewDate":"2007-12-17T00:00:00","Rating":5,"ModifiedDate":"2007-12-17T00:00:00"}]
937	HL Mountain Pedal	Silver/Black	80.99	[ {"Reviewer":{"Name":"David","E-mail":"david@graphicdesigninstitute.com"},"ReviewDate":"2007-12-15T00:00:00","Rating":4,"ModifiedDate":"2007-12-15T00:00:00"}, {"Reviewer":{"Name":"Jill","E-mail":"jill@margiestravel.com"},"ReviewDate":"2007-12-17T00:00:00","Rating":2,"ModifiedDate":"2007-12-17T00:00:00"} ]

As you can see, reviews are returned as a collection of JSON objects. There is exactly one row per each product so you do not need any transformation on the client side. This is a perfect choice if your client already expects JSON format for related records (e.g. if you are using some Angular, Knockout or other template engines that inject JSON model into HTML view).

As an alternative if you want to “join” products with related reviews as in the query above, you can use following query:

SELECT ProductID, Name, Color, ListPrice, ReviewerName, ReviewDate, Rating
 FROM Production.Product
 CROSS APPLY OPENJSON(Reviews)
 WITH ( ReviewerName nvarchar(30) '$.Reviewer.Name', ReviewDate datetime2, Rating int)
WHERE Reviews IS NOT NULL

OPENJSON table value function takes related reviews formatted as JSON and returns them as a table. In the WITH part you can specify schema of returned table. Column names match the names of properties in JSON object, and if you have nested property (e.g. Reviewer.Name) you can specify JSON path where this value can be found. CROSS APPLY joins parent row with table returned by OPENJSON functions. Results are shown in the following table:

ID	Name	Color	ListPrice	ReviewerName	ReviewDate	Rating
709	Mountain Bike Socks, M	White	9.50	John Smith	2007-10-20 00:00:00.000	5
937	HL Mountain Pedal	Silver/Black	80.99	David	2007-12-15 00:00:00.000	4
937	HL Mountain Pedal	Silver/Black	80.99	Jill	2007-12-17 00:00:00.000	2
798	Road-550-W Yellow, 40	Yellow	1120.49	Laura Norman	2007-12-17 00:00:00.000	5

In both cases you don’t need to scan two tables – all necessary results are taken with a single query.

OK, but how to update reviews?

This is important thing you would need to consider. Updating JSON collection of objects is not so efficient as updating collection of rows in a separate table. If you need to change objects in JSON you would need to transform it back to relational format, update rows, and format them again as JSON. In the following procedure is shown how can you append new review into the collection of product reviews:

CREATE PROCEDURE
 AddProductReview(@productid int, @reviewerName nvarchar(30), @reviewerEmail nvarchar(30), @reviewDate datetime2, @rating int)
 AS
 BEGIN
 
 WITH reviews AS
 (
 SELECT ProductReviews.*
 FROM Production.Product
 CROSS APPLY OPENJSON(Reviews)
 WITH ([Reviewer.Name] nvarchar(30), [Reviewer.Email] nvarchar(30), ReviewDate datetime2, Rating int, ModifiedDate datetime2)
 AS ProductReviews
 WHERE ProductID = @productid
 UNION ALL
 SELECT @reviewerName AS [Reviewer.Name], @reviewerEmail AS [Reviewer.Email], @reviewDate AS ReviewDate, @rating AS Rating, getdate() AS ModifiedDate
 )
 UPDATE Production.Product
 SET Reviews = (
 SELECT * FROM reviews FOR JSON PATH
 )
 WHERE ProductID = @productid
 
 END

In the first part of WITH clause we are opening JSON array, returning a table of product reviews and appending new product review row using parameters supplied to the stored procedure. Then, we are updating original product row and put new reviews formatted as JSON into Reviews column.

This is not efficient as a separate table, but this is another trade-off you need to consider. If related information are read-only (e.g. items of the sales order that is closed, name/surname or some person, names of towns, countries, continents that will unlikely be changed) you will probably not update this column. Also, if related rows are not frequently changed opening JSON and formatting it again might not be big performance impact. However, if you have product reviews that are frequently changed then you should use separate table.

Summary

Although SQL Server is relational database you don't need to use strict relational theory in your database design. With new JSON support that is coming in SQL Server 2016 and Azure SQL Database you can choose when to follow strict concepts of relational schema design, and when to format objects as in the NoSQL systems. SQL Server gives you flexibility to choose the best format for your applications and enables you to easily transform table data into JSON documents and JSON documents to table structure.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/09/01/combining-relational-and-nosql-concepts-in-sql-server-with-json.aspx

Метки: SQL Server 2016 JSON

Комментарии (0)

FOR JSON clause in SQL Server 2016 CTP2

Понедельник, 08 Июня 2015 г. 10:06 + в цитатник

SQL Server 2016 will include JSON support. First functionality (FOR JSON clause) is available in CTP2. Here you can see an overview of FOR JSON clause that is available in SQL Server 2016 CTP2.

Метки: SQL Server 2016 JSON

Комментарии (0)

Clustered Column Store Index: Bulk Loading the Data

Понедельник, 28 Июля 2014 г. 05:49 + в цитатник

Clustered Column Store: Bulk Load

As described in the blog http://blogs.msdn.com/b/sqlserverstorageengine/archive/2014/07/27/clustered-column-store-index-concurrency-and-isolation-level.aspx, the clustered column store index has been optimized for typical DW scenario supporting nightly or trickle data load with fast query performance. Multiple inserts can load the data in parallel concurrently while DW queries are being run in read uncommitted transaction isolation level.

This blog describes locking behavior when data is inserted through Bulk Load command. Here is the table we will use in the example

Create table t_bulkload (

accountkey int not null,

accountdescription nvarchar (50),

accounttype nvarchar(50),

AccountCodeAlternatekey int)

Bulk loading into CCI

A more common scenario is to bulk import data into CCI. The bulk import loads the data into delta store if the batch size is < 100K rows otherwise the rows are directly loaded into a compressed row group. Let us walk through an example of illustrate Bulk Load

-- Let us prepare the data

-- insert 110K rows into a regular table

begin tran

declare @i int = 0

while (@i < 110000)

begin

insert into t_bulkload values (@i, 'description', 'dummy-accounttype', @i*2)

set @i = @i + 1

end

commit

-- bcp out the data... run the following command in command window

bcp adventureworksDW2012..t_bulkload out c:\temp\t_bulkoad.dat -c -T

As the next step, let us truncate the table t_bulkload and create a clustered columnstore index on it. At this time, there are no rowgroups as the table has no rows

--truncate the table

Truncate table t_bulkload

-- convert row clustered index into clustered columnstore index

CREATE CLUSTERED COLUMNSTORE index t_bulkload_cci on t_bulkload

Now, we will bulk import the data with a batchsize > 102400 as follows. Notice, I am running this command under a transaction. This will help us identify us to see what locks are taken

-- now bulkload the data

begin tran

bulk insert t_bulkload

FROM 'c:\temp\t_bulkoad.dat'

WITH

(

BATCHSIZE = 103000

)

-- show rowgroups

select * from sys.column_store_row_groups where object_id = object_id('t_bulkload')

The output below shows that there are two row groups created. . First row group with row_group_id=0 is ‘compressed’ with 103000 rows. This is because the batchsize >= 102400, the SQL Server will directly compress this row group. This is a useful because Bulk Load is a common scenario to load the data into a Data Warehouse. With directly compressing the rows, SQL Server can minimize logging (I will blog transaction logging into CCI later) as the rows do not go through delta row group. Also, there is no need for tuple mover to move the data. The second batch had only 7000 row because we ran out of rows in the data file (remember, we the data file had only 110000 rows) and this set of rows are inserted into delta row group ‘1’. Note, that the row group is still marked ‘OPEN’ meaning that it is not closed. It will eventually get closed and eventually compressed by the background ‘tuple mover’ when the number of rows hit 1 million row mark.

Let us now look at the locks. Here is the output. Note that for we have X lock on both the delta row group and compressed row group. Taking lock at row group level minimizes the locking overhead.

You may wonder what will happen if we insert a row from another session. Let us just do that

begin tran

insert into t_bulkload values (-1, 'single row', 'single row', -1)

Now let us look the row groups. You will note that the new row actually was inserted into new delta row group as hi-lighted below because the Bulk Insert transaction holds an X lock on row group = 1. SQL Server allows the INSERT operation to succeed instead of blocking it because INSERT is a common operation for DW therefore maximum concurrency is needed. The down side is that now you have two open delta row groups. The future inserts can go into any of these row groups so in the worst case you may have 2 million rows in the delta row groups before they get compressed. This will impact the DW query performance because part of the query accessing rows from delta row group is not as efficient.

Hope this blog clarifies how data is bulk imported into clustered columnstore index. In most case, there are no issues if you are loading large amount of data. In the worst case, I expect the number of delta row groups will be same as degree of concurrency for Bulk Import Operations.

Thanks

Sunil Agarwal

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2014/07/28/clustered-column-store-index-bulk-loading-the-data.aspx

Метки: ColumnStore Index

Комментарии (0)

Clustered Column Store Index: Concurrency with INSERT Operations

Воскресенье, 27 Июля 2014 г. 20:44 + в цитатник

Clustered Column Store: Insert Operations

As described in the blog http://blogs.msdn.com/b/sqlserverstorageengine/archive/2014/07/27/clustered-column-store-index-concurrency-and-isolation-level.aspx , the clustered column store index has been optimized for typical DW scenario supporting nightly or trickle data load with fast query performance. Multiple inserts can load the data in parallel concurrently while DW queries are being run in read uncommitted transaction isolation level.

This blog describes locking behavior when data is inserted concurrently. For the scenarios below, we will use the following table

CREATE TABLE [dbo].[T_ACCOUNT](

[accountkey] [int] IDENTITY(1,1) NOT NULL,

[accountdescription] [nvarchar](50) NULL

) ON [PRIMARY]

-- create a CCI

CREATE CLUSTERED COLUMNSTORE INDEX ACCOUNT_CI ON T_ACCOUNT

Insert Operations

Let us insert 1 row and see the locks taken. Note, we did not commit the transaction

begin tran

insert into T_ACCOUNT (accountdescription ) values ('row-1');

Here are the locks. Note, the new row is inserted into delta rowgroup which is organized as a btree in traditional row storage format. There is a new resource ROWGROUP in the context of CCI. The current transaction has taken IX lock on the ROWGROUP

Now, let us insert another row in another session as follows and look at the lock

begin tran

insert into T_ACCOUNT (accountdescription ) values ('row-2');

Note, that the second transaction in session-55, also inserted the row into the same rowgroup. In other words, concurrent inserts can load the data into same rowgroup without blocking each other.

In summary, the insert into CCI does not block other concurrent inserts and concurrent inserts load data into the same delta rowgorup. In the next blog, we will look into BulkLoad Operations

Thanks

Sunil Agarwal

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2014/07/27/clustered-column-store-index-concurrency-with-insert-operations.aspx

Метки: ColumnStore Index

Комментарии (0)

Clustered Column Store Index: Concurrency and Isolation Levels

Воскресенье, 27 Июля 2014 г. 05:48 + в цитатник

Clustered Column Store and Concurrency

The clustered column store index (CCI) has been designed for Data Warehouse scenario which primarily involves

Write once and read multiple times – CCI is optimized for query performance. It give order of magnitude better query performance by compressing the data in columnar format, processing set of row in batches and by bringing only the columns that are required by the query.
Bulk data import and trickle data load – Insert Operation

While it supports UPDATE/DELETE operations but it is not optimized for large number of these operation. In fact, concurrent DELETE/UPDATE can cause blocking in some cases and can lead to multiple delta row-groups.To understand the concurrency model, there is a new lock resource, called ROWGROUP. Let us see how locks are taken for different scenarios. I will walk through concurrency using a series of blogs starting with transaction isolation levels

Transaction Isolation levels Supported

Read Uncommitted –While this is ok for most DW queries, and in fact, queries running on PDW appliance access CCI under read uncommitted to avoid blocking with concurrent DML operations. This is how CCI is queried in Analytics Platform System, a re-branding of PDW. Please refer to the http://www.microsoft.com/en-us/server-cloud/products/analytics-platform-system/default.aspx#fbid=CRIMcFvfkD2
Read Committed – Only lock based implementation of read committed isolation is supported which can get blocked with concurrent DML transactions.

If RCSI is enabled on the database containing one or more tables with CCI, all tables other than CCI can be accessed with non-blocking semantics under read committed isolation level but not for CCI

Example:

select is_read_committed_snapshot_on, snapshot_isolation_state_desc,snapshot_isolation_state

from sys.databases where name='AdventureWorksDW2012'

CREATE TABLE [dbo].[T_ACCOUNT](

[accountkey] [int] IDENTITY(1,1) NOT NULL,

[accountdescription] [nvarchar](50) NULL

) ON [PRIMARY]

-- create a CCI

CREATE CLUSTERED COLUMNSTORE INDEX ACCOUNT_CCI ON T_ACCOUNT

Session-1

use AdventureWorksDW2012

go

-- Do a DML transaction on CCI but don't commit

begin tran

insert into T_ACCOUNT (accountdescription )

values ('value-1');

Session-2

-- query the table under read committed in a different session

set transaction isolation level read committed

go

select * from t_account

You will see CCI query is blocked on session-1 as shown using the query below

select

request_session_id as spid,

resource_type as rt,

resource_database_id as rdb,

(case resource_type

WHEN 'OBJECT' then object_name(resource_associated_entity_id)

WHEN 'DATABASE' then ' '

ELSE (select object_name(object_id)

from sys.partitions

where hobt_id=resource_associated_entity_id)

END) as objname,

resource_description as rd,

request_mode as rm,

request_status as rs

from sys.dm_tran_locks

Even though the database is using default non-blocking read committed isolation level using row versioning, the CCI is accessed using lock based implementation of read committed.

Snapshot Isolation – It can be enabled for the database containing CCI. Any disk-based table other than CCI can be accessed under Snapshot Isolation but access to CCI is disallowed and it generates the following error

Msg 35371, Level 16, State 1, Line 26

SNAPSHOT isolation level is not supported on a table which has a clustered columnstore index.

Repeatable Read – Supported in CCI

set transaction isolation level repeatable read

go

begin tran

select * from t_account

go

Here are the locks. Note it takes S lock on all rowgroups as we are doing the full table scan

Serializable – Supported in CCI

set transaction isolation level serializable

go

begin tran

select * from t_account

go

Here are the locks. Note it takes S lock at the table level to guarantee serializable Isolation level

In the next blog, I will discuss locks taken when inserting rows into CCI

Thanks

Sunil Agarwal

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2014/07/27/clustered-column-store-index-concurrency-and-isolation-level.aspx

Метки: ColumnStore Index

LiveInternetLiveInternet

-Поиск по дневнику

-Подписка по e-mail

-Постоянные читатели

-Статистика

SQL Database Engine Blog

Перейти к полной формеБыстрая запись

OPENJSON - The easiest way to import JSON text into table

Introduction

Opening JSON objects

Inserting JSON content in table

Updating tables using JSON objects

Delete data

Conclusion

Loading non-relational data formatted as JSON text

Enforcing uniqueness constraint in table with Clustered Columnstore Index

Passing arrays to T-SQL procedures as JSON

Combining relational and NoSQL concepts in SQL Server

Intro

Relational vs. NoSQL approach

How to transform relational schema into NoSQL in SQL server?

OK, but how to update reviews?

Summary

FOR JSON clause in SQL Server 2016 CTP2

Clustered Column Store Index: Bulk Loading the Data

Clustered Column Store: Bulk Load

Clustered Column Store Index: Concurrency with INSERT Operations

Clustered Column Store: Insert Operations

Insert Operations

Clustered Column Store Index: Concurrency and Isolation Levels

Clustered Column Store and Concurrency

Transaction Isolation levels Supported