-Поиск по дневнику

Поиск сообщений в rss_sqlserver_storage_engine

 -Подписка по e-mail

 

 -Постоянные читатели

 -Статистика

Статистика LiveInternet.ru: показано количество хитов и посетителей
Создан: 08.06.2006
Записей:
Комментариев:
Написано: 0




This is the blog of the engineering team managing and developing the Database Engine for SQL Server and Azure SQL Database


Добавить любой RSS - источник (включая журнал LiveJournal) в свою ленту друзей вы можете на странице синдикации.

Исходная информация - http://blogs.msdn.com/sqlserverstorageengine/default.aspx.
Данный дневник сформирован из открытого RSS-источника по адресу http://blogs.msdn.com/b/sqlserverstorageengine/rss.aspx, и дополняется в соответствии с дополнением данного источника. Он может не соответствовать содержимому оригинальной страницы. Трансляция создана автоматически по запросу читателей этой RSS ленты.
По всем вопросам о работе данного сервиса обращаться со страницы контактной информации.

[Обновить трансляцию]

Monitoring performance of natively compiled stored procedures – database-scoped configuration options

Среда, 04 Апреля 2018 г. 21:10 + в цитатник
We just added new database-scoped configuration options that will help with monitoring performance of natively compiled stored procedures. The new options XTP_PROCEDURE_EXECUTION_STATISTICS and XTP_QUERY_EXECUTION_STATISTICS are available now in Azure SQL Database, and will be available in the next major release of SQL Server. These options will improve your monitoring and troubleshooting experience for databases leveraging In-Memory OLTP... Read more

https://blogs.msdn.microsoft.com/sqlserverstorageengine/2018/04/04/monitoring-performance-of-natively-compiled-stored-procedures-database-scoped-configuration-options/


Метки:  

How to configure network for Azure SQL Managed Instance

Среда, 14 Марта 2018 г. 19:11 + в цитатник
Azure SQL Managed Instance is fully managed SQL Server instance hosted in Azure cloud and placed in your Azure Virtual Network. In this post, I will explain how you can prepare network environment for Managed Instance. Azure SQL Database Managed Instance is a new data service currently in public preview. It is a dedicated resource... Read more

https://blogs.msdn.microsoft.com/sqlserverstorageengine/2018/03/14/how-to-configure-network-for-azure-sql-managed-instance/


Метки:  

Demo – Azure SQL Database Managed Instance in action

Среда, 14 Марта 2018 г. 15:38 + в цитатник
Azure SQL Database Managed Instance is new data service in Azure cloud that exposes fully-managed SQL Server Instance that is hosted in Azure cloud and placed in customer VNET. Here you can see some of the key new features that are available in Managed Instance. Azure SQL Database Managed Instance is fully-managed SQL Server instance... Read more

https://blogs.msdn.microsoft.com/sqlserverstorageengine/2018/03/14/demo-azure-sql-database-managed-instance-in-action/


Метки:  

How to configure tempdb in Azure SQL Managed Instance(preview)

Среда, 14 Марта 2018 г. 00:20 + в цитатник
Azure SQL Database Managed Instance (public preview) is a fully-managed SQL Server instance in Azure cloud. In the current public preview there are some known issues and bugs that will be fixed during public preview period. One of the potential issues is number of files and max_size that is set on tempdb. In this post,... Read more

https://blogs.msdn.microsoft.com/sqlserverstorageengine/2018/03/13/how-to-configure-tempdb-in-azure-sql-managed-instance/


Метки:  

What is Azure SQL Database Managed Instance?

Среда, 07 Марта 2018 г. 21:12 + в цитатник
Azure SQL Database Managed Instance is a new flavor of Azure SQL Database that represents fully managed SQL Server Instance hosted in Azure cloud. Currently, Azure SQL Database PaaS has two main offers for the customers who use SQL Server database and want to migrate to PaaS: Managed Database  isolated and self-contained database service... Read more

https://blogs.msdn.microsoft.com/sqlserverstorageengine/2018/03/07/what-is-azure-sql-database-managed-instance-2/


Метки:  

February 2018 Leaderboard of Database Systems contributors on MSDN

Вторник, 06 Марта 2018 г. 18:41 + в цитатник
Congratulations to our February top 10 contributors! Alberto Morillo and Visakh Murukesan maintain their top positions. This Leaderboard initiative was started in October 2016 to recognize the top Database Systems contributors on MSDN forums. The following continues to be the points hierarchy (in decreasing order of points):  ... Read more

https://blogs.msdn.microsoft.com/sqlserverstorageengine/2018/03/06/february-2018-leaderboard-of-database-systems-contributors-on-msdn/


Метки:  

Migrating to Azure SQL Database with Zero Downtime for Read-Only Workloads

Четверг, 15 Февраля 2018 г. 19:29 + в цитатник
Special thanks to MSAsset engineering team’s Peter Liu (Senior Software Engineer),  Vijay Kannan (Software Engineer), Sathya Muhandiramalage (Software Development Engineer II), Bryan Castillo (Principal Software Engineer) and Shail Batra (Principal Software Engineering Manager) for sharing their migration story with the Azure SQL Database product team. Microsoft uses an internally written service called MSAsset to manage all... Read more

https://blogs.msdn.microsoft.com/sqlserverstorageengine/2018/02/15/migrating-to-azure-sql-database-with-zero-downtime-for-read-only-workloads/


Метки:  

New Windows container images of SQL Server 2017 on Windows Server 1709!

Понедельник, 12 Февраля 2018 г. 08:03 + в цитатник
The Windows Server Core container images of SQL Server 2017 Developer & Express based on Windows Server version 1709 are now available on Docker Hub! These can be used in both Windows Server Containers as well as Hyper-V Containers. Windows Server version 1709 brings the following important improvements that developers can take advantage of with... Read more

https://blogs.msdn.microsoft.com/sqlserverstorageengine/2018/02/11/new-windows-container-images-of-sql-server-2017-on-windows-server-1709/


Метки:  

January 2018 Leaderboard of Database Systems contributors on MSDN

Пятница, 09 Февраля 2018 г. 18:33 + в цитатник
Congratulations to our January 2018 top 10 contributors! Alberto Morillo and Visakh Murukesan maintain their top positions. This Leaderboard initiative was started in October 2016 to recognize the top Database Systems contributors on MSDN forums. The following continues to be the points hierarchy (in decreasing order of points):... Read more

https://blogs.msdn.microsoft.com/sqlserverstorageengine/2018/02/09/january-2018-leaderboard-of-database-systems-contributors-on-msdn/


Метки:  

JSON parsing - performance comparison

Четверг, 14 Января 2016 г. 14:17 + в цитатник

One of the first questions that people asked once we announced JSON support in SQL Server 2016 was "Would it be slow?" and "How fast you can parse JSON text?". In this post, I will compare performance of JSON parsing with JSON_VALUE function with the XML and string functions.

Experiment

In this experiment I have compared JSON_VALUE with xml.value, and few string functions. In the experiment I have repeated the same function call 100000 times and calculated average execution time:

DECLARE @dt1 datetime2, @dt2 datetime2
declare @spentTime float = 0
declare @i int = 0
declare @s nvarchar(100)

while(@i<100000)
begin


DECLARE @json AS NVARCHAR(200) = '{"Root":{"ProductDescription":{"ProductID":"1","ProductName": "Road Bike","Features": {"Warranty": "1 year parts and labor","Maintenance": "3 year parts and labor extended maintenance is available"}}}}'

-- Start

set @dt1 = getdate()

set @s = JSON_VALUE(@json, N'$.Root.ProductDescription.ProductName')
-- set @s = CHARINDEX('ProductName', @json)
-- set @s = PATINDEX('%ProductName%', @json)
-- set @s = SUBSTRING(@json, 10, 10)
--
set @s = SUBSTRING(@json, CHARINDEX('ProductName', @json)+15, CHARINDEX('"', @json, CHARINDEX('ProductName', @json)+15) - (CHARINDEX('ProductName', @json)+15))
 
set @dt2 = getdate()

set @spentTime += DATEDIFF(nanosecond, @dt1, @dt2)

set @i = @i+1
end;


SELECT 'Execution time is ' + CAST( (@spentTime / 100000 / 1000) as nvarchar(100) ) + ' micro-seconds per call'

I have created one JSON string and applied functions JSON_VALUE, CHARINDEX, PATINDEX and SUBSTRING in 100000 iterations. Total time is calculated as a sum of duration of individual function calls. In each experiment I have uncommented function that I want to test and commented-out other ones.

I have also added one more complex string expression:

SUBSTRING(@json, CHARINDEX('ProductName', @json)+15, CHARINDEX('"', @json, CHARINDEX('ProductName', @json)+15) - (CHARINDEX('ProductName', @json)+15))

This expression is hard-coded simulation of JSON_VALUE function. I'm trying to extract substring between "ProductName" key, skip three characters after this key (i.e. ":" sequence) and find the rest of the text until the double quote. This expression will return the same result as JSON_VALUE function although it will not be context sensitive and it would not work in more complex JSONs because it cannot be aware is ProductName actually key or some random text. However, it is good enough for performance comparison.

To test XML value I will use slightly modified input that uses XML instead of JSON:

DECLARE @dt1 datetime, @dt2 datetime
declare @spentTime float = 0
declare @i int = 0
declare @s nvarchar(100)
while(@i<100000)
begin

DECLARE @xml as xml = N'



3 year parts and labor extended maintenance is available
1 year parts and labor

1
Road Bike

'

-- Start

set @dt1 = getdate()
set @s = @xml.value('(/Root/ProductDescription/ProductName)[1]', 'nvarchar(400)')
set @dt2 = getdate()

set @spentTime += DATEDIFF(nanosecond, @dt1, @dt2)

set @i = @i+1
end;


SELECT 'Execution time is ' + CAST( (@spentTime / 100000 / 1000) as nvarchar(100) ) + ' micro-seconds per call'

Here I have just converted JSON into equivalent XML and used XPath expression instead of JsonPath.

Results

In the following table are listed results of tests:

function time (microseconds)
@xml.value 30-50
JSON_VALUE 11-13
PATINDEX 7-9
CHARINDEX 7-9
SUBSTRING 1-2
SUBSTRING+CHARINDEX 18-22

The interesting thing we can notice is that parsing JSON text is 3-5x faster than parsing equivalent XML.

JSON_VALUE is x2 slower than PATINDEX and CHARINDEX, which is expected since these functions have much simpler parsing logic. However, if we are aware that JSON_VALUE use more complex parsing rules that are context aware (JSON_VALUE need to remember paths of parent objects) and conversion of escaped characters, this might not be too bad.

Direct access via position in SUBSTRING is the 10x faster than JSON_VALUE and 5-6x faster than PATINDEX/CHARINDEX, which is also expected.

The interesting result is simulation of JSON_VALUE via SUBSTRING and CHARINDEX functions. Although these functions use simple matches they are slower than parser used by JSON_VALUE. The reasons might be the facts the we need to scan JSON text three time to get the char indexes and also the fact that JSON parser use exact character matches and string function use collation aware rules for matching.

This test is executed on SQL Server 2016 CTP3.2 you can copy paste examples and run them on your own instances to check what are the ratios in your environments.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2016/01/14/json-parsing-performance-comparison.aspx


Метки:  

Returning spatial data in GeoJSON format - Part 2

Среда, 13 Января 2016 г. 13:41 + в цитатник

In this post I will show you how to export other spatial types such as lines or polygons, into equivalent GeoJSON formats.

...(read more)

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2016/01/13/returning-spatial-data-in-geojson-format-part-2.aspx


Метки:  

Clustered Columnstore Index: Data Load Optimizations - Minimal Logging

Понедельник, 11 Января 2016 г. 07:20 + в цитатник

When bulk importing large number of rows, you want to make sure it can be done efficiently with minimal resource usage so that that you can finish the data load quickly and start running your analytics. SQL Server provides following three key optimizations during bulk import

  • Minimal logging
  • Reduced overhead of locks
  • Concurrent Inserts

Just like regular tables where data is physically stored as rows, these optimizations are available in tables with columnstore indexes as well but there are some differences. This series of blog explains how each of these optimizations in the context of table with columnstore indexes.

Minimal Logging

Unlike common misconception, it does not mean ‘no logging’. Minimal logging logs the page allocation related info but no data rows are logged. You may wonder why could SQL not eliminate logging altogether because you can ‘reload’ the data. The issue is that if the data load fails, SQL will need to roll back the transaction and will have no way of knowing which pages were allocated. At this point, hopefully I have convinced you that minimal logging is good to minimize any logging related bottleneck. Let us now look at minimal logging for tables with clustered columnstore indexes (CCI).

When you import the data into CCI, the data is loaded either into delta rowgroup or compressed rowgroup based on the rule “If batchsize < 102400, the rows are loaded into delta rowgroup otherwise they are directly loaded into compressed rowgroup”. Minimal logging is only supported when data is directly loaded into compressed rowgroup. This means, to get minimal logging with CCI, you must use a batchsize >= 102400.

Here are couple of examples

-- create the table

create table t_bulkload (

accountkey int not null,

accountdescription char (500),

accounttype char(500),

AccountCodeAlternatekey int)

-- Let us prepare the data

-- insert 110K rows into a regular table

begin tran

declare @i int = 0

while (@i < 110000)

begin

insert into t_bulkload values (@i, 'description', 'dummy-accounttype', @i*2)

set @i = @i + 1

end

commit

-- bcp out the data... run the following in command window

bcp columnstore..t_bulkload out c:\temp\t_bulkoad-1.dat -c -T

--truncate the table

truncate table t_bulkload

-- create clustered columnstore index

CREATE CLUSTERED COLUMNSTORE index t_bulkload_cci on t_bulkload

-- clean up the transaction log (note, the database is

-- simple recovery mode)

Checkpoint

-- now bulkload the data

begin tran

-- this loads the data in 1 batch

bulk insert t_bulkload

FROM 'c:\temp\t_bulkoad-1.dat'

-- show the log records generated

-- note, I am using ‘log record length 1104’ because I know the length of the

-- insert log record.

select count(*)

from sys.fn_dblog(NULL, NULL) as t_log

where allocunitname like '%t_bulkload%' and operation like '%LOP_INSERT_ROWS%' and [Log Record Length] = 1104

You will see that it returns 0 as no data rows were logged. Abort the transaction and run the checkpoint command to cleanup the log records. Now let us run the same command but with the batchsize of 10000. Recall, this will load the rows into delta rowgroup

begin tran

-- Now use a smaller batchsize of 10000

-- I am using even TABLOCK

bulk insert t_bulkload

FROM 'c:\temp\t_bulkoad-1.dat'

WITH

(

BATCHSIZE = 10000

,TABLOCK

)

-- show the log records generated

select count(*)

from sys.fn_dblog(NULL, NULL) as t_log

where allocunitname like '%t_bulkload%' and operation like '%LOP_INSERT_ROWS%' and [Log Record Length] = 1104

You will see that the count is 110000, exactly the number of rows we inserted. In other words, the data rows are fully logged (i.e. no minimal logging). So the question is why did we not get minimal logging with delta rowgroup. There are two reasons. First, the delta rowgroup is organized as a btree, not a HEAP. Second, we expect large number of rows to be loaded to CCI therefore have chosen to optimize data load directly into compressed rowgroup. One additional advantage of directly loading data into compressed rowgroup is that it eliminates the additional step of loading data into delta rowgroup and then migrating this data into compressed rowgroups.

There you have it; our recommendation is to choose a batchsize of > 102400 to get benefits of minimal logging with clustered columnstore index. In the next blog, I will discuss parallel bulk import and locking optimizations.

Thanks

Sunil Agarwal

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2016/01/11/clustered-columnstore-index-data-load-optimizations-minimal-logging.aspx


Метки:  

Loading GeoJSON data into SQL Server

Вторник, 05 Января 2016 г. 20:01 + в цитатник

GeoJSON is popular format for spatial data representation. If you receive text formatted as GeoJSON from other systems, you can load it into SQL Server and convert it into spatial types.

New OPENJSON function in SQL Server 2016 enables you to parse and load GeoJSON text into SQL Server spatial types.

In this example, I will load GeoJSON text that contains a set of bike share locations in Washington DC. GeoJSON sample is provided ESRI and it can be found in https://github.com/Esri/geojson-layer-js/blob/master/data/dc-bike-share.json

I will load this GeoJSON text using following query:

drop table if exists BikeShare
create table BikeShare(
id int identity primary key,
position Geography,
ObjectId int,
Address nvarchar(200),
Bikes int,
Docks int )
declare @bikeShares nvarchar(max) = 
'{"type":"FeatureCollection",
"features":[{"type":"Feature",
"id":"56679924",
"geometry":{"type":"Point",
"coordinates":[-77.0592213018017,38.90222845310455]},
"properties":{"OBJECTID":56679924,"ID":72,
"ADDRESS":"Georgetown Harbor / 30th St NW",
"TERMINAL_NUMBER":"31215",
"LATITUDE":38.902221,"LONGITUDE":-77.059219,
"INSTALLED":"YES","LOCKED":"NO",
"INSTALL_DATE":"2010-10-05T13:43:00.000Z",
"REMOVAL_DATE":null,
"TEMPORARY_INSTALL":"NO",
"NUMBER_OF_BIKES":15,
"NUMBER_OF_EMPTY_DOCKS":4,
"X":394863.27537199,"Y":137153.4794371,
"SE_ANNO_CAD_DATA":null}
},
......'
-- NOTE: This GeoJSON is truncated.
-- Copy full example from https://github.com/Esri/geojson-layer-js/blob/master/data/dc-bike-share.json

INSERT INTO BikeShare(position, ObjectId, Address, Bikes, Docks)
SELECT geography::STGeomFromText('POINT ('+long + ' ' + lat + ')', 4326),
ObjectId, Address, Bikes, Docks
from OPENJSON(@bikeShares, '$.features')
WITH (
long varchar(100) '$.geometry.coordinates[0]',
lat varchar(100) '$.geometry.coordinates[1]',
ObjectId int '$.properties.OBJECTID',
Address nvarchar(200) '$.properties.ADDRESS',
Bikes int '$.properties.NUMBER_OF_BIKES',
Docks int '$.properties.NUMBER_OF_EMPTY_DOCKS' )

I have created a table BikeShare that will contain spatial data and I have defined local text variable @bikeShares that contains GeoJSON taken from https://github.com/Esri/geojson-layer-js/blob/master/data/dc-bike-share.json.

Then I will open GeoJSON rowset from @bikeShares variable using OPENJSON function. OPENJSON will return the one row for each object in GeoJSON array with the schema defined in WITH clause:

  • long and lat that represent longitude and latitude values on $.geometry.coordinates[0] and $.geometry.coordinates[1] paths in GeoJSON objects
  • objectId, address, number of bikes, and number of empty docks in $.properties object in GeoJSON.

Values in long and lat are used to initialize spatial type using geography::STGeomFromText method, and other values are inserted in table columns.

When I execute this script, I can query spatial data loaded from GeoJSON text:

select position.STAsText(), ObjectId, Address, Bikes, Docks 
FROM BikeShare

If you try to execute the same script you will see that all objects from GeoJSON variable are not in the table.

OPENJSON function enables you to parse any JSON text. You can use this function to convert GeoJSON format into SQL Server spatial types.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2016/01/05/loading-geojson-data-into-sql-server.aspx


Метки:  

Returning spatial data in GeoJSON format - Part 1

Вторник, 05 Января 2016 г. 18:38 + в цитатник

GeoJSON is popular format used to represent spatial data. An example of GeoJSON text that represents one point is shown in the following example:

{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [125.6, 10.1]
},
"properties": {
"name": "Dinagat Islands"
}
}

In 'geometry' object are placed type of the spatial data and coordinates. In "property" object can be placed various custom properties such as address line, town, postcode and other information that describe object. SQL Server stores spatial information as geometry or geography types, and also stores additional properties in standard table columns.

Since GeoJSON is JSON, it can be formatted using new FOR JSON clause in SQL Server.

In this example, we are going to format content of Person.Address table that has spatial column SpatialLocation in GeoJSON format using FOR JSON clause. Person.Address contains set of point so in this example I will generate an array of GeoJSON point.

Query looks like:

select TOP 20 
'Feature' AS type,
'Point' as [geometry.type],
JSON_QUERY
( FORMATMESSAGE('[%s,%s]',
FORMAT(SpatialLocation.Long, N'#.##################################################'),
FORMAT(SpatialLocation.Lat, N'#.##################################################'))
) as [geometry.coordinates],
AddressID as [property.id], AddressLine1 as [property.address],
City as [property.city], PostalCode as [property.postcode]
FROM Person.Address
FOR JSON PATH

First I have selected 'Feature' literal as 'type' because this is the first key:value pair in GeoJSON example above. Then I need to select 'Point' literal and place it on geometry.type path in JSON. These are just fixed values that must exist in the GoeJSON output.

Now we have tricky part - how to format latitude and longitude as JSON coordinates? I have used FORMATMESSAGE and FORMAT functions to create JSON array from longitude/latitude information from spatial column. If I just return this string, FOR JSON will see it as a plain text are wrap it with double quotes, e.g.:

"coordinates": "[-122.164644615406, 47.7869921906598]"

However, I don't want coordinates to be outputted as string - I need JSON array without surrounding double quotes, i.e.:

"coordinates": [-122.164644615406, 47.7869921906598]

Therefore I need to pass this string to JSON_QUERY function. JSON_QUERY will return input string because there is no selector as second parameter. However, FOR JSON will know that the output of JSON_QUERY is valid JSON text and it will not escape the content.

The last part is easy - I will just return columns from the table and format them in property object. FOR JSON ATH will format JSON using prefixes in the column aliases.

The final output might look like:

[{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [-122.164644615406, 47.7869921906598]
},
"property": {
"id": 1,
"address": "1970 Napa Ct.",
"city": "Bothell",
"postcode": "98011"
}
}, {
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [-122.250185528911, 47.6867097047995]
},
"property": {
"id": 2,
"address": "9833 Mt. Dias Blv.",
"city": "Bothell",
"postcode": "98011"
}
}]

As you can see with new FOR JSON clause we can easily output any JSON format such as GeoJSON.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2016/01/05/returning-spatial-data-in-geojson-format.aspx


Метки:  

Indexing data in JSON product catalogs

Понедельник, 21 Декабря 2015 г. 19:14 + в цитатник

In the previous post Designing Product Catalogs in SQL Server 2016 using JSON, I have described how you can design product catalog with JSON. Product catalog contains few columns that are common to all types of products and all custom/specific columns are stored as JSON key:value pairs:

This is simple, flexible and generic structure that can contain any product type. The only problem is that we will use full table scan every time we use JSON_VALUE in some predicate.

In this post I will show how you can add indexes on JSON properties in product catalog. In SQL Server 2016, you can use two type of indexes on JSON text:

  1. Index on computed column that index some specific properties in JSON.
  2. Full text search index that can index all key:value pairs in JSON objects.

NONCLUSTERED indexes

If you know that you will frequently filter rows in products catalog using some specific path, you can expose value on that path using computed column and create standard NONSLUSTERED index on that column. As an example, imagine that we are filtering rows by ProductNumber property. We can index that property using the following script:

 ALTER TABLE ProductCatalog
ADD vProductNumber AS JSON_VALUE(Data, '$.ProductNumber')

CREATE INDEX jidx_ProductNumber
ON ProductCatalog(vProductNumber)

Note that computed column is not PERSISTED. It will be calculated only when a product_number value in JSON column is changed and when index should be updated. However, there is no additional table space in this case. This index will be used if you add filter predicate that uses $.ProductNumber property in JSON_VALUE function:

How fast is this solution? Performance are same as performance of indexes on regular columns because the same index is used in both cases.

Note that you can add more than one column in the index, add both computed columns with JSON_VALUE function and standard column in the table, put some columns in INCLUDE section of index to create fully covered indexes, use filtered indexes, etc. This way you are using all benefits of standard SQL Server indexes on JSON values.

Full text indexes

Full text indexes breaks JSON text and enables you to find some words in the text. You can create full text index on JSON column:

-- Assumption is that FT catalog is already created, e.g.:
-- CREATE FULLTEXT CATALOG jsonFullTextCatalog;

-- Create full text index on SalesReason column.
CREATE FULLTEXT INDEX ON ProductCatalog(Data)
KEY INDEX PK_ProductCatalog_ID
ON jsonFullTextCatalog;

Now you can use CONTAINS function that checks does JSON text contains some text value. The interesting feature is NEAR operator that enables you to check are two words near each other:

SELECT ProductID, Name, JSON_VALUE(Data, '$.ProductNumber') AS ProductNumber, ProductModelID, JSON_VALUE(Data, '$.Color') AS Color
FROM ProductCatalog
WHERE CONTAINS(Data, 'NEAR((Color,Black),1)')

This query will locate all rows where 'Color' and 'Black' are on distance 1 - this is ideal for key:value pairs that are naturally on distance 1.

Note that in some cases this might produce false positive results (i.e. if some other value is 'Color', and the key that is placed after this value is 'Black'). If you believe that this might happen you can add additional predicate JSON_VALUE(Data, '$.Color') = 'Black' that will remove false-positive rows (see example below).

FTS will index all key:value pairs in the document and with NEAR operator we can create complex and flexible queries:

SELECT ProductID, Name
FROM ProductCatalog
WHERE CONTAINS(Data, 'NEAR((Color,Silver),1) AND NEAR((MakeFlag,true),1) AND NEAR((SafetyStockLevel,100),1)')

This query will use full text search and seen into table to get the ProductID and Name columns:

As you might notice, you can use any key:value pair in CONTAINS expression. FTS index will automatically index all key values pairs and enable you to query any of them using NEAR operator.

The only constraint here is the fact that values should be single words because if there is a longer text in the value, FTS will break it into separate words and distance from tokens in the value will be different for every token. However, in simple JSON texts (like the one that is imported from AdventureWorks database) you will be able to index all key values without any problem.

Filtering false positive

Problem with this approach is the fact that FTS is not aware that there is some JSON structure and (thanks to xor88 for pointing out this). Therefore FTS might return some rows that do have words Color and Silver near to each other and it will not be aware that Color might not be a JSON key we want. Therefore, we need to add additional filters to remove false positive results:

SELECT ProductID, Name
FROM ProductCatalog
WHERE CONTAINS(Data, 'NEAR((Color,Silver),1) AND NEAR((MakeFlag,true),1) AND NEAR((SafetyStockLevel,100),1)')
AND JSON_VALUE(Data,'$.Color') = 'Silver'
AND JSON_VALUE(Data,'$.MakeFlag') = 'true'
AND JSON_VALUE(Data,'$.SafetyStockLevel') = '100'

In this query FTS/CONTAINS is used to find candidate rows that might satisfy condition we need. In my example it return 16 out of 200.000 rows but some of them might have not have words in the right JSON context. Then the additional filter with JSON_VALUE will check that returned rows actually have these key:value pairs. To summarize:

  1. CONTAINS will give us high selectivity. Without this part we would end-up with full table scan. However, we cannot be 100% sure that NEAR operators give us correct results with the correct JSON context.
  2. JSON_VALUE will perform additional check on the smaller set of rows returned by contains/index seek. If FTS returns small number of candidates we will apply these JSON functions on smaller amount of rows. Without this predicate we might have some results that do not satisfy equivalent JSON predicate.
Query plan is shown in following figure:

You can notice that full text match will return candidate rows, and index seek will find them in ProductCatalog. You can also notice in the plan that result of full text match is pushed down as index seek predicate. Then Filter operator cleans potential false positive results and we are getting the right results on the output.

This way we can have efficient search by any product field.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/12/21/indexing-data-in-json-product-catalogs.aspx


Метки:  

Designing Product Catalogs in SQL Server 2016 using JSON

Понедельник, 21 Декабря 2015 г. 17:36 + в цитатник

In this post we will see how we could design flexible product catalogs in SQL Server using JSON.

...(read more)

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/12/21/designing-product-catalogs-in-sql-server-2016-using-json.aspx


Метки:  

WITHOUT_ARRAY_WRAPPER - New important change in FOR JSON

Понедельник, 21 Декабря 2015 г. 16:51 + в цитатник

In SQL Server 2016 CTP3.2 is added new option in FOR JSON clause - WITHOUT_ARRAY_WRAPPER see https://msdn.microsoft.com/en-us/library/mt631354.aspx

This option enables you to remove square brackets [ and ] that surround JSON text generated by FOR JSON clause. I will use the following example:

SELECT 2015 as year, 12 as month, 15 as day
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER

This query will return:

{ "year":2015, "month":12, "day":15 }

However, without this option, following text would be returned:

[{ "year":2015, "month":12, "day":15 }]

This might be useful if you know that you are returning single row in the result set (e.g. if you return row by primary key, or use TOP 1), and you don't want array with one element in output.

Also, it might be useful if you want to concatenate two result sets from two queries formatted as JSON. In the earlier version you would need to remove wrapper, concatenate two outputs and wrap them with brackets to create valid JSON array:

set @merged = '[' + dbo_fnUnwrap( (SELECT .... FOR JSON PATH ) ) + ',' + dbo_fnUnwrap( (SELECT .... FOR JSON PATH ) ) + ']'

Now you don't need additional function that removes brackets:

set @merged = '[' + (SELECT .... FOR JSON PATH, WITHOUT_ARRAY_WRAPPER ) + ',' + (SELECT .... FOR JSON PATH, WITHOUT_ARRAY_WRAPPER ) + ']'

Why it is important? FOR JSON functionality is changed compared to CTP3.1 version. In previous versions there were some problems with FOR JSON PATH:

  1. Before CTP3.2, FOR JSON PATH clause that is applied on simple queries that didn't used table source (like in the example above where we don't have FROM clause) generated simple JSON object without ALWAYS wraps results with square brackets. If you need to create single object you can remove square brackets using this option.
  2. There was some bugs in generated JSON with FOR JSON PATH. If you applied FOR JSON PATH on CTE, or TVF you might get JSON array without square brackets. In CTP3.2 brackets are always generated so output is always valid.

Therefore, if you relied on the fact that FOR JSON will sometime return a single JSON object beware that this behavior is changed. You would need to add WITHOUT_ARRAY_WRAPPER in your script if you want a single object.

Note one important thing - WITHOUT_ARRAY_WRAPPER will not generate valid JSON text. If you try to put FOR JSON WITHOUT_ARRAY_WRAPPER as column expression, it will be treated as a plain text, e.g.:

SELECT col1, col2, col3, (SELECT .... FOR JSON PATH, WITHOUT_ARRAY_WRAPPER ) AS NestedJson
FROM mytab
FOR JSON PATH

FOR JSON path will treat NestedJson as any other plain text escape it and surround it with double quotes.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/12/21/without-array-wrapper-new-important-change-in-for-json.aspx


Метки:  

Columnstore Index: Real-Time Operational Analytics using In-Memory Technology

Среда, 09 Декабря 2015 г. 21:31 + в цитатник

Starting with SQL Server 2016, you can implement real-time operational analytics by combining the OLTP workload with columnstore index. SQL Server offers three possible configurations (a) disk-based tables with nonclustered columnstore index (b) memory-optimized tables with columnstore index (c) clustered columnstore index with one or more btree indexes. I will blog in detail about these in coming weeks but the following blog that I just posted is a good overview. Look forward to your feedback/questions

http://blogs.technet.com/b/dataplatforminsider/archive/2015/12/09/real-time-operational-analytics-using-in-memory-technology.aspx

regards

Sunil

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/12/09/columnstore-index-real-time-operational-analytics-using-in-memory-technology.aspx


Метки:  

CHECKSUM and BINARY_CHECKSUM might not return unique values for different inputs

Среда, 09 Декабря 2015 г. 17:57 + в цитатник

CHECKSUM and BINARY_CHECKSUM create checksums for text inputs. Checksum values should be different for different input text. However, in some cases, returned values are not always unique for different inputs.

You can find patterns of inputs that return same values. This is similar to modulo operator that returns the same remainder if you find the right pattern of numbers.

As an example, if you run following queries you will find that these functions return 0 for all strings containing only letter 'a' that have lengths 32, 64, 96, n * 32 characters:

 select CHECKSUM(REPLICATE('a',32)), CHECKSUM(REPLICATE('a',64)), CHECKSUM(REPLICATE('a',96))
select BINARY_CHECKSUM(REPLICATE('a',32)), BINARY_CHECKSUM(REPLICATE('a',64)), BINARY_CHECKSUM(REPLICATE('a',96))

So if your text contains only letter 'a' and has length 26000 (i.e. 812*32) it will be in the same bucket as other values in this example and you will get value 0.

However, if you use text with 33, 65, 97, n * 32 + 1 characters 'a', CHECKSUM will return 142, and BINARY_CHECKSUM will return 97:

 select CHECKSUM(REPLICATE('a',33)), CHECKSUM(REPLICATE('a',65)), CHECKSUM(REPLICATE('a',97))
select BINARY_CHECKSUM(REPLICATE('a',33)), BINARY_CHECKSUM(REPLICATE('a',65)), BINARY_CHECKSUM(REPLICATE('a',97))

Conclusion: Do not assume that checksum will always return unique values for different texts. In some rare cases you can get the same values for text values that follows the same pattern.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/12/09/checksum-and-binary-checksum-might-not-return-unique-values-for-different-inputs.aspx


Метки:  

Create relational view over JSON text

Среда, 09 Декабря 2015 г. 12:26 + в цитатник

SQL Server 2016 enables you to use JSON_VALUE, JSON_QUERY, and OPENJSON functions to fetch data from JSON text. However, if you want to to query JSON data using standard relational models you can create views that encapsulate these functions.

In AdventureWorks2016CTP3 database is added Sales.SalesOrder_json table with two columns:

  • Info that contains various information about the sales order
  • OrderItems that contains an array of sales order items formatted as JSON array

You would need to run de-normalization script (attached in this post) to create and populate these columns.

If you want to access information in Info column, you can create standard view that encapsulates JSON_VALUE functions that access values from JSON column.

CREATE VIEW Sales.vwSalesOrderInfo_json AS
SELECT SalesOrderNumber,
OrderDate, ShipDate, Status, AccountNumber, TotalDue,
JSON_VALUE(Info, '$.ShippingInfo.Province') as [Shipping Province],
JSON_VALUE(Info, '$.ShippingInfo.Method') as [Shipping Method],
JSON_VALUE(Info, '$.ShippingInfo.ShipRate') as ShipRate,
JSON_VALUE(Info, '$.BillingInfo.Address') as [Billing Address],
JSON_VALUE(Info, '$.SalesPerson.Name') as [Sales Person],
JSON_VALUE(Info, '$.Customer.Name') as Customer
FROM Sales.SalesOrder_json

In the relational model you would need to join a lot of tables to get the same information, i.e.:

CREATE VIEW Sales.vwSalesOrderInfoRel_json AS
SELECT SalesOrderNumber, OrderDate, ShipDate, Status, Sales.SalesOrderHeader.AccountNumber, TotalDue,
shipprovince.Name as [Shipping Province],
shipmethod.Name as [Shipping Method],
shipmethod.ShipRate as ShipRate,
billaddr.AddressLine1 + COALESCE ( ', ' + shipaddr.AddressLine2, '') as [Billing Address],
sp.FirstName + ' ' + sp.LastName as [Sales Person],
cust.FirstName + ' ' + cust.LastName as Customer
FROM Sales.SalesOrderHeader
JOIN Person.Address shipaddr
ON Sales.SalesOrderHeader.ShipToAddressID = shipaddr.AddressID
LEFT JOIN Person.StateProvince shipprovince
ON shipaddr.StateProvinceID = shipprovince.StateProvinceID
JOIN Purchasing.ShipMethod shipmethod
ON Sales.SalesOrderHeader.ShipMethodID = shipmethod.ShipMethodID
JOIN Person.Address billaddr
ON Sales.SalesOrderHeader.BillToAddressID = billaddr.AddressID
LEFT JOIN Sales.SalesPerson
ON Sales.SalesPerson.BusinessEntityID = Sales.SalesOrderHeader.SalesPersonID
LEFT JOIN Person.Person AS sp
ON Sales.SalesPerson.BusinessEntityID = sp.BusinessEntityID
LEFT JOIN Sales.Customer
ON Sales.Customer.CustomerID = Sales.SalesOrderHeader.CustomerID
LEFT JOIN Person.Person AS cust
ON Sales.Customer.CustomerID = cust.BusinessEntityID

In Sales.SalesOrder_json table we also have OrderItems column that contains content of related SalesOrderDetails table. You can transform this array to rowset using the following view:

CREATE VIEW Sales.vwSalesOrderItems_json
AS
SELECT SalesOrderID, SalesOrderNumber, OrderDate,
CarrierTrackingNumber, OrderQty, UnitPrice, UnitPriceDiscount, LineTotal, ProductNumber, Name
FROM Sales.SalesOrder_json
CROSS APPLY
OPENJSON (OrderItems)
WITH ( CarrierTrackingNumber NVARCHAR(20),
OrderQty int '$.Item.Qty',
UnitPrice float '$.Item.Price',
UnitPriceDiscount float '$.Item.Discount',
LineTotal float '$.Item.Total',
ProductNumber NVARCHAR(20) '$.Product.Number',
Name NVARCHAR(50) '$.Product.Name'
)

This view returns the same information as original SalesOrderDetails table.

http://blogs.msdn.com/b/sqlserverstorageengine/archive/2015/12/09/create-relational-view-over-json-text.aspx


Метки:  

Поиск сообщений в rss_sqlserver_storage_engine
Страницы: 7 6 5 4 [3] 2 1 Календарь