SSIS package – Analysis services processing tasks

To process SSAS database objects you can use variant of tools/approaches one of them comes with Integration Services. If there is data processing to Analysis Services database part of your ETL process, SSIS package is good solution for handling it.

There is Analysis Services Processing task you can simply select object you would like to process with. To create such a solution processing your SSAS database you have actuality put two components to your SSIS package.

  1. Analysis Services Connection manager
  2. Analysis Services processing task

Download Data tools for Visual Studio if you don’t have. Now you can process SSAS database and deploy solution to SSAS server.

Let’s create package. New-> Integration Services project. Add Analysis Services Connection Manager to connect to our SSAS database.

Picture 01 - Analysis Services Connection Manager
Picture 01 – Analysis Services Connection Manager

Add two Analysis Services Processing Tasks   from SSIS toolbox to  process dimension and fact data.

Picture 02 - Analysis Services Processing Task
Picture 02 – Analysis Services Processing Task

By double click (or right mouse and Edit) on Analysis Services Processing Tasks you get window where you set objects you would like to work with. Go to Processing Settings section select connection manager, click on Add button and select objects from dialog, picture bellow. You can choose cube, or just partition of cube or dimension to proceed.

Picture 03 - Add Analysis Services Object
Picture 03 – Add Analysis Services Object

When objects are selected you can choose Process Options you would like to proceed the object. For example, in processing option of fact table select Process Add for incremental processing. The other options will be explained and demonstrated in one of next post to complete an overview of SSAS processing options. You can find other processing types for multidimensional SSAS here https://bit.ly/2SPsBtg .

Picture 03 - Analysis Services Process Options
Picture 03 – Analysis Services Process Options

Finally, you can get SSIS package with flow like on picture bellow. You should process dimensions first to avoid unknown member processing error when processing OLAP facts. You can work on it and extend this simple package with processing of DWH relation layer.

Picture 04 - SSIS package - processing Dimensions, Facts
Picture 04 – SSIS package – processing Dimensions, Facts

You can download package here: IncrementalProcessing

Stay tuned.

 

New Year coming

 Firstly, I would like to wish you Happy NEW YEAR, to all of you. I am very happy that I started writing this blog this year. My motivation you can find in my first post. In next year you can look forward to continuing with series I already started. I would like to share some advanced solutions I realized in Reporting Services. I would like to write few posts focused on extended events. With new year coming I would like to start with some new themes. I would like to share some posts describing some basics of T-SQL and SQL server internals (indexes, storage engine, some useful queries, optimizing tips etc.). As I started to implement some solutions with Power BI I would like to share some short tips, which could help if you start with this BI tool. Finally, I would like to write about some Data warehousing techniques.

Stay tuned in new year😉

Incorrect estimation cardinality estimator

Since 2014 version of SQL server, there were implemented new approarches to help query optimizer  better understand your SQL query and data estimation, for example correlation in predicates on the same table, correlation in predicates between tables, that outdated statistics can have newer data, etc…

More info can be found here: https://bit.ly/37ddnoe

Motivation to write this article was that there could be situations the cardinality estimator could make things worse. I am experienced in one of my projects that query stopped to finish in expected time after upgrading SQL server to higher version (from 2008 to 2016). I prepared simple example simulating that situation:

Let’s create table dbo . WrongEstimationTable, which have multiple relationships with another one table dbo.SameTable. I left dbo.WrongEstimationTable
empty, that there will be no rows in result set. So we would expect that execution plan estimates that no rows will be returned.

CREATE TABLE dbo.WrongEstimationTable(id1 BIGINT, id2 BIGINT,id3 BIGINT,id4 BIGINT,id5 BIGINT)

Fill dbo.SameTable with some testing data.

CREATE TABLE dbo.SameTable (id BIGINT,identityid INT IDENTITY(1,1),typeid INT)
INSERT INTO dbo.SameTable(id,typeid) SELECT object_id,1 FROM sys.objects
INSERT INTO dbo.SameTable(id,typeid) SELECT object_id,2 FROM sys.objects

Create indexes to help the query to get the data as fast as possible. 

CREATE CLUSTERED INDEX CX_identityid  ON  dbo.SameTable(identityid )
CREATE UNIQUE  NONCLUSTERED INDEX    CX_id  ON  dbo.SameTable(typeid,id  ) INCLUDE (identityid)

Run the query bellow where dbo.SameTable correlates with dbo.WrongEstimationTable using typeid predicate. Show actual execution plan on older version of SQL server or switch off Cardinality estimator if you use new version of SQL server, query bellow.

SELECT * FROM 
         dbo.WrongEstimationTable t 
JOIN     dbo.SameTable s1 ON s1.id=t.id1 AND s1.typeid=1
JOIN     dbo.SameTable s2 ON s2.id=t.id2 AND s2.typeid=1
JOIN     dbo.SameTable s3 ON s3.id=t.id3 AND s3.typeid=1
JOIN     dbo.SameTable s4 ON s4.id=t.id4 AND s4.typeid=1
JOIN     dbo.SameTable s5 ON s5.id=t.id5 AND s5.typeid=1
OPTION (USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'))

See that optimizer estimates that on the second one table there are no data, so the number of rows and executions are very low – one, as expected.

Execution plan - old optimizer
Picture01 – Execution plan – old optimizer

Execution plan - old optimizer
Picture 02 – Execution plan – old optimizer

Running the same query on SQL server 2016 or earlier builds of SQL Server 2017 leads to very strange execution plan were optimizer expect increasing estimated number of rows in each branch of nested loop to 1024 estimated rows. As you can see on picture bellow. In production it could cause your query to prolong its execution time rapidly.

Execution plan - old optimizer
03 Execution plan – old optimizer

Execution plan - newer optimizer
Picture 04 – Execution plan – newer optimizer

As solution you can:

  1. Rewrite the query
  2. Use the old version of Query Optimizer – it comes with Cumulative update 10 for SQL server 2017 https://bit.ly/2r3YyDO
  3. Use the database compatibility level for 2008 (100)
  4. Switch off Cardinality Estimator use the old functionality of optimizer for Cardinality estimation
  5. Whatever I forgot

1.

SELECT id,identityid INTO #tmp FROM dbo.SameTable WHERE typeid=1
CREATE CLUSTERED INDEX CX_identityid ON #tmp(id) 
CREATE UNIQUE NONCLUSTERED INDEX CX_id ON #tmp( id ) INCLUDE (identityid)
SELECT * 
  FROM dbo.WrongEstimationTable t
  JOIN #tmp s1 ON s1.id=t.id1
  JOIN #tmp s2 ON s2.id=t.id2
  JOIN #tmp s3 ON s3.id=t.id3
  JOIN #tmp s4 ON s4.id=t.id4
  JOIN #tmp s5 ON s5.id=t.id5

2.

SELECT *
FROM dbo.WrongEstimationTable t
JOIN dbo.SameTable s1 ON s1.id=t.id1 AND s1.typeid=1
JOIN dbo.SameTable s2 ON s2.id=t.id2 AND s2.typeid=1
JOIN dbo.SameTable s3 ON s3.id=t.id3 AND s3.typeid=1
JOIN dbo.SameTable s4 ON s4.id=t.id4 AND s4.typeid=1
JOIN dbo.SameTable s5 ON s5.id=t.id5 AND s5.typeid=1
OPTION (USE HINT('QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_100'))

3.

4.

SELECT *
FROM dbo.WrongEstimationTable t
JOIN dbo.SameTable s1 ON s1.id=t.id1 AND s1.typeid=1
JOIN dbo.SameTable s2 ON s2.id=t.id2 AND s2.typeid=1
JOIN dbo.SameTable s3 ON s3.id=t.id3 AND s3.typeid=1
JOIN dbo.SameTable s4 ON s4.id=t.id4 AND s4.typeid=1
JOIN dbo.SameTable s5 ON s5.id=t.id5 AND s5.typeid=1
OPTION (USE HINT('FORCE_LEGACY_CARDINALITY_ESTIMATION'))

I was not able to simulate above described situation on SQL server 2017 with cumulative update 10, https://bit.ly/35egiev, seems the issue was resolved.

When migrating or upgrading your current version of SQL Server to higher version you should check description of new features  or cumulative updates on MSDN. And you should be prepared on scenarios that things could be worse after your updates.