Friday, March 4, 2011

Checkpoint, ErrorHandling, and Transactions in SSIS




Friday, March 5, 2011

Checkpoints,ErrorHandiling, and Transactions in SSIS

Error Handling:

To use error paths, you will need to configure the error output. There are three error-handling options for handling errors in the data flow components:
• Setting the error output to Fail Transformation will cause the data flow to fail if an error is encountered.

• Using the Ignore Failure option will allow the row to continue out the normal green data path, but the value that resulted in the error is changed to a NULL in the output.

• Setting the error output to Redirect Row will send the error row out the red error path; this is the only way to handle errors with separate components.

Fig 1

Check Points:

What does it do…?

With Checkpoints enabled on a package it will save the state of the package as it moves through each step or task and place it in a XML file upon failure of the package. If your package does fail you can correct the problem in your package and rerun from the point of the tasks that did not successfully run the first time. Once the package completes successfully the file is no longer needed and automatically discarded.
How does this benefit you?                             .                                     

Just imagine your package is loading a table with 10 million records. Your package passes the Data Flow that performs this huge load without any problem (Other than the fact that it took two hours to load). The next task in your package is a Send Mail Task and for some reason fails.

You correct the problem in the Send Mail Task, but without using Checkpoints your package would still have to run that Data Flow that loads the 10 million records again (taking another two hours) even though you’ve already done it once. 
If you had enable Checkpoints on this package you could simply correct the problem in the Send Mail Task and then run the package again starting at the Send Mail Task. Sounds great right….?
Example Overview:
  •          Use Three Execute SQL Task using the AdventureWorks2009 (It can really be any database for this example) database as a connection manager.
  •          Configure the package to handle Checkpoints
  •          Configure the individual tasks to handle Checkpoints

Step 1: Configure Execute SQL Tasks
  •          Drag three Execute SQL Tasks on your Control Flow
  •          Use any database for the Connection property on all three tasks.
  •          Configure Execute SQL Task SQL Statement property: Select 1
  •     Configure Execute SQL Task 1 SQL Statement property: Select A (Set to intentionally fail)
  •    Configure Execute SQL Task 2 SQL Statement property: Select 1

Step 2:  Configure Package to enable Checkpoints:
  •    Open the properties menu at the package level (Just open properties in the Control Flow without any task or connection manager selected)
  • ·         Change the properties CheckpointFileName: c:Checkpoint.xml (You can use .txt if you want to open it in notepad)
  • ·         Change the properties CheckpointUsage: IfExists
  • ·         Change the properties SaveCheckpoints: True


Step 3:  Configure Each Task

  • ·         Select each task individually and open the properties menu at the task level (Just click the task once then hit F4)
  • ·         Change the FailPackageOnFailure property to True

Step 4:  Run the Package
  • ·         Run the package and you will see the package fail on the second task
  • ·         This also created the file c:\Checkpoints.xml.
  • ·         You could also save this file with .txt extension and just view in regular notepad and it still works as a Checkpoint.
  • ·         If you run the package a second time it will skip the first task that was successful and start right at the second task.
Step 5: Correct the Problem and Rerun Package
  •  Open the Execute SQL Task 2 and configure the SQL Statement property: Select
  • The package has now completed and skipped the first step which already succeeded. Imagine if that first step would normally take two hours to run!
Defining Package and Task Transaction Settings.
You can set package transactions at the entire package level or at any control flow container level or task level. Transactions in SSIS use the Windows Distributed Transaction Coordinator (DTC): the DTC service needs to be started on the machine for transactions to work. Any service or program that is enabled to work with the DTC can be part of a transaction in SSIS.
To enable a transaction within a package, you need to set the TransactionOption property of the task or container to Required. Figure highlights the properties of a package at the control flow level, which means they apply to the package as a whole. The TransactionOption property is the same on any control flow object.

A task's or container's TransactionOption property must be set to Required to enable a transaction within a package
When deciding whether and how to implement a transaction, follow these guidelines:

• For transactions to be enabled in SSIS, you need to turn on the DTC service, and the tasks that you want to be part of the transaction must work with the DTC service natively.

• If a series of tasks must be completed as a single unit, in which either all the tasks are successful and committed or an error occurs and none of the tasks are committed, then place the tasks within a Sequence Container, and then set the TransactionOption property of the container to Required.
• A task can inherit the transaction setting of its parent when the TransactionOption property is set to Supported, which is the default setting when creating a task or container.

• You can prevent a task from participating in a transaction by setting its TransactionOption setting to NotSupported.
• Transactions work at the control flow level and not within a data flow. This means that you can turn on a transaction for a data flow task, but you cannot turn it on separately for selected components within the data flow; either the entire data process will be successful or it will be rolled back.
Implementing Restartability Checkpoints:

At times, especially if you are working with complicated or long-running packages, you will want the ability to restart a package if it fails and have it start at the point of failure. In other words, you might not want the tasks that were already successful to be run again if the package is restarted. This can be done by enabling checkpoints in the package.
Enabling restartability within a package requires, first, enabling a package to use checkpoints, and, second, setting the specific tasks and containers to write checkpoints. To turn on checkpoints within a package, follow these steps:
1.    Within the package, open the Properties window, and then select the Control Flow tab, which will reveal the properties of the package.
 2.     Set the SaveCheckpoints property at the package level to True. This allows checkpoints to be saved during package execution.
 3.    In the CheckpointFileName property, provide a valid path and file name to a checkpoint file. Packages use files to maintain their state information, so if a package fails and is then restarted, the package can read the checkpoint file to determine where it left off and to track the state information at the last successful task.
      4.   Set the CheckpointUsage to IfExists, which causes the package to run from the beginning if  file is not present or to run from the identified point if the file exists. 
PrPractice: Implementing Package and Task Transactions

In this practice, you will turn on transactions at the container level and observe the result when a task fails within the container.

Exercise 1: Enabling Transactions
1 Navigate to the Control Panel/Administrative Tools/Services Console, and then start the Distributed Transaction Coordinator service.
2.    Open the SSIS project. Open the respective package.
3.    From the View Menu in the menu bar, open the Properties window, and then click the pushpin in the Properties window to lock the window in the open position.
4.    In the Control Flow Designer, select the Sequence Container (by clicking on it), and then note the Sequence Container properties listed in the Properties window.
5.    Set the TransactionOption to Required using the drop-down list.
6.    Save the package by clicking the Save icon in the toolbar.

Exercise 2:  Observing a Transaction Rollback

1.    With the package from Practive 1 still opened, expand the Sequence Container, and drag and drop a new Execute SQL Task to the bottom of the Sequence Container workspace.
 2.  Connect the green precedence arrow from the Data Flow to the new Execute SQL Task by dragging the green arrow from the bottom of the Data Flow onto the Execute SQL Task.
3.    Edit the Execute SQL Task by double-clicking on the task. In the Execute SQL Task Editor, change the Connection property to the AdventureWorks connection.
4.    Change the Name property within the Execute SQL Task Editor to Force Failure.
5.    Select OK in the Execute SQL Task Editor to return to the Control Flow.
6.    With the Force Failure Execute SQL Task still selected, open the Properties window and change the ForceExecutionResult property to Failure. By setting this property to Failure, you are specifying that the task should fail intentionally, which is something you might do for testing pruposes.
7.    Open a new database query, in SSMS, against the AdventureWroks database.
8.   Run the following SQL Statement and observe the results: SELECT COUNT(*) FROM Sales_Summary.
 9.    In the Business Intelligence Development Studio (BIDS), execute the SSIS package you just modified, which will intentionally fail at the last step.
10.  Stop the package execution, and rerun the query from step 8.
11.  Observe that even though the data flow was successful, the data was rolled back because the Sequence Container was configured with a transaction, and the last task within the Sequence Container failed.

Quick Check:
1.    You add a sequence container to a package that contains several tasks, one of which calls a command on a legacy system and another of which a Data Flow Task imports data into SQL Server. Even with DTC started and transactions turned on, your sequence container fails before the tasks even run. What is the problem…?

2.    What do you need to set in order to use checkpoint properties at the package level, after you have turned the checkpoint properties on…?

Quick Check Answers:
1.    The transactions featured in SSIS use the DTC service. However, not all systems support DTC, and a transaction cannot be forced on a non-complaint system, so the container will fail. You should remove the legacy task from the sequence container that has the transaction.
2.    You need to set the FailPackageOnFailure property to True for tasks to write to the checkpoint file. However, if you want to rerun any successful tasks that occur before the failure task, you need to use a sequence container around the group of related tasks that require transactions.
Handling Package Errors with Event Handlers
In the data flow, using data viewers provides the ability to easily debug problems while processing data. The control flow, however, is different because the focus is on workflow and execution rather than on data and transformations. Leveraging the capabilities in Visual Studio, the control flow supports visual debugging and breakpoint features. Let's first look at the event handlers that SSIS provides; then we will explore the debugging capabilities in the control flow.

Event Handlers
SSIS provides the ability to listen for certain execution events and perform other operations when an event happens (depending on the execution event). For example, if an error happens, the error event handler can send an alert or potentially fix a data problem. Event handlers use the control flow paradigm for workflow processing, which includes all the same control flow tasks and containers that are found in the toolbox of the control flow.

You can define zero, one, or more than one event handler for a package. To add an event handler to a package, you need to select the Event Handler tab in the package designer. Creating a new package event handler requires that you select the executable and the event handler event, as below Figure shows.
Figure: Selecting the executable and event handler event for a package


The executable is the task or container scope that the event will fire. You can also choose the package itself (the highest-level container) as the executable for an event. The event handler event is the actual event that causes the event workflow to execute. The following table includes the package event handler types:

In addition, event handlers assigned to an executable scope will propagate down to child events when the event fires. If an event is assigned to a container, the child executables include the tasks and containers that are embedded within the parent container. This means that if you assign an OnError event to the package and an OnError event occurs at a task, the event handler would fire for both the task and the package (and for any containers in between). You would use an event handler for tracking error details, for sending failure messages in emails, and for implementing manual rollback logic.
BEST PRACTICES  Capturing error information with the OnError Event.
Each package contains a set of system variables that are updated for the various levels in the package during the package execution. With Event Handlers, you can capture these variables and values, which provide contextual information, such as the ErrorCode, ErrorDescription, and SourceName (the task) when the event fires.
Using event handlers is a great way to track package execution,m they can be used to audit the execution, capturing the errors that occur in a task. In addition, the event handler Send Mail Task can be used for notification; for example, it can notify an administrator of a certain predefined condition that requires a special response.
Exam Tip
Event handlers can be turned off for any task or container by setting the Disable-EventHandlers property of the Task or Container to True. In other words, if you have an event handler defined, but you specifically do not want it to be invoked for a specific task, then you can turn off event handlers for that task only.
Debugging the Control Flow with Breakpoints
Package debugging lets you know what is going on during the execution of a package in the designer so that you can troubleshoot or validate processing logic. Control flow debugging involves setting breakpoints in the package, which will pause the control flow execution so that you can observe the execution state. SSIS takes advantage of the breakpoint functionality that comes with Visual Studio, which means you have the capabilities to view execution information about the package when you execute a package in the designer.  
BEST PRACTICES   Breakpoints work in control flow only.
Breakpoints functions in the control flow but not in the data flow. For scripting, this means that you can set breakpoints only in a control flow script Task and not in a data flow Script Component.
To set a breakpoint, highlight the task or container, and either press F9 or navigate to the Debug/Toggle Breakpoint menu. You can set multiple breakpoints in a package, and you can embed a breakpoint within a Script Task at a line of code.  The below fir shows a package that is running but is paused at execution.





Fig: A Package that is running but paused.

In this screen, the arrow next to the breakpoint icon indicates which task the package is currently waiting to run. When you are paused in the debug environment, you can do the following things to help troubleshoot your package:

• Open the Locals window to see all the variable values and the package status. You can find this window in the Debug toolbar, next to the package execution selections. If you have several variables in a package that you actively use to control logic and precedence, you can use a breakpoint to pause the execution, allowing you to troubleshoot variable values handling before the package execute completes.

• When you have completed working during a pause and are in a Script Task, you can continue the execution of the script and package to the next breakpoint by either clicking the Continue button in the Debug toolbar or by pressing F5. Alternatively, you can stop the package from continuing by clicking the Stop button on the toolbar. Breakpoints in a Script Task are very useful because they help you validate the code logic and branching that you are performing with the script.

• When the package is paused on a task (as opposed to within the code of a Script Task as described in the prior bullet) in the control flow, you can also continue running the package to completion (or to the next breakpoint) by selecting the Continue button on the Debug toolbar or by pressing F5.  

• Stopping the package during execution, whether the package is paused at a breakpoint or executing, is handled by clicking the Stop button in the toolbar or by pressing the Shift and F5 keys at the same time.

In all, breakpoints are powerful tools for resolving errors and validating functionality. Combined with data views in the data flow, they provide comprehensive debugging support during your package development.

Practice: Identifying Data Flow Errors

In this practice, you will add an error flow path and then identify the error by observing the rows in the output through a data viewer.

Exercise: Adding Error Flow Paths

1.    In SSMS, create a new database query connected to the AdventureWorks sample database and run the following code:

USE AdventureWorks
GO
TRUNCATE TABLE dbo.Sales_Summary
GO
ALTER TABLE dbo.Sales_Summary WITH CHECK ADD CONSTRAINT CK_Summary_ProductLine
CHECK ((upper([ProductLine]) = ‘R’ OR upper([ProductLine]) = ‘M’ OR upper([ProductLine]) = ‘T’))
GO


2.    The above Code uses the Sales_Summary table.

3.    Open the package that you modified in the previous practices

4.    Navigate to the Data Flow designer and open the OLE DB Destination adapter.

5.    In the Data Access Mode drop-down list, change the selection to Table OR View, and then verify that the Sales_Summary table is still selected in the Name Of The Table Or The View drop-down list.

6.    Click OK in the OLE DB Destination Editor to return to the package designer.

7.    Right-Click in the Data Flow workspace, and then choose Execute Task from the drop-down list to run Data Flow Task. Before stopping the package, navigate to the Progress tab and observe that the OLE DB Destination failed because of a constraint violation.

8.    Stop the package to return to design mode.

9.    From the SSIS menu in the menu bar, select Variables; this will open the Variable window.

10.  Select the leftmost icon in the Variable window toolbar to create a new variable for the package and name the variable ErrorCount

11.  Open the toolbar, and then drag a Row Count transformation onto the Data Flow work space.

12.  Highlight the OLE DB Destination adapter, and then drag the red error output path and drop it on the Row Count transformation.

13.  When the Configure Error Output window appears, change the value in the Error Column drop-down list to Redirect Row, and then click OK to return to the Data Flow designer.

14.  Open the Row Count transformation, and then change the VariavleName property to

15.  User: ErrorCount.

16.  Click OK in the Row Count Editor to return to the Data Flow.

17.  Right-Click the red error path and then select Data Viewers from the drop-down list 16. Choose Add in the Data Flow Path Editor, highlight Grid, and click OK in the Configure.

18.  Data Viewer window and OK in the Data Flow Path Editor.

19.  Right-click in the Data Flow designer workspace, and select Execute Task from the drop-down list.

20.  A new Data Viewer will appear, which will reveal the OLE DB Destination Error Output.

21.  Observe that the ProductLine column for all the error rows in S, which violates the defined constraint that you created in Step 1.

22.   Select Detach in the Data Viewer window, and then stop the package execution.

23.  In SSMS, run the following database query, which adds S as valid value of the ProductLine Column in the Sales_Summary table:

USE AdventureWorks
GO
ALTER TABLE dbo.Sales_Summary
DROP CONSTRAINT CK_Summary_ProductLine
GO
ALTER TABLE dbo.Sales_Summary WITH CHECK ADD CONSTRAINT CK_Summary_ProductLine
CHECK ([ProductLine]) =’T’  OR upper([ProductLine]) = ‘S’)) GO

24.   Return to BIDS and rerun the Data Flow, observing that the OLE DB Destination is now successful, with no rows being routed to the error path output or data viewer.

Quick Check:

1.  A Data Conversion transformation is failing in the middle of the data flow execution, and you need to determine what is causing the error. How should you proceed?

2.  Your package contains a string variable that you are updating, using a Script Task, to be a file path and file name. Your package is failing at a File System Task that is configured to use the variable to move the file to a different folder on the server. How do you troubleshoot the package…?

3.You would like to log all the package errors to a custom database table you have created for auditing purposes. How can you accomplish this task………………?

Quick Check Answers

1. To determine what is causing the error, configure the Data Conversion transformation error path to Flat File so that any rows that are failing conversion are sent out to a file. Then, create a data viewer on the error path, and run the package in BIDS. This technique will capture the errors in a file and display the rows in the designer for troubleshooting.

2. Because the Script Task can contain embedded breakpoints in the code, set a breakpoint in the script so that you will be able to execute the package and step through the lines of code, observing the value of the variable to check the code and accuracy.

3. By using the OnError event handler assigned to the package level, you can also use an Execute SQL Task that calls a stored procedure, passing in the SourceName and ErrorDescription variable values. The procedure can then track these details into a metadata storage table for auditing.