Monday, June 20, 2011

SSIS FAQ


SSIS - SQL Server Integration Services



Q1: What is SSIS? How it related with SQL Server.

SQL Server Integration Services(SSIS) is a component of SQL Server which can be used to perform a wide range of Data Migration and ETL Operations. SSIS is a component in MSBI process of SQL Server.

This is a platform for Integration and Workflow applications. It is known for a fast and flexible OLTP and OLAP extensions used for data extraction, transformation, and loading (ETL). The tool may also be used to automate maintenance of SQL Server databases and multidimensional data sets.



Q2: What are the tools associated with SSIS?

We use Business Intelligence Development Studio (BIDS) and SQL Server Management Studio (SSMS) to work with Development of SSIS Projects.

We use SSMS to manage the SSIS Packages and Projects.



Q3: What are the differences between DTS and SSIS

Data Transformation Services
SQL Server Integration Services

Limited Error Handling
Complex and powerful Error Handling
Message Boxes in ActiveX Scripts
Message Boxes in .NET Scripting
No Deployment Wizard
Interactive Deployment Wizard
Limited Set of Transformation
Good number of Transformations
NO BI functionality
Complete BI Integration




Q4: What is a workflow in SSIS ?

Workflow is a set of instructions on to specify the Program Execution on how to execute tasks and containers within SSIS Packages.
.

Q4: What is the control flow?

A control flow consists of one or more tasks and containers that execute when the package runs. To control order or define the conditions for running the next task or container in the package control flow, we use precedence constraints to connect the tasks and containers in a package. A subset of tasks and containers can also be grouped and run repeatedly as a unit within the package control flow. SQL Server 2005 Integration Services (SSIS) provides three different types of control flow elements: Containers that provide structures in packages, Tasks that provide functionality, and Precedence Constraints that connect the executables, containers, and tasks into an ordered control flow.


Q5: What is a data flow?

A Data Flow consists of the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations.

The Data Flow Task is the executable within the SSIS package that creates, orders, and runs the data flow. A Separate instance of the data flow engine is opened for each Data Flow Task in a package.

Data Sources, Transformations, and Data Destinations are the 3 Important categories in the Data Flow



Q6: How does Error-Handling work in SSIS

When a DATA FLOW component applies a transformation to column data, extracts data from sources, or loads data into destinations, errors can occur. Errors frequently occur because of unexpected data values.

Type of typical Errors in SSIS:

- DATA Connection Errors, which occur in case the connection manager cannot be initialized with the connection string. This applies to both Data Sources and Data Destinations along with Control Flows that use the Connection Strings.

- DATA Transformation Errors, which occur while data is being transformed over a Data Pipeline from Sources to Destination.

- Expression Evaluation Errors, which occur if expressions that are evaluated at run time perform invalid.



Q7: What is environment variable in SSIS?


An environment variable configuration sets a package property equal to the value in an environment variable.

Environmental configurations are useful for configuring properties that are dependent on the computer that is executing the package.


Q8: What are the Transformations available in SSIS?


AGGREGATE:  It applies aggregate functions to Record sets to produce new output records from aggregated values.

AUDIT: Adds Package and Task level Metadata – such as  Machine Name, Execution Instance, Package Name, Package ID, etc.,

CHARACTER MAP - Performs SQL Server level makes string data changes such as changing data from lower case to upper case.

CONDITIONAL SPLIT – Separates available input into separate output pipelines based on Boolean Expressions configured for each output.

COPY COLUMN - Add a copy of column to the output we can later transform the copy keeping the original for auditing.

DATA CONVERSION - Converts columns data types from one to another type. It stands for Explicit Column Conversion.

DATA MINING QUERY – Used to perform data mining query against analysis services and manage Predictions Graphs and Controls.

DERIVED COLUMN - Create a new (computed) column from given expressions.

EXPORT COLUMN – Used to export a Image specific column from the database to a flat file.

FUZZY GROUPING – Used for data cleansing by finding rows that are likely duplicates.

FUZZY LOOKUP - Used for Pattern Matching and Ranking based on fuzzy logic.

IMPORT COLUMN - Reads image specific column from database onto a flat file.

LOOKUP - Performs the lookup (searching) of a given reference object set against a
data source. It is used for exact matches only.

MERGE - Merges two sorted data sets into a single data set into a single data flow.

MERGE JOIN - Merges two data sets into a single dataset using a join junction.

MULTI CAST - Sends a copy of supplied Data Source onto multiple Destinations.

ROW COUNT - Stores the resulting row count from the data flow / transformation into
a variable.

ROW SAMPLING - Captures sample data by using a row count of the total rows in
dataflow specified by rows or percentage.

UNION ALL - Merge multiple data sets into a single dataset.
PIVOT – Used for Normalization of data sources to reduce analomolies by converting
rows into columns

UNPIVOT – Used for demoralizing the data structure by converts columns into rows incase of building Data Warehouses.


Q9: How to log SSIS Executions?

SSIS includes logging features that write log entries when run-time events occur and can also write custom messages.

This is not enabled by default. Integration Services supports a diverse set of log providers and gives you the ability to create custom log providers.

The Integration Services log providers can write log entries to text files, SQL Server Profiler, SQL Server, Windows Event Log, or XML files.

Logs are associated with packages and are configured at the package level. Each task or container in a package can log information to any package log. The tasks and containers in a package can be enabled for logging even if the package itself is not.



Q10 : How do you deploy SSIS packages.

BUILDing SSIS Projects provides a Deployment Manifest File.

We need to run the manifest file and decide whether to deploy this onto File System or onto SQL Server[ msdb].

SQL Server Deployment is very faster and more secure then File System

Deployment. Alternatively, we can also import the package from SSMS from File
System or SQ Server.


Q11: What are variables and what is variable scope ?

Variables store values that a SSIS package and its containers, tasks, and event handlers can use at run time.

The scripts in the Script task and the Script component can also use variables. The precedence constraints that sequence tasks and containers into a workflow can use variables when their constraint definitions include expressions.

Integration Services supports Two types of variables: user-defined variables and system variables.

User-defined variables are defined by package developers.
System variables are defined by Integration Services.
You can create as many user-defined variables as a package requires, but you cannot create additional system variables.


Q12: Can you name five of the Perfmon counters for SSIS and the value they provide?

· SQLServer:SSIS Service
· SSIS Package Instances
· SQLServer:SSIS Pipeline
· BLOB bytes read
· BLOB bytes written
· BLOB files in use
· Buffer memory
· Buffers in use
· Buffers spooled
· Flat buffer memory
· Flat buffers in use
· Private buffer memory
· Private buffers in use
· Rows read
· Rows written

Wednesday, May 4, 2011

New Features in SQL Server 2008




New Features in SQL Server 2008

I want to start to introduce some of the new SQL Server 2008 technologies at a surface that will start to shape the industry.  The features are across just about every major component in the product suite so the benefits should be broad and perceived as beneficial not only from an IT perspective, but also from a user perspective.  This should ultimately yield a significant business benefit as SQL Server 2008 is adopted in 2008 and 2009.

Solution

Let's jump into each the of new  SQL erver 2008 product categories, features and provide references to some of these technologies as additional points of reference.
  • Security and Data Auditing
    • Transparent Data Encryption - This is encrypting the data while it is on disk and remains transparent to applications
    • External Key Management - This new functionality relates to consolidation of key management and integration with external products
    • Data Auditing - This is one core feature of SQL Server 2008 that will include a number of new features to include:
      • The introduction of first class 'AUDIT' objects
      • Auditing DDL (data definition language) commands
      • Support for multiple logging targets
  • Availability and Reliability
    • CPUs - Support for pluggable CPUs which means that a CPU can be added on the fly and recognized by SQL Server 2008 just like memory in SQL Server 2005
    • Database Mirroring - Enhanced Database Mirroring to include compression of mirror streams, enhanced performance and automatic page-level repair for the principal and mirror
  • Performance
    • Backup Stream Compression - The ability to configure compression with server level control or backup statement control of all backup types (full, differential, transaction log)
    • Performance Data Collection - When you are experiencing a performance issue the biggest problem is pinpointing the problem, so with SQL Server 2008 Microsoft is introducing a single common framework for performance related data collection, reporting, and warehousing
    • Improved Plan Guide Support - With SQL Server 2008 plans can be frozen for permanent query usage as well as pull plans directly from plan cache with SQL Server Management Studio integration
  • Management
    • Policy-Based Management Framework - The ability to manage objects via policies as opposed to traditional scripts with inherent monitoring and enforcement
    • Microsoft System Center - Integration with Microsoft System Centre which a product from Microsoft to improve operational costs
    • Extended Events - Another new feature is Extended Events which is a high performance yet light weight tracing infrastructure with insight into the core engine independent of SQL Trace
  • New Data Types
    • Date Time Data Type - The datetime data type will now be able to support the following:
      • Precision to the 100th nanosecond which is 7 digits past second
      • Time-zone datetime offset to translate the datetimes across numerous time zones
      • Rather than having to parse the datatime for just the date or just the time, now SQL Server 2008 will have date only support as well as time only support
    • HierarchyID - With the introduction of the HierarchyID data type this data type will be hierarchical-aware and will be accompanied by built-in functions, methods, etc. to support complex hierarchies in your data with .NET

  • Development Enhancements
    • Entity Data Model - With SQL Server 2008 will come the concept of a 'business entities' vs. tables, this will enable the ability to model more complex relationships as well as be able to retrieve entities as opposed to a result set of rows and columns
    • SQL Server Change Tracking - This feature provides the ability to have  Change Data Capture without a comparable value
    • Large UDT's - The 8000 byte limit is no longer applicable for on CLR-based UDTs and UDAs
  • Service Broker  
    • Interface - A new user interface and tools will be released for working with  Service  Broker in order to add, drop or edit Service Broker objects directly in  SQL  Server  Management Studio
    • Conversation Priority - The ability to set message ordering with a send and receive impact with levels one to ten
  • Data Storage
    • Data Compression  - Reference the Performance section above
    • FILESTREAM Attribute - With this feature get the best of both worlds with functionality from BLOBs in the DB vs. BLOBs on filesystem   

    • Integrated Full Text Search  - With SQL Server 2008 Full Text Search is fully integrated into the relational engine with no external storage, no external service as well as more efficient and reliable costing
    • Sparse columns - SQL Server 2008 has more efficient storage for 'wide' tables with many columns that repeat and do not contain data
    • New index types - New indexes include:
      • Spatial indexes
      • Hierarchical indexes
      • FILTERED indexes (indexes on filtered values within columns)

  • Data Warehousing/ETL
    • Partitioned Table Parallelism - This feature eliminates the one thread limit per partition
    • Data compression - Reference the Performance section above
    • Resource Governor - Reference the Performance section above
    • Persistent Lookups in SSIS - There is no longer a need for re-querying for lookup operators and cache lookups in multiple ways with the ability to persist lookups to disk
    • Improved thread scheduling in SSIS - This is accomplished by a shared thread  pool and pipeline parallelism
    • SQL Server Change Tracking - Reference the Development Enhancements section above
    • MERGE statement - The MERGE statement will add a great deal of value with  Slowly Changing Dimensions (SCD)
    • Scale-out analysis services - With  read-only storage multiple Analysis Services SQL Servers can be leveraged
    • Subspace computations
    • New tools for cube design
    • Best practice design alerting
    • Backup cubes with better scalability
    • Excel - Data-mining add-ins for Excel

  • Reporting
    • Reporting Services Deployment - IIS is no longer required to run Reporting Services
    • Rich-text support
    • Enhanced visualization graphing
    • Word - Reports can be rendered to Microsoft Word

  • Deprecation
    • Many of the 'old' features are removed to include:
Next Steps
  • With all of these new features comes the potential for vast changes in building and managing applications.  As you begin to research these new features consider how these new opportunities will drive change in your applications, users and ultimately your business.
  • The SQL Server 2008 CTP is now available - Download the SQL Server 2008 CTP.  As time permits, consider downloading and installing the latest version of SQL Server as a means to learn about the new features.