Tuesday, November 17, 2015

How to delete millions of records without performance issue ?

How to delete millions of recs without performance issue:

Here I  provide video solution .

SELECT file_id, name, type_desc, physical_name, (size*8)/1024 SizeinMB, max_size
FROM sys.database_files;

DECLARE @DeleteRowCnt INT
SET @DeleteRowCnt = 1

DECLARE @DeleteBatchSize INT
SET @DeleteBatchSize=4000

WHILE (@DeleteRowCnt > 0)
BEGIN
DELETE TOP (@DeleteBatchSize) [dbo].[Customer]
WHERE RegionCD = 'AS;
 SET @DeleteRowCnt = @@ROWCOUNT;
END




Tuesday, June 23, 2015

What is Coming in SQL Server 2016 for Business Intelligence…?

Over the past few years we did not hear anything from Microsoft about SQL Server on-premises business intelligence enhancements. The extended cricket-chirping silence left us speculating about the future of Reporting Services, Analysis Services and other key technologies that are being used today to power critical decision making solutions. Good news, the waiting and wondering is over. Microsoft recently announced significant SQL Server on-premises upgrades in numerous presentations at both build and Ignite conferences. Here are a few of my top take-away points from a BI professional perspective.
 
Key Planned Investments
 
Starting with the development experience, SQL Server Development Tools (SSDT) and SQL Server BI Tools (BIDS) will finally be truly unified in Visual Studio. The setup experience has been improved along with the process for importing from the designer and from Office 2016. There is also support for the Analysis Services Tabular Scripting language and extended event-based monitoring in SQL Server Management Studio (SSMS).
 
Looking at the data mart or data warehouse side of things, SQL Server 2016 will include updateable nonclustered columnstore index support with columnar index in-memory or on-disk row store for deploying an operational analytics. Another incredible feature that was previously only available in Microsoft Analytics Platform System is Polybase. Polybase will now be out-of-the-box in SQL Server 2016. 
 
PolyBase is a distributed query capability that allows users to query non-relational data in Hadoop, blobs, files, and existing relational data in SQL Server without necessarily moving data. There are options for users to import Hadoop data for persistent storage in SQL Server or export aged relational data into Hadoop. PolyBase also provides the ability to access and query data that is either on-premises or on the cloud for hybrid solutions on your data, wherever it may be located. This topic alone deserves a dedicated future article!
 
Integration Services (SSIS) fans will be happy to know that designer now supports previous versions. Microsoft finally woke up and realized in the real-world, there are many versions being managed and installed. Rarely can a large enterprise upgrade all ETL to the latest and greatest version.  Other SSIS enhancements include using Power Query as a data source for self-service ETL to enterprise ETL upgrade scenarios.  High availability is also compatible with SQL Server Server AlwaysOn.
 
There is a new Azure Data Factory data flow task and new connectors for Azure Storage, Azure commandlets, OData Version 4, Hadoop File System (HDFS), JSON, and Oracle/Teradata connector V4 by Attunity. In 2016, SSIS also gets more usability improvements, incremental deployment options, custom logging levels, and package templates for ETL code reuse.
 
Most large data warehouse deployments also use master data management for customer, product and other master entities that have attributes that feed and get updated from numerous operational applications.
 
If you use SQL Server Master Data Services (MDS), you will be pleased to know that it also has been getting quite a bit of love in the 2016 builds.  You should notice improved performance when working with large models, added optional row-level compression per entity, a better administrative interface, and new configurable retention settings.
 
There are also new features for granular read, write, delete, and create security permissions. For your data stewards, the Master Data Services Add-in for Excel is now 15X faster and in 2016 will support bulk entity-based staging operations.

We are starting to see the Revolution Analytics R acquisition being baked into Microsoft’s solutions. A warmly welcome and long overdue enhancement to SQL Server 2016 is in-database R predictive analytic functions. These advanced analytical function call capabilities can be made via T-SQL queries. In the past you may have used DMX queries with SQL Server Analysis Services but those functions were fairly limited to a few algorithms, added another layer of complexity and overhead adding an Analysis Service instance and also have not been invested in for a few years. The new R in-database analytics is far better! It opens up a whole new world of unlimited analytical algorithms and you can even import models made in Azure ML.
 
Popular classic Reporting Services (SSRS) is also, finally seeing real investment. In 2016, we will see a number of enhancements to that enable you to create mobile-friendly reports with support for modern browsers on multiple platforms.
 
There will be native connectors for the latest versions of Microsoft data sources such as SQL Server and Analysis Services; third-party data sources such as Oracle Database, Oracle Essbase, SAP BW, and Teradata; and ODBC and OLEDB connectors for many more data sources.
 
New report themes and styles will make it easier to create modern-looking reports, while new chart types enable you to visualize your data in new ways. The 2016 release will also deliver greater control over parameter prompts and dynamic, parameterized report design options.
 
SQL Server 2016 makes several enhancements to Analysis Services including improvements in enterprise readiness, modeling platform, BI tools, SharePoint integration, and hybrid BI. SQL Server 2016 Analysis Services also provides functional parity with SharePoint vNext and Excel vNext.
 
In the area of enterprise readiness, enhancements include improved performance with unnatural hierarchies, relational OLAP distinct count, drill-through queries, processing and query process separation and semi-additive measures. For Analysis Services Multidimensional Mode, Database Console Commands will support detecting issues with multidimensional OLAP indexes. Additionally, Netezza is will be available as a data source.
 
For early adopters of Analysis Services Tabular Mode (SSAS), you will be relieved to know that it is getting an upgrade to become more enterprise ready in 2016. There are query engine optimizations that enhance performance for Direct Query, parallel partition processing, advanced modeling with bi-directional (many-to-many) cross filtering (already in Power BI Designer), new DAX functions (DATEDIFF, GEOMEAN, PERCENTILE, PRODUCT, XIRR, XNPV) and other performance improvements.

 

Monday, May 26, 2014

Do We Still Need Database Design in the Era of Big Data?


Many big data application implementations seem to begin with an existing data warehouse, one or more new high-volume data streams, and some specialized hardware and software to support data storage and business analytics. The data storage issue is often accommodated by installing a proprietary hardware appliance that can store huge amounts of data while providing extremely fast data access.

In these cases, do we really need to worry about database design?

Data Modeling in a Big Data Environment

Most database administrators agree: good database design is part of system and application design. There are many business requirements, such as data availability, purge processing, and application performance that are addressed using specific database design options.

What about big data?  Interestingly, vendors that supply hardware and software solutions for business analytics against big data claim that database design is less important. They argue that since the data is stored in their proprietary format, most database design considerations do not apply.

Related Articles


Confusion over this issue usually originates from misperceptions regarding how special solutions execute big data queries. Simply put, in most cases your data will be physically stored in two places: your current production database management system (DBMS) and the new special-purpose appliance. Your current production processes that extract, transform, and load data into the current DBMS continue to operate as-is, with an additional step: at every point that you load data into a table you will also ensure that the new data is loaded into the appliance as well.

Loading data into the appliance can be done immediately after your DBMS loads, or can be batched for later execution. The important thing is that data must be loaded into the appliance before any big data queries can utilize it for the advertised performance gains.

Database Design for Quality Assurance

What does a quality database design mean? Generally speaking, a database design starts with a model of the data and the business rules that define their relationships.  For example, an order is always associated with a customer, and a customer may have zero, one, or many orders. Along with these things and the data element definitions and attributes, the database design will address, cope with, or mitigate risks in the following areas:

  • Assist with defect prevention by automating data element valid values checking;
  • Permits defect detection and remediation during application construction and testing;
  • Moves data verification as close to the source as possible;
  • Provides stability, reliability, data accessibility and system scalability.

What Will the Database Designer do Differently?

A poor quality database design affects technical support the most. They are the ones that must deal with system problems in real-time. This drives up the cost of problem determination and problem resolution. It also manifests itself in product behaviors that may annoy ... or drive away customers. The most common problems associated with bad designs are poor application performance or data contention.

Typical fixes include database reorganizations or re-design, adding table indexes and changing table partitioning or clustering. However, in a big data environment, these options are typically not available in the appliance.  They will only exist in the base tables in the database proper. This is the crux of the matter: despite the vendor's claims that all your data can be moved into the appliance, this is seldom the best solution.

Having your data co-exist between the main database management system and the appliance is a best practice for several reasons.

Avoid single point of failure: The appliance is a single point of failure. Despite the best efforts of the vendor and your support staff the hardware, software, network connections or processes within the appliance may fail. If they do, how will queries be satisfied?  With data co-located in the database management system, query results can be satisfied by accessing the base tables. Granted, performance may suffer; however, the alternative is that your big data application will be unavailable until someone fixes the problem.

Provide for data offloads: Queries are not the only consumers of your data. One common use is offloading production data to a test environment. In addition, some third-party vendor software tools access the data in the database natively and directly, which is not available in the appliance because it stores the data in a proprietary format.

Backup and recovery: Most common backup and recovery utilities are based on data residing in the database. Again, third-party vendor tools are commonly used for high-performance backups and recovery, including index recovery. These backups are executed against the base tables and table spaces, not against the appliance.

Certain performance situations: There are some situations in which SQL queries are not executable in the appliance. Such limitations are defined in the manuals, and vary across vendor appliances and versions. In these cases, you have no choice; you must access the base tables and accept the performance degradation. Some of these limitations involve specific SQL syntax such as scrollable cursors, dynamic SQL, use of multiple character encoding schemes, certain correlated table expressions, and the use of certain built-in functions.

Database Design Decisions for Big Data

Since you will be keeping your data in both the DBMS and in the appliance, your standard database design rules still apply.  Interestingly, some of the rules are now expanded or more complex due to the existence of the appliance.  Some of the considerations are as follows:

The need for indexes: Indexes serve multiple needs: they can enforce data element uniqueness, they can enforce referential integrity relationships, they define primary keys, and they define additional access paths. This last item is important. In a big data environment the idea is to push long-running queries into the appliance for high-speed processing. If certain indexes exist simply to provide alternative access paths, they may no longer be needed. Part of database design or re-design should involve a review of so-called performance indexes. If the index is no longer being used by queries it can be dropped, thereby saving disk space, processing time, and recovery time if the table data needs to be recovered.

Removing appliance SQL limitations: Usually the business rules for the data determine portions of the database design. These include physical partitioning to permit faster queries and easier data purging, data element domain checking such as column constraints, and definition of primary and foreign keys to support referential integrity rules. Application developers then code SQL queries to access the data. In addition, users may have reporting tools that automatically generate SQL for queries and reports. Since the SQL query syntax and options will depend upon the database design, the designer needs to keep appliance limitations in mind.

Designing for high-speed appliance data load: Normal database load processes now contain an extra step: loading data into the appliance as well. How is this best accomplished? Depending upon your application and on your data volatility, you may wish to consider variations on the following:

  • Regular bulk load (daily, hourly) of the appliance, with the understanding that data there will not be completely current.
  • Trickle load, where row updates to the base tables are then propagated synchronously to the appliance. This keeps appliance data current, but row processing is much slower than bulk loading.

Summary

Big data and appliances have not removed the need for good database design.  Indeed, the designer has more things to consider: backup and recovery, index management, multiple methods of data access, and SQL limitations. The good news is that advances in database hardware and software can speed up data queries to a remarkable degree.

Additional Resources

-      Carnegie Mellon Capability Maturity Model

-      IBM TechDocs library: Information on autonomics—“ A First Look at Solution Installation for Autonomic Computing”, IBM document SG24-7099, available at the IBM Quality management solutions

-      American Productivity and Quality Center

-      American Society for Quality

 

Friday, November 15, 2013

User Defined Function


User  Defined Function in SQL Server.

Functions are basically a separate area where we write our code and according to the need we can call it.

SQL Server allows you to create functions, as we create functions in programming language. We can pass parameter to  function and function can also return the output. These are fast as these are precompiled.

Consider the following table prod_rep as an example:-

 



Fig-1

Type of User Defined Function in SQL Server

According to its return type we categorized the user defined function into following two parts.

Scalar Function

In scalar function, we can return a single value from the function.

For Example:-

Suppose I have to create function which will return the dept according to id.

Script:

Create function dbo.select_dept(@id int)

returns varchar(100)

as

begin

declare @dept  varchar(100);

select @dept=dept from prod_rep where id=@id;

return @dept

end

To Call Function:-

Select dbo.select_dept(5) as dept;

The output of this code as follows:-



Fig-2

Table Valued Function

When we need to return the complete table from the function in this situation we create table valued function which returns a table as result set.

For Example:-

In the following example I simply passing prod_year and according to the prod_year it will return the complete result set.

Script:

Create function dbo.ret_table (@prod_year int)

returns table

as

return

(

Select  * from prod_rep where prod_yet=@prod_year

);

To call Table Valued Function:-

Select * from dbo.ret_table(2001) ;

The output is:

 




Monday, October 28, 2013

Performance of the SQL MERGE vs. INSERT/UPDATE....?



Performance of the SQL MERGE vs. INSERT/UPDATE

MERGE is designed to apply both UPDATE and INSERTs into a target table from a source table.  The statement can do both at once, or simply do INSERTs or only UPDATEs.  One might even get the impression that INSERT and UPDATE are no longer needed.  Why not always use MERGE?
MERGE can also do DELETEs from the target table, but for today we won’t explore that case.
Sample Data and Some Basic Examples
To illustrate our case, let’s set up some very simplistic source and target tables, and populate them with some data that we can demonstrate with.
CREATE TABLE #Target
    (
      ID BIGINT PRIMARY KEY
    , Value INT
    );
CREATE TABLE #Source
    (
      ID BIGINT PRIMARY KEY
    , Value INT
    );
INSERT INTO #Target
    VALUES  ( 1, 2342 ),
            ( 2, 345 );
INSERT INTO #Source
    VALUES  ( 1, 975 ),
            ( 3, 683 );
When we MERGE into #Target, our matching criteria will be the ID field, so the normal case is to UPDATE like IDs and INSERT any new ones like this:
-- Standard MERGE of all #Source rows into #Target

MERGE #Target t
    USING #Source s
    ON s.ID = t.ID
    WHEN MATCHED
        THEN     UPDATE
            SET Value = s.Value
    WHEN NOT MATCHED
        THEN     INSERT
                ( ID, Value )
            VALUES
                ( s.ID
                , s.Value
                );
SELECT *
    FROM #Target;

This produces quite predictable results that look like this:
ID   Value
1    975
2    345
3    683

Let’s change the values in our #Source table, and then use MERGE to only do an UPDATE.

-- Change the values of our source rows

UPDATE #Source
    SET Value = CASE ID
                  WHEN 1 THEN 555
                  WHEN 3 THEN 999
                END

-- MERGE that only updates 

MERGE #Target t
    USING #Source s
    ON s.ID = t.ID
    WHEN MATCHED
        THEN     UPDATE
            SET Value = s.Value;
SELECT *
    FROM #Target;

The results now in #Target are:

ID   Value
1    555
2    345
3    999

Finally, we know we can also use MERGE to replace INSERT by omitting the MATCHED clause.  Let’s INSERT a new row into #Source and do this.

-- Insert a new row into our source 

INSERT #Source
    VALUES ( 4, 242 );

-- MERGE that only inserts 

MERGE #Target t
    USING #Source s
    ON s.ID = t.ID
    WHEN NOT MATCHED
        THEN     INSERT
                ( ID, Value )
            VALUES
                ( s.ID
                , s.Value
                );
SELECT *
    FROM #Target;

Unsurprisingly, the results now in #Target are these:

ID Value
1 555
2 345
3 999
4 242

Sorry if you know all this stuff and I’ve bored you, but we needed to get these basics out of the way.
Exploring the Performance of MERGE

To effectively test the performance of our alternatives, we’ll need to set up a test harness with a non-trivial number of rows in our #Source and #Target tables.  You can open the Test Harness 1.sql file in the resources section of this article and follow along.  The basic set-up data is as follows.
TRUNCATE TABLE #Source;
TRUNCATE TABLE #Target;

WITH    Tally ( n )
          AS ( SELECT TOP 1000000 ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL
                                                              ) )
                FROM sys.all_columns a
                    CROSS JOIN sys.all_columns b
             )
    INSERT INTO #Target
            SELECT 2 * n
                  , 1 + ABS(CHECKSUM(NEWID())) % 1000
                FROM Tally;
WITH    Tally ( n )
          AS ( SELECT TOP 1000000 ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL
                                                              ) )
                FROM sys.all_columns a
                    CROSS JOIN sys.all_columns b
             )
    INSERT INTO #Source
            SELECT CASE WHEN n <= 500000 THEN 2 * n - 1
                        ELSE 2 * n
                   END
                  , 1 + ABS(CHECKSUM(NEWID())) % 1000
                FROM Tall
We’ve purposely set up our source table so that the INSERTs it will do when merged with the target are interleaved with existing records for the first 500,000 rows.
Using SQL Profiler, we’ll compare two identical query scripts:
-- MERGE 
MERGE #Target t
    USING #Source s
    ON s.ID = t.ID
    WHEN MATCHED
        THEN     UPDATE
            SET Value = s.Value
    WHEN NOT MATCHED
        THEN     INSERT
                ( ID, Value )
            VALUES
                ( s.ID
                , s.Value
                );

-- TRUNCATE and re-populate Source and Target tables
-- UPDATE/INSERT

BEGIN TRANSACTION T1; 

UPDATE t
    SET Value = s.Value
    FROM #Target t
        JOIN #Source s
        ON s.ID = t.ID;
INSERT INTO #Target
        SELECT s.ID
              , s.Value
            FROM #Source s
                LEFT JOIN #Target t
                ON s.ID = t.ID
            WHERE t.ID IS NULL;
COMMIT TRANSACTION T1;

Both of these INSERT 500,000 rows and UPDATE 500,000 rows, the latter enclosed in a TRANSACTION.  We omit error handling and possible ROLLBACK of the transaction for simplicity.  The results that we obtained running the test harness in SQL Profiler (5 runs) are as follows:

Query           CPU     Reads    Writes  Duration
MERGE           4492    4513786  2578    4864
INSERT/UPDATE   3588    3072489  5496    3847
   
MERGE           4820    4514303  2899    5253
INSERT/UPDATE   3572    3072979  4007    4035
   
MERGE           4462    4513793  2571    4770
INSERT/UPDATE   3635    3072463  5497    3829
   
MERGE           4524    4513795  2931    4800
INSERT/UPDATE   3588    3072474  5505    3665
   
MERGE           4648    4513814  2939    4955
INSERT/UPDATE   3479    3072491  5522    3716
These indicate that MERGE took about 28% more CPU and 29% more elapsed time than the equivalent INSERT/UPDATE.  Not surprising considering all the complexity that MERGE must handle, but possibly forgivable for the additional safety, convenience and simplicity it represents (no need for a TRANSACTION with error handling).
Using MERGE as a Substitute for INSERT
To test the performance of MERGE as a substitute for INSERT, the same test harness that set up the #Target table can be used, but we’ll change the set up for the #Source table as follows.

WITH Tally (n) AS
(
    SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM sys.all_columns a CROSS JOIN sys.all_columns b
)
INSERT INTO #Source
SELECT CASE WHEN n <= 500000 THEN 2*n-1 ELSE 2000000+n END
    ,1+ABS(CHECKSUM(NEWID()))%1000
FROM Tally;

A quick check of the row counts generated from Test Harness #2.sql confirms that both MERGE and INSERT insert exactly 1,000,000 rows.  The results from a Profiler trace using these queries give us this:

Query           CPU     Reads    Writes Duration
MERGE           5054    6024150  2397   5576
INSERT          4992    6248001  7563   5507
   
MERGE           5226    6024165  3868   5529
INSERT          5383    6248005  7571   6298
   
MERGE           5257    6023557  3689   5473
INSERT          4851    6247403  7431   5546
   
MERGE           5273    6023589  2167   5662
INSERT          4914    6247440  7427   5281
   
MERGE           5179    6024619  1426   5476
INSERT          5039    6248483  6211   5954

In this case, CPU and elapsed time are probably too close to call.  CPU usage was about 3% more for the MERGE and elapsed time was about 3% less for the MERGE.
Using MERGE as a Substitute for UPDATE

Once again, to test this we’ll change the test harness only for the #Source table so that it creates the rows with exactly the same IDs as the #Target table.

WITH Tally (n) AS
(
    SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM sys.all_columns a CROSS JOIN sys.all_columns b
)
INSERT INTO #Source
SELECT 2*n
    ,1+ABS(CHECKSUM(NEWID()))%1000
FROM Tally;

After confirming that both MERGE and UPDATE update 1,000,000 rows, our Profile trace from running Test Harness #3.sql gave us these results.

Query           CPU     Reads   Writes  Duration
MERGE           1903    7982    2568    2010
UPDATE          1763    7954    2568    1840
   
MERGE           1904    7986    2576    2303
UPDATE          1809    7955    2560    1974
   
MERGE           1903    7968    2576    1951
UPDATE          1763    7940    2568    2005
   
MERGE           1918    7957    2568    2009
UPDATE          1731    7464    2584    1809
   
MERGE           1903    8005    2560    2023
UPDATE          1732    7977    2584    2063

Our MERGE was just about 8% more costly in CPU and 6% more in elapsed time.  The advantage probably goes to the UPDATE for this case.
Conclusions
While the MERGE statement is a wonderfully useful query syntax that does offer some safety features over the traditional approach of UPDATE then INSERT, it does come with a slight performance penalty.  Certainly the performance characteristics may vary depending on your special circumstances, so this article should only be treated as a guideline.  While I will probably continue to use it simply for the convenience it offers in the INSERT/UPDATE case, it is unlikely I’d consider using it as a replacement either for INSERT or for UPDATE.