SSIS and Sql Server Journey

Personal Notes of Sarabjit Singh and Bhavpreet Singh

Find the table to which a given column belongs – Sql Server

SELECT * FROM sys.columns WHERE NAME LIKE '%ColumnName%'

Result Set will contain object_ids in which the “ColumnName” exists. Use following queries to get the corresponding object names.

SELECT * FROM sys.objects WHERE object_id in (39879409,

More civilized code goes here;)

 SELECT * FROM sys.objects O
  INNER JOIN sys.columns C
  ON O.object_id = C.object_id 
  WHERE C.NAME LIKE '%ColumnName%'

Leave a comment »

Clustered and Non Clustered Indexes

Clustered Index

A clustered index is a special type of index that reorders the way records in the table are physically stored. Therefore table can have only one clustered index. The leaf nodes of a clustered index contain the data pages. A non-clustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a non-clustered index does not consist of the data pages. Instead, the leaf nodes contain index rows

  • Only 1 allowed per table
  • Physically rearranges the data in the table to conform to the index constraints
  • For use on columns that are frequently searched for ranges of data
  • For use on columns with low selectivity

Non-Clustered Index

A nonclustered index is analogous to an index in a textbook.

Non-clustered indexes are created outside of the database table and contain a sorted list of references to the table itself.
However, it’s important to keep in mind that non-clustered indexes slow down the data modification and insertion process, so indexes should be kept to a minimum

  • Up to 249 allowed per table in SQL 2000,2005 and upto 999 allowed in SQL 2008
  • Creates a separate list of key values with pointers to the location of the data in the data pages
  • For use on columns that are searched for single values
  • For use on columns with high selectivity

In following example column FileName is varchar(400), which will increase the size of the index key bigger than it is allowed. If we still want to include in our cover index to gain performance we can do it by using the Keyword INCLUDE.

USE AdventureWorks
CREATE INDEX IX_Document_Title
ON Production.Document (Title, Revision)
INCLUDE (FileName)

Non-key columns can be included only in non-clustered indexes. Columns can’t be defined in both the key column and the INCLUDE list. Column names can’t be repeated in the INCLUDE list. Non-key columns can be dropped from a table only after the non-key index is dropped first. For Included Column Index to exist there must be at least one key column defined with a maximum of 16 key columns and 1023 included columns.

Avoid adding unnecessary columns. Adding too many index columns, key or non-key as they will affect negatively on performance. Fewer index rows will fit on a page. This could create I/O increases and reduced cache efficiency. More disk space will be required to store the index. Index maintenance may increase the time that it takes to perform modifications, inserts, updates, or deletes, to the underlying table or indexed view.

Another example to test:
Create following Index on Database AdventureWorks in SQL SERVER 2005

USE AdventureWorks
ON Person.Address (PostalCode)
INCLUDE (AddressLine1, AddressLine2, City, StateProvinceID)

Test the performance of following query before and after creating Index. The performance improvement is significant.

SELECT AddressLine1, AddressLine2, City, StateProvinceID, PostalCode
FROM Person.Address
WHERE PostalCode BETWEEN '98000'
AND '99999';

One of the hardest tasks facing database administrators is the selection of appropriate columns for non-clustered indexes. You should consider creating non-clustered indexes on any columns that are frequently referenced in the WHERE clauses of SQL statements. Other good candidates are columns referenced by JOIN and GROUP BY operations.

You may wish to also consider creating non-clustered indexes that cover all of the columns used by certain frequently issued queries. These queries are referred to as “covered queries” and experience excellent performance gains.
SQL Server provides a wonderful facility known as the Index Tuning Wizard which greatly enhances the index selection process. To use this tool, first use SQL Profiler to capture a trace of the activity for which you wish to optimize performance. You may wish to run the trace for an extended period of time to capture a wide range of activity. Then, using Enterprise Manager, start the Index Tuning Wizard and instruct it to recommend indexes based upon the captured trace. It will not only suggest appropriate columns for queries but also provide you with an estimate of the performance increase you’ll experience after making those changes!

When Non Clustered Indexes should be used:

Before you create nonclustered indexes, understand how your data will be accessed. Consider using nonclustered indexes for:

  • Columns that contain a large number of distinct values, such as a combination of last name and first name (if a clustered index is used for other columns). If there are very few distinct values, such as only 1 and 0, most queries will not use the index because a table scan is usually more efficient.
  • Queries that do not return large result sets.
  • Columns frequently involved in search conditions of a query (WHERE clause) that return exact matches.
  • Decision-support-system applications for which joins and grouping are frequently required. Create multiple nonclustered indexes on columns involved in join and grouping operations, and a clustered index on any foreign key columns.
  • Covering all columns from one table in a given query. This eliminates accessing the table or clustered index altogether.


Index tuning:

Covered Index:

Leave a comment »

Interview question – Quick Links


Running value function:

SSRS Formatting

SSRS Functions:

SSRS Tips and Tricks

Dataware housing

SSIS — –V  Good

SSIS Event Handlers and Error Logging:

Fuzzy Grouping:

Fuzzy Look Up:

SSIS Performance Tuning:

SSIS Check Points:

Multicast Transformation vs Condition Split:


Sql Server Performance tuning: <–do Check the comments to the post



User Defined Datatypes:

Performance Related:

Rank, Dense_Rank, row_number:




Single tier – MS access, MS excel- Single file – direct acess —usually single user application

2 tier – Client – Server application  –multi user apps.

Multi Tier applications?

–      3 tier or N tier

–      Presentation Layer , Business Logic and Database Layer

  • Presentation Layer – UI or Front end
    • Handles User interactions – Screen validations
    • Should not contain business logic
  • Business Logic Layer – Middle ware or Back end
    • Set of rules to process the information
    • Should not contain presentation or data access code
  • Data Layer
    • Database , Datasets , RDBMS\DBMS
    • Provide access to back-end i.e. database

3 Tier Architecture Advantages:

–      Easy to maintain

–      Components are resusable

–      Faster development by division of work

  • Web designer can do presentation
  • Software engineer can do the logic part
  • DB devloper and admin can handle the backend and data modeling

Design Patterns?

Solution to common problems defined by gang of four programmers.

  • Factory  –  Creates an instance of several derived classes
  • Abstract Factory –  Creates an instance of several families of classes
  • Singlton – single instance of a class can exists
  • Builder – Separates object construction from its representation

data modeling?  

  • Process of defining and analysing data requirements to support business  processes.
  • ER diagrams are designed in VISIO to model the data requirements

SDLC Models?

Software development life cycle :

  • Waterfall model
  • Rapid application development Model
  • Incremental\Iterative Model
  • Proto-type Model
  • Spiral Model

Waterfall model – requirement garthering, designing, implementation, verification\testing, maintainace

  • Each phase has to be completed before the next starts
  • A review takes place after a particular phase completes and the next starts. – review is with the aim to check if the project is on right path.
  • No overlapping of phases
  • Easy to use and implement
  • Not good for projects where requirements changes frequently.
  • Poor model for complex projects

Incremental\Iterative Model This method doesn’t require to start with full requirement.

  • Also called Multi – waterfall model
  • Development and delivery is broken down into increments

Proto-type Model In this model , a proto-type (an early approximation of a final system or product) is build for user acceptance and the improvements are done untill the users are convinced.

Spiral Model  Combined features of proto-type and waterfall model.

  • Similar to incremental model
  • Planing, Risk analysis, engineering and evaluation

Rapid application development Model Based on component integration techniques

  • Short development cycle and linear squential model
  • Phases: business modeling , data modeling , process modeling, application modeling, testing.

White box testing(tranasparent testing) – internal – logical\data flow testing

Black box testing – input and desired output testing, not concerened with the inner code.

Grey box à combination of white and black box

SSIS – Tasks?

  • Data Flow task –extracts data from source, allow transformations, and the load data into target data destination
  • File System task – copy/move/delete files and folder over a file system
  • FTP task  – copy/move/delete files and folder over a FTP
  • Execute package
  • Execute SQL
  • Execute Script
  • Send Mail
  • Execute sql server agent job
  • Notify operator
  • Back up database
  • Bulk insert
  • For Loop:  Repeat a task a fixed number of times
  •  Foreach Loop: Repeat a task by enumerating over a group of objects

SSIS Auditing and Error Handling?

SSIS deploying packages:

  1. Database – schedule using Jobs
  2. File system – schedule using control M
  3. Can be called via command prompt or execute the .dtsx package

Sql Server

Normalisation – process of organising data in form of table and define relationship and hence form a realtional database.

RDBMS – ACID – atomicity , consistency, isolation, durability

1 NF – eliminating redundant groups

2 NF – eliminating redundant data

3NF – Eliminate Columns Not Dependent On Key

Trigers – DDL (drop table, create table, alter table or login events)

DML (insert update delete)

Instead off – fired instead off insert update delete

SSRS – variables – overview?

Delete duplicate rows

WITH CTE (COl1,Col2, DuplicateCount) AS
 (SELECT COl1,Col2,
 FROM DuplicateRcordTable)
 DELETE FROM CTE WHERE DuplicateCount >1

Master – contain information for all databases

Model – template for database —

MSDB –dts packages, jobs

Resource: read only – contain all system objects – sys.objects

Split Function: write a split function to get output of (1,2,3)  in a table in different rows

Scope Identity and @@identity


It returns the last identity value generated for any table in the current session, across all scopes.

Let me explain this… suppose we create an insert trigger on table which inserts a row in another table with generate an identity column, then @@IDENTITY returns that identity record which is created by trigger.


It returns the last identity value generated for any table in the current session and the current scope.

Let me explain this… suppose we create an insert trigger on table which inserts a row in another table with generate an identity column, then SCOPE_IDENTITY result is not affected but if a trigger or a user defined function is affected on the same table that produced the value returns that identity record then SCOPE_IDENTITY returns that identity record which is created by trigger or a user defined function.


It returns the last identity value generated for a specific table in any session and any scope.

In other words, we can say it is not affected by scope and session, it only depends on a particular table and returns that table related identity value which is generated in any session or scope.

Date , date part functions

Case and if statement: difference , syntax

SELECT   ProductNumber, Category =
CASE ProductLine
WHEN 'R' THEN 'Road'
WHEN 'M' THEN 'Mountain'
WHEN 'T' THEN 'Touring'
WHEN 'S' THEN 'Other sale items'
ELSE 'Not for sale'
FROM Production.Product
ORDER BY ProductNumber;

‘If statement’ cannot be used inside select statement <— v important

Table variable and #table difference – advantages and disadvantages:

major disadvantage of table variable

  • Table variables are Transaction neutral. They are variables and thus aren’t bound to a transaction.
  • Temp tables behave same as normal tables and are bound by transactions

Power of CTE

Can two different SPs create same #table or ##table (temp tables)

Rank dense rank row number

Pivot and unpivot

Find the top third row from a dataset

Find the day(Monday, Tuesday etc) on the first day of the current month (lets say today is 25th feb)

Steps for optimisation

OOPS Concepts? Inheritance, polimorphism (operation overloding and overridding) , encapsulation,abstraction

Boxing – converting a value type to reference type

unboxing — reverese i.e. converting a reference type to value type

Delegates – function pointers

Cloud computing

Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over a network (typically the Internet). The name comes from the use of a cloud-shaped symbol as an abstraction for the complex infrastructure it contains in system diagrams. Cloud computing entrusts remote services with a user’s data, software and computation. There are many types of public cloud computing:[1]

  • Infrastructure as a service (IaaS)
  • Platform as a service (PaaS)
  • Software as a service (SaaS)
  • Network as a service (NaaS)
  • Storage as a service (STaaS)
  • Security as a service (SECaaS)

In the business model using software as a service, users are provided access to application software and databases. The cloud providers manage the infrastructure and platforms on which the applications run. SaaS is sometimes referred to as “on-demand software” and is usually priced on a pay-per-use basis. SaaS providers generally price applications using a subscription fee.


  • SaaS allows a business the potential to reduce IT operational costs by outsourcing hardware and software maintenance and support to the cloud provider.
  • This enables the business to reallocate IT operations costs away from hardware/software spending and personnel expenses, towards meeting other IT goals.
  • In addition, with applications hosted centrally, updates can be released without the need for users to install new software.


  • One drawback of SaaS is that the users’ data are stored on the cloud provider’s server. As a result, there could be unauthorized access to the data.
  • Basically from the security point of view it is not the safest as compared to others.

Software as a Service (SaaS)

Software as a service (SaaS) sometimes referred to as “on-demand software”, is a software delivery model in which software and associated data are centrally hosted on the cloud. SaaS is typically accessed by users using a thin client via a web browser.e.g. Facebook.

Leave a comment »

Microsoft Azure – An Intro

Microsoft Azure

Manageability, High Availability :
Higher availability with less management required. (less labor required to maintain network security , maintenance of servers etc ).No Physical servers to buy, install, patch, maintain or secure.


1 GB to 50 GB support and scalable with partitioning to larger DBs.

Global Scalability:

If you want to target a specific region, you can deploy your database at the closest data center

Familiar Development Model : Built on T-Sql


Doesn’t support XML datatypes and many Sql commands.

No support for backup and restore. Need to use following third party components.

  1. For Restore: SQL Azure Migration Wizard (SSMS 2008 R2 should be installed) To Migrate database from Sql Server to Windows Azure. Other way out is to generate Script from SSMS 2008 R2 for windows azure and then deploy the script on Azure server. But not a robust solution.
  2. For DB Sync: Microsoft Sync Framework Power Pack for SQL Azure to synchronize data between a data source and a SQL Azure installation. But if the source DB is more than 50 GB, the client needs to manage the partitioning manually.
  3. For Partitioning: Enzo SQL Shard Third party component for partitioning.
  4. For Backup: Found following third party components\scripts. We need to drill down to these to check the best suitable for us. Used to backup from Sql Azure to Sql Blob Storage. Please note: Microsoft charges ~$0.10 per GB for Blob Storage.
  5.  Red Gate Backup tool: Considered as best but presently available as beta version. Probably would be paid software once it comes out of Beta.
  6. Script to automate back up to sql blob storage
  7. SQL Azure Migration Wizard: The SQL Azure Migration Wizard gives you the options to analyzes, generates scripts, and migrate data (via BCP) from:
  1. SQL Server to SQL Azure
  2. SQL Azure to SQL Server
  3. SQL Azure to SQL Azure

Other options not available when working with a SQL Azure database include: (one in red would affect us)

  1. Full-text indexing
  2. CLR custom types (however, the built-in Geometry and Geography CLR types are supported)
  3. RowGUIDs (use the uniqueidentifier type with the NEWID function instead)
  4. XML column indices
  5. Filestream datatype
  6. Sparse columns
  7. SQL Azure Doesn’t Support Database Mirroring or Failover Clustering


As per the example below Pricing is quite simple:

Database + Data transfer based per day charging

Costing Examples:

Following page makes the pricing a bit confusing as it states that we would be charged on Bandwidth usage, catching etc in addition to the Database and Data transfer.

Costing Calculator:

Pricing can also be based on Computation Power, Db storage, Blob Storage, Bandwidth

  1. Compute: For front end application hosting. You can have 0 to 10 Instances of VM. à Required to consider in cast we plan to host our front end also.
  2. Database :2 plans.


    Standard pay-as-you-go pricing

    Web Edition (up to 5 GB)

    $9.99 per 1 GB of database per month

    Business Edition (up to 150 GB)

    $99.99 per 10 GB of database per month

    (Maximum charge of $499.95 per database)*

  3. Data transfers:  All inbound data transfers, i.e. data going into the Window Azure platform datacenters, are free. Price for outbound data transfers, i.e. data going out of the Windows Azure platform datacenters, is shown below.
    Pricing details for outbound data transfers
    North America and Europe regions: $0.12
    Asia Pacific Region: $0.19
  4. Bandwidth: 0 to 2000 Gb. Charged as per bandwidth used.
    Bandwidth used between SQL Azure and Windows Azure or Windows Azure AppFabric is free within the same sub-region or data center. When deploying a Windows Azure application, locate the application and the SQL Azure database in the same sub-region to avoid bandwidth costs. For more information, see Accounts and Billing in SQL Azure.
  5. Catching services, Storage Services etc are also provided and charged.

Some important tools that you can use with SQL Azure for Tuning and Monitoring include:

  1. SSMS Query Optimizer to view estimated or actual query execution plan details and client statistics
  2. Select Dynamic Management views to monitor health and status:

Some Depreciated features in Sql Azure

  1. ‘ANSI_NULLS’ is not a recognized SET option.
  2. Deprecated feature ‘SET ANSI_PADDING OFF’ is not supported in this version of SQL Server.
  3. Deprecated feature ‘More than two-part column name’ is not supported in this version of SQL Server.  This a significant change if your using Schemas.
  4. Deprecated feature ‘Data types: text ntext or image’ is not supported in this version of SQL Server.
  5. Deprecated feature ‘Table hint without WITH’ is not supported in this version of SQL Server.
  6. Global temp Tables are not supported in this version of SQL Server.
  7. A full list id found at Deprecated Database Engine Features in SQL Server 2008 – MSDN.

Important Links:

Video tutorial for scripting out DB Schema for migration to SQL Azure using SSMS R2 : Free course to learn basics of SQL Azure by Microsoft. : sql azure errors and trouble shooting : SQL Statements not supported for sql azure

1 Comment »

Very informative post

Bhavesh Patel's Blog

DBCC (Database consistency checker) are used to check the consistency of the databases. The DBCC commands are most useful for performance and trouble shooting exercises.

I have listed down and explained all the DBCC commands available in SQL Server 2005, with examples.

The DBCC Commands broadly falls into four categories:

  • Maintenance
  • Informational
  • Validation
  • Miscellaneous

Maintenance Commands

Performs maintenance tasks on a database, index, or filegroup.

1. CLEANTABLE – Reclaims space from the dropped variable-length columns in tables or index views.

DBCC CLEANTABLE (‘AdventureWorks’,’Person.Contact’,0)

2. DBREINDEX – Builds one or more indexes for the table in the specified database. (Will be removed in the future version, use ALTER INDEX instead)

USE AdventureWorks

DBCC DBREINDEX (‘Person.Contact’,’PK_Contact_ContactID’,80)

3. DROPCLEANBUFFERS – Removes all clean buffers from buffer pool.


4. FREEPROCCACHE – Removes all elements from the procedure cache


5. INDEXDEFRAG – Defragments indexes of the…

View original post 590 more words

Leave a comment »

Grant Execute to Given User – Dynamic Query

declare @username varchar(255)
set @username = ‘YourUser’
SELECT ‘grant exec on ‘ + QUOTENAME(ROUTINE_SCHEMA) + ‘.’ +

The above query with result in all the User Defined Stored Procedures with “Grant Execute Permissions.

Run the resulting commands and ‘YourUser’ has the Execute Rights 🙂

Leave a comment »

SSIS Expression To Append Time and Date with File Name Dynamically

Use Following expression to append date and time at the time of package execution with the file name.

+(DT_STR,2,1252)DATEPART("mi" ,GETDATE())+"-"

Evaluated Expression would look some thing like this:


Using Variable:

@[User::OutputFileLocation]  : Variable to hold file location

@[User::OutputFileName]  : Variable to hold file name


+ @[User::OutputFileName]
+(DT_STR,2,1252)DATEPART("mi" ,GETDATE())+"-"

Hope it Helps 🙂


BIDS 2008 does not support XML datatype in OLE DB Command

Senario Overview:
We have some SP which expect some XML Type input parameters and also return XML Type parameter.
Having a requirement to insert data in bulk to Database we planned to use SSIS.
Working on the package, I found that SSIS 2008 doesn’t supports XML datatypes in OLE DB Command.
Error Details:
Operand type clash: int is incompatible with xml
Operand type clash: int is incompatible with xml
Googled.. and found that the issue has been solved in latest releases .. 🙂
But, No luck after installing the latest patch.
Found the following work around:
1. Create a new SP and use NVARCHAR(MAX) instead of Input XML Datatypes and call this SP from SSIS ( We could have modified the original though, but preferred avoiding dependencies issues)
2. Call the main SP from the new SP after type casting to XML
Now new issue:
As we have one XML Output datatype . It still went on giving error even after casting to NVARCHAR(MAX)
Finally, NVARCHAR(4000) worked fine.
Not a robust solution but good enuf to full fill our requirements 🙂
Leave a comment »

Combining Multiple Rows into One Row – Sql Server

In this example i am going to describe how to combine multiple  rows in one column in MS SQL.

Here is the scenario
uSER_id and respective Hobbies are listed in the table.

i want to combine all the hobbies of a user

declare @User table
UserId int,
Hobbies VARCHAR(100)

INSERT INTO @User (UserId,Hobbies) values
(1, ‘gaming’) , (1, ‘cricket’),
(2, ‘movies’) , (2, ‘cricket’), (2, ‘gardening’) , (2, ‘football’),
(3, ‘gaming’) , (3, ‘football’)

combine multiple rows

Allhobbies = substring( ( SELECT ‘, ‘ + Hobbies
FROM @User T1
WHERE T1.UserId = T2.UserId FOR XML path(”), elements
FROM @User T2

Leave a comment »