SSIS TUNING

DESIGN PROCESS:

The process for designing SQL Server Integration Services (SSIS) packages is typically iterative. You start by getting the components working individually or in small sets, then concentrate on ensuring that the components will work in the correct sequence. During later iterations, you add in more components or adjust properties to perform error handling. Then, in a final pass, you might add in abstractions, taking advantage of variables and expressions to enable runtime changes for your package. But your work is not yet complete at this stage.

EXTERNAL FACTORS AFFECT:

There are various factors that can affect the performance of SSIS packages.                    Several of these factors are completely external to SSIS, such as
a) The source systems that you’re using for data extractions
b) Disk speed and configuration, NICs, network bandwidth
c) And the amount of memory available on the server executing the packages.

CONTOL FLOW LEVEL PERFORMANCE TIPS

a)    Running executable in parallel to speed up control flow processing
The only way within the package to speed up control flow processing is to have tasks or containers (collectively called executables) run in parallel.
The package property that determines how many executables can run in parallel is MaxConcurrentExecutables. The default is -1, which means that the control flow engine will use the number of logical processors on the server plus 2. For example, if you’re executing the package with the default setting on a quad-core server, you can have up to six executables running in parallel.
However, one scenario with potential for improvement is when you have a package with tasks that must wait for a response from external systems and your server is dedicated to executing packages. In that case, you might be able to boost parallelism by replacing the MaxConcurrentExecutables value with a higher number. Start by increasing the value to the number of processors plus 3, then test the package in Business Intelligence Development Studio (BIDS) to see whether the number of tasks executing in parallel increased. If so, continue to increment the property value by 1, test, and repeat until no further parallelization is possible, or you have no more tasks to run in parallel.


DATA FLOW LEVEL

MEMORY IMPACT
SSIS uses a pipeline engine with an in-memory buffer architecture to efficiently manage Data Flow Task operations. Performance of the pipeline engine largely depends on the number of records moving through the pipeline and the number of buffers required to accomplish the transformations and to move the data to the destinations. Therefore, to optimize performance, you need to understand how data flow components and the Data Flow Task properties affect pipeline throughput and buffer requirements.
Finding Bottlenecks and Establishing Baselines
Before you start making changes to the data flow, you should establish some baselines so that you can identify the slower components in the data flow and better assess the impact of any subsequent changes.
Let’s assume that your package has a Data Flow Task with a single source, one transformation n, and one destination. You should execute the Data Flow Task multiple times to calculate an average for various conditions, which you record in a table. Here are the steps:
1.       Create a table with the column and row headings.
2.      Execute the package a total of 10 times, adding the times to column A. This will determine the time required to execute the package when all three components are intact.
3.      Replace the destination component with a Multicast transformation, which allows the task to complete successfully without adding any noticeable overhead to the task execution time. Execute the package a total of 10 times, adding the times to column B.
4.      Remove the first transformation in the data flow and connect the source directly to the Multicast transformation to measure only the data extraction process. Execute the package a total of 10 times, recording the times in column C.
5.       Calculate the average for each of the three base measurements.
6.      Derive the processing time of the transformation component by subtracting the averaged value in column C from the averaged value in column B.
7.       Derive the time required to load the destination by subtracting the averaged value in column B from the averaged value in column A.




Most packages have more than one transformation, and some packages have multiple sources or destinations. For these packages, you simply add more columns to the table as needed and work through combinations of components to isolate processing time for each component.
Based on the test results in the table, you can identify the component that consumes most of the processing time and focus your tuning efforts on it. Furthermore, as you implement changes, you’ll be able to quantify the difference in performance and determine whether the resulting difference justifies altering your package permanently.

Fixing Source Component Bottlenecks

Bringing data into the data flow pipeline is naturally constrained by the speed with which the source can actually deliver the data requested by the source component. The processing time required to extract the data is the fastest possible speed for the Data Flow Task.

1.     Use a SQL statement in the source component

  • You only select the columns that you need
  • You already perform the necessary data type conversions using CAST or CONVERT
  • You can already perform complex logic such as joining different tables together
  • You can filter the data
  • You can sort the data if needed

2.     Get as many rows as you can into a buffer

The SSIS data flow uses memory buffers to manage the data flowing through the data flow. It’s very important that you can get as many rows into one single buffer. Imagine a line of people passing buckets to put out a fire. The more water you can get into a bucket, the quicker you can put out a fire. To achieve this goal, you can take two actions:
  • Only select the columns you need (that’s why #1 is important). If you bring unnecessary columns into the data flow, you waste precious buffer space.
  • Keep the columns as small as possible. If the total row length is smaller, you can fit more rows into a buffer. Try converting the columns to a smaller size in the source if possible (see also #1). The data flow takes the maximum size of each column to calculate the total row length. For example, if a column has the size VARCHAR(4000), the data flow will assume a length of 4000 bytes, even if the actual rows contain much less data.
#3 Don’t use the default buffer settings
While we’re on the topic of buffers, let’s look at some data flow properties:
  • DefaultBufferSize: the default size of a buffer which is set to 10MB.
  • DefaultBufferMaxRows: the default number of rows the data flow will try to put in a buffer. This is set to 10,000 rows.
When calculating the buffer size, the data flow uses these two properties. If one property is reached, the size of the buffer is not further enlarged. For example, if the row size is 1MB, the data flow wil put 10 rows into the buffer, since the maximum buffer size will then be met. If the row size is only 1KB, the data flow will use the maximum of 10,000 rows.
These two default settings haven’t been changed since the release of SSIS in 2005. Apparently, a time where memory was still crazy expensive. The first thing you’ll want to do is set these properties to a much higher value. There’s no “golden” number that will guarantee optimal performance all the time. Sometimes a smaller buffer is better if the source is really slow, sometimes a bigger buffer is better if you don’t have many columns and a very fast source. Keep in mind that buffers can be too big as well. Think about the line of people passing buckets. If the buckets are too big, it takes too long to fill them with water and the fire still rages on. In reality, it takes some testing to find the optimal settings for your data flow. Typically I run my package first with the default settings and I monitor the execution time. Then I enlarge the buffers – to 30MB and 50,000 rows for example – and see what effect it has. And then I try to determine if the package is better off with even larger buffers, or if they need to be a bit smaller.
Luckily, all this testing can now largely be avoided by one new property introduced in SQL Server 2016: AutoAdjustBufferSize. When this property is set to True, it will automatically adjust the size of the buffer so that the DefaultBufferMaxRows setting is met. For example, if you have set it to 40,000 rows, the data flow will automatically scale the buffer size so that the buffers will contain 40,000 rows. Unfortunately, this property is set to False by default.
In conclusion, make those buffers bigger. This can really have a drastic impact on the performance.

#4 Avoid blocking transformations

Even if you followed the previous tips to the letter, performance can still be terrible if you use a blocking transformation. These are the types of transformations in a data flow:
  • Non-blocking. The buffers are unchanged and “flow” through the transformation. The data flow performs the transformation on a memory buffer and moves on to the next. Examples are: derived column, data type conversion, conditional split …
  • Semi-blocking. The buffer can change in size; either in the number of columns or in the number of rows. However, there is no blocking behavior. Examples are the Union All, Merge and Merge Join.
  • Blocking. The bane of SSIS performance. These transformations need to read all the buffers before they can output even one single buffer. This can lead to memory pressure which causes the SSIS package to spill to disk. In other words, your package runs now for hours instead of minutes or seconds. Examples are Sort and Aggregate.
All transformations can also be divided into two categories: synchronous and asynchronous. Synchronous means the buffer doesn’t change in size at all. All non-blocking transformations are synchronous. Sometimes this isn’t obvious: the Multicast component creates multiple output paths, which seems like multiple buffers are being created. However, behind the scenes it’s still the same memory buffer. Asynchronous components do change the size of the buffer. Columns can be added or removed. Rows can be added or removed as well. All blocking and semi-blocking transformations are asynchronous.
The rule is simple: avoid asynchronous transformations. In most cases, these are blocking anyway. The Union All has the benefit of the doubt: it’s the least bad asynchronous transformation. However, if you can design your package so you can avoid it, it’s recommended that you do. The Merge and Merge Join are to be avoided as well, because they require sorted input. You can sort in the data flow, but the Sort component is a blocking transformation, so it needs to be avoided. An alternative is to sort in the source component, for example by adding an ORDER BY clause to the SQL statement. The data flow needs to know the data is sorted though. You can do this by setting the IsSorted property to true on the output in the Advanced Editor (to open the advanced editor, right-click on the source and select it from the context menu).

#5 Don’t use the OLE DB command transformation

Don’t use it. Ever. Unless you have really small data sets and you’re 100% sure they won’t grow. The OLE DB command executes the SQL statement for every row in the buffer. This means if 1 million rows pass through the transformation, the SSIS package will sent 1 million SQL statements to the server. Most likely the transaction log will blow up and it might even fill your disks completely.
A work around is to write the records to a staging table and then use a Execute SQL Task to do a set-based SQL statement. In other words, use the Execute SQL Task to run one single UPDATE statement instead several thousand UPDATE statements.

Reduce the number of columns. Regardless of the type of data source you’re using, select only the columns that are needed for transformations and columns that must be loaded into the destination. That way, memory isn’t wasted on data that won’t be used downstream.
Reduce the number of rows. When working with a relational source, use a WHERE clause to filter the data to the rows you need rather than rely on a transformation in the pipeline to eliminate rows. When working with large flat files, break up the source file where possible and process the resulting files using sequential Data Flow Tasks to minimize I/O contention.
Reduce column width. The column width is affected by the data type. You might need to recast data types in a relational source query or manually edit column widths for other sources to reduce the column width to the smallest possible size that can still accommodate the data coming from the source. SSIS works more efficiently with smaller data types.
Use the SQL Command option instead of the Table or View option for relational sources. The OLE DB source lets you retrieve data from a table or view in the OLE DB data source or use a SQL command to retrieve data from the OLE DB data source. When you use the Table or View option to retrieve data from a table or view, the data flow pipeline engine issues a command using the OPENROWSET function to get the data. When you use the SQL Command option, the engine uses the faster sp_executesql procedure.
Use the fast parsing mode for Flat File sources. If the data environment doesn’t need to support local-specific formats for integer, date, and time data types, you can boost performance by setting the FastParse property to True for relevant columns in the Advanced Editor for the Flat File source.

Fixing Transformation Bottlenecks

The SSIS runtime engine executes the package. It executes every task other than data flow task in the defined sequence. Whenever the SSIS runtime engine encounters a data flow task, it hands over the execution of the data flow task to data flow pipeline engine. The data flow pipeline engine breaks the execution of a data flow task into one more execution tree(s) and may execute two or more execution trees in parallel to achieve high performance. Now if you are wondering what an execution tree is, then here is the answer.

An execution tree, as name implies, has a similar structure as a tree. It starts at a source or an asynchronous transformation and ends at destination or first asynchronous transformation in the hierarchy. Each execution tree has a set of allocated buffer and scope of these buffers are associated the execution tree. Also each execution tree is allocated an OS thread (worker-thread) and unlike buffers this thread may be shared by any other execution tree, in other words an OS thread might execute one or more execution trees. The process of breaking data flow task into an execution tree has been enhanced to create an execution path and sub-path so that your package can take advantage of high-end multi-processor systems.

 

The performance of transformations in the data flow depends on the efficient use of buffers. High-performance transformations, known as synchronous transformations , operate on data in an existing buffer, thereby avoiding the overhead of copying data from one buffer to another. Synchronous transformations can either manipulate data in place (e.g., Derived Column transformation) or add a new column (e.g., Audit transformation). But even this group of transformations consists of two subgroups: streaming transformations, which are the fastest of all the transformations, and row-based transformations, which must perform an operation row by row and consequently are slower.

After a row-based transformation completes processing all rows in a buffer, that buffer becomes available for processing by the next transformation in the data flow. If the next transformation is a row-based transformation, the data stays in the same buffer. Performance of the data flow is optimal in this case. If the next transformation in the data flow is an asynchronous transformation, a new buffer is required (which takes time to create and load) and a new thread is introduced into the data flow.
Moving from one memory buffer to another is still faster than the I/O operations required to store data on disk between transformations, which is why SSIS in general performs very well. The problem arises when the data flow engine must handle large data volumes, causing it to run out of memory to allocate for buffers. When memory is no longer available, the data flow engine swaps data to disk and consequently slows the data flow performance.
There are two types of asynchronous transformations—partially blocking transformations and blocking transformations. Partially blocking transformations have multiple inputs, but a single output. After the transformation writes a row to the new buffer, the row is available to the next transformation even while the partially blocking transformation continues processing remaining rows. Blocking transformations, by contrast, must read all rows in the pipeline before writing rows to the new buffer, which prevents downstream components from starting. Consequently, blocking transformations are most likely to be the cause of performance problems in the data flow when large volumes of data are involved.
When baseline performance testing identifies transformations as the bottleneck, there might be another way to design the data flow to accomplish the same goal. Consider the following design alternatives:
Perform transformations in the source query. When you’re using a relational source, performing transformations in the source query lets you take advantage of the database engine’s capabilities while freeing up resources on the package execution server if it’s separate from your source server. Data type casting, cleansing (using ISNULL or TRIM functions, for example), aggregations, and sorts are common operations that you can perform in the source query. This approach will work only with relational sources. To use it, you need to set the IsSorted property of the source component’s output to True. You also need to configure the SortKeyPosition properties for each output column using the Advanced Editor for the source component.
Separate aggregate operations. If you need to perform different aggregations (each of which is associated with a separate destination) but you can’t perform the aggregations in the source query, you can create a separate Aggregate transformation for each destination rather than create multiple outputs for the transformation. That way, downstream processing for the more summarized aggregations (with fewer rows) can continue without waiting for the less summarized aggregations (with more rows) to finish. In addition, you can configure the AutoExtendFactor property to tune memory separately for each Aggregate transform.
Remove unneeded columns. After asynchronous operations, there might be columns that were used for a transformation and left in the pipeline even though they won’t be loaded into the destination. By eliminating these columns, you can minimize the memory requirements for the new buffer.
Reduce the number of rows. If the data flow doesn’t have a relational source, you won’t be able to use a WHERE clause in the source query to eliminate rows. However, row reduction is still an important goal for performance tuning, so add a Conditional Split transformation immediately after the source component to filter data as early as possible in the data flow.
Optimize the cache for Lookup transformations. Using the Lookup transformation without a cache is slow, so having all rows available in memory using the Full Cache option is a better approach. However, pipeline processing is blocked until the cache is loaded. So, you should follow some of same recommendations I discussed in the “Source Data Extraction” section: Reduce the number of columns and rows to manage memory usage and use the SQL command to retrieve data more quickly.
Preload the cache for Lookup transformations. If you’re using SQL Server 2008 or later, you can use the Cache Transform transformation to preload the cache in a separate data flow. You can even use a separate package to load the cache as a special type of Raw File that persists until you delete it. This cache file loads into memory much more quickly than loading the cache directly from an OLE DB source.
Replace the Slowly Changing Dimension (SCD) transformation with Merge Join and Conditional Split transformations. The SCD transformation is notoriously slow with large dimensions because it has to perform a row-by-row lookup to check whether a dimension row in the pipeline already exists in the target dimension. You could use a Lookup transformation instead, but the memory and time required to load the cache might still create a performance bottleneck. An alternative design is to use a Merge Join transformation with a LEFT JOIN to match sorted source records on the left with sorted dimension records on the right. You then add a Conditional Split transformation to evaluate columns added from the dimension source. If those columns are null, there is no match, which means the row is a new record. If there’s a match, you use the Conditional Split transformation to separate Type 1 from Type 2 processing.
Use an Execute SQL task instead of an OLE DB Command transformation. Another commonly slow component is the OLE DB Command transformation, which performs operations row by row. If the purpose of the OLE DB Command transformation is to update a table with a value obtained in the pipeline, it might be faster to load the pipeline data into a staging table, then use an Execute SQL task in the control flow to perform a set-based UPDATE operation.

Fixing Destination Component Bottlenecks

Sometimes the performance bottleneck in a package is caused by the destination component. Although external factors can affect the performance of destinations, there are some recommendations for improving the data flow design that relate to destinations:
Optimize the OLE DB destination. When using a table or view as the target for the data access method, you can choose whether to use the Fast Load option. This option performs a bulk insert, which is much faster than the row-by-row insert that would otherwise occur. Also, enabling the Table Lock option will help improve performance. If you have a lot of rows to insert into a table with a clustered index, the data flow engine must first sort all rows in the pipeline before performing the insert. You can get better performance if you specify the number of rows per batch, which will reduce the volume of data to be sorted at one time. Another alternative is to drop the index before loading the data into the destination and rebuild the index after the load is complete.
Use a SQL Server destination instead of an OLE DB destination. If the target is a SQL Server database, you can get up to 25 percent faster performance from the SQL Server destination because the pipeline engine can bypass the network layer during the data load. However, the package must execute on the same server as the target database and all data types in the pipeline must match the data types in the target table.
Set data types explicitly. An OLE DB destination can recast data types using a Data Conversion transformation, but there’s a performance cost. You can avoid adding a Data Conversion transformation to the data flow by casting the data type explicitly in the source query for a relational source or by manually editing the data type for each column in the Flat File Connection Manager for a Flat File source.

SQL Server Destination Adapter             

It is recommended to use the SQL Server Destination adapter, if your target is a local SQL Server database. It provides a similar level of data insertion performance as the Bulk Insert task and provides some additional benefits. With the SQL Server Destination adapter you can transformation the data before uploading it to the destination, which is not possible with Bulk Insert task. Apart from the options which are available with OLEDB destination adapter.
There are many options which are important to improve performance .
You can specify whether the insert triggers on the target table should fire or not. By default this option is set to false which means no triggers on the destination table will fire. Enabling this option may cause an additional performance hit because the triggers need to fire, but the trigger logic may be needed to enforce data or business rules. Additional options include specifying the number of the first/last rows in the input to load, specifying the maximum number of errors which will cause the bulk load operation to be cancelled as well as specifying the insert column sort order which will be used during the upload process.
Remember if your SQL Server database is on a remote server, you cannot use SQL Server Destination adapter. Instead use the OLEDB destination adapter. In addition, if it is likely that the destination will change from a local to remote instances or from one SQL Server instance to another, it is better to use the OLEDB destination adapter to minimize future changes.




  

Evaluating Buffer Efficiency

After working through any bottlenecks to optimize performance, your next step is to evaluate how efficiently SSIS can place data in buffers. To do this, you must enable the BufferSizeTuning logging event, then execute your package and examine the log. You’ll be able to see how many rows were put into the buffer. You can compare this value to the DefaultBufferMaxRows property, which has a default value of 10,000 rows.
If the buffer is actually getting 1,000 rows instead of 10,000, you can adjust DefaultBufferSize to make the buffer larger so that it can hold more rows. Alternatively, you can eliminate columns or reduce column widths by changing data types to try to get more rows into the buffer.
Another property that you can adjust for performance is EngineThreads. This property’s default value is 10 in SQL Server 2008 and later, but only 5 in SQL Server 2005. If you have a server with multiple processors, you can increase this value. You can test your package by incrementally changing the value to determine whether the additional threads improve performance.

DefaultBufferMaxSize and DefaultBufferMaxRows

The execution tree creates buffers for storing incoming rows and performing transformations. So how many buffers does it create? How many rows fit into a single buffer? How does it impact performance?
The number of buffer created is dependent on how many rows fit into a buffer and how many rows fit into a buffer dependent on few other factors. The first consideration is the estimated row size, which is the sum of the maximum sizes of all the columns from the incoming records. The second consideration is the DefaultBufferMaxSize property of the data flow task. This property specifies the default maximum size of a buffer. The default value is 10 MB and its upper and lower boundaries are constrained by two internal properties of SSIS which are MaxBufferSize (100MB) and MinBufferSize (64 KB). It means the size of a buffer can be as small as 64 KB and as large as 100 MB. The third factor is, DefaultBufferMaxRows which is again a property of data flow task which specifies the default number of rows in a buffer. Its default value is 10000.
Although SSIS does a good job in tuning for these properties in order to create a optimum number of buffers, if the size exceeds the DefaultBufferMaxSize then it reduces the rows in the buffer. For better buffer performance you can do two things. First you can remove unwanted columns from the source and set data type in each column appropriately, especially if your source is flat file. This will enable you to accommodate as many rows as possible in the buffer. Second, if your system has sufficient memory available, you can tune these properties to have a small number of large buffers, which could improve performance. Beware if you change the values of these properties to a point where page spooling begins, it adversely impacts performance. So before you set a value for these properties, first thoroughly testing in your environment and set the values appropriately.
You can enable logging of the BufferSizeTuning event to learn how many rows a buffer contains and you can monitor "Buffers spooled" performance counter to see if the SSIS has began page spooling.

BufferTempStoragePath and BLOBTempStoragePath

If there is a lack of memory resource i.e. Windows triggers a low memory notification event, memory overflow or memory pressure, the incoming records, except BLOBs, will be spooled to the file system by SSIS. The file system location is set by the BufferTempStoragePath of the data flow task. By default its value is blank, in that case the location will be based on the of value of the TEMP/TMP system variable.
Likewise SSIS may choose to write the BLOB data to the file system before sending it to the destination because BLOB data is typically large and cannot be stored in the SSIS buffer. Once again the file system location for the spooling BLOB data is set by the BLOBTempStoragePath property of the data flow task. By default its value is blank. In that case the location will be the value of the TEMP/TMP system variable. As I said, if you don't specify the values for these properties, the values of TEMP and TMP system variables will be considered as locations for spooling. The same information is recorded in the log if you enable logging of the PipelineInitialization event of the data flow task as shown below.
User:PipelineInitialization,ARSHADALI-LAP,FAREAST\arali,Data Flow Task,{C80814F8-51A4-4149-8141-D840C9A81EE7},{D1496B27-9FC7-4760-821E-80285C33E74D},10/11/2009 1:38:10 AM,10/11/2009 1:38:10 AM,0,0x,No temporary BLOB data storage locations were provided. The buffer manager will consider the directories in the TEMP and TMP environment variables.
So far so good. What is important here is to change this default values of the BufferTempStoragePath/BLOBTempStoragePath properties and specify locations where the user executing the package (if the package is being executed by SQL Server Job, then SQL Server Agent service account) has access to these locations. Preferably both locations should refer to separate fast drives (with separate spindles) to maximize I/O throughput and improve performance.

How DelayValidation property can help you

SSIS uses validation to determine if the package could fail at runtime. SSIS uses two types of validation. First is package validation (early validation) which validates the package and all its components before starting the execution of the package. Second SSIS uses component validation (late validation), which validates the components of the package once started.
Let's consider a scenario where the first component of the package creates an object i.e. a temporary table, which is being referenced by the second component of the package. During package validation, the first component has not yet executed, so no object has been created causing a package validation failure when validating the second component. SSIS will throw a validation exception and will not start the package execution. So how will you get this package running in this common scenario?
To help you in this scenario, every component has a DelayValidation (default=FALSE) property. If you set it to TRUE, early validation will be skipped and the component will be validated only at the component level (late validation) which is during package execution.

TUNING STEPS FOLLOWED IN SACE OUT SSIS ENVIRONMENT
One of the project building a distributed (scaled-out) SSIS environment, this means multiple VMs with a standalone SSIS (2012 in Package mode) instances installed (so no SQL Server Database Engine), all pushing massive amount of data to a staging database.
Few techniques to improve the performance of the scaled-out SSIS environment, by basically increasing the throughput to the staging database, below I discuss some of the general approaches I have taken to achieve that goal. Some of the advice might be a bit random, and others might not be pertinent to your particular situation, but over-all you should find a gem or two on optimising SSIS  performance in there somewhere!

Minimize SSIS File Movement

This was a big issue in the environment I was analyzing, considering the SSIS packages are processing and transforming raw text files, there was a lot of data movement in order to distributed the files to the SSIS VMs for processing, then back to the distribution server for archiving.

Considering there was a large number of files, the SSIS process was having to share I/O and Network resources with the file copy process, causing sluggishness while both processes were being executed at the same time.
Ideally all file movement should be minimized if I/O is a bottleneck in your SSIS environment, this can be accomplished through fast access shared storage (that appears as if it was local to each of the SSIS VMs), HyperV has a clustered disk solution that means each VM would see the disk as local storage, but in reality they are shared across multiple VMs that can edit files concurrently. It is important to note that normal network share causes files to be copied locally (to where the SSIS process is being executed) before execution can begin, which causes a huge delay in the throughout.
Another thing to consider is that if all the file movement steps are required in your execution logic, for example in my scenario we had:
Distribution Box (Downloading data) –(move)> SSIS Box (Processing data) –(move)> Distribution Box (Archiving data)
We reworked this path to minimize network traffic by copying the file from the distribution box to the SSIS box, while archiving it the same time, once SSIS finished processing a file it will just delete it. This means that files are transferred over network only once.
Additionally, SSIS, as with many data processing solutions, works better on larger files rather than a large number of small files, this is because it takes time to establish a connection to a new file each time, hence to shave off these extra milliseconds, it is recommended to combine the data files into smaller number of files. It makes a difference when processing very large number of files.
Drop/Create Indexes on Staging Tables
Pushing data into an indexed table will take much more time than pushing the data into an empty table and then rebuilding indexes afterwards.
This is because while pushing data into an indexed table, the database engine is having to build the index alongside the updates to the table, which causes a considerable delay.
Additionally, and depending on your insert pattern, if you are inserting data into an indexed table you might end-up with a fragmented index, which is slower to access and deal with than a non-fragmented index (obviously), hence it is highly recommended to follow the index drop/recreate procedure below:
  • Drop all indexes on destination table
  • Push data (perform ETL operations)
  • Create all indexes on destination table

 

Dedicate VM Resources

I have noticed, while analyzing the live environment I was trying to optimise, that VMWare (which is our virtualization provider) seems to de-allocate and reallocate CPU resources on the SSIS boxes depending on the activity (and the activity trend over time), this is to allow the host to re-allocate that resource around the VM stack, giving boxes that require more resources a way to get what they need, while maintaining operations on existing VMs.
This is all well and good, if you have a steady workload that doesn’t change over time, but if it does, then the box will suffer from “resource suffocation” until the VM controller decides to re-allocate the resources back to the box. This re-allocation will take some time during which your process is probably running on very low throughout.
If you have a variable load, it is highly recommended to allocate non-reclaimable resources (Memory and CPU) to those boxes, this will reduce resource contention due to the dynamic VM resource re-arrangement and provide consistent performance figures.

Enable Bulk Operations on the SSIS Login

By providing the login being used to load data into the database from SSIS (the Destination Connection login) with the BulkAdmin server-role you will enable SSIS to load the data in bulk, if it is possible.
This recommendation has to be in synergy with utilizing the fast load data access mode in the Destination component of the SSIS’s Data Flow, which allows SSIS to utilize the BULK INSERT statement.

Set FastParse Option on the Flat File Source

The give-away is in the name, the FastParse option allows the Flat File data source output to be parsed more quickly (sometimes significantly) at the expense of supporting locale-specific data formats (such as numeric, date and time data). This is an option definitely worth considering if you do not have locale specific data or can control the format of the input files. Also, it is something worth keeping in mind when building an output file that is intended to be consumed by an SSIS package.
The option can be located for each column (or node) under the following path: Flat File Source -> Show Advance Editor -> Input and Output Properties (tab) -> Flat File Source Output (node) -> Output Columns (node).
Configure and Tweak SSIS Destination Operators
There are a few options you could tweak on the SQL Server Data Destination to better accommodate the data you are trying to load, these are:
  • MaximumInsertCommitSize: Used to determine how much data is pushed into the table before SSIS commits the transaction, by default this is set to 2147483647, but should be tweaked to better suite the type of data being pushed. This value’s effect becomes more important in an environment were multiple SSIS packages are loading data into one table, and each package is trying to take a table lock on the destination table. Its important to note that this value also influence how much is rolled-back if the SSIS fails before committing the transaction.
  • Check Constraints: This flags whether the destination component should check the constraint imposed on the destination table when inserting data, un-ticking this box will speed up loading into the destination table, in the cases were there are constraints on the table.
Configure Connection Packet Size Property
By default, an SSIS connection is setup with a packet transfer size (Packet Size) of 4,096 bytes (4KB), this is all well and good for old networks that can’t handle larger packet size contention, but these days you will be hard done to find an enterprise grade network that isn’t capable of handling packet sizes 8 times the size of that.
Packaging data into bigger size packets over the network will reduces the overhead associated with the packaging and unpackaging of those packets, leading to better throughput.
It is generally recommended to increase this value to 32,767 (32KB) which is the maximum allowed value. It is highly recommended to monitor your network and ensure no contention or packet loss is happening due to this newly configured value.

Destination Table Partitioning

The importance of partitioning can not be stressed enough in a scaled-out SSIS environment. Whether each SSIS package inserts into a separate staging table which then gets switched into one staging table (ideal), or SSIS itself inserts into specific partitions of the staging table (you’ll need to take out table-lock escalation), partitioning (when done correctly) should eleminate any bottlenecks to do with sql engine table lock specific waits, which can be considerable in a multi-ssis environment.









No comments:

Post a Comment

Note: Only a member of this blog may post a comment.