DESIGN PROCESS:
The process for designing SQL Server Integration
Services (SSIS) packages is typically iterative. You start by getting the components working
individually or in small sets, then concentrate on ensuring that the components
will work in the correct sequence. During later iterations, you add in more
components or adjust properties to perform error handling. Then, in a final
pass, you might add in abstractions, taking advantage of variables and
expressions to enable runtime changes for your package. But your work is not
yet complete at this stage.
There are various factors
that can affect the performance of SSIS packages. Several of these factors are
completely external to SSIS, such as
a) The source systems that
you’re using for data extractions
b) Disk speed and configuration,
NICs, network bandwidth
c) And the amount of memory
available on the server executing the packages.
CONTOL
FLOW LEVEL PERFORMANCE TIPS
a)
Running executable in parallel to speed up
control flow processing
The only way within the
package to speed up control flow processing is to have tasks or containers
(collectively called executables) run in parallel.
The package property that
determines how many executables can run in parallel is
MaxConcurrentExecutables. The default is -1, which means that the control flow
engine will use the number of logical processors on the server plus 2. For
example, if you’re executing the package with the default setting on a quad-core server, you can have up to six executables
running in parallel.
However, one scenario with potential for improvement is when you have a package
with tasks that must wait for a response from external systems and your server
is dedicated to executing packages. In that case, you might be able to boost
parallelism by replacing the MaxConcurrentExecutables value with a higher
number. Start by increasing the value to the number of processors plus 3, then
test the package in Business Intelligence Development Studio (BIDS) to see
whether the number of tasks executing in parallel increased. If so, continue to
increment the property value by 1, test, and
repeat until no further parallelization is possible, or you have no more
tasks to run in parallel.
DATA FLOW LEVEL
MEMORY IMPACT
SSIS uses a pipeline engine
with an in-memory buffer architecture to efficiently manage Data Flow Task
operations. Performance of the pipeline engine largely depends on the number of
records moving through the pipeline and the number of buffers required to
accomplish the transformations and to move the data to the destinations.
Therefore, to optimize performance, you need to understand how data flow
components and the Data Flow Task properties affect pipeline throughput and
buffer requirements.
Finding
Bottlenecks and Establishing Baselines
Before
you start making changes to the data flow, you should establish some baselines
so that you can identify the slower components in the data flow and better
assess the impact of any subsequent changes.
Let’s
assume that your package has a Data Flow Task with a single source, one transformation
n, and one destination. You should execute the Data Flow Task multiple times to
calculate an average for various conditions, which you record in a table. Here
are the steps:
1. Create
a table with the column and row headings.
2. Execute
the package a total of 10 times, adding the times to column A. This will
determine the time required to execute the package when all three components
are intact.
3. Replace
the destination component with a Multicast transformation, which allows the
task to complete successfully without adding any noticeable overhead to the
task execution time. Execute the package a total of 10 times, adding the times
to column B.
4. Remove
the first transformation in the data flow and connect the source directly to
the Multicast transformation to measure only the data extraction process.
Execute the package a total of 10 times, recording the times in column C.
5. Calculate
the average for each of the three base measurements.
6. Derive
the processing time of the transformation component by subtracting the averaged
value in column C from the averaged value in column B.
7. Derive
the time required to load the destination by subtracting the averaged value in
column B from the averaged value in column A.
Most
packages have more than one transformation, and some packages have multiple
sources or destinations. For these packages, you simply add more columns to the
table as needed and work through combinations of components to isolate
processing time for each component.
Based
on the test results in the table, you can identify the component that consumes
most of the processing time and focus your tuning efforts on it. Furthermore,
as you implement changes, you’ll be able to quantify the difference in
performance and determine whether the resulting difference justifies altering
your package permanently.
Fixing Source Component
Bottlenecks
Bringing data into the data
flow pipeline is naturally constrained by the speed with which the source can
actually deliver the data requested by the source component. The processing
time required to extract the data is the fastest possible speed for the Data
Flow Task.
1.
Use a SQL statement in the source component
- You
only select the columns that you need
- You
already perform the necessary data type conversions using CAST or CONVERT
- You
can already perform complex logic such as joining different tables
together
- You
can filter the data
- You
can sort the data if needed
2.
Get as many rows as you can into a buffer
The SSIS data flow
uses memory buffers to manage the data flowing through the data flow. It’s very
important that you can get as many rows into one single buffer. Imagine a line
of people passing buckets to put out a fire. The more water you can get into a
bucket, the quicker you can put out a fire. To achieve this goal, you can take
two actions:
- Only select the columns you need
(that’s why #1 is important). If you bring unnecessary columns into the
data flow, you waste precious buffer space.
- Keep the columns as small as
possible. If the total row length is smaller, you can fit more rows into a
buffer. Try converting the columns to a smaller size in the source if
possible (see also #1). The data flow takes the maximum size of each
column to calculate the total row length. For example, if a column has the
size VARCHAR(4000), the data flow will assume a length of 4000 bytes, even
if the actual rows contain much less data.
#3 Don’t use the default buffer settings
While we’re on the topic of buffers, let’s
look at some data flow properties:
- DefaultBufferSize:
the default size of a buffer which is set to 10MB.
- DefaultBufferMaxRows:
the default number of rows the data flow will try to put in a buffer. This
is set to 10,000 rows.
When calculating the buffer size, the data
flow uses these two properties. If one property is reached, the size of the
buffer is not further enlarged. For example, if the row size is 1MB, the data
flow wil put 10 rows into the buffer, since the maximum buffer size will then
be met. If the row size is only 1KB, the data flow will use the maximum of
10,000 rows.
These two default settings haven’t been
changed since the release of SSIS in 2005. Apparently, a time where memory was
still crazy expensive. The first thing you’ll want to do is set these
properties to a much higher value. There’s no “golden” number that will
guarantee optimal performance all the time. Sometimes a smaller buffer is
better if the source is really slow, sometimes a bigger buffer is better if you
don’t have many columns and a very fast source. Keep in mind that buffers can
be too big as well. Think about the line of people passing buckets. If the
buckets are too big, it takes too long to fill them with water and the fire
still rages on. In reality, it takes some testing to find the optimal settings
for your data flow. Typically I run my package first with the default settings
and I monitor the execution time. Then I enlarge the buffers – to 30MB and
50,000 rows for example – and see what effect it has. And then I try to
determine if the package is better off with even larger buffers, or if they
need to be a bit smaller.
Luckily, all this testing can now largely be
avoided by one new property introduced in SQL Server 2016: AutoAdjustBufferSize. When this property is set to
True, it will automatically adjust the size of the buffer so that the
DefaultBufferMaxRows setting is met. For example, if you have set it to 40,000
rows, the data flow will automatically scale the buffer size so that the
buffers will contain 40,000 rows. Unfortunately, this property is set to False
by default.
In conclusion, make those buffers bigger. This
can really have a drastic impact on the performance.
#4 Avoid blocking transformations
Even if you followed
the previous tips to the letter, performance can still be terrible if you use a
blocking transformation. These are the types of transformations in a data flow:
- Non-blocking. The buffers
are unchanged and “flow” through the transformation. The data flow
performs the transformation on a memory buffer and moves on to the next. Examples
are: derived column, data type conversion, conditional split …
- Semi-blocking. The buffer
can change in size; either in the number of columns or in the number of
rows. However, there is no blocking behavior. Examples are the Union All,
Merge and Merge Join.
- Blocking. The bane of
SSIS performance. These transformations need to read all the
buffers before they can output even one single buffer. This can lead to
memory pressure which causes the SSIS package to spill to disk. In other
words, your package runs now for hours instead of minutes or seconds.
Examples are Sort and Aggregate.
All transformations
can also be divided into two categories: synchronous and asynchronous.
Synchronous means the buffer doesn’t change in size at all. All non-blocking
transformations are synchronous. Sometimes this isn’t obvious: the Multicast
component creates multiple output paths, which seems like multiple buffers are
being created. However, behind the scenes it’s still the same memory buffer.
Asynchronous components do change the size of the buffer. Columns can be added
or removed. Rows can be added or removed as well. All blocking and
semi-blocking transformations are asynchronous.
The rule is simple:
avoid asynchronous transformations. In most cases, these are blocking anyway.
The Union All has the benefit of the doubt: it’s the least bad asynchronous
transformation. However, if you can design your package so you can avoid it,
it’s recommended that you do. The Merge and Merge Join are to be avoided as
well, because they require sorted input. You can sort in the data flow, but the
Sort component is a blocking transformation, so it needs to be avoided. An
alternative is to sort in the source component, for example by adding an ORDER
BY clause to the SQL statement. The data flow needs to know the data is sorted
though. You can do this by setting the IsSorted property to true on the output in the
Advanced Editor (to open the advanced editor, right-click on the source and
select it from the context menu).
#5 Don’t use the OLE DB command
transformation
Don’t use it. Ever.
Unless you have really small data sets and you’re 100% sure they won’t grow.
The OLE DB command executes the SQL statement for every row in the buffer. This
means if 1 million rows pass through the transformation, the SSIS package will
sent 1 million SQL statements to the server. Most likely the transaction log
will blow up and it might even fill your disks completely.
A work around is to
write the records to a staging table and then use a Execute SQL Task to do a
set-based SQL statement. In other words, use the Execute SQL Task to run one
single UPDATE statement instead several thousand UPDATE statements.
Reduce
the number of columns. Regardless of the type of data source
you’re using, select only the columns that are needed for transformations and
columns that must be loaded into the destination. That way, memory isn’t wasted
on data that won’t be used downstream.
Reduce
the number of rows. When working with a relational source,
use a WHERE clause to filter the data to the rows you need rather than rely on
a transformation in the pipeline to eliminate rows. When working with large
flat files, break up the source file where possible and process the resulting
files using sequential Data Flow Tasks to minimize I/O contention.
Reduce
column width. The column width is affected by the data type. You might
need to recast data types in a relational source query or manually edit column
widths for other sources to reduce the column width to the smallest possible
size that can still accommodate the data coming from the source. SSIS works
more efficiently with smaller data types.
Use the
SQL Command option instead of the Table
or View option for relational sources. The
OLE DB source lets you retrieve data from a table or view in the OLE DB data
source or use a SQL command to retrieve data from the OLE DB data source. When
you use the Table or View option to retrieve data from a table
or view, the data flow pipeline engine issues a command using the OPENROWSET
function to get the data. When you use the SQL Command option, the engine uses
the faster sp_executesql procedure.
Use the
fast parsing mode for Flat File sources. If the data
environment doesn’t need to support local-specific formats for integer, date,
and time data types, you can boost performance by setting the FastParse
property to True for relevant columns in the Advanced Editor for the Flat File
source.
Fixing Transformation Bottlenecks
The SSIS runtime engine executes the
package. It executes every task other than data flow task in the defined
sequence. Whenever the SSIS runtime engine encounters a data flow task, it
hands over the execution of the data flow task to data flow pipeline engine.
The data flow pipeline engine breaks the execution of a data flow task into one
more execution tree(s) and may execute two or more execution trees in parallel
to achieve high performance. Now if you are wondering what an execution tree
is, then here is the answer.
An execution tree, as name implies, has a
similar structure as a tree. It starts at a source or an asynchronous
transformation and ends at destination or first asynchronous transformation in
the hierarchy. Each execution tree has a set of allocated buffer and scope of
these buffers are associated the execution tree. Also each execution tree is
allocated an OS thread (worker-thread) and unlike buffers this thread may be
shared by any other execution tree, in other words an OS thread might execute
one or more execution trees. The process of breaking data flow task into an
execution tree has been enhanced to create an execution path and sub-path so
that your package can take advantage of high-end multi-processor systems.
The performance of transformations
in the data flow depends on the efficient use of buffers. High-performance
transformations, known as synchronous transformations , operate on data in an
existing buffer, thereby avoiding the overhead of copying data from one buffer
to another. Synchronous transformations can either manipulate data in place
(e.g., Derived Column transformation) or add a new column (e.g., Audit
transformation). But even this group of transformations consists of two
subgroups: streaming transformations, which are the fastest of all the
transformations, and row-based transformations, which must perform an operation
row by row and consequently are slower.
After a
row-based transformation completes processing all rows in a buffer, that buffer
becomes available for processing by the next transformation in the data flow.
If the next transformation is a row-based transformation, the data stays in the
same buffer. Performance of the data flow is optimal in this case. If the next
transformation in the data flow is an asynchronous transformation, a new buffer
is required (which takes time to create and load) and a new thread is
introduced into the data flow.
Moving
from one memory buffer to another is still faster than the I/O operations
required to store data on disk between transformations, which is why SSIS in
general performs very well. The problem arises when the data flow engine must
handle large data volumes, causing it to run out of memory to allocate for
buffers. When memory is no longer available, the data flow engine swaps data to
disk and consequently slows the data flow performance.
There are two types of asynchronous transformations—partially blocking
transformations and blocking transformations. Partially blocking
transformations have multiple inputs, but a single output. After the
transformation writes a row to the new buffer, the row is available to the next
transformation even while the partially blocking transformation continues
processing remaining rows. Blocking transformations, by contrast, must read all
rows in the pipeline before writing rows to the new buffer, which prevents
downstream components from starting. Consequently, blocking transformations are
most likely to be the cause of performance problems in the data flow when large
volumes of data are involved.
When baseline performance
testing identifies transformations as the bottleneck, there might be another
way to design the data flow to accomplish the same goal. Consider the following
design alternatives:
Perform
transformations in the source query. When you’re using a relational
source, performing transformations in the source query lets you take advantage
of the database engine’s capabilities while freeing up resources on the package
execution server if it’s separate from your source server. Data type casting,
cleansing (using ISNULL or TRIM functions, for example), aggregations, and
sorts are common operations that you can perform in the source query. This
approach will work only with relational sources. To use it, you need to set the
IsSorted property of the source component’s output to True. You also need
to configure the SortKeyPosition properties for each output column using the
Advanced Editor for the source component.
Separate
aggregate operations. If you
need to perform different aggregations (each of which is associated with a
separate destination) but you can’t perform the aggregations in the source
query, you can create a separate Aggregate transformation for each destination
rather than create multiple outputs for the transformation. That way,
downstream processing for the more summarized aggregations (with fewer rows)
can continue without waiting for the less summarized aggregations (with more
rows) to finish. In addition, you can configure the AutoExtendFactor
property to tune memory separately for each Aggregate transform.
Remove
unneeded columns. After asynchronous operations, there might be
columns that were used for a transformation and left in the pipeline even
though they won’t be loaded into the destination. By eliminating these columns,
you can minimize the memory requirements for the new buffer.
Reduce the
number of rows. If the data flow doesn’t have a relational
source, you won’t be able to use a WHERE clause in the source query to
eliminate rows. However, row reduction is still an important goal for
performance tuning, so add a Conditional Split transformation immediately
after the source component to filter data as early as possible in the data
flow.
Optimize
the cache for Lookup transformations. Using the Lookup transformation without a
cache is slow, so having all rows available in memory using the Full Cache
option is a better approach. However, pipeline processing is blocked until
the cache is loaded. So, you should follow some of same recommendations I
discussed in the “Source Data Extraction” section: Reduce the number of columns
and rows to manage memory usage and use the SQL command to retrieve data more
quickly.
Preload
the cache for Lookup transformations. If you’re using SQL Server 2008 or later, you
can use the Cache Transform transformation to preload the cache in a separate
data flow. You can even use a separate package to load the cache as a special
type of Raw File that persists until you delete it. This cache file loads
into memory much more quickly than loading the cache directly from an OLE DB
source.
Replace
the Slowly Changing Dimension (SCD) transformation with Merge Join and
Conditional Split transformations. The SCD transformation is notoriously slow
with large dimensions because it has to perform a row-by-row lookup to check
whether a dimension row in the pipeline already exists in the target dimension. You
could use a Lookup transformation instead, but the memory and time required to
load the cache might still create a performance bottleneck. An alternative
design is to use a Merge Join transformation with a LEFT JOIN to match sorted
source records on the left with sorted dimension records on the right. You
then add a Conditional Split transformation to evaluate columns added from the
dimension source. If those columns are null, there is no match, which means the
row is a new record. If there’s a match, you use the Conditional Split
transformation to separate Type 1 from Type 2 processing.
Use an
Execute SQL task instead of an OLE DB Command transformation. Another commonly slow component is the OLE
DB Command transformation, which performs operations row by row. If the
purpose of the OLE DB Command transformation is to update a table with a value
obtained in the pipeline, it might be faster to load the pipeline data into a
staging table, then use an Execute SQL task in the control flow to perform a
set-based UPDATE operation.
Fixing Destination Component Bottlenecks
Sometimes the performance
bottleneck in a package is caused by the destination component. Although
external factors can affect the performance of destinations, there are some
recommendations for improving the data flow design that relate to destinations:
Optimize
the OLE DB destination. When using
a table or view as the target for the data access method, you can choose
whether to use the Fast Load option. This option performs a bulk insert,
which is much faster than the row-by-row insert that would otherwise occur.
Also, enabling the Table Lock option will help improve performance. If
you have a lot of rows to insert into a table with a clustered index, the data
flow engine must first sort all rows in the pipeline before performing the
insert. You can get better performance if you specify the number of rows per
batch, which will reduce the volume of data to be sorted at one time. Another
alternative is to drop the index before loading the data into the destination
and rebuild the index after the load is complete.
Use a SQL
Server destination instead of an OLE DB destination. If the target is a SQL Server database, you
can get up to 25 percent faster performance from the SQL Server destination
because the pipeline engine can bypass the network layer during the data load.
However, the package must execute on the same server as the target database and
all data types in the pipeline must match the data types in the target table.
Set data
types explicitly. An OLE DB
destination can recast data types using a Data Conversion transformation, but
there’s a performance cost. You can avoid adding a Data Conversion
transformation to the data flow by casting the data type explicitly in the
source query for a relational source or by manually editing the data type for
each column in the Flat File Connection Manager for a Flat File source.
SQL Server Destination Adapter
It is
recommended to use the SQL Server Destination adapter, if your target is a
local SQL Server database. It provides a similar level of data insertion
performance as the Bulk Insert task and provides some additional
benefits. With the SQL Server Destination adapter you can transformation the
data before uploading it to the destination, which is not possible with
Bulk Insert task. Apart from the options which are available with OLEDB
destination adapter.
There are many
options which are important to improve performance .
You can specify whether the insert triggers on the target
table should fire or not. By default this option is set to false which
means no triggers on the destination table will fire. Enabling this option may
cause an additional performance hit because the triggers need to fire, but the
trigger logic may be needed to enforce data or business rules. Additional
options include specifying the number of the first/last rows in the input to
load, specifying the maximum number of errors which will cause the bulk load
operation to be cancelled as well as specifying the insert column sort order
which will be used during the upload process.
Remember if your SQL Server database is on a remote server,
you cannot use SQL Server Destination adapter. Instead use the OLEDB
destination adapter. In addition, if it is likely that the destination will
change from a local to remote instances or from one SQL Server instance to
another, it is better to use the OLEDB destination adapter to minimize future
changes.
Evaluating Buffer Efficiency
After working through any
bottlenecks to optimize performance, your next step is to evaluate how
efficiently SSIS can place data in buffers. To do this, you must enable the BufferSizeTuning
logging event, then execute your package and examine the log. You’ll be
able to see how many rows were put into the buffer. You can compare this
value to the DefaultBufferMaxRows property, which has a default value of 10,000
rows.
If the
buffer is actually getting 1,000 rows instead of 10,000, you can adjust DefaultBufferSize
to make the buffer larger so that it can hold more rows. Alternatively, you
can eliminate columns or reduce column widths by changing data types to try to
get more rows into the buffer.
Another
property that you can adjust for performance is EngineThreads. This
property’s default value is 10 in SQL Server 2008 and later, but only 5 in SQL
Server 2005. If you have a server with multiple processors, you can increase
this value. You can test your package by incrementally changing the value to
determine whether the additional threads improve performance.
DefaultBufferMaxSize
and DefaultBufferMaxRows
The execution tree creates buffers for
storing incoming rows and performing transformations. So how many buffers does
it create? How many rows fit into a single buffer? How does it impact
performance?
The number of buffer created is dependent
on how many rows fit into a buffer and how many rows fit into a buffer
dependent on few other factors. The first consideration is the estimated row
size, which is the sum of the maximum sizes of all the columns from the
incoming records. The second consideration is the DefaultBufferMaxSize property
of the data flow task. This property specifies the default maximum size of a
buffer. The default value is 10 MB and its upper and lower boundaries are
constrained by two internal properties of SSIS which are MaxBufferSize (100MB)
and MinBufferSize (64 KB). It means the size of a buffer can be as small as 64
KB and as large as 100 MB. The third factor is, DefaultBufferMaxRows which is
again a property of data flow task which specifies the default number of rows
in a buffer. Its default value is 10000.
Although SSIS does a good job in tuning for
these properties in order to create a optimum number of buffers, if the size
exceeds the DefaultBufferMaxSize then it reduces the rows in the buffer. For
better buffer performance you can do two things. First you can remove unwanted
columns from the source and set data type in each column appropriately,
especially if your source is flat file. This will enable you to accommodate as
many rows as possible in the buffer. Second, if your system has sufficient
memory available, you can tune these properties to have a small number of large
buffers, which could improve performance. Beware if you change the values of
these properties to a point where page spooling begins, it adversely impacts
performance. So before you set a value for these properties, first thoroughly
testing in your environment and set the values appropriately.
You can enable logging of the
BufferSizeTuning event to learn how many rows a buffer contains and you can
monitor "Buffers spooled" performance counter to see if the SSIS has
began page spooling.
BufferTempStoragePath
and BLOBTempStoragePath
If there is a lack of memory resource i.e.
Windows triggers a low memory notification event, memory overflow or memory
pressure, the incoming records, except BLOBs, will be spooled to the file
system by SSIS. The file system location is set by the BufferTempStoragePath of
the data flow task. By default its value is blank, in that case the location
will be based on the of value of the TEMP/TMP system variable.
Likewise SSIS may choose to write the BLOB
data to the file system before sending it to the destination because BLOB data
is typically large and cannot be stored in the SSIS buffer. Once again the file
system location for the spooling BLOB data is set by the BLOBTempStoragePath
property of the data flow task. By default its value is blank. In that case
the location will be the value of the TEMP/TMP system variable. As I said,
if you don't specify the values for these properties, the values of TEMP and
TMP system variables will be considered as locations for spooling. The same information
is recorded in the log if you enable logging of the PipelineInitialization
event of the data flow task as shown below.
User:PipelineInitialization,ARSHADALI-LAP,FAREAST\arali,Data
Flow Task,{C80814F8-51A4-4149-8141-D840C9A81EE7},{D1496B27-9FC7-4760-821E-80285C33E74D},10/11/2009
1:38:10 AM,10/11/2009 1:38:10 AM,0,0x,No
temporary BLOB data storage locations were provided. The buffer manager will
consider the directories in the TEMP and TMP environment variables.
So far so good. What is important here is
to change this default values of the BufferTempStoragePath/BLOBTempStoragePath
properties and specify locations where the user executing the package (if the
package is being executed by SQL Server Job, then SQL Server Agent service
account) has access to these locations. Preferably both locations should
refer to separate fast drives (with separate spindles) to maximize I/O
throughput and improve performance.
How
DelayValidation property can help you
SSIS uses validation to determine if the
package could fail at runtime. SSIS uses two types of validation. First is
package validation (early validation) which validates the package and all its
components before starting the execution of the package. Second SSIS uses
component validation (late validation), which validates the components of the
package once started.
Let's consider a scenario where the first
component of the package creates an object i.e. a temporary table, which is
being referenced by the second component of the package. During package validation,
the first component has not yet executed, so no object has been created causing
a package validation failure when validating the second component. SSIS will
throw a validation exception and will not start the package execution. So how
will you get this package running in this common scenario?
To help you in this scenario, every
component has a DelayValidation (default=FALSE) property. If you set it to
TRUE, early validation will be skipped and the component will be validated only
at the component level (late validation) which is during package execution.
TUNING STEPS FOLLOWED IN
SACE OUT SSIS ENVIRONMENT
One of the project
building a distributed (scaled-out) SSIS environment, this means multiple VMs
with a standalone SSIS (2012 in Package mode) instances installed (so no SQL
Server Database Engine), all pushing massive amount of data to a staging
database.
Few techniques to
improve the performance of the scaled-out SSIS environment, by basically
increasing the throughput to the staging database, below I discuss some of
the general approaches I have taken to achieve that goal. Some of the advice
might be a bit random, and others might not be pertinent to your particular
situation, but over-all you should find a gem or two on optimising SSIS performance
in there somewhere!
Minimize SSIS File Movement
This was a big issue in the environment I was
analyzing, considering the SSIS packages are processing and transforming raw
text files, there was a lot of data movement in order to distributed the files
to the SSIS VMs for processing, then back to the distribution server for
archiving.
Considering there was
a large number of files, the SSIS process was having to share I/O and Network
resources with the file copy process, causing sluggishness while both processes
were being executed at the same time.
Ideally all file
movement should be minimized if I/O is a bottleneck in your SSIS environment,
this can be accomplished through fast access shared storage (that appears as if
it was local to each of the SSIS VMs), HyperV has a clustered disk solution
that means each VM would see the disk as local storage, but in reality they
are shared across multiple VMs that can edit files concurrently. It is
important to note that normal network share causes files to be copied locally
(to where the SSIS process is being executed) before execution can begin, which
causes a huge delay in the throughout.
Another thing to
consider is that if all the file movement steps are required in your execution
logic, for example in my scenario we had:
Distribution Box (Downloading data) –(move)> SSIS Box
(Processing data) –(move)>
Distribution Box (Archiving data)
We reworked this path
to minimize network traffic by copying the file from the distribution
box to the SSIS box, while archiving it the same time, once SSIS finished
processing a file it will just delete it. This means that files are transferred
over network only once.
Additionally, SSIS, as
with many data processing solutions, works better on larger files rather
than a large number of small files, this is because it takes time to
establish a connection to a new file each time, hence to shave off these extra
milliseconds, it is recommended to combine the data files into smaller number
of files. It makes a difference when processing very large number of files.
Drop/Create Indexes on Staging Tables
Pushing data into an
indexed table will take much more time than pushing the data into an empty
table and then rebuilding indexes afterwards.
This is because while
pushing data into an indexed table, the database engine is having to build the
index alongside the updates to the table, which causes a considerable delay.
Additionally, and
depending on your insert pattern, if you are inserting data into an indexed
table you might end-up with a fragmented index, which is slower to access and
deal with than a non-fragmented index (obviously), hence it is highly
recommended to follow the index drop/recreate procedure below:
- Drop all indexes
on destination table
- Push data
(perform ETL operations)
- Create all indexes
on destination table
Dedicate VM Resources
I have noticed, while analyzing the live
environment I was trying to optimise, that VMWare (which is our virtualization
provider) seems to de-allocate and reallocate CPU resources on the SSIS boxes
depending on the activity (and the activity trend over time), this is to allow
the host to re-allocate that resource around the VM stack, giving boxes that
require more resources a way to get what they need, while maintaining
operations on existing VMs.This is all well and good, if you have a steady workload that doesn’t change over time, but if it does, then the box will suffer from “resource suffocation” until the VM controller decides to re-allocate the resources back to the box. This re-allocation will take some time during which your process is probably running on very low throughout.
If you have a variable load, it is highly recommended to allocate non-reclaimable resources (Memory and CPU) to those boxes, this will reduce resource contention due to the dynamic VM resource re-arrangement and provide consistent performance figures.
Enable Bulk Operations on the SSIS Login
By providing the login being used to load data
into the database from SSIS (the Destination Connection login) with the BulkAdmin server-role you will
enable SSIS to load the data in bulk, if it is possible.This recommendation has to be in synergy with utilizing the fast load data access mode in the Destination component of the SSIS’s Data Flow, which allows SSIS to utilize the BULK INSERT statement.
Set FastParse Option on the Flat File Source
The give-away is in the name, the FastParse
option allows the Flat File data source output to be parsed more quickly
(sometimes significantly) at the expense of supporting locale-specific data
formats (such as numeric, date and time data). This is an option definitely
worth considering if you do not have locale specific data or can control the
format of the input files. Also, it is something worth keeping in mind when
building an output file that is intended to be consumed by an SSIS package.The option can be located for each column (or node) under the following path: Flat File Source -> Show Advance Editor -> Input and Output Properties (tab) -> Flat File Source Output (node) -> Output Columns (node).
Configure and Tweak SSIS Destination Operators
There are a few
options you could tweak on the SQL Server Data Destination to better
accommodate the data you are trying to load, these are:
- MaximumInsertCommitSize: Used to determine how much data is pushed into the
table before SSIS commits the transaction, by default this is set to 2147483647, but
should be tweaked to better suite the type of data being pushed. This
value’s effect becomes more important in an environment were multiple SSIS
packages are loading data into one table, and each package is trying to
take a table lock on the destination table. Its important to note that
this value also influence how much is rolled-back if the SSIS fails before
committing the transaction.
- Check Constraints: This flags whether the destination component should
check the constraint imposed on the destination table when inserting data,
un-ticking this box will speed up loading into the destination table, in
the cases were there are constraints on the table.
Configure Connection Packet Size Property
By default, an SSIS
connection is setup with a packet transfer size (Packet Size) of 4,096 bytes
(4KB), this is all well and good for old networks that can’t handle larger
packet size contention, but these days you will be hard done to find an enterprise
grade network that isn’t capable of handling packet sizes 8 times the size of
that.
Packaging data into
bigger size packets over the network will reduces the overhead associated with
the packaging and unpackaging of those packets, leading to better throughput.
It is generally recommended to increase this
value to 32,767 (32KB) which is the maximum allowed value. It is highly
recommended to monitor your network and ensure no contention or packet loss is
happening due to this newly configured value.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.