FAQS

Datawarehouse Fundamental Questions

IT applications and differences between OLTP and OLAP?
1) OLTP (Online Transaction Processing)
Purpose: Day-Day business storage
Process: Front-End, Back-End process
Points:
a) Data: Less data
b) Model: Noramlized model (more tables,smaller tables)
c) Joins: More joins (So retrieval it is slower)
d) Access Frequency: High [Milliseconds to seconds]
e) Volatality: Volatile [Read, Ins, update, Delete]

2) OLAP(Online Analytical Processing)
Purpose: For analysis and mining
Process: Layered architecture
Points:
a) Data: More data
b) Model: DeNoramlized model (Less tables, complex tables)
c) Joins: Less joins (So retrieval it is faster)
d) Access Frequency: Low [Day/week/month...]
e) Volatality: NonVolatile [Read]

What is Datawarehouse [Single line] and warehouses available in the market?
Hige storage area and which is suitable for decision making.
Ex: Teradata,Oracle,Sqlserver,Sybase,Netezza, GreenPlum etc...

Difference between OLAP and BI?
BI is a process which uses OLAP approach to gather, convert and present the data.

Difference between Analysis and Mining?
Analysis talks about 'current understanding of data' and 'past analysis of data', where as mining talks about 'future prediction of data'.
Note: Datamining is a knowledge analysis and discovery method.

How many types of decision making available in IT?
a) Strategic-- Past analysis [OLAP data]
b)Tactical--Current and recent [OLTP]

Difference between OLTP and ODS?
OLTP holds realtime data where as ODS holds near real time data[daily 4 times from OLTP]
ODS helps in two ways >
a) Input to DWH
b) Incase enterprise database [OLTP] fails, it acts like enterprise db and continue operations.
Note: All top companies use ODS in the ETL flow.
OLTP similar terminologies?
a)Operational Systems
b)Enterprises databases
c)Business Processing Systems
d) Transactional systems

Can you describe an End-End typical ETL Flow?
OLTP->ODS->Stage Area[Files]->DWH[Stage]-->DWH[WorkArea]->DWH[Target]

How many life cycle does a DWH project need?
Three life cycles a)ETL b) Semantic c) Reporting
Famous DWH life cycles: Agile, iterative incremental [Block approach], V model

Explain Inmon Characterstics/ Principles?
Inmon Characterstics:
a) Subject oriented: Store data based on operation but not application.
Ex: Savings coount subject area
Application2 : a)Withdraw b)deposit
Subject Area: withdraw+deposit single area
b) Integrated: Integrate from diverse applications by eliminating inconsistencies and following standards.
c) Nonvolatile: No change of data. [Don't modify the data] Helps to maintain history.
d) Timevarient: Data store base on timeframes [granularity load / grain]
We can perform the below operations easily.
a) Current understanding
b)Past analysis
c)Future predictions

What is granularity and which grain your project has?
Granularity talks about the level of data maintaining in the project. Dwh has multiple tables with multiple grains.
Always the lowest grain has lot of burden but improves analytical performance.
Ex: Day/ week/month etc.. grains Day grain is recommended for better analysis

What is datamart and how many types available?
DataMart--Data sub store for specify business/ operation / functioality May or not not be a subset of DWH.
Three types of datamarts
a)Dependent-- DWH created first and then datamart [subset of DWH]
Ex: ICICI DWH, Savings account dependent datamart
b) Independent-- Directly created from source systems [we don't use DWH here]
Ex: ICICI employee payroll System [No history, not much analysis required]
c) Logical Datamart--It is replica of another data mart

How do we create datamarts practically?
a) By using Complex Views [Materialized views] --Dependent DM
b) By using a complex table --Depedent DM
c) By using a seperate physical storage area --Independent DM

How many types of DWH approaches available, which approach your company following?
Two approaches
a) KIMBal approach--Datamarts-> DWH [Bottom up approach]
b) Inmon approach-- DWH -->Datamarts [Top down approach]
My company using KImbal approach for DWH implementation because of dynamic decisions and adhod requests.

How many types OLAP available, which is effective?
OLAP--Online Analytical Processing
It creates aggregates(sum,avg, max,min stddev,covarience etc...) for decision making.
ROLAP: Relational OLAP--Data and aggrgates in the relation are [OLTP area]
Latency time less [fresh data anlysis]
Slower analysis
Minimal set up
Ex: Smal, Medium
MLOAP: Multidimensional OLAP-- Data and aggregates in mutidimensional area[ Cube]
Adv: More and detailed analysis this is helpful
Latency time is high [ fresh data analysis is not possible]
Complex set up
Ex: Corporates
HOLAP:Data in relational area and aggreagates in multidimensional area [cube]
Ex: Latency time avg
Avg set
Desktop OLAP: Here the data and aggregates on PC based desk top applications
Ex: Excel, LOtus, FoxPro,VISICALC etc...
Explain Time hierarchy?
Hour-->Day-->Week-->Fort Night-->Month-->Quarter-->half Year(Semister)-->Year

You have two databases with same size, then dow do you identify which is OLTP or OLAP?
Normalized model is OLTP, Denormalized model is OLAP

Need of Data warehouse?

1)To Analysis of data and history maintenance.
2)Companies require Strategic information to face the competition in market.
3)The Operation system are not designed for strategic information.
4)To maintain history of data for whole Organization and to have a single place where the entire data stored.

What is data warehousing and Explain Approaches?
Many companies follow either characteristic defined by W.H.Inmon or Sean kelly.
a)Inmon definition
Subjected Oriented, Integrated, Non Volatile,Time Variant
b)Sean Kelly definition
Seperate, Available, Integrated,TimeStamped,Suject Oriented, Non Volatile, Accessible
c)Dwh Approaches
There are two Approches
1.Top Down by Inmon
2.Bottom Up by Ralph kimbal
Inmon approach:Enterprise datawarehouse structured first and next Datamart created.(TopDown).
Ralph kimbal:Datamart designed first, later Datamarts to Datawarehouse designed.(BottomUp).

What are the responsibilities of a data warehouse consultant/professional?
The basic responsibility of a data warehouse consultant is to ‘publish the right data’. Some of the other responsibilities of a data warehouse consultant are:
1. Understand the end users by their business area, job responsibilities, and computer tolerance.
2. Find out the decisions the end users want to make with the help of the data warehouse.
3. Identify the ‘best’ users who will make effective decisions using the data warehouse
4. Find the potential new users and make them aware of the data warehouse.
5. Determining the grain of the data.
6. Make the end user screens and applications much simpler and more template driven.

What are fundamental stages of Data Warehousing?

a)Offline Operational Databases - Data warehouses in this initial stage are developed by simply copying the database of an operational system to an off-line server where the processing load of reporting does not impact on the operational system's performance.
b)Offline Data Warehouse - Data warehouses in this stage of evolution are updated on a regular time cycle (usually daily, weekly or monthly) from the operational systems and the data is stored in an integrated reporting-oriented data structure.
c)Real Time Data Warehouse - Data warehouses at this stage are updated on a transaction or event basis, every time an operational system performs a transaction (e.g an order or a delivery or a booking etc.)
d)Integrated Data Warehouse - Data warehouses at this stage are used to generate activity or transactions that are passed back into the operational systems for use in the daily activity of the organization.

What is Datamart Explain Types?

It is a specific Subject area or Functionality or Task. It is Designed to facilitate end user Analysis.
Wrong Answer-- It is a subset of warehouse
Types of Datamarts
Dependent, Independent, Logical.
Dependent: Created from Datawarehouse as a seperate physical store.
Independent :Created directly from operational systems to a seperate physical store.
Logical: Exists as a subset of existing Datawarehouse.

Data Model Fundamental Questions


Q: What is Data Model?

Ans: Modeling the data. Arrangement of data inside database.

Q: What is the purpose of data model?
Ans:
If we don't arrange properly data inside database, it may not be useful up to the mark.
So, arranging data helps to retrive faster (reporting and analysis happen faster)
and reduces storage time (load time decreases).

Q: How do you differentiate whether a field is textual / measurable?
Ans: 
Textual attributes do not support aggregate operations, where as measrable attributes support.

Q: How many types of data models available?
Ans:
There are types of models
a) E-R Model (Normalization model)
Usage: OLTP systems follow Normalization methodology
Terminology: Entity, attribute, primary key, foreign key etc...
Entity--> Is an object which is differentiated to another object.
          Entity holds textual and measuable information.
Ex: Product: prid(int) pname(char) pcode(int) pcost (decimal) psales(decimal)
    Organization: orid(int),orname(varchar),orincome (decimal),orgrowthper(int)
Textual info: prid,pname,pcode, orid,orname
Measurable info: pcost,psales,orincome,orgrowthper

Relationships: 1-1: One entity related to another
1-Many : One entity related to many other
   Many-Many: Many entities related to many other

b) Denormalization Model (Dimensional model)
Usage: OLAP, DSS and BI applications
Terminology: Dimension, Fact, [composite] primary key, surrogate key etc...
Dimension: Holds only Textual info
Fact: Holds only Measurable info

Relation Ships: Star schema,snow flake schema, Hybrid scheme and Galaxy schema.


Q: What is dimension and explain dimension table features?
Ans:
Dimension is a textual attribute. If you keep set of textual attributes, then it is a dimension table.

Features:
a) Master table
b) Contains primary key
c) Answer for textual questions such as What, Where, When, Why etc...
d) Rare change table
e) Direct load table
f) More columns [wide table]
g) Less rows

Q: What is Fact and explain Fact table features?
Ans:
Fact is a measurable attribute. 
It has two sections a) Foreign key references from dimension b) Measures  

Features:
a) Child/ Trnsaction table
b) Contains composite primary key (with set of foreign key combination)
c) Answer for Measurable questions such as How much, How many etc...
d) Frequently change table [based on the business]
e) InDirect load table [a) Dimensions verification first b) Fact load]
f) Less columns 
g) More rows and years and years [Deep table]


Q: What is surrogate key and where do we use?
Ans:
It is a unique values key which is generated by the technical people in BI projects.
So, we call it as BI Key / DWH Key.
Features:
a) Contain unqiue values
b) Values may be populated from database sequence generation method or manual algorithm 
c) Non change of values

Real time usage:
a) In fact table to reduce record identification complexity
b) In dimension table to reduce the natural primary key changes affect in the fact table.

Q: What kind of relationships we can establish in dimensonal model?
Ans:

a) Star schema : Fact table surrounded by dimensions
b) Snow flake schema : fact surrounded by dimensions and dimensions having sub dimensions.
c) Mixed schema / Star Flake schema: Combination of star and snow flake
d) Galaxy schema / fact constellation schema : Multiple fact tables share common dimensions.

Q: How many types of dimensions available ?
Ans:
a) Conformed
b) Junk
c) Degenrated
d) Role playing
e) Dirty
f) SCD--Slowly Changing Dimension


Q: How many types of facts / measures available?
Ans:
a) Fully additive
b) Semi additive
c) Non additive

Q: What is Factless fact table?

Q: What is cube and explain the advantages?.

Q: Which model you are using in the project?
Ans:
My project DWH model is Galaxy or star flake, where my application is in Star schema/ snow flake.

Q: Usually how many dimensions connected to a fact table?
Ans:
10-15 dimensions.More than this need more joins and decreases performance at the time of retrieval [ reporrting and analysis takes time]

Q: On what basis you identify the business analysis?
Ans:
Fact table basis.
If the fact table holds sales data, then it is Sales analytical system.
If it holds shipping data, then it is Shipping analytical system.

SSIS FREQUENTLY ASKED QUESTIONS (SIMPLE, MEDIUM & COMPLEX)
SSIS Interview Questions ( Simple, Medium and Complex)
Q: Which edition of MSBI is used by the companies?
Ans: Business Intelligence Edition
Q: What is SSIS?
Ans: It is Integration of three services[ETL, Windows Admin,Sqlserver Admin]
Q: What is the previous component of SSIS?
Ans: DTS[Data Tranmsformation Services]
Q: SSIS version differences.
Ans:
Q: How many engines does SSIS use and explain the operations?
Ans: Two engines SSIS use a) SSIS Runtime Engine--Starts when pkg starts
2) Data Pipeline engine--Starts when DataFlow Task Starts
Q: What is inmemory process?
Ans: DataFlow opertation is Inmemory process. It retrives source data into
SSIS internal memory (inmemory) and perform operations (options) and loads from the memory to Target.
Q: What is Solution, Project and Package?
Ans:
Solution--Is a container for one or more projects
Project--Collection of packages and related files
Package--Collection of tasks and containers.
Q: Whare the components a package has and explain in single line?
Ans:
It has five components.
a) Control flow-- controls various tasks and containers with precedence constraints
b)Data Flow--contains sources, targets and transformations
c) Event Handler--Perfoms action based on event
d) Parameters--Similar to variables but take user interaction, helpful in catalog deployment
e) Package Explorer--Single view of all components inside package
Q: Difference between control flow and data flow?
Ans: COntrol flow is the brain of apckage and controls various tasks and containers.
It has ETL,WindowsAdmin and SqlServer Admin tasks.
Data flow is a component inside control flow and it is heart of ETL process.
It transfors data from sources to destinations by doing operations.
Q: Difference between Variable and Parameter?
Ans:
Parameter:
a)Take value and support user interaction at runtime
b)Scope is across package
Variable:
a) Take value and does not support user interaction
b) Scope at Package/ Container/ Task Level
c) Variables of two types 1. System Defined 2. User Defined variables
Q: What is the difference between System defined and User defined variables?
Ans:
System defined variables crated by the system at the time of package creation
Store under System namespace.
Generally they hold Audit information
No Scope
Ex: Execution Time,Package Name, User Name etc
User defined variables crated by the user.
Store under User namespace.
Generally they hold user information
Three scopes a)Package b)Container c) Task
Ex: Numeric value 10
String Value MSBIClass etc...
Q: What are the extensions of Solution, Project,Package, Paramter and connection Strings?
Ans:
Solution: .sln
Project: .dtProj
Package: .dtsx
Parameter: .param
Connection Manager: .conmgr
Q: When do we Miscellaneous folder in SSIS?
Ans: Generally Business Docs, LLDs, and other credential docs are maintained here.
Q: Difference between COntainer and Group?
Ans:
Group is logical object which hold tasks and containers.
No properties associated to group.
Groups are ungrouped.
Container is physical object which hold tasks and containers.
Properties associated to container.
Containers support only deletion.
Q: How do yopu identify package and task functionality easily?
Ans: By using naming conventions and annotations[detailed descriptions]
Q: How many ways we can execute package ?
Ans: 8 ways [Materail]
Q: How many colors a package uses?
Ans: 5 colors
Green--Success
Red--Failure
Orange/Yellow--Running
Grey--Disable
White--Ready to execute
Q: How many types of files SSIS support
Ans: 4 types a) Flatfile b)Excel c) XML d) Raw file
Note: Refer to material for explanation
Q: What is CSV file and how do you open it?
Ans: Comma Seperated Value file, we open in two ways
1. Flatfile 2. Excel
Q: How do you use csv file in SSIS?
Ans: Flatfile source
Q: Which extension of flatfile data you get mostly?
Ans:
a) txt--text file
b)dat--data file
c) out--output file
d)csv--comma seperated value file
Q: What is delimiter and explain some dlimiters?
Ans: Data seperator is delimiter.
Ex: , | : $ & @ LF CRLF etc...
Q: What is CRLF?
Ans: Carriage return[First char of a line] line feed/ forward [New line]
New line first character--Enter key
Q: How many types of flatfiles avaulable?
Ans: Two types a) Fixed length b) Delimiter
Q: What is TextQualifier, how is it helpful?
Ans: Qulifies the text with some indicators.
Ex: 'vinay'--single quote is text qualifier
Q: How many settings we need to know before processing flatfile?
Ans: 7 settings [ refer to material]
Q: What is codepage, which code page is useful in realtime to process all types of data?
Ans: Encrypted charset is codepage,in realtime Unicode or UCS [2/4/8] suitable.
Q: Data loading how many ways in real time?
Ans:
a) Truncate and load
b) Incremental load [it is of three types 1.Direct incremental 2. SCD load 3. CDC Load]
Q: What is Delta Load?
Ans: Differentiated records load is delta load.
Q: What is SCD?
Ans: Slowly changing dimension for implementung actions for new and modified records.
Q: What is CDC?
And: Change Data Capture for taking sources changes and doing some actions. Source changes are Inserts,updates and deletes.
Q: Can you give alternate terminologies for source, target, import and export?
Ans: Source--> Data producer/provider/feed/input
Target-->Desintation/ Data consumer/ Data output
Import-->Loading/Populating
Export-->Unloading/unpopulating
Q: Can yopu explain me Dataflow task features briefly?
Ans: Moves data from source to destination.
Uses internal buffer while processing [in memory process]
Each data flow task initiates one Execution Tree internally.
Q: Why DataFlow task runs faster than all other tasks in SSSI?
Ans: Seperate process [in memory process] and seperate engine.
Q: What type of default datatypes dows a Flatfile and Excel take?
Ans: Flatfile take : DT_STR Excel ake:DT_Wstr
Q: What is FastParse option in flatfile source?
Ans: Ensures faster compilation or validation incase of date and numeric columns.
If we don't mention this option at flatfile source, but you specified to one of the column as DT_I4, then every value validated for numeric evalution.
Q: What is the diff between External and Output cols?
Ans: External cols are source cols, where the propertie should not be modified. Output cols are columns passed to the next component,where the properties could be modified.
Q: Why SQL command option is helpful to process source data?
Ans: We will get required columns and rows from single or multiple tables. Reduces network traffic, burden and load.
Q: How do you identity Buffer Size?
Ans:
Row Size * No of rows
Row Size: PID(INT), PNM(Char(30)),PLOC(char(30)): 4 + 30+ 30=64 Bytes
10000 rows: 10000 * 64 --Buffer Size
Q: Which language does SSIS support?
Ans: It has its own language, i.e Integration Services Language
Q: Difference between DT_Dtr,DT_Wstr?
Ans: Dt_Str is ASCII based so code page required to convert where as DT_Wstr is unicode based so no code page is required.
Q: What is transform and how many types available?
Ans: Intermediate operation between components is called transformation.
[Ex: Sort, Merge, Aggregate etc...]
The component which implements this is transform.
Two types of transforms a) Synchronous b) Asynchronous
Q: Differences between Synchronous and Asynchronous transforms?
Ans:
Synchronous:
a) Source and target record counts mostly match
b) Transform is non blocking [blocking of memory not happened]
c) No seperate execution thread in the execution tree
Ex: Data conversion, Derived column, Oledb command
ASynchronous:
a) Source and target record counts mostly unmatched
b) Transform is blocking [blocking of memory happened]
Blocking is of two types a)Partial blocking b) Full blocking
c) Seperate execution thread in the execution tree
Ex: Sort, Merge, Union, Union All, Aggregate etc...
Q: Give me examples for Full and Partial Blocking Transforms?
Ans:
Full blocking --Sort, Aggregate etc...
Partial Blocking -- Merge, Union, Union All etc...
Q: Could you tell me sort transform properties?
Q: How many ways we convert the data?
Ans:
a) Data conversion transform
b) Expressions (DT_I4,Dt_Str etc...)
c) Source level/ target level directly [Advanced tab/ Show advanced editor]
Q: When do we go for Transform and when do we go for expression?
Ans: If there is an expression editor, take Type cast expression. Otherwise take transform.
Q: How many operations does aggregate transfor perform?
Ans: Seven--Sum,Avg,Min,Max,Count,Count Distinct, Group
Q: How do you identify that Derived column Transform is required?
Ans: If the business logics need to be implemented on every row.
Q: How do you merge rows in SSIS?
Ans: Union all,Marge and Marge Join Transforms
Q: Difference between Union all and Merge?
Ans:Refer to notes
Q: What is the basic protocol to merge rows?
Ans: Input structures should match.[No of cols and order of data types]
Q: What do you mean by sort key position 0, 1, 2?
Ans:
0-->Unsorted
1-->First column sorting
2-->Second column sorting
Ex: PNM,PLOC,PID, need PLOC indescending and PNAME in ascending.
PID-->0
PLOC-->1
PNM-->2
Q: When do we go for Joins?
Ans: Joins help to get multiple columns from multiple tables.
Q: What is Join type?
Ans: The type of relationship you establish between multiple objects [usually between two objects]
Q: Do we have right join in SSIS?
Ans: Merge does not have right join but if we swap and perform left join, then it is right join.
Q: How many merge/ merge joins/ union all required for joining 4 inputs?
Ans:
Union All: Only one
Merge: N-1: 3
Merge Join: N-1: 3
Q: For every input record I would like to do one SQL operation (specially update), which transform required?
Ans : OLEDB command
Q: How do we present OLEDB complaint parameters?
Ans: ?
First ?: Parameter 0
Second ?: Parameter 1
Third ? : Parameter 2
etc...
Q: How do you process MVS or mainframe / cobol files data in SSIS?
Ans: Unpivot transform
Q: How look up is different from Merge join?
Ans:
Read for differences material
Lookup=Merge inner join + unmatched handling + Cache mechanism
Q: When do we go for Full cache, Partial Cache and No cache?
Ans: No cache-- Fresh data / Live data lookup
Full Cache-- If the lookup object not changing frequently
Partial Cache--If the lookup object changing sometimes
Q: What is the advantage of cache? and What it is?
Ans:
Cache prestores data so reduces the burden and traffic on the object.
It is a file, saved with .caw.
Q: Lookup two basic rules to operate?
Ans: a)Lookup object
b) Condition
c) Return columns
Q; What are the lookup objects supported in SSIS?
Ans: Table/ Cache File
Q: Incase of no match in the lookup what happens?
Ans: It allows to do either of the four operations
a) Fail component b)Ignore failure c) Redirect to No Match d)Redirect to Error Output
Q: Incase of multiple matches, which match it returns?
Ans: First Match
Q: What is the structure of unmatched data?
Ans: Similar to source
Q: What kind of difference you find when you go for No cache,Full cache and partial cache?
Ans:
No cache--> More burden, more traffic, more time
Full Cache-->Less burden, Less traffic, less time
Partial Cache--> Avg burden, Avg traffic, Avg time
Q: If the cache size increased or more, what is the impact of it and how do you reduce?
Ans:
More cache size leads to more time operation and more storage operation.
To take required cache, then write a customized query in the lookup transform or adjust the memory in the advanced tab.
use result of sql query: select did,dnm from dept where did in (10,20,30)
Q: What is the default join of Lookup?
Ans: Inner join
Q: How do we implement left join using lookup?
Ans: Using the below option:
ignore failure
Q: Difference between Fuzzy lookup and normal lookup?
Ans:
Fuzzy lookup: Similarity match on string values
Normal lookup: Exact match on numeric, string and date values
Normal lookup gives required return columns, where as fuzzy lookup additionally gives three columns
[Confidence, Column similarity, Row similarity]
Q: What is LineageID?
Ans: For every column system assigns a unique ID, i.e Lineage ID.
Q: How do we export multiple flatfiles / images into a table column for auditing?
Ans: Import Column Transform
Q: How many you have developed till this date?
Ans: I never constructued script for my applications, they are created by the script developers.
I know how to call and use methods and variables.
Q: What is the difference between script component and script task?
Ans: Script task available at control flow, script component available at Dataflow.
Script component has source, target and destination options.
Q: What is the meaning of Dts.TaskResult = (int)ScriptResults.Success?
Ans: It returns success state message to Package.It the system not reaching this statement means there is a problem in the script.
Q: What is a script hierarchy?
Ans: Namespace-->Class-->Method/Variables
Q:How do you control the tasks execution?
Ans: By using precedence constraints
Q: Explain constraints of SSIS?
Ans: a) Success b) Failure c) Completion (either success / failure)
Q: How many Precedence options available in SSIS?
Ans: Four options
a) Constraint b) Expression c) Expression and constraint d) Expression or constraint
Q: Whatis logical and in precedence constraint?
Ans:
Logical Ans -ensure all precedence conditions satisfaction.
Q: How do we pass parent package variable easily in SSIS?
Ans: By using execute package task
Q: What do you mean by Execute Outofprocess true?
Ans:The package runs seperately with its process or execution threads
Q: Can you give some examples of Execute Process task?
Ans: a) Unzip/ Zip folder b) Software or patches iinstallation c) Working with windows commands and powershell objects e) Reuable components (.dll,.com) f) batch file operations (.bat)
Q: How do you ensure file existence and continue with process?
Ans: By using WMI event watcher task
Q: How do you ensure how many applications running now?
Ans: WMI Reader Task
Q: How do we identify current memory statistics?
Ans: WMI Reader Task
Q: What is Task host container?
Ans: Every task itself is a container Ex: DataFlow: It has set of sub components and features
Q: How do you transfer sql server objects from one instance to another?
Ans: Transfer tasks
Q: How do we process SSAS objects in SSIS?
Ans: There are four tasks
a) Script task b) Analysis Services Processing Task c) Analysis Services Execute DDL task d) Execute Process Task
Q: What task helps to load dimensions and facts in SSIS?
Ans: Analysis Services Processing Task
Q; Which task helps to create/ modify objects in cube?
Ans: Analysis Services Execute DDL Task
Q: What is Hitcount in SSIS?
Ans: It is used in the BreakPoint level to control the number of hits.
Q:How do customers provide dummy files/ actual files for your Loading?
Ans: They keep in FTP Server
Q: How do you pass your pkgs, unit test docs etc... to customer?
Ans: There are different projects different ways, but in my current project we keep them in FTP server.
Q: How do you call web method in SSIS?
Ans: By using WebService Task
Q:What is logging and advantages of it, how do you implement in your project?
Ans:
Capturing log information (package components execution statistics) at specified events is called as logging.
In my OLD project(version 2008 R2) we used a custom database to log.
In latest project (post 2012) we are using existing catalog database logging.
Advantages:
a) Identify the execution process
b) Identify errors and rectify
c) Identify performance issues and eliminate
d) To identify regular intervals running and load statistics
Q: What is Logprovider?
Ans: The object which holds / captures log info is called as Log Provider.
Ex: Log Provider for File
Log Provider for table
Etc...
Q: How do you implement restartability in SSIS?
Ans: Checkpoint mechanism
Q: What is checkpoint mechanism?
Ans:
Implements restartability by keeping the stopped or failed point.
Q: What are the checkpoint options available?
Ans:
a) Checkpoint filename
b)Checkpoint usage
c)Save Checkpoints
Q: Can we take checkpoins and transactions in a single package?
Ans: NO, the reason is both are independent and contrary operations.
Checkpoint move forward where as Transaction rollbackword.
Q: What is the service does transaction use?
Ans: DTC--Distributed transaction coordinator
Q: What is Transaction?
Ans: Set of tasks executed like a single request, if all tasks executed successfully, then trnasaction is successful oterwise transaction fails and rollback the previously succesful tasks.
Q: How many transaction options available?
Ans:
Three options
a) Supported--Joins to the existing transaction
b) Required-- Joins if exists or starts new transaction
c) Not supported--Will not participate in the transaction process
Q: How do we implement transaction coding wise?
Ans:
Using Begin Transaction and End transaction statements in the programming and query languages.
Q: How do we run packages without SSDT?
Ans: Using commandline utilities a.DTEXECUI b.DTUtil c.DTEXEC d.
Q; How do we call packages in .Net Program?
Ans:
Using the command DTExec.exe with the below syntax.
DtExec.exe\Dts [Refer to 105page]
Q: What is configuration and how many configurations SSIS has?
Ans:
Configurations are important package settings [such as Sourcem, Target connection strings, Databases,Paths etc...] and user external values.
SSIS has 5 configurations
a) XML File
b) SqlSever
c)ParentPackage Variable
d) Environment Variable
e)Registry Variable
Q: What is deployment and how many models available?
Ans: Moving the packages and relevant files from one environment to other environment is called deployment.[Dev-->Test,Test-->Prod]
There are two ways
a) Legacy approach--Package Deployment [Since 2005...]
b) Modern Approach--Project Deployment [Since 2012]
Q: How do you implement Package deployment and how many modes available?
Ans:
Package by package deployed in this model.
a) Need to have Manifest file
b) Use manifest file to deploy into File System [to a Folder deployment] or Sqlserver [SSMS-->IS].
Q: What is manifest file?
Ans: The file which has deployment settings [Packages, config files and other files]
Q; How do we deploy requried packages?
Ans: Three ways
a) Open manifest file and remove the unnecessary packages
b) Remove packages from deplopyment folder
c) Exclude packages at solution expplorer and then Build->Solution
Q: Major differences between Package and Project deployment features?
Ans:
Package deployment + Versioning + Catalog Logging + Parameters and Environments
Q: How do you implement project deployment?
Ans:
a) Prepare Catalog Database
b) Deploy from SSDT
Q: What types of data loadng you have in the project?
Ans: In my DWH, there are hundreds of tables and multiple types of loading.
a) Direct Lod (Truncate and Load)
b) Incrmental Load (Direct b/ Delta Load)
c) Incrmental Load (SCD)
d) Incremental Load (CDC)
Q: How SCD is different from CDC?
Ans:
Source changes carry forwarding is called as CDC [Inserts,updates and deletions]
SCD handle new and old records by doing Insert and Update operations.
Q: Which SCD method you use most in real time?
Ans:
Depending on table situation we go for SCD type.
Some tables are Non history, Some tables are full history and some are partial history.
But I found mosty SCD date process for analytical reporting purpose.
Q: Explain about SSIS Isolation Levels?
Ans:
There are 7 options for this setting.
Unspecified: A different isolation level than the one specified is being used, but the level cannot be determined. Huh? I think that’s just another way of saying that the isolation level isn’t set for this container but that it is set for the package, or container that the task resides in.
Chaos: The same as ReadUncommitted but it checks the isolation level of pending transactions during a write operation so that transactions with more restrictive isolation levels are not overwritten.
ReadUncommitted: Does not lock so a transaction can read uncommitted rows (dirty read).
ReadCommitted: Can only read records from committed transactions and issues a lock during the read. However it does not prevent records from being added, deleted or changed which can allow for a repeatable read (phantom read).
RepeatableRead: Locks are held until the transaction completes. It also means that the transaction cannot read data that has changed but has not been committed (ReadCommitted).
Serializable: Locks the entire data set until the transaction completes. It guarantees that the result set does not change during the transaction
Snapshot: The transaction takes a snapshot of the read at the beginning of the transaction. As such, data will not change. This property cannot be set for a package. It is an available option but it throws an error.



SSRS FREQUENTLY ASKED QUESTIONS (SIMPLE, MEDIUM & COMPLEX)

What is Report?
Ans:
Data display component with visuals is called report
Visual/ Visulizations: Chart,Matrix,List,Map etc...
What is SSRS, similar tools?
Ans:
is a report generation tool.
Similar tools are Cognos,BO,Tableau,Qlikview,Crystal reports,Panorama,Jasper, Microstrategy,OBIEE, Hyperion etc...
How many types reports generated in the market?
Ans:
) OLTP report
Business Processing Report
Ex: Movie ticket
Payment Vochure
Tools: Light weight reporting tools
Ex: OracleD2k,Jasper,Crystal reports, SSRS etc...
b) OLAP Report
Decision support (Stategic and Tactic)
Ex: Last year last four months sales vs this year last four months sales comparison
Every even month sales groth comparison for the last 5 years
Tools: Heavy weight tools
Ex: SSRS,Cognos,BO,Hyperion,Microstrategy,Tableau,Qlikview etc...
What is the generic reporting life cycle steps availabe?
Ans:
Three steps
a) Authoring phase
Here we focus on design, layout, look and feel.
Even we focus on Data Source,Dataset and logics
b)Management Phase
Which servers can hold the reports and with what kind of security.
c) Access and delivery phase
Does does customers can access the reports(either Pull Model/ Push Model)
What is push model?
Delivering the report in the specified format to the specified people in the specified mode (Email/ Shared folder)
Scheduled approach/ subscription approach
What is Pull Model?
Ans:
End users go and get their required format from the report server. (Manual fetch of reports)
What is subscription?
Delivering the report in the specified format to the specified people in the specified mode (Email/ Shared folder) in a stipulated time.
Ex: Need a report in PDF format before 5AM in my
What is Report Rendering?
Ans: Report delivery format is report rendering.
Report render formats: PDF, Excel, Word, HTML,MHTM(Mime HTML) etc...
What is report item and report region?
Ans:
Report item is single cell holder (Ex: Textbox, Line, Image, Place holder etc...)
Report region is multi cell holder (Ex: Chart,Matrix, List etc...)
What are the report data sources available, what is report data source?
Ans:
Data feed/ Data input to the report is Data Source.
Ex:
Sqlserver
Teradata
Oracle
Analysis Services
Azure
SAP Netviewer
Hyperion
ODBC --complaint applications (Excel...)
XML
etc...
Data source takes connection to the source
[Ex: Servername, credentials]
What is Dataset?
Ans:
Subset of data source data is dataset.
Dataset takes a) query b)View c) Procedure to bring columns in the report.
What is drill through reporting?
Ans:
One report to another report navigation is drill through.
What is Drill Down report?
Ans:
With in the same report navigating based on expansion and collapsion.
Top down and bottom up hierachical analysis in the report.
Ex:
Coutry expand states display, states expand cities display etc...
What is BookMark reporting?
Ans:
Moving to the corresponding area in the same report based on action (click)
What is parametrised reporting?
Ans:
If user interaction provided to the report, then it is paramterized report.
Ex: Giving location specification to a report to go the correponding location.
SSRS History?
Ans:
2005-SSRS introduced
2008--Many changes to SSRS [ Architecture, IIS independent, new visualization, Internet protocol change]
2008 R2--New visualizations added [ Map, Indicator, databar, Sparkline] Shared Dataset introducued.
2012--Look and feel [Sleek], Sharepoint Mode many changes. Native Mode: Shared data source added
Sharepoint Mode: Power view, Power Pivot added. Dashboards added.Cards, Tiles etc...new visualizations added. The SSRS model is Project Cresend Model.
2014
2016...
Which mode of SSRS runs faster?
Ans: Shrepoint mode. In this mode all components installed in Sharepoint Server. Sharepoint server is one of the fatest running server service .
Hence it is recommended.
What do you mean by repory deployment / publishing, what is the use of it?
Ans:
Reports moving from development to server area for end user access. This process is called as deployment/ publishing.
What is report data?
Ans:
The data which you use in the report is called report data.
Ex:
Data source
Data Set
Parameters
etc...
SSRS important components:
Ans:
Three important components
a) Report Server:
Heart of SSRS.
Reports stored and retrived by using this component. This is for Developers and DBAs.
b) Report Manager:
This is brain of ssrs.
To manage, schedule, secure the reports this is helpful.
It is flexible to Developers, Support, DBAs and Clients [End uSers]
c) Report Server Databases.
These databases contain system defined tables, which store metedata of reports [Data about data]
Ex: For a report metadata is colors, items, fonts, data sources etc...
There are two types of databases
a) Permanent DB
Stores report definitions, data items ,credentials, history etc...
b) Temporary DB--Stores temporary calculations, caching informations, sessions info etc...
Q: How many reports you schedled as a developer in your 3 years of exp?
Ans:
As developer i never involved in scheduling.
Q: Have you created any procedure or view for your report?
Ans:
Small company: Yes, I created
Large: No, Sql Development Team used to provide that
Q: Which type of data retrieval into report is recommended and why?
Ans: Procedural retrieval is better in report performance.
a) Precompiled so during report operations it won't compile
b) Preplanned --Execution plan is generated.So at runtime it will not generate a plan
c) Changes at database level are independent to the report
[Ex: Column name change or length change or any metadata change]
Q: Tell me some of the reporting commands?
Ans: .rs, .rsconfig , .rskeymgmt, .rsactivate etc...
Q: How many ways we create reports?
Ans:
a) SSDT (Wizard and Project)
b) Report Builder --Repoprt Manager
c) PowerView--Sharepoint Mode
Q: Which method mostly used to create reports in real time?
Ans: SSDT--Report Server Project
Q: Can you tell me the Sql Query which you used in the projects?Either simple/complex?
Ans:
SELECT D.StateName, D.StateSpatialData, P.*, PA.EMAIL, PA.P_ADDRESS, PA.PHONENO
FROM PARTY_ADDRESS PA
INNER JOIN PARTY P ON PA.PARTYID = P.PARTYID
INNER JOIN DimSpatiaData D ON P.PARTYID = D.ID
Note: Here Spatial data, party data and address info displayed based on Joins.
Q: Difference between View->Properties and Format Menu?
Ans:
Format menu has limited features and non customized features,
Where as View->Properties having many features with customized options.
Q: How many ways we create dataset?
Ans:
a) Using shared dataset
b) Embedding in a report --Exclusive for the report only
Q: In which version Shared Data Source and Dataset introduced and why?
Ans: Sqlserver BI 2008 R2 in the replacement of Report Model.
Q: How come View->Properties is different from TABLIX properties?
Ans: Tablix properties talks about data and management where as View->properties talks about formations.
Q: How do you display report classfification information in multiple worksheets with their actual names?
Ans:
Group on partylo,partloc->view->properties->group->Page Name: partyloc!.Value
Q: What is ROwnumber?
Ans: Unique value for every row
Q: What is Mod 2?
Ans: Produces either 0 or 1
Q: What is Mod 3?
Ans: Produces either 0 /1/2
Q: Diffrence between IIF and Switch?
Ans: Single condition eveluation--IIF, Multi condition evaluation--Switch
Q: What do you mean by "nothing" in SSRS?
Ans: NULL
Q: How many sub reports you can create in a main report?
Ans: No restriction, but if the no of inner reports increase performance decreases.
Q: What is subreport can you give a real time example?
Ans: Report inside another report is called subrport. Displaying All managers information with their location statistics.
Q: When do you go for Gague item?
Ans: One value pointing in a range
Q: When do you gor Indicator?
Ans: Value indication through image/icon
Q: When do you go for List?
Ans: Freeform reporting [Any kind of design, look and feel]
Q: How do you create side title report?
Ans: Using List
Q: What is parameter? How many types available?
Ans: Parameter prompts user values. They are of 5 types
a) Manual entering / no value default
b) Non queried paramters --Predefined values
c) Queried Parameter--Values from a query
d) Cascaded Parameter --One parameter output is input to another parameter
e) Multivalues parameter--Allow users to select all / required parameter values
Q: How do you filter dataset data with parameter?
Ans:
a) Goto query-> Put where cluase and filter
b) Goto Filter section-> take condition
Q: IF dataset not fetching data, please display "NO MATCH FOUND"
Ans: Highlight the region->properties->No rows message: "NO MATCH FOUND"
Q: How do customers choose all locations?
Ans: Multi valued parameters
Q:How do you display default value always?
Ans:
Parameter-> Default values->Specify Value: HYD
Q: Create a process where the user has to enter multiple values (not cascaded), if all values are set then only report should display.
Ans:
a) Create multiple independent parameters and use in the main dataset
b) Create a procedure with multiple parameters and pass values to the procedure at dataset.
Q. How many days you took to create a report?
Ans: Different sizes of reports and accordingly the time. Usually days to weeks. But, for the dash board reporting we took months (3-4 months)
Q. Have you used filters in Realtime? In which situation?
Ans: From the same dataset columns different no of rows to fetch filter is helpful. To do this we need to have parameters in the filtering.
Q: What is Report Model, what it contains?
Ans: Report model is a reuable set of columns, which help in dynamic and faster reporting. Initially it was for Report Buider.Latesr versions this is replaced by Shared Dataset and Shared dataSource.
Q: What is the usable of report builder?
Ans: For dynamic/ adhoc/ customer friendly quick reports generation tool.
Q: Difference between SSDT and Report Builder?
Ans: Report builder has additional features such as a) Direct publishing in the report server b) Report Part
Q: What is the usage of report manager?
Ans: It is suitable for Developers,DBAs, Support resources and Customers with different operations.
Q: What is the purpose of upload file?
Ans: Helps to upload any files but typically reports, report parts and shred datasets direct publishing.
Q: How many types of security we have in SSRS?
Ans:
SSRS has role based security,Where the role is assigned to multiple users / groups.
a) Website level : System Administrator, System User
b) Folder level / Report Level: Five roles available.
1.Browser 2. Content Manager 3. MyReports 4. Publisher 5.Report Builder
Q: When do we go for linking of reports?
Ans: If different people need different views of data (different parameter and security)from the main report, then we go for linking of report.
Q: How do you provide link aswell as user prompt for the report?
Ans: In the Report manager->Individual report->Manage->Parameters Prompt User and then enter some text like (Would you like to see other location?)
Q: What are the types of subscriptions avaiable? and their purpose?
Ans:
A) Standard Subscription: Single user delivery [Statis values]
B) Data driven subscription : Multi user delivery [Dynamic values]
c)NULL Delivery subscription: It will invoke report process but do not deliver anything. Introduced for cache refreshment in older versions.
Q: When do we go for Cache, Snapshot and Live data mechanisms?
Ans:
Live data:If the data changing during the period
Ex: Sensex 8am-3.30PM [ So make it live at that time]
Cache: Data not changing for sometime, No history and fast retrieval required
Ex: Movie tickets 11Am Show, seats reserved till 2PM only Later expired. Cache suitable
Snpshot: Data not changing for sometime, history needed and fast retrieval required
Ex: Post sensex operation data won't change till 7.59AM. So create snapshot at 3.31PM and move snapshot into hist at 7.59AM in the next day.
Q: How do you identify report performance issues?
Ans: Using ExecutionLog views under ReportServer database
Q: Frequently used report and running longer time, how do you avoid the bottleneck?
Ans: If it is Live, then try to make it a snapshot/cache report
Q: Data source and dataset issue comes under which phase?
Ans: Retrieval phase
Q:Why do we get Bytecount and Row count sizes more?
Ans:
Dataset processes query/proc/view wrongly constructured(not optimized)
Q: Where does history stored in SSRS?
Ans:ReportServer Db->History table
Q: Where does Snapshot info stored in SSRS?
Ans:ReportServer Db->SnapshotData table
Q: Where does Cache info stored in SSRS?
Ans:ReportServer Temp Db->ExecutionCache table
Q: Where does Sessions info stored in SSRS?
Ans:ReportServer Temp Db->SessionData table
Q: Where does Deployed info stored in SSRS?
Ans:ReportServer Db->Catalog table
Q: How many users running reports/ how many connected to SSRS, how do you identify?
Ans:ReportServer Temp Db->SessionData table
Q: Where does Sunscriptions, schedules stored in SSRS?
Ans:ReportServer Db->Schedule table
Q; How do you identify which user having lock on reports?
Ans: ReportServer Temp Db-->SessionLock table
Q: Who are all participated if report performance goes down?
Ans:Depends on the issue
a) RS Developers -- Data source, Dataset,Parameters,Report processing level, Rendering level (Linkings, Snapshot, Caches)
b) Sql Developers-- Query/ View/ Procedure
c) Sql DBA--Access issues, lock issues, Memory issues at RS Server
d) Source network adiministrators, Rendering level systems network administrator
e) MS GS Team--If system behaving stragely (components malfunctioned /not working properly)
Q: How do you merge cells?
Ans: Highlight two cells->Rc->Merge Cells
Q: After merging cell, which output you get most?
Ans: Left cell output
Q: How do you Split cells?
Ans: Highlight the cell->Rc->Split Cells
Q: How many ways we deploy reports?
Ans:
Many ways
a) SSDT: Build-->Deploy
b) RS utility [CommandLine]
c) Report Manager-->Upload File
d) Report Manager: Replace
e) Programmability (.Net Program)
Q: What is encryption file extension in SSRS?
Ans: .snk
Q: How do we open other report in the same window?
Ans:
=JavaScript:Void(windows.open["reportpath"])
Q: How do we take checkboxes at Report Level?
Ans:
Checkbox has two options: Tick Mark, Cross
Represented with ASCII codes
TickMark: Chr(0254)
Cross: Chr(162) or chr(168)
Compulsary you must take Font as Wingdings
Textbox->Font->Wingdings and write the below expression
=iif(Fields!Genercode="M", chr(0254), char(162))
Q:How to get no of rows in dataset?
Ans: =count("DatasetName")
Q: Difference between Normal function and report function?
Ans: Report functions used at header and footer level
Q: Difference between IIF and Switch?
Ans: IIF is for single condition and switch is for multiple conditions
Q:How do you identify Dataset level aggregations?
Ans: Expression editor -->Left handside -->Datasets feature
Q:How do we open link in a new window
=Code.Newwindows("www.google.com","no","no")
or
="javascript.void(window.open("www.google.com")"
Q:How do we print negative numbers?
Two step process a)Format string #,##0.00
b) In the color property = iif(Fields!partyincome<0,"red","black")
Q: How do we take new line in textboxes?
Two ways a)VBCRLF b) By using Chrw(13) and Chrw(10)
Note: At the beginning and ending if you need new line VBCRLF is an issue
Q:How do we display no rows message when dataset not showing any data in the region.
Ans: Highlight the region->rc->No Rows Message->”No rows found”
Q: How do we print negative values?
Ans: Negative values are represented with minus sign and red color
I NEED TO WRITE AN SSRS EXPRESION TO CHECK AND REPLACE NULL WITH ANOTHER FIELD VALUE…HOW DO YOU DO THAT?
=IIF(ISNOTHING(F1),F2,F2)
SCENARIO: Explain about Choose function…
Choose(eid.value,”one”,”two”,”three”,”four”) –english translation
Displaying only date from datetime
=FormatDateTime(Today(), DateFormat.ShortDate)
Q: Can I use report variable/ group variable or a text box value inside dataset?
Ans; No, These come under processing phase and we can't pass them to retrieval phase
Q: Need a report which take four params ,if all selected then only report required (not cascaded)
Ans:
Create a procedure with params ans pass values through those.
or
take three individual params (not related) and use in the dataset query
CHART ADDITIONAL:
Take pie chart.
SC1:LESS THAN THE SPECIFIED PERCENTAGE VALUES INTO SINGLE SLICE
In the General section, expand the CustomAttributes node.
Set the CollectedStyle property to SingleSlice
Set the CollectedThresholdUsePercent property to True.
Set the CollectedThreshold property to 10.
SC2:LESS THAN THE SPECIFIED VALUES INTO SINGLE SLICE
In the General section, expand the CustomAttributes node.
Set the CollectedStyle property to SingleSlice
Set the CollectedThresholdUsePercent property to False.
Set the CollectedThreshold property to 5000.
SC3:TO COLLECT SMALL SLICES INTO A SECONDARY, CALLOUT PIE CHART
In the General section, expand the CustomAttributes node.
Set the CollectedStyle property to CollectedPie
Set the CollectedThresholdUsePercent property to True.
Set the CollectedThreshold property to 10.
Scenario: Display Male when true is choose, Display Female info When False choosen.
Ans:
a) Create parameter
ParBool
Boolean
b) Goto Dataset
Where p.Gender= (Case when @ParBool='True' then 'Male' else 'Female' End)
c) Preview--> True-->View report displays only Male data
Additional scenarios based on expressions:
a) Not shoing rows which are having NULLs on Partyname and Loc columns
Ans:
Highlight the rows section->Row visibility->
Show or hide based on expression->
=iif(Isnothing(partyname) and isnothing(partyloc),true, false)
Taking check boxes for values
a) Take dept_valid col and write expression
iif(partycode<=30, "VALID","INVALID")
B) Take another column
textbox properties-> Font: Windings
Expression : iif(partycode<=20,chr(254),chr(168))
Note: 254 is fixed for tick marl,168 is for uncheck
Handling NULL values
Take income column values as NULL
Expression : iif(isnothing(partyincome),99999,partyincome)
Showing only limited columns based on user selection
Parameters: Name: PARBOOL
Data Type: Boolean
columns group section->advanced columns->choose the columns to hide(by selecting static)->properties: visibility: Expression : =(ParBOOL="FALSE")
Doing a fuzzy search / similarity search in paramters
select * from party where partyname like '%'+@parname+'%'
Display the report data in to muliple excel sheets with names
Ans:
Do group on Location,(Parent Group)
Locgroup->properties-->PageName: Fields!Location.Value
Show the column headings in all pages,by going to advanced.
RepeatOn Newpage: True
KeepwithRowGroup: After
It will show table header in every page]
Displaying Fixed Number of Rows per SSRS Report Page
=CEILING(RowNumber(Nothing)/20)
Report, Page--->InteractiveSize---> Height
Change the height properties by 5.5 and you will be able to show 20 rows per page.
PageBreaks: Between each instance of a group
One option is to generate group rows in sql query, for this you can use row_number() or to evenly devide you can use NTILE() in sql query. Based on that group number, you can create a group and set a page break after each group. So, only 10 rowes will get displayed in each page.
Report, Page--->InteractiveSize---> Height
Change the height properties by 5.5 and you will be able to show 20 rows per page.
Scenario: Display Male when true is choose, Display Female info When False choosen.
Ans:
a) Create parameter
ParBool
Boolean
b) Goto Dataset
Where p.Gender= (Case when @ParBool='True' then 'Male' else 'Female' End)
c) Preview--> True-->View report displays only Male data
Q: How many reports generated in your 3 years of exp?
Ans:
First check interviewer mood and knowledge [Presence of mind]
I have created and manipulated many reports in the last three years.
It could be around 40-50.
Q: How many days you took to create a report?
Ans:
We have different types of reporting situations and took time accordingly.
Some of them in a day, in a week, in a month and dash board kind of stuff months.




















No comments:

Post a Comment

Note: Only a member of this blog may post a comment.