Sunday, August 21, 2011

Testing Data warehouse


Data warehousing plays a pivotal role in identifying the significant trends and aspects of real world business based on the historical and real time data available in the warehouse. Testing has proven to be a necessity to sustain the data accuracy and credibility of data warehouse system.
Hardly any standard methodologies have been defined to perform testing in a data warehouse environment. Testing takes a different course when it comes to test data ware house. Where as in traditional approach emphasis is on testing functionality or business logic of a system, testing data warehouse revolves on testing the real time data and information provided by these data which is much more tedious.


Data warehouse in a Nut- Shell:
Data warehouse is the repository of an organization, where its electronic data from heterogeneous source is stored in single place. This stored data then supports and implements querying and analysis; for the purpose of decision making. The data in the data warehouse is not simply the collection of data from heterogeneous sources but it is the data which conforms the business rules. For this data is Extracted (from heterogeneous systems), Transformed (To business rule) and Loaded (In data warehouse).
According to W.H Inmon, the data in the data warehouse should satisfy following properties:
  •  Subject oriented.
  •  Integrated. 
  • Time variant. 
  • Non-Volatile.

Components of data ware house
As above picture depict data warehouse is build with the different components and processes that ultimately stores all the data in the data warehouse.
Source Systems
This is the first component of a data warehouse, all the data which is there is data are collected from the source systems. They can be categorized in OLTP systems, Flat files, Legacy systems, and External data.
Extract Transform Load (ETL) 
On the testing standpoint data warehouse testing starts from this point. Nevertheless, for the any data warehouse project ETL tool is the back bone of data warehouse, it is responsible for the loading the source data into the target table. To name Informatica power center, Data Stage, AbInito are few ETL tools, there are few open source ETL tools available to name few Pentaho Kettle, Talend etc.
Operational Data Store (ODS)
An operational data store is the repository that contains the aggregated data from the source systems its main purpose is to cater the operational data for the reporting that requires real time or near real time data.
Staging Area 
Is a temporary area where data from the source system is copied and kept for some time so that the transformation in data can be carried out without putting extra load to the source systems.Again to use or not to use staging area depends of the architect if data warehouse.
Data Marts
A data mart (DM) is the access layer of the data warehouse (DW) environment that is used to get data out to the users. The DM is a subset of the DW, usually oriented to a specific business line or team.
Analysis and Reporting Tools
These tools helps end user to access the data from the data warehouse for analysis purpose.
To name few OBIEE from Oracle, Cognos from IBM, Business Object (B.O) from SAP etc are few commercial tools available in market today.

Testing data ware house:
Normal process that applies for testing generic system testing is valid for the data warehouse testing as well. They are as below, as these steps are same we will concentrate on the approach to test data warehouse. 
  • Requirement Analysis 
  • Testing planning 
  • Test requirement creation 
  • Test case creation 
  • Test execution 
  • Verification & Validation 
  • Review and walkthroughs
Approach to Test Data warehouse


·         Data Warehouse Testing All the objects that are the part of data warehouse and those that reside in different data bases, & schema like table, triggers, indexes, views etc.
Ø  Validate all the DB objects
Ø  Validate trigger logic from audit schema
Ø  Validate table level constraints
·         Extraction Transformation and Load (ETL) Layer Testing – The script behind all the procedure in the ETL layers that is responsible for loading the data from the source to ODS (or staging) DB’s and from ODS (or staging) to data warehouse DIM and FACT tables is tested
Ø  Validation of extraction logic
Ø  Validation of transformational logic
Ø  Checksum comparison
Ø  Data completeness
Ø  Data integrity
Ø  Data validation
Ø  Initial/Incremental load
Ø  Testing end to end data flow from Source – Target – Report
·         Reports Testing – Listed below are the tasks that are carried out as a part of reports testing:
Ø  UI validation like Report interface, downloading, pagination, summation, grand totals
Ø   Drill down reports, sorting, filters
Ø  Verifying report’s data
Ø  Comparison within different types of reports

QA’s test entry point on the typical data warehouse architecture.





For testing data warehouse the initial phase should start from business understanding. This is vital for a tester as to a developer. Test engineers should take part in business and technical walkthroughs of the requirements.

Testing should be performed for each and every component of data warehouse as ETL is more concentrated part in above components as data movement happens in ETL. Key data warehouse test strategy is to perform both Analysis based and Queries based test execution this happens to be on the Front end part of data warehouse like reports, Ad-Hoc reports & charts etc. here we test the reports for the data from the data warehouse and same is cross verified by executing related queries in data base. Any test strategy for testing data warehouse should essentially concentrate on the below points.

1. Functional testing
In this test, For ETL processes ensure that all the mappings that are deployed in the QA environment are working as expected, verify its Source, Target, Connections parameter, Insert, Update strategy, Error logs are generating, failure / report notifications that are sent via Informatica are proper. Ensure all mappings are working fine without any error. For Reports ensure that the data from UI and DB are same; there is no view display error and other general UI test scenarios.

2. Integration testing
Ensure working of the data warehouse application as a whole. Integration test cases should contain the scenario for sequenced / scheduled ETL jobs. Test strategies should be devised to verify reaction of data warehouse application in case of fail over’s.

3. Regression testing
As Stated above the the requirement can’t be freeze in any data warehouse application so it is certain that there will be new requirement or enhancements in existing requirements. In regression ensure that all the existing code still works new releases. Normally for the data warehouse application it is verified by running all the functional test cases of the impacted area.

4. Performance testing
Scalability and robustness should be intrinsic nature of any data warehouse application. As the requirements are not stagnant in these types of applications, scalability should not be the bottle neck of these applications new requirements should be added successfully without having any performance degradation in the existing one. Hence performance testing is the critical & this is done using the voluminous data that is in sync with the max load count of the ETL routines.

5. User Acceptance testing
Is carried by the business users since they are not aware of ETL routine that needs to run before verifying, the technology team should properly document/ explain business team the process of data loading to the data warehouse and convey the same using business rules.






11 comments:

  1. Excellent sudhakar! very informative. expecting more like this from you

    ReplyDelete
  2. Wow.. in a very nutshell language... you described more than needed. Please include
    A. MDM testing wrt MRD,
    B. Collection testing as well,
    C. Datawarehouse testing itself &
    D. Database engine testing
    as they are also an integral part of any standard datawarehouse testing.

    ReplyDelete
  3. Thanks Sudhakar for sharing such useful information.

    ReplyDelete
  4. very very thankful to u;;;
    i got big help from it;;;;

    ReplyDelete
  5. Thank your very much...

    ReplyDelete
  6. Hi ,
    Can you help me with what all information do me need to gather about the dimensions and facts before starting testing. For example kinds of facts ,grain. How to put a list in place.

    Thanks in advance.

    ReplyDelete
  7. This is a very good content I read this blog, please share more content on MSBI Online Training

    ReplyDelete