Testing – Data Analytics
Business Intelligence testing initiatives help companies gain deeper and better insights so they can manage or make decisions based on hard facts or data.
Our team of BI testers can help you validate your data in your BI systems through robust testing processes.
BI Testing Strategy:
A BI testing project is a testing project too. That means the typical stages of testing are applicable here too, whether it is the performance you are testing or functional end to end testing:
- Test planning
- Test strategy
- Test design (Your Test cases will be query intensive rather than plain text based. This is the ONE major difference between your typical test projects to an ETL/Data Warehouse/BI testing project.)
- Test execution (Once again, you are going to need some querying interface such as TOAD to run your queries)
- Defect reporting, closure etc.
Some of the important testing services we offer are:
Analytical/BI Report Testing & Validation
The reports are never going to be correct, consistent and fast if your preceding layers were malfunctioning.It is important to look for the following:
- The reports generated and their applicability to the business
- The ability to customize and personalize the parameters to be included in the reports. Sorting, Categorizing, grouping, etc.
- The appearance of the report itself. In other words, the readability.
- If the BI elements are BI integrated, then the corresponding functionality of the application is to be included in an end-to-end test.
ETL/DW Testing & Validation
The ETL process is where the raw data gets processed into business targeted information. We need to look for the following during this testing:
- The source and destination data types should match. E.g.: You can’t store the date as text.
- Primary key, foreign key, null, default value constraints, etc. should be intact.
- The ACID properties of source and destination should be validated, etc.
- Performance: As systems become more intricate, there are relationships formed between multiple entities to make several co-relations. This is great news for data analytics, however, this kind of complexity often results in queries taking too long to retrieve results. Therefore, performance testing plays an important role here.
- Scalability: Data is only going to increase not decrease. Therefore, tests have to be done to make sure that the size of the growing business and data volumes can be handled by the current implementation or not. This also includes testing the archival strategy too. Basically, you are trying to test the decision- “What happens to older data and what if I need it?”
Data Migration Testing requires a comprehensive strategy in order to reduce risk and deliver a successful migration for the end users.Database migration testing is needed when you move data from the old database(s) to a new database. The old database is called the legacy database or the source database and the new database is called the target database or the destination database. Database migration may be done manually but it is more common to use an automated ETL (Extract-Transform-Load) process to move the data. In addition to mapping the old data structure to the new one, the ETL tool may incorporate certain business rules to increase the quality of data moved to the target database.
The test approach for database migration testing consists of the following activities:
I. Design the validation tests
In order to test database migration, you need to use SQL queries (created either by hand or using a tool e.g. a query creator). You need to create the validation queries to run against both the source as well as the target databases. Your validation queries should cover the scope defined by you. It is common to arrange the validation queries in a hierarchy e.g. you want to test if all the Orders records have migrated before you test for all OrderDetails records. Put logging statements within your queries for the purpose of effective analysis and bug reporting later.
II. Set up the test environment
The test environment should contain a copy of the source database, the ETL tool (if applicable) and a clean copy of the target database. You should isolate the test environment so that it does not change externally.
III. Run your validation tests
Depending on your test design, you need not wait for the database migration process to finish before you start your tests.
You should report the following data for each failed test:
a. Name of the entity that failed the test
b. A number of rows or columns that failed the test
c. If applicable, the database error details (error number and error description)
d. Validation query
e. Date and time the test was run
Big Data Testing
Testing Big Data application is more a verification of its data processing rather than testing the individual features of the software product. When it comes to Big data testing, performance and functional testing are the keys.
In Big data testing, QA engineers verify the successful processing of terabytes of data using commodity cluster and other supportive components. It demands a high level of testing skills as the processing is very fast. Processing may be of three types
- Real Time
Along with this, data quality is also an important factor in big data testing. Before testing the application, it is necessary to check the quality of data and should be considered as a part of database testing. It involves checking various characteristics like conformity, accuracy, duplication, consistency, validity, data completeness, etc.
Big Data Testing can be broadly divided into three steps:
Step 1: Data Staging Validation
The first step of big data testing also referred as Pre-Hadoop stage involves process validation.
- Data from various source like RDBMS, web logs etc. should be validated to make sure that correct data is pulled into the system.
- Comparing source data with the data pushed into the Hadoop system to make sure they match.
- Verify the right data is extracted and loaded into the correct HDFS location
Step 2: “Map Reduce”Validation
The second step is a validation of “Map Reduce”. In this stage, the tester verifies the business logic validation on every node and then validating them after running against multiple nodes, ensuring that the
- Map-Reduce process works correctly
- Data aggregation or segregation rules are implemented on the data
- Key value pairs are generated
- Validating the data after Map-Reduce process
Step 3: Output Validation Phase
The final or third stage of Big Data testing is the output validation process. The output data files are generated and ready to be moved to an EDW (Enterprise Data Warehouse ) or any other system based on the requirement.
Activities in the third stage includes
- To check the transformation rules are correctly applied
- To check the data integrity and successful data load into the target system
- To check that there is no data corruption by comparing the target data with the HDFS file system data
Types of Performance Testing:
Load testing is a type of performance test where the application is tested for its performance on normal and peak usage. The performance of an application is checked with respect to its response to the user request, its ability to respond consistently within accepted tolerance on different user loads.
The key considerations are:
What is the max load the application is able to hold before the application starts behaving unexpectedly?
How much data is the Database able to handle before system slowness or the crash is observed?
Are there any network related issues to be addressed?
Stress testing is the test to find the ways to break the system. The test also gives the idea for the maximum load the system can hold.
Generally, Stress testing has the incremental approach where the load is increased gradually. The test is started with a good load for which application has been already tested. Then slowly more load is added to stress the system and the point when we start seeing servers not responding to the requests is considered as a break point.
During this test, all the functionality of the application are tested under heavy load and on the back-end, these functionalities might be running complex queries, handling data, etc.
The following questions are to be addressed:
What is the max load a system can sustain before it breaks down?
How is the system break down?
Is the system able to recover once it’s crashed?
In how many ways system can break and which are the weak node while handling the unexpected load?
Volume test is to verify the performance of the application is not affected by the volume of data that is being handled by the application. Hence to execute Volume Test generally the huge volume of data is entered into the database. This test can be an incremental or steady test. In the incremental test volume of data is increased gradually.
Generally, with the application usage, the database size grows and it is necessary to test the application against heavy Database. A good example of this could be a website of a new school or college having small data to store initially but after 5-10 years the data stored in a database of the website is much more.
The most common recommendation for this test is tuning of DB queries which access the Database for data. In some cases, the response of DB queries is high for a big database, so it needs to be rewritten in a different way or index, joints etc need to be included.
=> Is the application capable of meeting business volume under both normal and peak load conditions?
Capacity testing is generally done for future prospects. Capacity testing addresses the following:
Will the application able to support the future load?
Is the environment capable of standing for upcoming increased load?
What are the additional resources required to make environment capable enough?
Capacity testing is used to determine how many users and/or transactions a given web application will support and still meet performance. During this testing resources such as processor capacity, network bandwidth, memory usage, disk capacity, etc. are considered and altered to meet the goal.
Online Banking is a perfect example of where capacity testing could play a major part.
Reliability Testing or Recovery Testing – is to verify as to whether the application is able to return back to its normal state or not after a failure or abnormal behavior- and also how long does it take for it to do so(in other words, time estimation).
An online trading site if experience a failure where the users are not able to buy/sell shares at a certain point of the day (peak hours) but are able to do so after an hour or two. In this case, we can say the application is reliable or recovered from the abnormal behavior.
In addition to the above sub-forms of performance testing, there are some more fundamental ones that are prominent:
How is the new version of the application performing when compared to previous ones?
Is any performance degradation observed in any area in the new version?
What should be the next area where developers should focus to address performance issues in the new version of the application?