Jaanu Jeeva: 2012

Saturday, August 25, 2012

Risk Based Testing, Strategies for Prioritizing Tests against Deadlines

Often all other activities before test execution are delayed. This means testing has to be done under severe pressure. It is out of question to quit the job, nor to delay delivery or to test badly. The real answer is a prioritization strategy in order to do the best possible job with limited resources.

Which part of the systems requires most attention? There is no unique answer, and decisions about what to test have to be risk-based. There is a relationship between the resources used in testing and the risk after testing. There are possibilities for stepwise release. The general strategy is to test some important functions and features that hopefully can be released, while delaying others.

First, one has to test what is most important in the application. This can be determined by looking at visibility of functions, at frequency of use and at the possible cost of failure. Second, one has to test where the probability of failure is high, i.e. one may find most trouble. This can be determined by identifying especially defect-prone areas in the product. Project history gives some indication, and product measures like complexity give more. Using both, one finds a list of areas to test more or less.

After test execution has started and one has found some defects, these defects may be a basis for re-focussing testing. Defects clump together in defect-prone areas. Defects are a symptom of typical trouble the developers had. Thus, a defect leads to the conclusion that there are more defects nearby, and that there are more defects of the same kind. Thus, during the latter part of test execution, one should focus on areas where defects have been found, and one should generate more tests aimed at the type of defect detected before.

Disclaimer: The ideas in this paper are not verified for use with safety critical software. Some of the ideas may be useful in that area, but due consideration is necessary. The presented ideas mean that the tester is taking risks, and the risks may or may not materialize in the form of serious failures.

Introduction

The scenario is as follows: You are the test manager. You made a plan and a budget for testing. Your plans were, as far as you know, reasonable and well founded. When the time to execute the tests approaches, the product is not ready, some of your testers are not available, or the budget is just cut. You can argue against these cuts and argue for more time or whatever, but that doesn’t always help. You have to do what you can with a smaller budget and time frame. Resigning is no issue. You have to test the product as well as possible, and you have to make it works reasonably well after release. How to survive?

There are several approaches, using different techniques and attacking different aspects of the testing process. All of them aim at finding as many defects as possible, and as serious defects as possible, before product release. Different chapters of this paper show the idea. At the end of the paper, some ideas are given that should help to prevent the pressured scenario mentioned before.

In this paper we are talking about the higher levels of testing: integration, system and acceptance test. We assume that developers have done some basic level of testing of every program (unit testing). We also assume the programs and their designs have been reviewed in some way. Still, most of the ideas in this paper are applicable if nothing has been done before you take over as the test manager. It is, however, easier if you know some facts from earlier quality control activities such as design and code reviews and unit testing.

1. The bad game

You are in a bad game with a high probability of loosing: You will loose the game any way, by bad testing, or by requiring more time to test. After doing bad testing you will be the scapegoat for lack of quality. After reasonable testing you will be the guilty in late release. A good scenario illustrating the trouble is the Y2K project. Testing may have been done in the last minute, and the deadline was fixed. In most cases, trouble was found during design or testing and system owners were glad that problems were found. In most cases, nothing bad happened after January 1^st, 2000. In many cases, managers then decided there had been wasted resources for testing. But there are options. During this paper I will use Y2K examples to illustrate the major points.

How to get out of the game?

You need some creative solution, namely you have to change the game. You need to inform management about the impossible task you have, in such a way that they understand. You need to present alternatives. They need a product going out of the door, but they also need to understand the RISK.

One strategy is to find the right quality level. Not all products need to be free of defects. Not every function needs to work. Sometimes, you have options to do a lot about lowering product quality. This means you can cut down testing in less important areas.

Another strategy is priority: Test should find the most important defects first. Most important means often "in the most important functions". These functions can be found by analyzing how every function supports the mission, and checking which functions are critical and which are not. You can also test more where you expect more defects. Finding the worst areas in the product soon and testing them more will help you find more defects. If you find too many serious problems, management will often be motivated to postpone the release or give you more time and resources. Most of this paper will be about a combination of most important and worst areas priority.

A third strategy is making testing cheaper in general. One major issue here is automation of test execution. But be cautious: Automation can be expensive, especially if you have never done it before or if you do it wrong! However, experienced companies are able to automate test execution with no overhead compared to manual testing.

A fourth strategy is getting someone else to pay. Typically, this someone else is the customer. You release a lousy product and the customer finds the defects for you. Many companies have applied this. For the customer this game is horrible, as he has no alternative. But it remains to be discussed if this is a good strategy for long term success. So this "someone else" should be the developers, not the testers. You may require the product to fulfill certain entry criteria before you test. Entry criteria can include certain reviews having been done, a minimum level of test coverage in unit testing, and a certain level of reliability. The problem is: you need to have high-level support in order to be able to enforce this. Entry criteria tend to be skipped if the project gets under pressure and organizational maturity is low.

The last strategy is prevention, but that only pays off in the next project, when you, as the test manager, are involved from the project start on.

2. Understanding necessary quality levels

Software is embedded in the larger, more complex business world. Quality must be considered in that context (8).

The relentless pursuit of quality can dramatically improve the technical characteristics of a software product. In some applications - medical instruments, railway-signaling applications, air-navigation systems, industrial automation, and many defense-related systems - the need to provide a certain level of quality is beyond debate. But is quality really the only or most important framework for strategic decision making in the commercial marketplace?

Quality thinking fails to address many of the fundamental issues that most affect a company's long-term competitive and financial performance. The real issue is which quality will produce the best financial performance.

You have to be sure which qualities and functions are important. Fewer defects do not always mean more profit! You have to research how quality and financial performance interact. Examples of such approaches include the concept of Return on Quality (ROQ) used in corporations such as AT&T (9). ROQ evaluates prospective quality improvements against their ability to also improve financial performance. Be also aware of approaches like Value Based Management. Avoid to fanatically pursuing quality for its own sake.

Thus, more testing is not always needed to ensure product success!

Example from the Y2K problem: It may be acceptable that a product fails to work on February 29, 2000. It may also be acceptable that it sorts records wrong if they are blended with 19xx and 20xx dates. But it may be of immense importance that the product can record and process orders after 1 Jan 2000.

3. Priority in testing most important and worst parts of the product.

Risk is the product of damage and probability for damage to occur. The way to assess risk is outlined in figure 1 below. Risk analysis assesses damage during use, usage frequency, and determines probability of failure by looking at defect introduction.

Figure 1: Risk definition and structure

Testing is always a sample. You can never test everything, and you can always find more to test. Thus you will always need to make decisions about what to test and what not to test, what to do more or less. The general goal is to find the worst defects first, the ones that NEED TO BE FIXED BEFORE RELEASE, and to find as many such defects as possible.

This means the defects must be important. The problem with most systematic test methods, like white box testing, or black box methods like equivalence partitioning, boundary value analysis or cause-effect graphing, is that they generate too many test cases, some of which are less important (17). A way to lessen the test load is finding the most important functional areas and product properties. Finding as many defects as possible can be improved by testing more in bad areas of the product. This means you need to know where to expect more defects.

When dealing with all the factors we look at, the result will always be a list of functions and properties with an associated importance. In order to make the final analysis as easy as possible, we express all the factors in a scale from 1 to 5. Five points are given for "most important" or "worst", or generally for something having higher risk, which we want to test more, 1 points is given to less important areas. (Other publications often use weights 1 through 3).

The details of the computation are given later.

3.1. Determining damage: What is important?

You need to know the possible damage resulting from an area to be tested. This means analyzing the most important areas of the product. In this section, a way to prioritize this is described. The ideas presented here are not the only valid ones. In every product, there may be other factors playing a role, but the factors given here have been valuable in several projects.

Important areas can either be functions or functional groups, or properties such as performance, capacity, security etc. The result of this analysis is a list of functions and properties or combination of both that need attention. I am concentrating here on sorting functions into more or less important areas. The approach, however, is flexible and can accommodate other items.

Major factors include:

Critical areas (cost and consequences of failure)

You have to analyze the use of the software within its overall environment. Analyze the ways the software may fail. Find the possible consequences of such failure modes, or at least the worst ones. Take into account redundancy, backup facilities and possible manual check of software output by users, operators or analysts. Software that is directly coupled to a process it controls is more critical than software whose output is manually reviewed before use. If software controls a process, this process itself should be analyzed. The inertia and stability of the process itself may make certain failures less interesting.
Example: The subscriber information system for a Telecom operator may uncouple subscriber lines - for instance if 31-12-99 is used as «indefinite» value for the subscription end date. This is a critical failure. On the other hand, in a report, the year number may be displayed as blanks if it is in 2000, which is a minor nuisance.
Output that is immediately needed during working hours is more critical than output that could be sent hours or days later. On the other hand, if large volumes of data to be sent by mail are wrong, just the cost of re-mailing may be horrible. The damage may be classified into the classes mentioned down below, or quantified into money value, whatever seems better. In systems with large variation of damage it is better to use damage as absolute money value, and not classify it into groups.
A possible hierarchy for grouping damage is the following:
A failure would be catastrophic (3)
The problem would cause the computer to stop, maybe even lead to crashes in the environment (stop the whole country or business or product). Such failures may deal with large financial losses or even damage to human life. An example would be the gross uncoupling of all subscribers to the telephone network on a special date.
Failures leading to loosing the license, i.e. authorities closing down the business, are part of this class. Serious legal consequences may also belong here.
The last kind of catastrophic failures is endangering the life of people.
A failure would be damaging (2)
The program may not stop, but data may be lost or corrupted, or functionality may be lost until the program or computer is restarted. An example is equipment that will not work just around midnight on 31 December.
A failure would be hindering (1)
The user is forced to workarounds, to more difficult actions to reach the same results.
A failure would be annoying (0)
The problem does not affect functionality, but rather make the product less appealing to the user or customer. However, the customer can live with the problem.

Visible areas

The visible areas are areas where many users will experience a failure, if something goes wrong. Users do not only include the operators sitting at a terminal, but also final users looking at reports, invoices, or the like, or dependent on the service delivered by the product which includes the software. A factor to take into account under this heading is also the forgivingness of the users, i.e. their tolerance against any problem. It relates to the importance of different qualities, see above.

Software intended for untrained or naive users, especially software intended for use by the general public, needs careful attention to the user interface. Robustness will also be a major concern. Software which directly interacts with hardware, industrial processes, networks etc. will be vulnerable to external effects like hardware failure, noisy data, timing problems etc. This kind of software needs thorough validation, verification and retesting in case of environment changes.
An example for a visible area is the functionality in a phone switch, which makes it possible to make a call. Less visible areas are all the value-added services like call transfer.
One factor in visibility is possible loss of faith by customers. I.e. longer-term damage which would means longer-term loss of business because customers may avoid products from the company.

Usage frequency

Damage is dependent on how often a function or feature is used.
Some functions may be used every day, other functions only a few times. Some functions may be used by many, some by few users. Give priority to the functions used often and heavily. The number of transactions per day may be an idea helping in finding priorities.
A possibility to leave out some areas is to cut out functionality that is going to be used seldom, i.e. will only be used once per quarter, half-year or year. Such functionality may be tested after release, before its first use. A possible strategy for Y2K testing was to test leap year functionality in January and February 2000, and then again during December 2000 and in 2004.
Sometimes this analysis is not quite obvious. In process control systems, for example, certain functionality may be invisible from the outside. In modern object oriented systems, there may be a lot of central libraries used everywhere. It may be helpful to analyze the design of the complete system.
A possible hierarchy is outlined here (from (3)):
Unavoidable (3)
An area of the product that most users will come in contact with during an average usage session (e.g. startups, printing, saving).
Frequent (2)
An area of the product that most users will come in contact with eventually, but maybe not during every usage session.
Occasional (1)
An area of the product that an average user may never visit, but that deals with functions a more serious or experienced user will need occasionally.
Rare (0)
An area of the product which most users never will visit, which is visited only if users do very uncommon steps of action. Critical failures, however, are still of interest.

An alternative method to use for picking important requirements is described in (1).

Importance can be classified by using a scale from one to five. However, in some cases this does not sufficiently map the variation of the scale in reality. Then, it is better to use real values, like the cost of damage and the actual usage frequency.

3.2. Failure probability: What is (presumably) worst

The worst areas are the ones having most defects. The task is to predict where most defects are located. This is done by analyzing probable defect generators. In this section, some of the most important defect generators and symptoms for defect prone areas are presented. There exist many more, and you have to always include local factors in addition to the ones mentioned here.

Complex areas

Complexity is maybe the most important defect generator. More than 200 different complexity measures exist, and research into the relation of complexity and defect frequency has been done for more than 20 years. However, no predictive measures have until now been generally validated. Still, most complexity measures may indicate problematic areas. Examples include long modules, many variables in use, complex logic, complex control structure, a large data flow, central placement of functions, a deep inheritance tree, and even subjective complexity as understood by the designers. This means you may do several complexity analyses, based on different aspects of complexity and find different areas of the product that might have problems.

Changed areas

Change is an important defect generator (13). One reason is that changes are subjectively understood as easy, and thus not analyzed thoroughly for their impact. Another reason is that changes are done under time pressure and analysis is not completely done. The result is side-effects. Advocates for modern system design methods, like the Cleanroom process, state that debugging during unit test is more detrimental than good to quality, because the changes introduce more defects than they repair.
In general, there should exist a protocol of changes done. This is part of the configuration management system (if something like that exists). You may sort the changes by functional area or otherwise and find the areas which have had exceptionally many changes. These may either have a bad design from before, or have a bad design after the original design has been destroyed by the many changes.
Many changes are also a symptom of badly done analysis (5). Thus, heavily changed areas may not correspond to user expectations.

Impact of new technology, solutions, methods

Programmers using new tools, methods and technology experience a learning curve. In the beginning, they may generate many more faults than later. Tools include CASE tools, which may be new in the company, or new in the market and more or less unstable. Another issue is the programming language, which may be new to the programmers, or Graphical User Interface libraries. Any new tool or technique may give trouble. A good example is the first project with a new type of user interface. The general functionality may work well, but the user interface subsystem may be full of trouble.
Another factor to consider is the maturity of methods and models. Maturity means the strength of the theoretical basis or the empirical evidence. If software uses established methods, like finite state machines, grammars, relational data models, and the problem to be solved may be expressed suitably by such models, the software can be expected to be quite reliable. On the other hand, if methods or models of a new and unproven kind, or near the state of the art are used, the software may be more unreliable.
Most software cost models include factors accommodating the experience of programmers with the methods, tools and technology. This is as important in test planning, as it is in cost estimation.

Impact of the number of people involved

The idea here is the thousand monkeys' syndrome. The more people are involved in a task, the larger is the overhead for communication and the chance that things go wrong. A small group of highly skilled staff is much more productive than a large group of average qualification. In the COCOMO (10) software cost model, this is the largest factor after software size. Much of its impact can be explained from effort going into detecting and fixing defects.

Areas where relatively many and less qualified people have been employed, may be pointed out for better testing.
Care should be taken in that analysis: Some companies (11) employ their best people in more complex areas, and less qualified people in easy areas. Then, defect density may not reflect the number of people or their qualification.
A typical case is the program developed by lots of hired-in consultants without thorough follow-up. They may work in very different ways. During testing, it may be found that everyone has used a different date format, or a different time window.

Impact of turnover

If people quit the job, new people have to learn the design constraints before they are able to continue that job. As not everything may be documented, some constraints may be hidden for the new person, and defects result. Overlap between people may also be less than desirable. In general, areas with turnover will experience more defects than areas where the same group of people has done the whole job.

Impact of time pressure

Time pressure leads to people making short-cuts. People concentrate on getting the job done, and they often try to skip quality control activities, thinking optimistically that everything will go fine. Only in mature organizations, this optimism seems to be controlled.
Time pressure may also lead to overtime work. It is well known, however, that people loose concentration after prolonged periods of work. This may lead to more. Together with short-cuts in applying reviews and inspections, this may lead to extreme levels of defects density.
Data about time pressure during development can best be found by studying time lists, project meeting minutes, or by interviewing management or programmers.

Areas which needed optimizing

The COCOMO cost model mentions shortage of machine and network capacity and memory as one of its cost drivers. The problem is that optimization needs extra design effort, or that it may be done by using less robust design methods. Extra design effort may take resources away from defect removal activities, and less robust design methods may generate more defects.

Areas with many defects before

Defect repair leads to changes which lead to new defects, and defect prone areas tend to persist. Experience exists that defect prone areas in a delivered system can be traced back to defect prone areas in reviews and unit and subsystem testing. Evidence in studies (5) and (7) shows that modules that had faults in the past are likely to have faults in the future. If defect statistics from design and code reviews, and unit and subsystem testing exist, then priorities can be chosen for later test phases.

Geographical distribution

If people working together on a project are not co-located, communication will be worse. This is true even on a local level. Here are some ideas which haven proven to be valuable in assessing if geography may have a detrimental effect on a project:

People having their offices in different floors of the same building will not communicate as much as people on the same floor.

People sitting more than 25 meters apart may not communicate enough.

A common area in the workspace, such as a common printer or coffee machine improves communication. People sitting in different buildings do not communicate as much as people in the same building. People sitting in different labs communicate less than people in the same lab. People from different countries may have difficulties, both culturally and with the language. If people reside in different time zones, communication will be more difficult. This is a problem in outsourcing software development.

In principle, geographical distribution is not dangerous. The danger arises if people with a large distance have to communicate, for example, if they work with a common part of the system. You have to look for areas where the software structure implies the need for good communication between people, but where these people have geography against them.

History of prior use

If many users have used software before, an active user group can be helpful in testing new versions. Beta testing may be possible. For a completely new system, a user group may need to be defined, and prototyping may be applied. Typically, completely new functional areas are most defect-prone because even the requirements are unknown.

Local factors

Examples include looking at who did the job, looking at who does not communicate well with someone else, who is new in the project, which department has recently been reorganized, which managers are in conflict with each other, the involvement of prestige and many more factors. Only fantasy sets boundaries. The message is: You have to look out for possible local factors outside the factors having been discussed here.

One general factor to be considered in general

This paper is about high level testing. Developers test before this. It is reasonable to have a look at how developers have tested the software before, and what kind of problems they typically overlook. Analyze the unit test quality. This may lead to a further tailoring of the test case selection methods (17).

Looking at these factors will determine the fault density of the areas to be tested. However, using only this will normally over-value some areas. Typically, larger components will be tested too much. Thus, a correction factor should be applied: Functional size of the area to be tested. I.e. the total weight of this area will be "fault proneness / functional volume". This factor can be found from function point analysis early, or from counting code lines if that is available.

Figure 2: Failure Probability

What to do if you do not know anything about the project, if all the defect generators can not be applied?

You have to run a test. A first rough test should find defect prone areas, the next test will then concentrate on them. The first test should cover the whole system, but be very shallow. It should only cover typical business scenarios and a few important failure situations, but cover all of the system. You can then find where there was most trouble, and give priority to these areas in the next round of testing. The next round will then do deep and through testing of prioritized areas.

This two-phase approach can always be applied, in addition to the planning and prioritizing done before testing. Chapter 4 explains more of this.

3.3. How to calculate priority of test areas

The general method is to assign weights, and to calculate a weighted sum for every area of the system. Test where the result is highest!

For every factor chosen, assign a relative weight. You can do this in very elaborate ways, but this will take a lot of time. Most often, three weights are good enough. Values may be 1, 3, and 10. (1 for "factor is not very important", 3 for "factor has normal influence", 10 for "factor that has very strong influence").

For every factor chosen, you assign a number of points to every product requirement (every function, functional area, or quality characteristic. The more important the requirement is, or the more alarming a defect generator seems to be for the area, the more points. A scale from 1 to 3 or 5 is normally good enough. Assigning the points is done intuitively.

The number of points for a factor is then multiplied by its weight. This gives a weighted number of points between 1 and 50. These weighted numbers are then summed up for damage (impact) and for probability of errors, and finally multiplied. As many intuitive mappings from reality for points seem to involve a logarithmic scale, where points follow about a multiplier of 10, the associated risk calculation should ADD the calculated weighted sums for probability and damage. If most factors’ points inherently follow a linear scale, the risk calculation should MULTIPLY the probability and damage points. The user of this method should check how they use the method! Testing can then be planned by assigning most tests to the areas with the highest number of points.

An example (functional volume being equal for the different areas):

Area to test	Business criticality	Visibility	Complexity	Change frequency	RISK
Weight	3	10	3	3
Order registration	2	4	5	1	46*18
Invoicing	4	5	4	2	62*18
Order statistics	2	1	3	3	16*18
Management reporting	2	1	2	4	16*18
Performance of order registration	5	4	1	1	55*6
Performance of statistics	1	1	1	1	13*6
Performance of invoicing	4	1	1	1	22*6

The table suggests that function «invoicing» is most important to test, «order registration» and performance of order registration. The factor which has been chosen as the most important is visibility.

Computation is easy, as it can be programmed using a spreadsheet. A more detailed case study is published in (4). A spreadsheet is on http://home.c2i.net/schaefer/testing/riskcalc.hqx (Binhex file, save to disk, decompress, open with Excel)

A word of caution: The assignment of points is intuitive and may be wrong. Thus, the number of points can only be a rough guideline. It should be good enough to distinguish the high-risk areas from the medium and low risk areas. That is its main task. This also means you don't need to be more precise than needed for just this purpose. If more precise test prioritization is necessary, a more quantified approach should be used wherever possible. Especially the possible damage should be used as is, with its absolute values and not a translation to points. An approach is described in (18).

4. Making testing more effective

More effective test means to find more and more important defects in the same amount of time.

The strategy to achieve this is to learn from experience and adapt testing.

First, the whole test should be broken into four phases:

test preparation
pre-test
main test
after-test.

Test preparation sets up areas to test, the test cases, test programs, databases and the whole test environment. Especially setting up the test environment can give a lot of trouble and delay. It is generally easy to install the program itself and the correct operating system and database system. Problems often occur with the middleware, i.e. the connection between software running on a client, and software running on different servers. Care should be taken to thoroughly specify all aspects of the test environment, and dry runs should be held, in order to ensure that the test can be run when it is time to do it. In a Y2K project, care was taken to ensure that licenses were in place for machine dates after 1999, and the licenses allowed resetting of the machine date. Another area to focus is that included software was Y2K compliant.

The pre-test is run after the software under test is installed in the test lab. This test contains just a few test cases running typical day to day usage scenarios. The goal is to test if the software is ready for testing at all, or totally unreliable or incompletely installed. Another goal may be to find some initial quality data, i.e. find some defect prone areas to focus the further test on.

The main test consists of all the pre-planned test cases. They are run, failures are recorded, defects found and repaired, and new installations of the software made in the test lab. Every new installation may include a new pre-test. The main test takes most of the time during a test execution project.

The after-test starts with every new release of the software. This is the phase where optimization should occur. Part of the after-test is regression testing, in order to find possible side-effects of defect repair. But the main part is a shift of focus.

The type of defects may be analyzed. A possible classification is described in (14). In principle, every defect is a symptom of a weakness of some designer, and it should be used to actively search for more defects of the same kind.

Example: In a Y2K project, it was found that sometimes programs would display blank instead of zeroes in the year field in year 2000. A scan for the corresponding wrong code through many other programs produced many more instances of the same problem.

Another approach is to concentrate more tests on the more common kinds of defects, as these might be more common in the code. The problem is, however, that such defects might already have been found because the test was designed to find more of this kind of defects. Careful analysis is needed. Generally, apply the abstractions of every defect found as a checklist to more testing or analysis.

The location of defects may also be used to focus testing. If an area of code has especially many failures, that area should be a candidate for even more testing (7, 13). But during the analysis, care should be taken to ensure that a high level of defects in an area is not caused by an especially high-test coverage in that area.

5. Making testing cheaper

A viable strategy for cutting budgets and time usage is to do the work in a more productive and efficient way. This normally involves applying technology. In software, not only technology, but also personnel qualifications seem to be ways to improve efficiency and cut costs. This also applies in testing.

Automation

There exist many test automation tools. Tools catalogues list more tools for every new edition, and the existing tools are more and more powerful while not costing more (12). Automation can probably do most in the area of test running and regression testing. Experience has shown that more test cases can be run for much less money, often less than a third of the resources spent for manual testing. In addition, automated tests often find more defects. This is fine for software quality, but may hit the testers, as the defect repair will delay the project... Still, such tools are not very popular, because they require an investment into training, learning and building an infrastructure at start. Sometimes a lot of money is spent in fighting with the tool. For the productivity improvement, nothing general can be said, as the application of such tools is too dependent on platforms, people and organization. Anecdotal evidence prevails, and for some projects automation has had a great effect.

An area where test is nearly impossible without automation is stress, volume and performance testing. Here, the question is either to do it automatically or not to do it at all.

Test management can also be improved considerably using tools for tracking test cases, functions, defects and their repairs. Such tools are now more and more often coupled to test running automation tools.

In general, automation is interesting for cutting testing budgets. You should, however, make sure you are organized, and you should keep the cost for startup and tool evaluation outside your project. Tools help only if you have a group of people who already know how to use them effectively and efficiently. To bring in tools in the last moment has a low potential to pay off, and can do more harm than good.

The people factor - Few and good people against many who don’t know

The largest obstacle to an adequate testing staff is ignorance on the part of management. Some of them believe that "development requires brilliance, but anybody can be a tester."

Testing requires skill and knowledge. Without application knowledge your testers do not know what to look after. You get shallow test cases which do not find defects. Without knowledge about common errors the testers do not know how to make good test cases. Good test cases, i.e. test cases that have a high probability of finding errors, if there are errors, are also called «destructive test cases». Again, they do not find defects. Without experience in applying test methods people will use a lot of unnecessary time to work out all the details in a test plan.

If testing has to be cheap, the best is to get a few highly experienced specialists to collect the test candidates, and have highly skilled testers to improvise the test instead of working it out on paper. Skilled people will be able to work from a checklist, and pick equivalence classes, boundary values, and destructive combinations by improvisation. Non-skilled people will produce a lot of paper before having an even less destructive test. A method for this is called "exploratory testing".

The test people must be at least equally smart, equally good designers and have equal understanding of the functionality of the system. One could let the Function Design Team Leader become the System Test Team Leader as soon as functional design is complete. Pre-sales, Documentation, Training, Product Marketing and/or Customer Support personnel should also be included in the test team. This provides early knowledge transfer (a win-win for both development and the other organization) and more resources than there exist full-time. Test execution requires lots of bodies that don't need to be there all of the time, but need to have a critical and informed eye on the software. You probably also need full-time testers, but not as many as you would use in the peak testing period. Full-time test team members are good for test design and execution, but also for building or implementing testing tools and infrastructure during less busy times.

If an improvised test has to be repeated, there is a problem. But modern test automation tools can be run in a capture mode, and the captured test may later be edited for documentation and rerunning purposes.

The message is: get highly qualified people for your test team!

6. Cutting testing work

Another way of cutting costs is to get rid of part of the task. Get someone else to pay for it or cut it out completely!

Who pays for unit testing?

Often, unit testing is done by the programmers and never turns up in any official testing budget. The problem is that unit testing is often not really done. Test coverage tool vendors often report that without their tools, 40 - 50% of the code are never unit tested. Many defects then survive until the later test phases. This means later test phases have to test better, and they are overloaded and delayed by finding all the defects which could have been found earlier.

As a test manager, you should require higher standards for unit testing! This is inline with modern "agile" approaches to software development. Unit tests should be automated as well and rerun every time units are changed or integrated.

What about test entry criteria?

The idea is the same as in contracts with external customers: If the supplier does not meet the contract, the supplier gets no acceptance and no money. Problems occur when there is only one supplier and when there is no tradition in requiring quality. Both conditions are true in software. But entry criteria can be applied if the test group is strong enough. Criteria include many, from the most trivial to advanced. Here is a small collection of what makes the life in testing easier:

The system delivered to integration or system test is complete
It has been run through static analysis and defects are fixed
A code review has been done and defects have been corrected
Unit testing has been done to the accepted standards (near 100% statement coverage, for example)
Any required documentation is delivered and is of a certain quality
The units compile and can be installed without trouble
The units should have passed some functional test cases (smoke test).
Really bad units are sorted out and have been subjected to special treatment like extra reviews, reprogramming etc.

You will not be allowed to require all these criteria. You will maybe not be allowed to enforce them. But you may turn projects into a better state over time by applying entry criteria. If every unit is reviewed, statically analyzed and unit tested, you will have a lot less problems to fight with later.

Less documentation

If a test is designed "by the book", it will take a lot of work to document. Not all this is needed. Tests may be coded in a high level language and may be self-documenting. A test log made by a test automation tool may do the service. Qualified people may be able to make a good test from checklists, and even repeat it. Check out exactly which documentation you will need, and prepare no more. Most important is a test plan with a description of what is critical to test, and a test summary report describing what has been done and the risk of installation.

Cutting installation cost - strategies for defect repair

Every defect delays testing and requires an extra cost. You have to rerun the actual test case, try to reproduce the defect, document as much as you can, probably help the designers debugging, and at the end install a new version and retest it. This extra cost is impossible to control for a test manager, as it is completely dependent on system quality. The cost is normally not budgeted for either. Still, this cost will occur. Here is some advice about how to keep it low.

When to correct a defect, when not?

Every installation of a defect fix means disruption: Installing a new version, initializing it, retesting the fix, and retesting the whole. The tasks can be minimized by installing many fixes at once. This means you have to wait for defect fixes. On the other hand, if defect fixes themselves are wrong, this strategy leads to more work in debugging the new version. The fault is not that easy to find. There will be an optimum, dependent on system size, the probability to introduce new defects, and the cost of installation. For a good description of practical test exit criteria, see (2). Here are some rules for optimizing the defect repair work:

Rule 1: Repair only important defects!
Rule 2: Change requests and small defects should be assigned to the next release!
Rule 3: Correct defects in groups! Normally only after blocking failures are found.
Rule 4: Use an automated "smoke test" to test any corrections immediately.

7. Strategies for prevention

The starting scenario for this paper is the situation where everything is late and where no professional budgeting has been done. In most organization, there exist no experience data and there exists no serious attempt to really estimate costs for development, testing, and error cost in maintenance. Without experience data there is no way to argue about the costs of reducing a test.

The imperatives are:

You need a cost accounting scheme
You need to apply cost estimation based on experience and models
You need to know how test quality and maintenance trouble interact

Measure:

Size of project in lines of code, function points etc.
Percentage of work used in management, development, reviews, test preparation, test execution, and rework
Amount of rework during first three or six months after release
Fault distribution, especially causes of user detected problems.
Argue for testing resources by weighting possible reductions in rework before and after delivery against added testing cost.

Papers showing how such cost and benefit analysis can be done, using retrospective analysis, have been published in several ESSI projects run by Otto Vinter from Bruel&Kjær (6). A different way to prevent trouble is incremental delivery. The general idea is to break up the system into many small releases. The first delivery to the customer is the least commercially acceptable system, namely, a system which does exactly what the old one did, only with new technology. From the test of this first version you can learn about costs, error contents, bad areas etc. and then you have an opportunity to plan better.

8. Summary

Testing in a situation where management cuts both budget and time is a bad game. You have to endure and survive this game and turn it into a success. The general methodology for this situation is not to test everything a little, but to concentrate on high risk areas and the worst areas.

Priority 1: Return the product as fast as possible to the developers with a list of as serious deficiencies as possible.

Priority 2: Make sure that, whenever you stop testing, you have done the best testing in the time available!

References

(1) Joachim Karlsson & Kevin Ryan, "A Cost-Value Approach for Prioritizing Requirements", IEEE Software, Sept. 1997

(2) James Bach, "Good Enough Quality: Beyond the Buzzword", IEEE Computer, Aug. 1997, pp. 96-98

(3) Risk-Based Testing, STLabs Report, vol. 3 no. 5 (info@stlabs.com)

(4) Ståle Amland, "Risk Based Testing of a Large Financial Application", Proceedings of the 14th International Conference and Exposition on TESTING Computer Software, June 16-19, 1997, Washington, D.C., USA.

(5) Tagji M. Khoshgoftaar, Edward B. Allan, Robert Halstead, Gary P. Trio, Ronald M. Flass, "Using Process History to Predict Software Quality," IEEE Computer, April 1998

(6) Several ESSI projects, about improving testing, and improving requirements quality, have been run by Otto Vinter. Contact the author at otv@delta.dk.

(7) Ytzhak Levendel, "Improving Quality with a Manufacturing Process", IEEE Software, March 1991.

(8) "When the pursuit of quality destroys value", by John Favaro, Testing Techniques Newsletter, May-June 1996.

(9) "Quality: How to Make It Pay," Business Week, August 8, 1994

(10) Barry W. Boehm, Software Engineering Economics, Prentice Hall, 1981

(11) Magne Jørgensen, 1994, "Empirical studies of software maintenance", Thesis for the Dr. Scient. degree, Research Report 188, University of Oslo.

(12) Lots of test tool catalogues exist. The easiest accessible key is the Test Tool FAQ list, published regularly on Usenet newsgroup comp.software.testing. More links on the author’s Web site.

(13) T. M. Khoshgoftaar, E.B. Allan, R. Halstead, Gary P. Trio, R. M. Flass, «Using Process History to Predict Software Quality», IEEE Computer, April 1998

(14) IEEE Standard 1044, A Standard Classification of Software Anomalies, IEEE Computer Society.

(15) James Bach, «A framework for good enough testing», IEEE Computer Magazine, October 1998

(16) James Bach, "Risk Based Testing", STQE Magazine,6/1999, www.stqemagazine.com

(17) Nathan Petschenik, "Practical Priorities in System Testing", in "Software- State of the Art" by DeMarco and Lister (ed), Sept. 1985, pp.18 ff

(18) Heinrich Schettler, "Precision Testing: Risikomodell Funktionstest" (in German), to be published.

Wednesday, August 15, 2012

What Questions we need to ask before Start any Testing

Before you start any testing ask yourself the following questions:

How will the application be used?
What is the purpose of the application? Information, entertainment, business, and so on.
Who is the target audience? Who is the current audience? Who is the potential audience?
What is the user scenario for using it? There may be several representing various classes of users.
Will this be used from an Internet kiosk or other machine that the user has no control over? Should the application be verified in kiosk mode?
Are alternative devices going to use this? (Web TV, PDAs, iMode phones, and so on.)
What is the estimated user base for the first 6 months? First year?
Are users paying to access your site or use your application? If so, their expectations may be very different than if it were free.
Are there partners you are dependent on? Are partners dependent on you? Is there a plan if either your service is down or a partner you rely on is down?
Is there a new user or first-time user experience that differs from subsequent visits?
What dependencies does your site/application have? It may require that WinAmp, AOL Instant Messenger, or some other piece of software be installed, or it may make use of system- or browser-provided DLLs. If it does not rely on any system DLLs, then there may be no need to test on various versions of each platform family.
Does your application install any components or controls? What if the user cannot/does not install it? What if you release a new version-is there an upgrade path for the users?
If your application installs components or controls, where are they installed? Is there an assumed or hard-coded location? Will that break on a nonstandard configuration (for example, a Windows machine where the active drive is the F:\ drive instead of the more common C:\ drive)?
If there is a component or control installed, is there an uninstall path for it?
Does the component or control that is installed set any registry keys? When it uninstalls, all of these need to be cleaned up appropriately.
What platforms is this supported on? What are the various Windows, Macintosh, and Unix versions, as well as various devices?
Which browsers are supported? Microsoft Internet Explorer, Microsoft MSN Explorer, Netscape Navigator, Microsoft Pocket IE and Generic IE, Opera, Omni Group Omni Web, Amaya?
Are there varying levels of support?
What does the test matrix look like?
What does the support matrix look like?
What happens if you access the application from an unsupported platform? Consider a notification mechanism.
What other requirements does the application have-security settings, script settings, cookie settings, resolution, line speed, and so on?
Is there an internal coding standard?
Is this feature necessary?
Is the role of the feature in the application clearly understood?
How will a user access this piece of functionality? List all ways to get into this feature.
How will the user put data into this piece of functionality?
What are those areas of user-defined input? What are the sizes of those areas (in bytes or characters)? What type of input do they accept?
Where is the data output displayed?
What boundaries exist for this data?
What error states can come of this data or action?
What states is the data available in? Map the creation, edit, save, display, and delete states.
How is the user expected to navigate? Are Back and Forward controls provided in the application interface or is the user expected to use the browser buttons? Make this clear, but know what the behavior is when using the browser buttons.
If the application makes use of frames, what should the behavior be when the user clicks the browser Back or Forward buttons? What about Refresh or Stop?
Do errors occur when the page loads?
Do all graphics appear properly?
Are all similar icons of the same size (visually), giving a clean appearance?
Is the download time perceptibly slow?
Do all graphics have width and height tags to correctly place them into a formatted page while they are downloading?
Are all links active and working? Are they accurate and/or descriptive?
Are links relative or absolute?
Should all links or active (hot) areas have the finger pointer (hand) displayed when the cursor is over them?
Do the mailto links also display the mailing address?
Can the user print various pages?

Saturday, August 4, 2012

Xenu vs. Screaming Frog

Over the past couple of years, we've seen a number of new web-based SEO crawlers (including our own Pro App here on SEOmoz). They're great tools and can provide a lot of insight, but sometimes you've just got to get down and dirty into a big site, and you need a desktop crawler application. For the past few years, that's pretty much meant Xenu's Link Sleuth. Recently, a new entrant has thrown its hat into the ring – Screaming Frog SEO Spider.

I was amazed, on social media, how many people still hadn't heard of Screaming Frog. Actually, a few people still don't know about Xenu. So, I thought it would be a good time to put both programs through their paces.

Xenu's Link Sleuth

In this corner (that doesn't really work when you can't see me pointing, does it?), we have Xenu, a PC-based desktop app that bills itself as a tool for checking broken links. Over the years, enterprising SEOs have found it's capable of a lot more. For such a popular tool, it's funny how little we know about the creator, a German [engineer?] named Tilman Hausherr. All I can tell you is that he wrote Xenu because he was bored, he doesn't want your money, and he really dislikes Scientology.

Pro: Xenu is Free

Let's start with a few pros and cons. You really can't beat that Xenu is completely free. We all owe Tilman a hearty "thank you" for putting it out there over a decade ago.

Pro: Xenu is Fast

While a large-scale site can take a while to fully crawl, Xenu is a machine, and it's an impressively multi-threaded app, considering when it was built. It can power through 1000s of pages pretty quickly.

Con: It's Not Intuitive

Xenu really takes some getting used to, and it's clearly built by an old-school software designer. The basics are easy enough, but getting into the real SEO applications of Xenu takes a bit of a learning curve. To be fair, it wasn't built for what most of us try to do with it.

Con: No Mac Version

This is the biggie for many people – Xenu only runs on Windows PCs. There's no Mac or Linux version, and the author doesn't plan to release one.

What Can Xenu Do?

At its core, Xenu really is just a link checker, churning through your site to test internal and (optionally) external links:

Since Xenu operates like a crawler, though, it can really help test crawl paths and find holes in your internal linking. Xenu also reports useful stats, like the crawl level, outbound links from a page, and inbound links to a page. It also returns TITLEs and META descriptions, which can be handy at-a-glance.

One really useful feature in Xenu is that, once you find a broken link (page or image), you can just right-click on it, select "URL Properties", and you'll get a pop-up like the one below showing you all the pages linking to that broken file:

Xenu is a work-horse, and it's still a great tool for churning through links fast. It's only real fault is that we've tried to push it beyond what it was designed for, and it wasn't really designed for SEO.

Screaming Frog SEO Spider

That's where Screaming Frog comes in, a desktop crawler specifically designed for SEOs by a UK search marketing firm. Because it's built specifically for SEOs, it has a lot of bells and whistles that Xenu doesn't. Let's start with the quick list of pros and cons.

Pro: It's Built for SEOs

We'll dive into this one below, but Screaming frog has been built by SEOs for SEOs, and that fact is obvious as soon as you launch the application.

Pro: It's User-friendly

I found Screaming Frog very easy to dive into. The advanced uses will take some time to discover, I'm sure, but any experienced SEO should be able to get up and running on it quickly.

Pro: It Runs on Your Mac

Screaming Frog supports both PCs and Macs. If you have a Mac shop, that's a big one.

Con: It's Not Free

Ok, here's the down side. It's £99 per year (that's about $162, currently) to run the full version of Screaming Frog. The free version will only crawl 500 URIs. The free version is nearly fully functional, though, so you can get a good idea if it's worth the price for you.

What Can Screaming Frog Do?

On the surface, Screaming Frog doesn't look too much different than Xenu:

Like Xenu, Screaming Frog tracks the crawl level, internal links, outbound links, TITLE tags, and META descriptions. It also adds:

HTTP status codes
TITLE tag length
META description length
H1, H2 tags and lengths
Canonical tags

Many of these elements are also broken into their own tabs at the top, so that you can easily focus in on problem areas. You can right-click on any URL for many more options, including checking indexation in Google, Yahoo, or Bing and checking backlinks in Open Site Explorer, Yahoo Site Explorer, or Majestic SEO:

The paid version includes one incredibly useful feature – the ability to custom filter and search HTML source code across a site. Digging into all of the features is beyond the scope of this post, but Branko Rihtman has a nice Screaming Frog review over at SEO Scientist. If you want to dig deeper into Xenu, Tom Critchlow had a good Xenu recap here on SEOmoz last year.

And The Winner Is...

Sorry, I just can't do it. I've loved Xenu (the app, not the galactic overlord) for years, and even if it weren't free, I'd have gotten my value out of it many times over. I'm excited about Screaming Frog and what it can do, though, and I'm especially excited to see new products and innovation finally coming to desktop crawlers. This is a win-win for technical SEOs everywhere, and I'm looking forward to the evolution of both desktop and web-based crawlers.

Sunday, March 25, 2012

Web QA Test Tool Links

Link and HTML Test Tools

Product	Vendor	Comments
AccVerify/AccRepair	HiSoftware	Verify, Correct, Monitor and Manage your WebSite and Web Based Applications for W3C and Section 508 compliance.
AppLoader	NRG	AppLoader is a load and functional testing software that simulates hundreds or thousands of end users interacting with your application. . It is protocol independent and works with any application accessible from a windows desktop. Including: Java, .NET, Ajax, Flex, and Flash. You can simulate the load using the browsers the end users are expected to use.
AppsWatch	NRG	Appswatch is an automated testing tool for performance, Web testing, SLA monitoring and alerting. It runs tests from the user’s Desktop. It is protocol independent and works with all Web applications.
ChangeAgent		Web site repair and maintenance application.
CSE HTML Validator	AI Internet Solutions	HTML, XHTML, CSS, link, spelling, and accessibility checker available. Windows application
Cyber Spyder Link Test	Aman Software	Web site management program to be used for verifying that the URLs on a site are not broken and for analyzing site content. Shareware.
Dead Links	Dead Links	Free online spider based link checker. You only have to input the index page and the spider will crawl your website looking for broken links. Also gives advices for search engines.
HTML Candy	Anetto Software	New-age software for final preparation of HTML pages. It is able to fix up a wide range of problems with HTML syntax, including tags, styles, attributes, attributes' values, deprecated and obsolete elements and attributes.
HTML PowerTools	Talicom	Suite of Windows tools for HTML checking, spelling, etc.
HTML Tidy	W3C	A very nice tool that fixes common errors and pretty prints HTML. Free
InFocus	SSB Technologies	Tool for making Web sites compliant with Section 508 requirements for access by the individuals with disabilities.
Link Checker Pro		Link Checker Pro is a link checking tool for websites and has been tested on sites containing more than 100,000 links. It can export results in a number of formats and provide a graphical view of the website structure.
LinkTiger.com	LinkTiger.com	Internet service that validates websites for broken links on a daily basis. When errors occur, you will receive an e-mail alert. LinkTiger.com features rich reports, e-mail notification, highlighting of broken links and no-software installation.
LinkRunner	Viable Software Alternatives	Dead link detector for Windows
LinkScan	Elsop	LinkScan is an industrial-strength link checking and website management tool. The LinkScan family of products delivers flexible test automation capabilities for the entire spectrum of web based applications. Customers include Fortune 1000 companies such as Hewlett Packard, government agencies like NASA, as well as many smaller businesses.
LinkSleuth	Xenu	Dead link detector. Free
Link Validator	REL Software	Link checker and site management tool for webmasters to check links for accuracy and availability, find broken links and links containing syntactic errors. A free "lite" version is also offered.
MOMspider	UCI	Robot link checker. Free
Ramp Ascend		Testing and remediating websites and web based applications for integrated accessibility for Section 508 and WCAG compliance.
Real Validator		HTML syntax checker for Windows
Truwex website QA tool	Erigami	Free online tool for evaluating website accessibility, privacy, performance, quality, broken links. Shows issues on a web page map and in the HTML code.
WebLight	Illumit	Finds broken links and markup problems, and identifies relationships between URL's.
WebQA	Watchfire	Report on over 40 errors including, Search engine optimization, Site inventory, Web accessibility (now includes Bobby reports, Section 508, W3C's WCAG), Corporate standards. Can also manage meta data and create automated test scripts to test site functionality.

Return to Top of Page

On-line Link and HTML Test Services

Product	Vendor	Comments
Audit Blossom	Blossom Software	Low cost web site audit service.
Bobby	CAST	Help make pages accessible to people with disabilities (WAI and Section 508 compliance). It will also find HTML compatibility problems that prevent pages from displaying correctly on different browsers. Free
CSSCheck	Web Design Group	On-line cascading Style sheet checker. Free
CSS Validation Service	W3C	Free service that checks Cascading Style Sheets (CSS) in (X)HTML documents or standalone for conformance to W3C recommendations
Dr. Watson		HTML checking service. Free.
HTML Validator	WDG	Online service that validates HTML documents. Free
HTML Validation Service	W3C	Online service that checks HTML documents for conformance to W3C HTML and XHTML Recommendations and other HTML standards. Free
Link Alarm	LinkAlarm	Online service for detecting broken links
NetMechanic	Monte Sano Software	Service which searches your site to find broken links, spot bad HTML tags, and rates your server's response time. Commercial but offers free trials
Site Check	UITest.com	Enter a URL and it runs HTML and stylesheet validators, accessibility assessment, link check, load time check, and more using free online web test tools.
SiteTechnician	SiteTechnician	Hosted management solutions. Identifies broken links, analyzes accessibility, reports on search engine optimization, monitors page load times, and helps you manage changes to your site over time.
Validation Spider	Dead Links	Free dead link checker
W3C Link Checker	W3C	Free dead link checker
Weblint Gateway	San Francisco State University	Performs an HTML error scan and creates an HTML listing with error messages in context and tags, URLs, and error messages highlighted in color
Web Page Backward Compatibility Viewer	Delorie Software	On-line utility that serves a page based on how various browsers would render it.
Web Page Purifier	Delorie Software	On-line utility that maps a page to HTML 2.0, HTML 3.2, HTML 4.0, or WebTV 1.1. standards. Free.
XML Validation	Scholarly Technology Group	Validates XML documents. Free.

Return to Top of Page

Functional Test Tools

Product	Vendor	Comments
actiWATE	Actimind	Freeware web application testing environment. actiWATE Framework is written in Java and Java is used for writing tests.
AppsWatch	NRG	Appswatch is an automated testing tool for functional, regression, and GUI testing. It runs tests from the user’s Desktop. It is protocol independent and works with any application accessible from a windows desktop. Including: Citrix, Java, .NET, Oracle, SAP, Eclipsys, Mckessen, Meditech, GE Healthcare, Epic, Cerner, Client-Server, Flex, Flash and even custom applications.
Astra QuickTest	Mercury Interactive	Web site functional testing
AutoTester One	AutoTester	Functional, regression, and systems integration testing of Windows, Client Server, Host/Legacy, or Web applications.
Badboy	Badboy Software	Provides an enhanced browser interface to aid in building and testing dynamic applications. Combines capture/replay ability with performance testing and diagnostic features to allow developers to closely monitor and understand the interactions between the browser and the server. Free for small users.
Canoo WebTest	Canoo	Canoo WebTest is a free open source tool for testing of web applications. It calls pages and verifies the results, giving comprehensive reports on success and failure.
eValid	Software Research	Test enabled Web Browser. Provides Browser-Based Client-Side WebSite quality checking for Functional Testing, LoadTest, Site Analysis (SiteMap), Performance check and page tuning.
HTTP::Recorder	Opus	Browser-independent recorder that records interactions with web sites and produces scripts for automated playback. Open source (Perl module)
IeUnit		Simple framework to test logical behaviors of web pages. It helps users to create, organize and execute functional unit tests. IeUnit is implemented in JavaScript for the Windows XP platform with Internet Explorer.
Imprimatur		Web application testing tool. The tests are described in a simple XML file. Along with the standard GET and POST actions, Imprimatur handles HTTP sessions and file uploads. The responses can be validated using regular expressions and response code checks. Open source.
InCisif.net	InCisif	Automation tool designed to implement client-side functional testing of web applications Ajax enabled or not, under Internet Explorer 6.x or 7.x, using the C#, VB.NET or IronPython language.
Internet Macros	Opus	Low cost automation testing
iRise Application Simulator	iRise	Allows the definition, testing and approval of Web-based business software before developers write the first line of code. Users validate requirements from a visually accurate, functionally rich and interactive simulation of the application that can be modified on the fly during feedback sessions.
ITP	Incanica	Web application testing harness. Lightweight, yet powerful! Test scripts written in XML. No programming required! No changes required to your application. Supports sessions/cookies, POST form data. Command line based for integration into other tools. Open source.
LISA	iTKO	No-code, enterprise-strength automated testing solution for J2EE applications, websites and web services. LISA uses "inline testing" technology to talk to every component within your infrastructure. New one-click wizards enable even non-developers to connect to, analyze and interact with live EJBs, databases, messaging layers and web services/SOAP objects.
MaxQ	Bit Mechanic	MaxQ is a free functional testing tool. It includes an HTTP proxy that records your test script, and a command line utility that can be used to playback tests. The proxy recorder automatically stores variables posted to forms, so you don't have to write that stuff by hand. The paradigm of MaxQ is similar to commercial tools like Astra QuickTest or Empirix e-Test. These products are quite expensive. MaxQ hopes to provide the essential features: HTTP test recording, scripting, and playback without the huge cost. Source code to MaxQ is provided as part of the distribution.
Netvantage Functional Tester	Netvantage Technologies	Functional and regression automated web testing tool.
PesterCat	PesterCat	Web testing tool that was designed to perform functional testing for web applications. PesterCat features an integrated proxy recorder that enables you to record scripts using your favorite web browser. PesterCat runs on Linux, Mac OSX, and Windows.
QA Wizard	Seapine	Automated functional testing tool for Web-based apps. Automated scripting allows the creation of more scripts in less time. Advanced object binding greatly reduces script changes when Web-based apps change. FREE QA Wizard Demo and Test Script!
Ranorex	Ranorex Software	Windows GUI test and automation framework for C++, Python and for the .Net languages
Rational Robot	Rational Software	Automated functional, regression, and smoke tests for e-applications
Sahi		Automation and testing tool for web applications, with the facility to record and playback scripts. Developed in java and javascript, this tool uses simple javascript to execute events on the browser. Features include, in-browser controls, text based scripts, ant support for playback of suites of tests, and multi threaded playback. It supports HTTP and HTTPS.
SAMIE		Simple Automated Module For Internet Explorer. Perl module (SAM.pm) that allows a user to automate Internet Explorer. This free tool is designed for quality assurance engineers that need to run tests for their browser applications. (Windows only)
Selenium	ThoughtWorks	Selenium is a test tool for web applications. Selenium tests run directly in a browser, just as real users do. And they run in Internet Explorer, Mozilla and Firefox on Windows, Linux, and Macintosh.
SilkTest	Segue Software, Inc.	Enterprise-class web testing
SoapTest	Parasoft	Automated tool for testing Web services. SOAPtest helps you to prevent errors by performing server functional testing, load testing, and client testing with just the click of a button. SOAPtest facilitates server functional testing by automatically creating a test suite from a WSDL document that tests every operation associated with that document. The same test suite used for server functional testing can also be used as a load test suite that not only monitors the servers response rate with the specified number and mixture of simultaneous requests, but also verifies whether the test loads cause functionality problems. You can also have SOAPtest emulate a server and verify that the client sends appropriate requests to the appropriate services, and that the client handles the services responses as expected.
soapui	eviware	Desktop application for inspecting, invoking and testing (functional and load) of web services over HTTP. It is mainly aimed at developers/testers providing and/or consuming web services (java, .net, etc). Functional and Load-Testing can be done both interactively in soapui or within a automated build/integration process using the soapui command-line tools. soapui currently requires java 1.5 and is licensed under the LGPL license.
Solex	NEOMAlogic	Solex is an open source Web application testing tool built as a plug-in for the Eclipse IDE. It provides functions to record a client session, adjust it according to various parameters and replay it later typically in order to ensure non regression of a Web application's behaviour.
swete	Neal Lester	Provides cross platform console based tools for regression testing of web applications. The tools may be used when refactoring and during development to ensure that new functionality doesn't break previously completed features. Open source.
TestSmith	Quality Forge	Functional/Regression Test Automation Tool for Windows. Web Site and Web Application testing is done at both the the HTML Document Object Model (DOM) level and the Windows object level, as appropriate.
TestWeb	Original Software	Automation solutions for IBM iSeries, Microsoft, and Oracle.
vTest	Verisium	Functional and regression testing with a host of powerful web functional testing features. It empowers you to thoroughly verify and validate your web applications in a variety of environments. You can enhance productivity by generating automated test scripts, replaying both automated and custom test scripts, generating test reports and spotting software bugs early in the development cycle. Enhances testing engineer efficiency through the ability to create automated test scripts. These test scripts can be executed automatically with minimal human intervention. They can also be scheduled based on a set plan. This normally results in a huge productivity gain and allows testing departments to accomplish significantly more work in the same amount of time.
WatiN		Open source C# framework for automating functional web testing. Inspired by Watir implementation, WatiN drive Internet Explorer at the user level.
Watir		Open-source functional testing tool for automating browser-based tests of web applications. Drives the Internet Explorer browser the same way people do. It clicks links, fills in forms, presses buttons. Watir also checks results, such as whether expected text appears on the page. Watir is a Ruby library that works with Internet Explorer on Windows. Like other powerful programming languages, Ruby gives you the power to connect to databases, read data files, export XML and structure your code into reusable libraries.
web2test	itCampus	Automated Capture & Replay test tool for functional testing of web applications and portals. web2test runs under Windows and Linux and supports Firefox, Internet Explorer, Mozilla and Seamonkey. Test scripts are browser and platform independent and tolerant to GUI changes. A 100% component based approach allows detailed and complex checks. There is no programming knowledge needed for creation and maintenance of test scripts. Additionally web2test provides a scripting interface in Jython and Groovy. The tool can be integrated with HP TestDirector for Quality Center and supports the testing of AJAX applications, data driven and distributed testing.
WebAii	ArtOfTest	.NET automation infrastructures provided FREE of charge to the developer and quality assurance community. Use WebAii to automate unit tests, feature and scenario tests for Ajax applications or any web application while leveraging an innovative approach to enhance Product Testability and produce more agile and robust test automation suites. WebAii comes with tight Visual Studio Integration in addition to many new features.
Webcorder	Crimson Solutions	Free GUI software testing tool developed in VB to allow for simple end user web testing. Essentially the user presses record and navigates their way through a scenario, telling the program to check for text/images along the way and optionally taking screenshots. At the end of the process you stop recording and save the script. You can then play back the script either interactively or in batch mode, and the program will generate log files etc.
WebInject	WebInject	WebInject is a free tool for testing of web applications and services. It can be used to test any individual system component with an HTTP interface (JSP, ASP, CGI, PHP, Servlets, HTML Forms, etc), and can be used as a test harness to create a suite of [HTTP level] functional, acceptance, and regression tests. A test harness (also referred to as a test driver or a test framework) allows you to run many test cases and collect/report your test results.
WebKing	ParaSoft	White-box, black-box, and regression testing
WET	Qantom Software	Framework for Web automation testing. WET is released on a Opensource license by Qantom.org. Has many features like multiple parameter based object identification for more reliable recognition, support for XML object repository, better popup handling, HTML results, Precondition Support, Teardown support, Parametrization of the scripts, Parameterization of the objects, External Library support and more!
WSUnit		WSUnit is a tool that can help you test Web Service consumers. It provides a predictable and repeatable simulation of a Web Service that is ideal for unit testing.
Yawet	InforMatrix	Java/swing tool to create, run and debug functional web tests. It can verify Html, Pdf and Xml documents. Yawet offers reports, step libraries, parametrisation and custom extensions.

Return to Top of Page

Security Test Tools

Product	Vendor	Comments
QA Inspect	SPI Dynamics	Incorporate fully automated web application security testing into the overall test management process. Now, Mercury users can conduct and manage both functional testing and security testing from a single platform.

Return to Top of Page

Performance Test Tools

Product	Vendor	Comments
ANTS	Red Gate	Load and scalability testing of .NET web services and applications
AppLoader	NRG	AppLoader is a load and functional testing software that simulates hundreds or thousands of end users interacting with your application. . It is protocol independent and works with any application accessible from a windows desktop. Including: Citrix, Java, .NET, Oracle, SAP, Eclipsys, Mckessen, Meditech, GE Healthcare, Epic, Cerner, Client-Server, Flex, Flash and even custom applications
AppsWatch	NRG	Appswatch is an automated testing tool for functional, regression, and GUI testing. It runs tests from the user’s Desktop. It is protocol independent and works with any application accessible from a windows desktop. Including: Citrix, Java, .NET, Oracle, SAP, Eclipsys, Mckessen, Meditech, GE Healthcare, Epic, Cerner, Client-Server, Flex, Flash and even custom applications.
Astra LoadTest	Mercury Interactive	Load testing tool for Web-based system
Dotcom-Monitor	Dana Consulting	Dotcom-Monitor.com simulates real users by using its remote agents positioned around the globe. As an end-user you setup the conditions for the web site or web application you would like to stress test.
forecast	Facilita	Suite of tools for system load testing, performance measurement and multi-user functional testing.
http_load	Acme Software	Free Unix-Based load load generator for web servers.
Jblitz	Clan Productions	Load testing tool aimed at smaller sites.
LoadTracer	Trace Technologies	GUI-based tool for load/Performance /Stress/ Scalability testing of web applications. Using this tool more number of virtual clients can be generated to hit the web server at a specific time. It simulates multiple instances of web client accessing a Web Server based on the simulating information obtained from one web client during a session with a web server. So under various load or stress condition, any web application can be tested effectively. This tool is developed into three modules, Load Tracer- Recorder, Load Tracer-Generator, and Load Tracer- Analysis. Load Tracer is compatible with Internet Explorer and Netscape.
Microsoft Application Center Test	Microsoft	Designed to stress test Web servers and analyze performance and scalability problems with Web applications, including Active Server Pages (ASP) and the components they use. Simulates a large group of users by opening multiple connections to the server and rapidly sending HTTP requests. Supports several different authentication schemes and the SSL protocol, making it ideal for testing personalized and secure sites.
NeoLoad	Neotys	NeoLoad simulates hundreds of virtual users on your web site, getting performance statistics and revealing errors under stress. Free eval download.
OpenLoad	OpenDemand Systems	Web testing tool used to simulate hundreds or even thousands of users simultaneously accessing a web site or application
OpenSTA	OpenSTA	A distributed software testing architecture based on CORBA. OpenSTA is designed to be used by Performance Testing Consultants or other technically proficient individuals. Using OpenSTA a user can generate realistic heavy loads simulating the activity of hundreds to thousands of virtual users. This capability is fully realized through OpenSTA's distributed testing architecture. OpenSTA graphs both virtual user response times and resource utilization information from all Web Servers, Application Servers, Database Servers and Operating Platforms under test, so that precise measurements can be gathered during load tests and analysis on these measurements can be performed. OpenSTA is Open Source software licensed under the GNU General Public License.
PowerProxy	Orderly Software	PowerProxy is a load-testing HTTP proxy with features that can help you browse or buy product from busy web servers.
Proxy Sniffer	Ing. Fischer	Web load and stress testing tool that permits to analyze the performance characteristics and the stability of a Web application under various load conditions. The measured results enables a targeted tuning which results in an optimized application.
PureLoad	NFG Global	Load testing tool that simulates hundreds of users executing requests against server based applications
QuotiumPro	Quotium	Professional load testing tool that analyses and predicts application behavior and performance. It tests an entire architecture by emulating very large numbers of realistic users, as well as sharp load variations.
Siege	Joe Dog	Open Source HTTP performance testing tool for UNIX systems.
SilkPerformer	Segue	Web server load testing
StressIT	Synametrics	StressIT is a robust stress testing application that accurately simulates the number of users your site can maintain. It will allow you to chart the breaking point at which your site's performance is no longer acceptable, and will allow you to pinpoint the bottlenecks preventing your site from reaching its maximum levels. StressIT allows you to define relevant use-case scenarios for your site; this customization allows you to test all aspects of your complex multi-tiered web-based application.
Site Tester 1.0	Pilot Software	SiteTester is a load-testing utility designed to test web servers and web applications. SiteTester simulates concurrent access of various users from different hosts to a web/application server. Each virtual user follows a predefined procedure to access the server. By increasing the number of virtual users you can test the capacity of your servers and determine the extra hardware and bandwidth needs, if any. SiteTester1.0 is a JDK1.2 compatible Java application and is available for the Windows and UNIX environments.
TestMaker	PushToTest	Open source framework to build intelligent test agents that test Web services (HTTP, HTTPS, SSL, Servlet, JSP, EJB, ActiveX, SOAP, .NET) for scalability and performance. Written in Java and runs everywhere Java runs, including Windows, Linux, Solaris, Macintosh and more.
WAPT	Novosoft	Performance, load and stress testing tool for Web sites and intranet applications. Supports all security features related to HTTPS/SSL. Testing of sites protected with basic authorization is also supported. Good abilities of run-time test data generation. The values of request parameters and URIs can be calculated in a number of ways.
Wbox	open source	Wbox aims to help you having fun while testing HTTP related stuff.
Web Application Stress Tool	Microsoft	Simulation tool that is designed to realistically reproduce multiple browsers requesting pages.
Webload	Radview Software	Performance tool for testing Internet and Intranet applications
WebPartner TPC	WebPartner	WebPartner's Test and Performance Center measures website performance from a customer's perspective combining ease of use and performance at an incredible value to meet all of your Testing, Diagnostics and Monitoring needs. The WebPartner Test and Performance Center web-based user interface allows you to quickly FIND, FIX and PREVENT bottlenecks and failures within your applications inside or outside your firewall.
Web Performance Trainer	Web Performance	Web Performance tester. For Windows and UNIX.
Web Polygraph		Freely available benchmarking tool for caching proxies, origin server accelerators, L4/7 switches, content filters, and other Web intermediaries
Web Roller	Novosoft	Performance and load testing for web and Intranet applications
Web Server Stress Tool	Paessler	Load test application for web servers
WebSizr	Technovations	Performance Analysis and Sizing framework for HTTP based Servers and applications

Return to Top of Page

Performance Test Services

Product	Vendor	Comments
Load Gold	ApTest	Outsourced Load Testing. Combining Open source tools with our staff's expert consulting and technology transfer to your team.
SiteStress	WebMetrics	A comprehensive service to generate http based synthetic transactions capable of scaling to to tens of thousands of users.
webStress	Moniforce	Web performance and stress testing services.