Theme:Testability

From TEITAC

Jump to: navigation, search

Contents

Introduction

In order to assess accessibility in the sense of compliance with these regulations, it is necessary to agree on criteria and measurement techniques. This shared understanding is also essential for communicating about accessibility. Without those conditions, for example, vendors and potentially many purchasers may have to perform testing on the same product that may be wastefully redundant.

Professional practice identifies three categories of testing techniques:

  • Inspection, in which the provision itself contains the metric and criterion, and which permits testing by individuals without much subject expertise or complex laboratory equipment. An example is Provision 3-J, "If any audio plays automatically for more than 3 seconds, either a mechanism is available to pause or stop the audio, or a mechanism is available to control audio volume which can be set independently of the system volume."
  • Formal test method, in which the provision or its Notes refer to an established test method or protocol. An example is Provision 5-D, which says in part, "Voice mail, messaging, auto-attendant, and interactive voice response telecommunications systems must provide access in the following manner ... 2. Use the ITU-T G.711 recommendation for encoding and storing audio information." The ITU-T G.711 standard refers to specific testing methods. In some cases a formal test method may need to be developed or adapted.
  • Expert evaluation, for provisions where the intended outcome may be too complex or contextual for any objective methods. An example is Provision 3-A, "Color must not be used as the only visual means of conveying information, indicating an action, prompting a response, or distinguishing a visual element." Although a testing tool may be used to create a colorless image of the product, only an expert can evaluate whether the resulting image lacks any information, because context is essential to that evaluation.

The goal of the Committee has been to clarify testing for these regulations, and to extend objective testing methods where possible.


Test Method Types from an IEEE standard

The IEEE P1583 standard used three test methods for the accessibilty requirements. This draft standard was superceded by federal work to develop voting system guidelines. (Text from final balloted draft - provided by Whitney Quesenbery)

Inspection

Using the inspection method, the design is systematically examined to determine whether it possesses a feature or functions specified in the requirements. Requirements to be inspected must be either observable or easily measurable and require no interpretation or judgment in determining whether the standards are met. They may be based on easily taught criteria, which would be known to any professional with appropriate background.

Measure (Test)

Requirements that require the test method have well-specified tests that are used to determine whether the applicable requirements are met. For example, many of the ergonomic requirements have specific measurement tests, such as decibel levels or measuring reach and clearance.

Expert Evaluation/Review

In an expert evaluation/review, a human factors/usability/accessibility subject matter expert performs a review of the system or its functionality to determine whether the applicable standards are met. This method extends inspection by requiring expert knowledge and judgment. The parameters of the decision or the type of expert needed for the review are (specified for each requirement).

Notes on attributes of test methods

These notes on types of test methods are from: "Assessment of Current Status of Voting System Standards and Other Resources Relevant to Human Factors and Privacy" (February 14, 2005, John Cugini for the TGDC/NIST - http://home.comcast.net/~jcuz/voting/eac-work/hfp/deliv/assess-gap.html)

1.3.2 Testing Just as standards can be characterized along several independent dimensions, so too there are several ways to categorize testing. Note that we are concerned here only with conformance testing, i.e tests whose primary purpose is to determine whether a system conforms to a given requirement. There are many other types of testing, such as: formative usability testing, debugging tests, quality comparison tests, etc.

Again, the distinctions below are not intended to be completely precise, but only to suggest different emphases in how tests are conducted.

We assume that all tests are, at some level, performed by a human agent (the "tester") even if with the assistance of sophisticated measuring devices.

  • Mode:
    • Inspection: An inspection test is one in which the system is in a passive state, and some static property is directly examined or measured. E.g. the tester measures the distance between control buttons. Inspection tests are usually associated with design-type requirements.
    • Operation: In an operational test, the system is activated and its behavior is observed and evaluated. E.g. the tester runs a large sample of ballots through a scanner to determine its error rate. Operation tests are usually associated with functional or performance-type requirements.
  • Judgment Required:
    • Basic: The tester may have to make direct observations, make small counts, or read instruments, but beyond these basic discriminations, no expert judgment is required to determine the test result. E.g. the tester measures the distance between control buttons. Since these tests depend only slightly on human judgment, they may fairly claim to be objective.
    • Expert: The tester must employ expert judgment, usually requiring a background in some technical discipline, in order to evaluate the system. E.g. the tester goes through the voting process to decide whether the instructions provided are clear or confusing. Such tests are inherently more subjective than those that depend only on measuring devices or basic observation.
  • Technical Complexity:
    • Low: The test is conducted with the aid of at most elementary measuring tools. Test setup procedures are simple or non-existent. E.g. the tester measures the distance between control buttons using a ruler.
    • High: The test is conducted with the aid of complex measuring tools whose use requires significant technical skill. E.g. the tester measures the figure to ground ambient contrast ratio for text. Complexity encompasses not only sophisticated hardware, but also complex procedures, such as the use of human subjects in usability testing. Note also that although a good deal of expertise is needed to conduct the test, expert judgment is not implied.
  • Result Metric:
    • Binary: The test result is basically pass/fail. E.g. the system either does or does not allow the voter to cast a straight party line vote.
    • Numeric: The test result may be expressed as a reasonably well-defined numeric quantity (or quantities), e.g. the distance between control buttons. Of course, if there is a benchmark for conformance, such as a minimum separation of one inch, then the numeric quantity gets mapped into a pass/fail result.
    • Qualitative: The result metric is a qualitative judgment or some sort, e.g. the clarity of the voting instructions could be judged as "very good", "adequate", "needs improvement", etc. As with numeric, this qualitative evaluation could then be mapped into a pass/fail result.
  • Reproducibility:
    • High: Test results vary only slightly when repeated, e.g. we would expect the measured distance between control buttons for the same model system to be quite consistent among test instances.
    • Lower: Test results may vary significantly among instances. Two possible sources for the variability are the reliance on expert judgment and the use of statistical techniques. An an example of the first case, two experts mught disagree in their evaluation of the clarity of instructions. An an example of the second, the measurement of error rates usually involves submitting large samples of input to the system. Even though the result for each individual test is precise (e.g. 3 errors in 100,000 trials), we would not expect exactly the same result each time.

1.3.3 Relationship between Standards and Testing An important point to keep in mind is that even though the characteristics of a standard may constrain the type of test that is appropriate, they will not usually determine it absolutely. I.e. given a particular requirement, we still need to think about the best way to test it.

Take, for instance, the somewhat vague requirement that "the system shall provide instructions to the voter that are clear and easy to understand". One way to test this is to rely on expert judgment. But one could also construct a usability test in which subjects were directed to read the instructions and then answer some questions or perform some task based on their reading. Or, one could submit the text to a software system that generates some sort of "simplicity" metric. It is not just obvious which is the "right" way to test; the costs and benefits of each approach need to be evaluated.

Personal tools
Task Forces