System Safety and Mil-STD-882
Kanchan Biswas
Former
Director (Aircraft), CEMILAC, DRDO
(+ 91
9448376835, kanchan.biswas@rediffmail.com)
Abstract
Key Words: System Safety, Mil 882,
Hazard, Risk, Probability of occurrences, Risk Mitigations, RTCA DO 178, RTCA
DO 278.
1.
INTRODUCTION TO SYSTEM SAFETY
A system is a composite entity of
personnel, materials, tools, equipment, facilities, and software. The elements
of this complex agglomeration are used together in the operational and support
environment to perform a given task or achieve a desired mission task. All the elements
of the system must not only operate without any hazard but also must be
compatible with each other and must not impose or introduce any additional
hazard to other constituents of the system.
System safety is defined as the
application of engineering and management principles, criteria, and techniques
to optimize safety within the constraint of operational effectiveness, time,
and cost throughout all phases of the system life cycle. A system has well-defined boundaries interfacing hardware, software, environment, and humans as
operators. The system safety process is a risk management process developed to
the highest degree. The steps in the process are [1]:
- Identify the risks using hazard analysis techniques as early as possible
in the system life cycle
- Develop options to mitigate – Eliminate, control, or avoid the hazards
- Provide for timely resolution of the hazard
- Implement the best strategy
- Control the hazard through a closed-loop system.
System safety is not only a
function of engineering but also an integral part of management activities.
Participation of management is required to ensure timely identification and
resolution of the hazards. The management as well as the engineering tasks are
identified in the Mil standard 882 [2].
1.1
Hazard and Risk
A system has well-defined boundaries
interfacing hardware, software, environment, and humans as operators. The terms ‘Safety’ and ‘Reliability’ are closely related. Reliability is defined as
the probability that a product will perform its intended function for a given ‘period
of time’ under a given set of conditions. The ‘time’ in this definition can
refer to such duration as ‘mission time’, ‘warranty period’, ‘predefined number
of cycles’, or ‘complete life cycle’.
Safety on the other hand may be defined as the system operation without the presence of any ‘untoward incidence/accident’ or ‘Hazard’
or ‘Mishaps’ within the operational time frame. The term ‘Hazard’, means a failure and the hazard function is defined as a limit of the
failure rate within a time interval (t2 - t1) as (t2 - t1) approaches zero. Many industries
use the term ‘Mean Time Between Failures’ (MTBF), the reciprocals of hazard
rate as a measure of system reliability [1].
‘Safety’ therefore, may be considered as
freedom from ‘accident’ or ‘risk’. The classical definition of ‘Risk’ on the
other hand is the product of frequency of hazard (f) and the severity (s) of
the hazard. The severity may be expressed in the loss in dollars, damage to
property or life, etc. The severity is
also categorized based on the extent of damage or loss.
Risk = Summation (f x s ) from time t1 to t2
= Product of f and s, summed for all the hazards during the defined time interval ( t2-t1).
Further, we may define an
Accident as a Hazard triggered by an unsafe condition that may result in a
Mishap – an unpleasant event that may result in death, injury, occupational
illness, or damage to equipment or property.
1.2 Generic
Cause of an Accident
For any mishaps to happen a hazardous
situation should exist and a trigger event will lead to an accident. To
prevent accidents, the hazard and the trigger event should, therefore, be
separated.
The generic cause of an
accident is shown in Figure 1, and some common examples of hazards and their
trigger event are shown in Table 1.
Figure
1. Generic cause of an Accident [1].
Table 1.
Some Examples of Hazards and Trigger Events [1]
Hazard |
Trigger Event |
High Temperature |
Temperature above the flash point |
Nuclear Gas Leakage |
The operator does not notice |
The train driver is under the influence of Drugs |
Train on the wrong track close to an oncoming train |
Cracks in aircraft fuselage body |
Too many landings and take-offs |
1.3 Safety
and Reliability
As indicated earlier safety and reliability are very closely
related terms, both reliability and safety engineers aim to make the system
design robust, so that it operates without any mishaps. However, there is a
small difference in their outlooks. A reliability engineer is interested in
increasing the MTBF of the product. He may increase the MTBF by introducing a
system redundancy. The reliability engineer may not pay great attention to a
critical failure mode, especially when redundancy is present in the system
design. However, a safety engineer may still consider such a failure critical
and dangerous. For example, if an aircraft device has a redundant circuit
board, a failure may not be critical if only one circuit board has failed. But
if the failed board is not detected and replaced immediately, another failure
could be catastrophic. The safety engineer in this case will ask for ‘failure
detectability’ or the health status through some warning/indication scheme so
that the maintenance engineer can identify the failure and call for repair/
replacement at the earliest opportunity [1].
1.4
System Safety Design
System Safety design involves the following activities:
a) Identify Potential Hazards
b) Classify the Hazards based on their severity or criticality
c) Eliminate or mitigate the hazards through design solutions or other
process
d) Maintain a Hazard Tracking system.
A system safety procedure
for Hazard Tracking is shown in Figure 2.
1.4.1
Preliminary Hazard Analysis
Preliminary Hazard Analysis (PHA)
is done at the design concept stage. This is done early so that initial risk
assessment and mitigation make the design concept sound. A lot of time and
money may be spent if the design concepts are to be changed later. The hazards
may be associated with hardware, software, operational procedures, system
interfaces, and environmental and health risks.
No clear-cut procedure is laid down in the literature on identifying hazards. However, PHA is basically a
brainstorming technique, hence some kind of organized approach helps in
conducting this analysis.
Mil-STD–882 [2] defines the
basis for the classification of Hazards based on the extent of loss or damage (see
later). For every class or criticality of Hazard, Mil 882 defines the
acceptance level of probability of occurrences. The probability of occurrence
should be inverse of the criticality level of the Hazard; in the sense that a more critical hazard should have a lower probability of occurrence.
A general format for making the
PHL is shown in Table 2.
Table 2. Preliminary Hazard List
Hazard
Sl No. |
Aircraft
System |
Hazard
Description |
Hazard Severity |
Hazard
Effect |
Hazard
Mitigation Scheme |
1.4.2 Fault Tree Analysis
A Fault Tree Analysis (FTA) along with cut set
analysis for the system level failures can also help in the identification of the
potential hazard. Other techniques are cause-effect diagrams, design reviews, and simulations. FTA is an inductive process especially useful for analyzing
catastrophic and critical hazards. FTA is a cause-and-effect diagram that uses
standard symbols of logic gates, prominent among them are ‘And’, ‘Or’, ‘Circle’
(component level faults), ‘Rectangle’ (top event), etc.
An FTA diagram shows the system component
arrangement based on their functional layout in the system, along with their cross-functional
dependency, it can, therefore, also be used for evaluating the system
reliability. For this reliability of each of the components at the elementary
level is to be known. The reliability of the next higher level can then be
evaluated using the system logic gate (‘And’, ‘OR’ gates) and this process is
to be continued till we reach the top level of the system.
An example of an FTA is shown in Figure 3. The system layout diagram is shown in Figure 3 (a) and the FTA diagram is shown in Figure 3 (b) [3].
Figure 3 (a). System Logic Diagram (example)
Figure 3 (b). FT diagram with Reliability values for the system shown in Fig 3(a) [3].
In Figure 3 (b), if the reliability of each of the
components Fx1, Fx2, ..
a) a) Each of the Subsystems A and B are in the ‘AND’ gate with their elements Fx1, Fx2, Fx3, and Fx4. Elements Fx1 and
b) The Subsystems A and B are in the ‘Or’ gate. They are in series, therefore,
Remember that the definitions
of ‘And’ and ‘OR’ gates in reliability analysis and FTA diagrams are opposite. The reliability values of each component, sub-system, and overall system reliability are shown
in Figure 3(b).
1.4.3
System Safety Analysis
A generic format for the System Safety Analysis
(SSA) Report is shown in Table 3. Instructions for filling up the cells are
shown.
Table 3. System Safety Analysis Report
Sl No |
Hazard
Description |
Risk Before Mitigation |
Risk elimination or mitigation measures |
Risk
after mitigation |
Safety Assessment
Report |
||||
Severity |
Likeli-hood |
Risk Hazard Index
|
Severity |
Likeli- hood |
Risk
Hazard Index |
||||
|
Name of the System: |
||||||||
|
Hazard
description Study
may include -Source
of potential harm -Mechanism
by which the harm may be caused -worst
credible outcome assuming no measure employed |
The
severity of the worst credible effect without any mitigation measures |
The
probability of occurrence of the hazard or failure mode without the
mitigation measures |
Combin-ation
of severity and probability to determine qualitative risk to the public. Mark the cell red to indicate
unaccept-able risk. |
Measures
taken to reduce the risk to public (reducing either the severity or
probability). Typically through design
changes, safety devices, warning devices, procedures and training. |
Post
mitigation Measures
severity level
|
Probability
of occurrence of failure post all mitigation measures taken |
If the cell is red, further elimination of
mitigation must be done to reduce the risk |
After
mitigation all cells should be green |
2. SYSTEM SAFETY – MIL 882E
Risk is a function of the severity
of a failure (event) and its probability of occurrence. The hazards are assigned
priorities so that the catastrophic and critical ones are prevented. The system safety standard practice for
military systems as identified in the DoD System Engineering approach, is to
eliminate the hazards, where possible, ‘And’ minimize the risks, where it is
not possible to eliminate the risks. The detailed procedure is identified in Mil
STD 882 E, 11 May 2011 [2].
The system safety process consists
of eight activity elements. The logical sequence of the eight elements is
shown in Figure 4.
2.1
Hazard Severity Category and Probability Levels (882E)
A hazard is defined as a real or
potential condition that could lead to an unpleasant event or series of events
(i.e. mishaps) resulting in death, injury, occupational illness, damage or loss
of equipment or property, or damage to the environment.
The severity categories of the
hazards are defined in Table 4.
Table 4. Severity Categories
of Hazard (Mil 882)
Description |
Severity Category |
Hazard (Mishap) Result Criteria |
Catastrophic
|
1 |
Could
result in one or more of the following: death, permanent total disability, irreversible
significant environmental impact, or monetary loss equal to or exceeding
$10M. |
Critical |
2 |
This could result in one or more of the following: permanent partial disability,
injuries, or occupational illness that may result in hospitalization of at
least three personnel, reversible significant environmental impact, or
monetary loss equal to or exceeding $1M but less than $10M. |
Marginal
|
3 |
Could
result in one or more of the following: injury or occupational illness
resulting in one or more lost work day(s), reversible moderate environmental
impact, or monetary loss equal to or exceeding $100 K but less than $1M. |
Negligible |
4 |
Could
result in one or more of the following: injury or occupational illness not
resulting in a lost workday, minimal environmental impact, or monetary loss
less than $100K. |
The probability level of a hazard is defined as the likelihood of occurrence of a mishap. Probability level ‘F’ is used to document cases where the hazard is no longer present. The quantitative probability levels have been suggested in Appendix A of Mil-STD 882E. The improbable level is generally considered to be less than one in a million. The probability Level of occurrences of Hazard is shown in table 5.
Table
5. Probability Level of Occurrences of Hazards (Mil 882)
Description |
Level |
Specific Individual Item |
Quantitative value |
Fleet or Inventory |
Frequent
|
A |
Likely
to occur often in the life of an item |
Probability of occurrence ≥ 10e-1 |
Continuously experienced |
Probable
|
B |
Will
occur several times in the life of an item |
The probability of occ less than 10e-1 |
Will occur frequently |
Occasional |
C |
Likely
to occur sometimes in the life of an item |
The probability of occ less than 10e-2 |
Will Occur several times |
Remote |
D |
Unlikely,
but possible to occur in the life of an item |
The probability of occ is less than 10e-3 but ≥ 10e-6 |
Unlikely, but can reasonably be expected to occur |
Improbable
|
E |
So
unlikely, it can be assumed occurrence may not be experienced in the lifetime
of an item |
The probability of occ is less than 10e-6 |
Unlikely to occur, but possible |
Eliminated
|
F |
Incapable
of occurrence within the life of an item. This category is used when
potential hazards are identified and later eliminated. |
2.2
Risk Assessment Code (Mil 882E)
The assessed risks are expressed
as a Risk Assessment Code (RAC) which is a combination of the Risk severity
category and the probability of its occurrence level. The Risk Assessment
Matrix is shown in Table 6. For example, in Table 6, RAC in cell 1A is High,
which has risk severity as “Catastrophic” and probability of occurrence as
“Frequent”. The risks are assigned risk levels of High, Serious, Medium, or
Low for each RAC.
Table 6. Risk Assessment Matrix
Probability |
Catastrophic (1) |
Critical (2) |
Marginal (3) |
Negligible (4) |
Frequent (A) |
High |
High |
Serious |
Medium |
Probable (B) |
High |
High |
Serious |
Medium |
Occasional (C) |
High |
Serious |
Medium |
Low |
Remote (D) |
Serious |
Medium |
Medium |
Low |
Improbable (E) |
Medium |
Medium |
Medium |
Low |
Eliminated (F) |
Eliminated |
Note: The definitions in tables 2, 3, and 4 (RAC) shall be used unless tailored alternative definitions are formally approved by procurement executives.
3. Risk Mitigation Procedure (Mil 882)
The potential risk mitigation(s)
shall be identified, and the expected risk reduction(s) of the alternative(s)
shall be estimated and documented in the Hazard Tracking System (HTS). The goal
should always be to eliminate the risk, however, when a hazard cannot be
eliminated, the risk should be reduced to the lowest acceptable level within
the constraints of cost schedule and performance by applying the system design
order of precedence Mil 882 standard has gone through several revisions. Released
in 1960, it was revised as 882A on 15 Aug 1979, 882B on 30 Mar 1984, 882C on 19
Jan 1993, 882D on 10 Feb 2000, and the last revision 882E came on 11 May
2011. There have been a few changes that happened during the revisions, notable
among them are:
a) The Hazard Risk index and suggested mitigation criteria have been
changed.
b) In 882E, the risk mitigation goal has been redefined to eliminate the
hazard if possible. When a hazard cannot be eliminated, the associated risk
should be reduced to the lowest acceptable level within the constraints of
cost, schedule, and performance by applying the system safety design order of
precedence. However, no quantitative value has been indicated.
c) Mil 882C had considered software as a critical item. However, no
specific direction for software safety analysis was indicated. Mil 882 E has
introduced specific Software Safety analysis.
3.1 Mil 882C Acceptance Criteria
While the Mil-STD 882E variant does not
clearly indicate the Risk acceptance criteria, the earlier variants did suggest
risk acceptance criteria concerning Table 4. Mil STD 882C, 19 Jan 1993: suggested
two acceptance criteria; one qualitative and one quantitative. The Risk Index
Matrix along with the two acceptance criteria are shown in table 7a and 7b.
Table 7a. Hazard Risk Index Matrix (Mil 882C, App – A)
Frequency |
Catastrophic (1) |
Critical (2) |
Marginal (3) |
Negligible (4) |
(A) Probable (X > 10e-1 |
1A |
2A |
3A |
4A |
(B) Frequent (10e-1 |
1B |
2B |
3B |
4B |
(C) Occasional ( 10e-2 |
1C |
2C |
3C |
4C |
(D) Remote (10e-3> X >10e-6 |
1D |
2D |
3D |
4D |
(E) Improbable (10e-6 |
1E |
2E |
3E |
4E |
Hazard Risk Index Suggested
Acceptance Criteria
1A, 1B, 1C, 2A, 2B, 3A - Hazard
Unacceptable
1D, 2C, 2D, 3B, 3C - Hazard Undesirable (management decision
required)
1E,2E,3D,3E,4A,4B - Hazard Acceptable with review by Management
4C, 4D, 4E - Hazard Acceptable without Review.
Table 7b. Hazard Risk Index Matrix (Mil 882C, App – A)
Frequency |
Catastrophic |
Critical |
Marginal |
Negligible |
Frequent |
1 |
3 |
7 |
13 |
Probable |
2 |
5 |
9 |
16 |
Occasional |
4 |
6 |
11 |
18 |
Remote |
8 |
10 |
14 |
19 |
Improbable |
12 |
15 |
17 |
20 |
Hazard Risk Index Suggested Acceptance Criteria
1 – 5 - Hazard Unacceptable
6 – 9 - Hazard Undesirable (management decision
required)
10 – 17 - Hazard Acceptable with review by Management
18 – 20 - Hazard Acceptable without Review.
It may be seen that 1C is
not unacceptable as per 7a but is undesirable as per 7b.
3.2 Mil-Std-882 D Acceptance
Mil-STD 882D, 10 Feb 2000 uses the
risk assessment matrix as shown in table 7b and uses the same acceptance
criteria as shown in table 7c.
Table 7c. Mishap risk categories and
mishap risk acceptance levels (882D, App–A).
Mishap Risk Assessment Value |
Mishap Risk Category |
Mishap Risk Acceptance Level |
1 – 5 |
High |
Component Acquisition
Executive |
6 – 9 |
Serious |
Program Executive Officer |
10 – 17 |
Medium |
Program Manager |
18 – 20 |
Low |
As Directed |
3.3 Other Risk Mitigation
Procedure (Mil 882E)
If mitigation through alternative
design change or material, does not appear feasible, other means as indicated
below may be adopted:
a) Consider the design change that reduces the severity and/or the
probability of the mishap potential caused by the hazard.
b) Reduce the severity or probability of the mishap potential caused by the
hazard by using engineered features or devices (alternative operating
procedures/mechanisms etc.).
c) Provide warning devices.
d) Incorporate signage, procedure, training, and Personal Protective
Equipment.
e) Manage Life Cycle Risk – After the system is fielded, the program office
should continue the system safety process of identifying and maintaining a ‘Hazard
Tracking System’. If a new hazard is discovered or a known hazard is determined
to have a higher risk level than previously assessed, the new or revised hazard
will need to be formally accepted and dealt with appropriately.
4.
Software Contribution to System Risk
The assessment of risk for
software-controlled or software-intensive systems cannot rely solely on the
risk severity and probability of occurrence like in the case of hardware. Software (s/w) is generally application-specific, and the reliability
parameters associated with a s/w cannot be estimated in the same manner as a
hardware. Therefore, another approach will have to be used for the assessment
of s/w’s contributions to system risk that considers the potential risk
severity and the degree of control the software exercises over the hardware.
The system safety approach of Mil
882E generates a Software Safety Criticality Matrix (SSCM) using the same
severity categories as catastrophic, critical, marginal and negligible. However,
in place of the probability of occurrences, we place the Software Control Categories
(SCC) levels. The SCC defines the degree of control the software exercises on
the system. The levels are defined from 1 to 5 for the ‘Autonomous’ to ‘No
Safety Impact (NSI)’ software (see table 8). The SSCM index obtained depending
on the SCC and severity categories indicates the Level of Rigor (LOR) task
category. The ‘LOR’ is a specification that defines the depth and breadth of
software analysis and verification activities necessary to provide a
significant level of confidence that a safety critical or a safety-related
software function will perform as required.
The software Control Categories
are shown in Table 8.
Table 8. Software Control Categories
Software Control Categories |
||
Level |
Name |
Description
|
1 |
Autonomous
(AT) |
· Software
functionality that exercises autonomous control authority over potentially
safety-significant hardware systems, subsystems, or components without the
possibility of predetermined safe detection and intervention by a control
entity to preclude the occurrence of a mishap or hazard. (This definition includes complex
system/software functionality with multiple
subsystems, interacting parallel processors, multiple interfaces, and
safety-critical functions that are time-critical.) |
2 |
Semi Autonomous
(SAT) |
· Software
functionality that exercises control authority over potentially
safety-significant hardware systems, subsystems, or components, allowing time
for predetermined safe detection and intervention by independent safety
mechanisms to mitigate or control the mishap or hazard. (This
definition includes the control of moderately complex system/ software
functionality, no parallel processing, or few interfaces, but other safety
systems/mechanisms can partially mitigate. System and software fault
detection and annunciation notifies the control entity of the need for
required safety actions.)
· Software item
that displays safety-significant information requiring immediate operator
entity to execute a predetermined action for mitigation or control over a
mishap or hazard. Software exception, failure, fault, or delay will allow, or
fail to prevent, mishap occurrence. (This definition assumes that the
safety-critical display information may be time-critical, but the time available
does not exceed the time required for adequate control entity response and
hazard control.) |
3. |
Redundant
Fault Tolerant (RFT) |
· Software
functionality that issues commands over safety-significant hardware systems,
subsystems, or components requiring a control entity to complete the command
function. The system detection and functional reaction includes redundant,
independent fault-tolerant mechanisms for each defined hazardous condition. (This
definition assumes that there is adequate fault detection annunciation,
tolerance, and system recovery to prevent the hazard occurrence if the software
fails, malfunctions, or degrades. There are redundant sources of safety-significant
information, and mitigating functionality
can respond within any time-critical period.)
· Software that
generates information of a safety-critical nature used to make critical
decisions. The system includes several redundant, independent fault tolerant
mechanisms for each hazardous condition, detection and display. |
4. |
Influential |
· Software
generates information of a safety-related nature used to make decisions by
the operator but does not require operator action to avoid a mishap. |
5. |
No Safety Impact (NSI) |
· Software
functionality that does not possess command or control authority over
safety-significant hardware systems, subsystems, or components and does not
provide safety-significant information. Software does not provide
safety-significant or time sensitive data or information that requires
control entity interaction. Software does not transport or resolve communication
of safety-significant or time sensitive data. |
4.1 Software Safety Criticality Matrix and LOR Tasks
The Software Safety Criticality Matrix as per Mil-Std-882E is shown in
Table 9. The SSCM (Table 9a) uses the Table 4 severity categories for the
columns and Table 8 software control categories for the rows. Table 9a assigns
Software Criticality Index (SwCI) numbers to each cross-referenced block of the
matrix.
Software Safety Criticality Matrix |
||||
|
Severity
Category |
|||
S/W Control Category |
Catastrophic (1) |
Critical (2) |
Marginal (3) |
Negligible (4) |
1 |
SwCI 1 |
SwCI 1 |
SwCI 3 |
SwCI 4 |
2 |
SwCI 1 |
SwCI 2 |
SwCI 3 |
SwCI 4 |
3 |
SwCI 2 |
SwCI 3 |
SwCI 3 |
SwCI 4 |
4 |
SwCI 3 |
SwCI 4 |
SwCI 4 |
SwCI 4 |
5 |
SwCI 5 |
SwCI 5 |
SwCI 5 |
SwCI 5 |
The Level of Rigor (LOR) tasks are associated with the specific SwCI as
defined in SSCM table 9a. Although the SSCM table is similar in appearance to the
Risk Assessment Matrix (Table 6), the SSCM is not an assessment of risk. The
LOR tasks associated with each SwCI number are the minimum set of tasks
required to assess the software contributions to the system-level risk.
The system safety and software system safety hazard analysis processes identify and mitigate the exact software contributors to hazards and mishaps. The successful execution of pre-defined LOR tasks increases the confidence that the software will perform as specified to software performance requirements while reducing the number of contributors to hazards that may exist in the system. The LOR task matrix is shown in Table 9b.
Table 9b.
Software Level of Rigor Matrix
Level of Rigor |
|
SwCI |
Level of Rigor Task Description |
SwCI 1 |
The program shall
perform an analysis of requirements, architecture, design, and code; and conduct in-depth safety-specific Testing. |
SwCI 2 |
The program shall
perform an analysis of requirements, architecture, and design;
and conduct in-depth safety-specific testing |
SwCI 3 |
The program shall
perform an analysis of requirements and architecture, and conduct in-depth
safety-specific testing. |
SwCI 4 |
The program shall
conduct safety-specific testing. |
SwCI 5 |
Once assessed by safety
engineering as Not Safety, then no safety-specific analysis or verification
is required. |
4.2 Assessment of Software Contribution to Risk
All software contributions to system risk, including any results of
Table 10 (below) application, shall be documented in the HTS.
a) The LOR tasks shall be performed as per table number 9b. Results of the
LOR tasks provide a level of confidence in safety-significant software and
document causal factors and hazards that may require mitigation. The results of the
LOR shall be included in the risk management process.
b) If the required LOR tasks are not performed, then the system risk(s)
contributions associated with unspecified or incomplete LOR tasks shall be
documented according to Table 10. Table 10 depicts the relationship between
SwCI, risk levels, completion of LOR tasks, and risk assessment. The assignment of
risk level is shown in column 2 of Table 10.
c) Table 10 shows how not meeting the LOR task requirements affects risk.
TABLE 10. Software
Contribution to System Risk
Relationship between
SwCI, risk level, LOR tasks, and risk |
||
S/w Criticality Index (SwCI) |
Risk
Level |
Software
LOR Tasks and Risk Assessment/Acceptance |
SwCI 1 |
High |
If SwCI 1 LOR tasks are
unspecified or incomplete, the contributions to system risk will be
documented as HIGH and provided to the PM for decision. The PM shall document
the decision of whether to expend the resources required to implement SwCI 1
LOR tasks or prepare a formal risk assessment for acceptance of a HIGH risk. |
SwCI 2 |
Serious |
If SwCI 2 LOR tasks are
unspecified or incomplete, the contributions to system risk will be
documented as SERIOUS and provided to the PM for decision. The PM shall
document the decision of whether to expend the resources required to
implement SwCI 2 LOR tasks or prepare a formal risk assessment for acceptance
of a SERIOUS risk. |
SwCI 3 |
Medium |
If SwCI 3 LOR
tasks are unspecified or incomplete, the contributions to system risk will be
documented as MEDIUM and provided to the PM for decision. The PM shall
document the decision of whether to expend the resources required to
implement SwCI 3 LOR tasks or prepare a formal risk assessment for acceptance
of a MEDIUM risk. |
SwCI 4 |
Low |
If SwCI 4 LOR
tasks are unspecified or incomplete, the contributions to system risk will be
documented as LOW and provided to the PM for decision. The PM shall document
the decision of whether to expend the resources required to implement SwCI 4
LOR tasks or prepare a formal risk assessment for acceptance of a LOW risk. |
SwCI 5 |
Not Safety |
No
safety-specific analyses or testing is required. |
The risks associated with system hazards that have software causes and
controls may be acceptable based on evidence that hazards, causes, and mitigations
have been identified, implemented, and verified by DoD customer
requirements. The evidence supports the conclusion that hazard controls provide
the required level of mitigation and the resultant risks can be accepted by the
appropriate risk acceptance authority. In this regard, software is no different
from hardware and operators. If the software design does not meet safety
requirements, then there is a contribution to the risk associated with inadequately
verified software hazard causes and controls. Generally, risk assessment is
based on quantitative and qualitative judgment and evidence.
Table 11 shows risk levels and how the risk can affect the system's
functioning.
Table 11. Software Hazard Causal Factor Risk Assessment Criteria
Risk Levels |
Description of Risk
Criteria |
A software implementation or software design defect that upon
occurring during normal or credible off-nominal operations or tests: |
|
High |
·
Can lead directly to a
catastrophic or critical mishap, or ·
Places the system in a
condition where no independent functioning interlocks preclude the potential
occurrence of a catastrophic or critical mishap. |
Serious |
·
Can lead directly to a
marginal or negligible mishap, or ·
Places the system in a
condition where only one independent functioning interlock or human action
remains to preclude the potential occurrence of a catastrophic or critical
hazard |
Medium |
·
Influences a marginal or
negligible mishap, reducing the system to a single point of failure, or ·
Places the system in a
condition where two independent functioning interlocks or human actions
remain to preclude the potential occurrence of a catastrophic or critical
hazard |
Low |
·
Influences a catastrophic
or critical mishap, but where three independent functioning interlocks or
human actions remain, or ·
Would be a causal factor
for a marginal or negligible mishap, but two independent functioning
interlocks or human actions remain. ·
A software degradation of
a safety critical function that is not categorized as high, serious, or
medium safety risk. ·
A requirement that, if
implemented, would negatively impact safety; however code is implemented
safely |
4.3
Software Criticality Levels (DO 178B)
In RTCA DO 178B [4], a software
level is decided based on its contributions to potential failure conditions
as determined by the system safety assessment process. The software level
implies that the level of effort required to show compliance with certification
requirements varies with the failure condition category. The software-level
definitions are:
a) Level A: Software whose anomalous behavior, as shown by the system
safety assessment process, would cause or contribute to a failure of system
function resulting in a catastrophic failure condition for the aircraft.
b) Level B: Software whose anomalous behavior, as shown by the system
safety assessment process, would cause or contribute to a failure of system
function resulting in a hazardous/severe-major failure condition for the
aircraft.
c) Level C: Software whose anomalous behavior, as shown by the system
safety assessment process, would cause or contribute to a failure of system
function resulting in a major failure condition for the aircraft.
d) Level D: Software whose anomalous behavior, as shown by the system
safety assessment process, would cause or contribute to a failure of system
function resulting in a minor failure condition for the aircraft.
e) Level E: Software whose anomalous behavior, as shown by the system
safety assessment process, would cause or contribute to a failure of system
function with no effect on aircraft operational capability or pilot workload.
Once the software has been confirmed as level E by the certification authority, no
further guidelines of the DO 178B document apply.
These five categories may be compared as
equivalent to the five categories of S/w Control Categories of Mil 882 E (see
Table 8).
4.4 Software Data Control
Categories (DO 178B)
Software life cycle data are assigned
to one of two categories: Control Category 1 (CC1) and Control Category 2
(CC2). These categories are related to the Software Configuration Management (SCM)
controls placed on the data. SCM introduces more severe control for the
category CC1 data control during each activity of the SCM process.
The SCM is involved with 13 different process objective verifications. While CC1 data control verifies all 13 objectives, CC2 data controls verifies only 6 of the process control objectives.
4.5. Aircraft and Engine
Software Certification Procedure (DO 178B)
For certification of aircraft and
engine software, the certification authority considers the software as part of
the airborne system or equipment installed on the aircraft or engine; that is,
the certification authority does not approve the software as a unique,
stand-alone product. The certification authority establishes the certification
basis for the aircraft or engine in consultation with the applicant. The
certification basis defines the regulations together with any special
conditions that may supplement the published regulations.
For modified aircraft or engines,
the certification authority considers the impact the modification has on the
certification basis originally established for the aircraft or engine. In some
cases, the certification basis for the modification may not change from the
original certification basis; however, the original means of compliance may not
be applicable for showing that the modification complies with the certification
basis and may need to be changed.
The certification authority
assesses the Plan for Software Aspects of Certification for completeness and
consistency with the means of compliance that was agreed upon to satisfy the certification
basis. The certification authority satisfies itself that the software level(s)
proposed by the applicant is consistent with the outputs of the system safety
assessment process and other system life cycle data. The certification
authority informs the applicant of issues with the proposed software plans that
need to be satisfied before the certification authority agreement.
Prior to certification, the
certification authority determines that the aircraft or engine (including the software
aspects of its systems or equipment) complies with the certification basis. For
the software, this is accomplished by reviewing the Software Accomplishment
Summary and evidence of compliance. The certification authority uses the Software
Accomplishment Summary as an overview of the software aspects of
certification.
The certification authority may
review at its discretion the software life cycle processes and their outputs or
evaluate other data or evidence of compliance as felt necessary by the
certification authority.
4.6 Software Verification
Plan (DO 178B)
The Software Verification Plan is
a description of the verification procedures to satisfy the software
verification process objectives. These procedures may vary by software level as
defined in Annex A-1 of DO 178B. The plan should include:
a) Organization: Organizational responsibilities within the software
verification process and interfaces with the other software life cycle
processes.
b. Independence: A description of the methods for establishing
verification independence, when required, for example, the independent
verification and validation procedure.
c. Verification methods: A description of the verification methods to be
used for each activity like Review, Analysis and testing.
d) Verification environment – Equipment for testing, analysis tools and
testing guidelines.
Annex A-1, DO 178b [4] indicates following categories of verification
methods:
a) The objectives should be satisfied with independence – (IV)
b) The objectives should be satisfied – (V)
c) Satisfaction of objectives at Applicants Discretion – (AD).
The verification methods for
objectives for A, B, C, and D levels of Software and their outputs are
identified in the annexure Tables A-1 to A-10. The data control category (CC1
and CC2) is also indicated for each objective verification process (including
the sub-objectives) for different software levels.
These annexures provide guidelines
for the software life cycle process objectives and outputs described in this
document by software level. These tables reference the objectives and outputs
of the software life cycle processes previously described in this document.
The tables include guidelines for:
a. For level E software, no separate certification requirements.
b. The independence by software level of the software life cycle process
activities applicable to satisfy that process's objectives.
c. The control category by software level for the software life cycle
data produced by the software life cycle process activities (subsection 7.3).
A summary of the Aircraft and
Engine Certification requirements as indicated in Annexures A-1 to A-10 of
DO 178 b [4] is shown below:
a) Some of the objectives of A and B level software are subjected to independent
verifications (IV), and rests are put to program verifications i.e the
objectives should be verified (V) category.
b) Most of the C level S/w are put up for program verification (V) and the
D level s/w are mostly AD (satisfaction of objectives is applicants;
discretion) category
c) Data controls for requirements are generally CC1 and data control for
records are of the CC2 category.
The summary details are shown in
Table 12.
Table 12. Aircraft & Engine Software Certification Details
DO 178 Table
No. |
Applicable Verification Method, Output, and Data Control Category. |
A-1: S/w Planning Process (7 objectives)
|
The table has 7 objectives, and the outputs
are various plans like design, development, certification plans, coding, SCM,
SQA, etc. Verification – V for A, B, and C and AD for D. Data control CC1 and
CC2 based on the data. |
A-2: S/w Development Process (7 objectives) |
There are 7 objectives, and the outputs are S/w design requirement,
design description, source and executable codes. All verifications are V
category and Data control - CC1 & CC2 |
A-3: Verification of OP of S/w Requirement Process (7 objectives) |
There are 7 objectives, and the outputs are S/w verification results. IV
(for Accuracy & compliance of high-level S/w, and Accuracy of Algorithm –
A & B level S/w). V for C & D level with some D levels are AD. Data
control mostly CC2. |
A-4:
Verification of OP of S/W Design Process (13 objectives) |
There are 13 objectives, and the outputs are S/w verification results.
IV for A & B level for S/w Compliance, Accuracy, architecture, and
portioning integrity. C level s/w are all V and D level s/w are of the AD
category. Data controls are CC2. |
A-5: Verification
of OP of S/w Coding & Integration Processes (7 objectives) |
There are 7 objectives, and the outputs are S/w verification results. IV
for A level for Source Code Compliance to low-level requirement, s/w architecture,
Accuracy & Consistency, for rest of the objectives V category. IV for
source code accuracy and consistency of B level s/w, rest V category. C
levels are V category, and all D level s/w are AD category. Data control is CC2 |
A-6: Testing
of Outputs of Integration Process (5 objectives) |
There are 5 objectives, and the outputs are S/w verification cases and
Procedures and S/w verification results. IV for A level S/w for two
objectives – object code Compliance to low-level requirement, and robustness.
All other objectives are V category. IV for B level s/w for objective code
compliance to low-level requirement and rest of the objectives V category. All
C-level S/w are V categories and D levels are V and AD categories. Data control
is CC1 for requirements and CC2 for rest. |
A-7: Verification of
Verification Process Results (8 objective cases) |
There are 8 objectives, and the outputs are S/w verification cases and
Procedures and S/w verification results. IV for all objectives for A level
S/w. IV for 3 objectives of B level s/w – Test coverage for s/w structures,
and V for rest. C and D level S/w are V and AD category. Data control is CC2.
|
A-8: S/W Configuration Management Process (6 objectives) |
There are 6 objectives, and the outputs are SCM
Records and S/w Life cycle environment Configuration Index. All 6 objectives
for A, B, C, and D level s/w are subjected to V category verification. Data
control baseline traceability and s/w life cycle environment control is CC1
and all others are CC2. |
A-9: S/W QA Process (3 objectives) |
There are 3 objectives, and the outputs are S/w
QA Records. All 3 objectives for A, B, C, and D level S/w are subjected to IV
verifications. Data control is CC2. |
A-10: Certification Liaison Process (3 objectives) |
There are 3 objectives, and the outputs Plan
for S/w aspect of Certification. All 3 objectives for A, B, C, and D level
S/w are subjected to V verifications. All data control is CC1. |
4.7 Comparison of Software
Safety – Mil 882, RTCA DO 178, and DO 278
RTCA DO 278/ EUROCAE ED 109, “Guidelines for Communication, Navigation, Surveillance, and Air Traffic Management (CNS/ATM) Systems Software Integrity Assurance” [5], is the ground-based complement to the DO-178B airborne standard. RTCA DO-278 provides guidelines for the assurance of software contained in non-airborne CNS/ATM systems. DO-178B/ED-12 defines a set of objectives that are recommended to establish assurance that the airborne software has been reviewed, and in some cases, modified for application to non-airborne CNS/ATM systems. DO-278 is intended as an interpretive guide for the application of DO-178 B guidance to non-airborne CNS/ATM systems. The two standards are thus interrelated [6].
In the aviation industry, the Unmanned Aerial System
(UAS) contains the Ground Control Centre and the Datalink which are essential parts
of the UAS. These being ground-based components; the software elements of these
components may be designed as per DO 278 standard.
DO-278
provides guidelines to produce software for ground-based avionics systems and
equipment that performs its intended function with a level of confidence in
safety. The guidelines are in the form of:
· Objectives of software life
cycle processes
· Description of activities
and design considerations for achieving these objectives
· Description of the evidence
that indicates that the objectives have been satisfied.
The
document discusses those aspects of certification that pertain to the
production of software for ground-based avionics systems and used in CNS or ATM
equipment.
A
comparison of the Software Safety/Assurance
Levels amongst Mil 882, DO 178 and DO-278 is shown in table 12.
Table 12. Comparison of Software
Assurance levels: Mil 882, DO 178, and DO-278
Failure
Category |
Failure
Description |
DO
178 S/w Criticality Level |
DO
278 Assurance Level |
Mil-STD 882 SwCI |
Catastrophic |
Prevents continued safe flight or landing, and many
fatal injuries |
Level
A |
AL1 |
SwCi
1 |
Critical /Severe major |
Failure conditions would reduce the capability of the aircraft or the ability of the crew to
cope with adverse operating conditions. |
Level B
|
AL2
|
SwCI 2 |
Major/ Marginal |
Impairs crew efficiency, discomfort, or possible injuries to occupants |
Level C |
AL3 |
SwCI 3 |
|
|
Not Used |
AL4 |
Not used |
Minor/ Negligible |
Reduced aircraft safety
margins, but well within crew capabilities |
Level D |
AL5 |
SwCI 4 |
No Safety Impact |
S/w resulting in no effect
on the system |
Level E |
AL6 |
SwCI 5 |
5.
MiL-STD-882E Tasks
Mil-STD-882 defines system safety tasks
to be performed. The 100-series tasks apply to management. The 200-series tasks
apply to the analysis. The 300-series tasks apply to evaluation. The 400-series tasks
apply to verification. These tasks can be selectively applied to fit a tailored
system safety effort. Each desired task shall be specifically called out in a
contract because the task descriptions do not include requirements for any
other tasks.
5.1
Task structure.
Each individual task is divided into three
parts: purpose, task description, and details to be specified.
a. The purpose explains the rationale for performing the task.
b. The task description describes the work a contractor shall perform if
the task is placed on contract. When preparing proposals, the contractor may recommend the inclusion of additional tasks or the deletion of specified tasks with supporting
rationale for each addition/deletion.
c. The details to be specified in each task description lists specific
information, additions, modifications, deletions, or options to the
requirements of the task that should be considered when requiring a task.
5.2
Task Section 100 - Management
a) Task 101 is to integrate “Hazard Identification and Mitigation”
into the Department of Defense Acquisition Systems Engineering process using
the system safety methodology.
b) Task 102 is to develop a “System Safety Program Plan” (SSPP)
that documents the system safety methodology for the identification,
classification, and mitigation of safety hazards as part of the overall Systems
Engineering process.
c) Task 103 is to develop a “Hazard Management Plan” (HMP) that
documents a standard, generic system safety methodology for the identification,
classification, and mitigation of hazards as part of the overall Systems
Engineering (SE) process.
d) Task 104 is to support reviews, certifications, boards, and audits
performed by or for the Government.
e) Task 105 is to provide support to designated program office
Integrated Product Teams (IPTs) or Working Groups (WGs).
f) Task 106 is to establish and maintain a closed-loop Hazard
Tracking System (HTS).
g) Task 107 is to submit periodic progress reports summarizing the
pertinent hazard management and engineering activities that occurred during the
reporting period.
h) Task 108 is to implement
a ‘Hazardous
Materials Management Plan’ (HMMP) which shall be made available to the
Government on request.
5.3
Task Section 200 - Analysis
a) Task 201 is to compile a
list of “Potential hazards” early in development.
b) Task 202 is to perform
and document a “Preliminary Hazard Analysis” (PHA) to identify hazards, assess
the initial risks, and identify potential mitigation measures.
c) Task 203 is to perform
and document a “System Requirements Hazard Analysis” (SRHA) to determine the
design requirements to eliminate hazards or reduce the associated risks for a
system, to incorporate these requirements into the appropriate system
documentation, and to assess compliance of the system with these requirements.
d) Task 204 is to perform
and document a “Subsystem Hazard Analysis” (SSHA) to verify subsystem
compliance with requirements to eliminate hazards or reduce the associated risks;
to identify previously unidentified hazards associated with the design of
subsystems; and, to recommend actions necessary to eliminate identified hazards
or mitigate their associated risks.
e) Task 205 is to perform
and document a “System Hazard Analysis” (SHA) to verify system compliance with
requirements to eliminate hazards or reduce the associated risks.
f)
Task 206 is to perform and document an “Operating and Support Hazard Analysis”
(O&SHA) to identify and assess hazards introduced by operational and
support activities and procedures; and to evaluate the adequacy of operational
and support procedures, facilities, processes, and equipment used to mitigate
risks associated with identified hazards.
g)
Task 207 is to perform and document a “Health Hazard Analysis” (HHA) to identify
human health hazards, to evaluate proposed hazardous materials and processes
using such materials, and to propose measures to eliminate the hazards or
reduce the associated risks when the hazards cannot be eliminated.
h)
Task 208 is to perform and document a “Functional Hazard Analysis” (FHA)
of an individual system or subsystem(s).
i)
Task 209 is to perform and document an analysis of the “System-of-Systems” (SoS)
to identify unique SoS hazards.
j) Task 210 is to perform and document an ‘Environmental Hazard Analysis”
(EHA) to support design development decisions.
5.4 Task Section 300 –
Evaluation
a) Task 301 is to perform and document a “Safety Assessment Report”
(SAR) to provide a comprehensive evaluation of the status of safety hazards and
their associated risks before the test or operation of a system, before the next
contract phase, or at contract completion.
b) Task 302 is to perform and document a “Hazard Management Assessment
Report” (HMAR) to provide a comprehensive evaluation of the status of
hazards and their associated risks before the test or operation of a system,
before the next contract phase, or at contract completion.
c) Task 303 is to participate in the “Test and Evaluation” (T&E)
process to evaluate the system, verify and validate risk mitigation measures,
and manage risks for test events.
d) Task 304 is to perform and document the application of the system
safety process described in Section 4 of this Standard to “Engineering Change Proposals”
(ECPs); change notices; deficiency reports; mishaps; and requests for deviations,
waivers, and related change documentation.
5.5 Task Section 400 – Verification
a) Task 401 is to define and perform tests and demonstrations or use
other verification methods on safety-significant hardware, software, and
procedures to “verify compliance with safety requirements”.
b) Task 402 is to perform tests and analyses, develop data necessary to
comply with hazard classification regulations, and prepare “Explosive
hazard classification data” associated with the development or
acquisition of new or modified explosives and packages or commodities
containing explosives (including all energetics).
c) Task 403 is to provide “Explosive Ordnance Disposal” (EOD)
source data, recommended render-safe procedures, and disposal considerations.
6.
Conclusion
Defining and following a process for assessing the risk associated with hazards is critical to the success of a program, particularly as systems are combined into more complex System of Systems (SoS). These SoS often involve systems developed under disparate development and safety programs and may require interfaces with other Service (Army, Navy/Marines, and Air Force) or DoD agency systems. These other SoS stakeholders may have their own safety processes for determining the acceptability of systems to interface with theirs.
Therefore, Mil-Std-882
procedure of identifying hazards, assessing the contribution of and deciding the
levels of mitigation required to achieve the acceptable level can ensure the
system design safe for operation.
Software contribution to system risks are
discussed in section 4.4 of Mil-Std-882E, RTCA Do 178B for airborne systems and
equipment and RTCA DO 278 discusses the software integrity assurance for ground
based CNS/ATM systems. All these standards segregate the software into
different levels based on their failure criticality and recommends appropriate
procedure for development, testing and qualifications. A comparison of the
criticality levels of these standards have been established and shown in table
12.
For contribution to safety by system software,
Mil 882 indicates that during the development process, program shall ensure the
Level of Rigor for the airborne software as per their criticality as indicated
in the standard (see table 10). This procedure is likely to ensure adequate software
system safety. However, it does not indicate separate procedure for regulatory verification
or testing for software belonging to different criticality. The author is not
competent enough to specify whether software verification as indicated in RTCA
DO 178B for airborne and DO 278A for ground-based system software verification
methodology can be eliminated in case software was developed following the
Level of Rigor as described in Mil ST 882E. The regulatory body or the
certification authority can only take these decisions.
References
1. Dev G Raheja, ‘Assurance
Technologies – Principles and Practices’, McGraw Hill Publication.
2. Mil STD 882 E, 11 May 2012,
‘Department of Defense Standard Practices, System Safety’.
3. L S Srinath, ‘Reliability
Engineering’, 4th Edition, Affiliated East West Private Limited, New
Delhi, 2013.
4. RTCA DO 178 B, 01 Dec 1992:
Software Consideration in Airborne Systems and Equipment Certification.
5. RTCA DO 278A: Guidance for the
development of software for Communication Navigation, Surveillance, and Air
Traffic Management (CNS/ATM) system software integrity Assurance.
6. Stephen A Jaclin, ‘Certification of Safety-Critical Software Under DO -178C and DO -278A. NASA
Ames Research Centre, Moffett Field, CA, 94035. American Institute of
Aeronautics and Astronautics.
Published for all practicing engineers
ReplyDelete