Wednesday, August 21, 2024

System Safety and Mil-STD-882

                                      

                                                 System Safety and Mil-STD-882 

Kanchan Biswas

Former Director (Aircraft), CEMILAC, DRDO

(+ 91 9448376835, kanchan.biswas@rediffmail.com)

 

       Abstract

A system may be defined as a composite entity consisting of hardware, and software, integrated through some interfaces to be used following a defined operating procedure by the operators in a supporting environment to achieve any defined task or complete some specific mission. The system safety studies are primarily aimed at eliminating system malfunctions or failures to prevent loss of human life and properties. While it is not possible to eliminate system failures totally, system safety engineering aims at identifying the level of risk and trying to mitigate the risk to an acceptable level. Mil-STD 882 indicates the procedure for classifying the severity of hazard as well as the acceptable probability of occurrences for each category of hazard. System safety engineers cannot wait for an accident or defect investigation report to improve safety, rather all safety provisions are to be complied with during the design conceptual stages. System safety engineering is an application of engineering and management principles, criteria, and techniques to optimize safety within the constraints of operational effectiveness, time, and cost throughout all phases of the system life cycle.    

Key Words: System Safety, Mil 882, Hazard, Risk, Probability of occurrences, Risk Mitigations, RTCA DO 178, RTCA DO 278.      

 

1. INTRODUCTION TO SYSTEM SAFETY

A system is a composite entity of personnel, materials, tools, equipment, facilities, and software. The elements of this complex agglomeration are used together in the operational and support environment to perform a given task or achieve a desired mission task. All the elements of the system must not only operate without any hazard but also must be compatible with each other and must not impose or introduce any additional hazard to other constituents of the system.

System safety is defined as the application of engineering and management principles, criteria, and techniques to optimize safety within the constraint of operational effectiveness, time, and cost throughout all phases of the system life cycle. A system has well-defined boundaries interfacing hardware, software, environment, and humans as operators. The system safety process is a risk management process developed to the highest degree. The steps in the process are [1]:

  1.       Identify the risks using hazard analysis techniques as early as possible in the system life cycle
  2.           Develop options to mitigate – Eliminate, control, or avoid the hazards
  3.         Provide for timely resolution of the hazard
  4.        Implement the best strategy
  5.           Control the hazard through a closed-loop system.

System safety is not only a function of engineering but also an integral part of management activities. Participation of management is required to ensure timely identification and resolution of the hazards. The management as well as the engineering tasks are identified in the Mil standard 882 [2].

 

1.1 Hazard and Risk

A system has well-defined boundaries interfacing hardware, software, environment, and humans as operators. The terms ‘Safety’ and ‘Reliability’ are closely related. Reliability is defined as the probability that a product will perform its intended function for a given ‘period of time’ under a given set of conditions. The ‘time’ in this definition can refer to such duration as ‘mission time’, ‘warranty period’, ‘predefined number of cycles’, or ‘complete life cycle’.

Safety on the other hand may be defined as the system operation without the presence of any ‘untoward incidence/accident’ or ‘Hazard’ or ‘Mishaps’ within the operational time frame. The term ‘Hazard’, means a failure and the hazard function is defined as a limit of the failure rate within a time interval (t2 - t1) as  (t2 - t1) approaches zero. Many industries use the term ‘Mean Time Between Failures’ (MTBF), the reciprocals of hazard rate as a measure of system reliability [1].  

‘Safety’ therefore, may be considered as freedom from ‘accident’ or ‘risk’. The classical definition of ‘Risk’ on the other hand is the product of frequency of hazard (f) and the severity (s) of the hazard. The severity may be expressed in the loss in dollars, damage to property or life, etc.  The severity is also categorized based on the extent of damage or loss.

          Risk =  Summation (f x s ) from time t1 to t2 

                  = Product of f and s, summed for all the hazards during the defined time interval ( t2-t1).

Further, we may define an Accident as a Hazard triggered by an unsafe condition that may result in a Mishap – an unpleasant event that may result in death, injury, occupational illness, or damage to equipment or property. 

 

1.2 Generic Cause of an Accident

For any mishaps to happen a hazardous situation should exist and a trigger event will lead to an accident. To prevent accidents, the hazard and the trigger event should, therefore, be separated. 

The generic cause of an accident is shown in Figure 1, and some common examples of hazards and their trigger event are shown in Table 1.

 


Figure 1. Generic cause of an Accident [1].

 

                                  Table 1. Some Examples of Hazards and Trigger Events [1]

Hazard

Trigger Event

High Temperature

Temperature above the flash point

Nuclear Gas Leakage

The operator does not notice

The train driver is under the influence of Drugs

Train on the wrong track close to an oncoming train

Cracks in aircraft fuselage body

Too many landings and take-offs

 

1.3 Safety and Reliability

As indicated earlier safety and reliability are very closely related terms, both reliability and safety engineers aim to make the system design robust, so that it operates without any mishaps. However, there is a small difference in their outlooks. A reliability engineer is interested in increasing the MTBF of the product. He may increase the MTBF by introducing a system redundancy. The reliability engineer may not pay great attention to a critical failure mode, especially when redundancy is present in the system design. However, a safety engineer may still consider such a failure critical and dangerous. For example, if an aircraft device has a redundant circuit board, a failure may not be critical if only one circuit board has failed. But if the failed board is not detected and replaced immediately, another failure could be catastrophic. The safety engineer in this case will ask for ‘failure detectability’ or the health status through some warning/indication scheme so that the maintenance engineer can identify the failure and call for repair/ replacement at the earliest opportunity [1]. 

 

1.4 System Safety Design

System Safety design involves the following activities:  

a)    Identify Potential Hazards

b)    Classify the Hazards based on their severity or criticality

c)     Eliminate or mitigate the hazards through design solutions or other process

d)    Maintain a Hazard Tracking system.

A system safety procedure for Hazard Tracking is shown in Figure 2. 



 Figure 2. A Hazard Tracking System

1.4.1 Preliminary Hazard Analysis

Preliminary Hazard Analysis (PHA) is done at the design concept stage. This is done early so that initial risk assessment and mitigation make the design concept sound. A lot of time and money may be spent if the design concepts are to be changed later. The hazards may be associated with hardware, software, operational procedures, system interfaces, and environmental and health risks.

No clear-cut procedure is laid down in the literature on identifying hazards. However, PHA is basically a brainstorming technique, hence some kind of organized approach helps in conducting this analysis.

Mil-STD–882 [2] defines the basis for the classification of Hazards based on the extent of loss or damage (see later). For every class or criticality of Hazard, Mil 882 defines the acceptance level of probability of occurrences. The probability of occurrence should be inverse of the criticality level of the Hazard; in the sense that a more critical hazard should have a lower probability of occurrence. 

A general format for making the PHL is shown in Table 2.

Table 2. Preliminary Hazard List

Hazard Sl No.

Aircraft

System

Hazard Description

Hazard

Severity

Hazard Effect

Hazard Mitigation Scheme

 

 1.4.2 Fault Tree Analysis

A Fault Tree Analysis (FTA) along with cut set analysis for the system level failures can also help in the identification of the potential hazard. Other techniques are cause-effect diagrams, design reviews, and simulations. FTA is an inductive process especially useful for analyzing catastrophic and critical hazards. FTA is a cause-and-effect diagram that uses standard symbols of logic gates, prominent among them are ‘And’, ‘Or’, ‘Circle’ (component level faults), ‘Rectangle’ (top event), etc.

An FTA diagram shows the system component arrangement based on their functional layout in the system, along with their cross-functional dependency, it can, therefore, also be used for evaluating the system reliability. For this reliability of each of the components at the elementary level is to be known. The reliability of the next higher level can then be evaluated using the system logic gate (‘And’, ‘OR’ gates) and this process is to be continued till we reach the top level of the system.  

An example of an FTA is shown in Figure 3. The system layout diagram is shown in Figure 3 (a) and the FTA diagram is shown in Figure 3 (b) [3].

  

                     Figure 3 (a). System Logic Diagram (example)

 Figure 3 (b). FT diagram with Reliability values for the system shown in Fig 3(a) [3].

In Figure 3 (b), if the reliability of each of the components Fx1, Fx2,  ...., Fx4 is 0.9. The reliability of the sub-systems at level 2, can be calculated as:

a)      a) Each of the Subsystems A and B are in the ‘AND’ gate with their elements Fx1, Fx2, Fx3, and Fx4. Elements           Fx1 and  Fx2 are in the ‘And’ gate, components are in parallel, therefore,

              Rsubsystem =  1 – {(1- RFx1  ) ((1- RFx2) }  = 0.99

   b) The Subsystems A and B are in the ‘Or’ gate. They are in series, therefore,

  R system =  Rsubsystem a x Rsubsystem b  0.99 x 0.99 = 0.9801

Remember that the definitions of ‘And’ and ‘OR’ gates in reliability analysis and FTA diagrams are opposite. The reliability values of each component, sub-system, and overall system reliability are shown in Figure 3(b).

 

1.4.3 System Safety Analysis

A generic format for the System Safety Analysis (SSA) Report is shown in Table 3. Instructions for filling up the cells are shown.

 Table 3. System Safety Analysis Report


Sl No

Hazard Description

 

Risk Before Mitigation

Risk elimination or mitigation measures

Risk after mitigation

Safety

Assessment Report 

Severity

Likeli-hood

Risk Hazard

Index

Severity

Likeli-

hood

Risk Hazard Index

 

Name of the System: 

 

Hazard description

Study may include

-Source of potential harm

-Mechanism by which the harm may be caused

-worst credible outcome assuming no measure employed  

The severity of the worst credible effect without any mitigation measures

The probability of occurrence of the hazard or failure mode without the mitigation measures

Combin-ation of severity and probability to determine qualitative risk to the public.

Mark the cell red to indicate unaccept-able risk

Measures taken to reduce the risk to public (reducing either the severity or probability). Typically

through

design changes, safety devices, warning devices, procedures and training.

Post mitigation

Measures severity level

 

Probability of occurrence of failure post all mitigation measures taken

If the cell is red, further elimination of mitigation must be done to reduce the risk

After mitigation all cells should be green

 

 2. SYSTEM SAFETY – MIL 882E

Risk is a function of the severity of a failure (event) and its probability of occurrence. The hazards are assigned priorities so that the catastrophic and critical ones are prevented.  The system safety standard practice for military systems as identified in the DoD System Engineering approach, is to eliminate the hazards, where possible, ‘And’ minimize the risks, where it is not possible to eliminate the risks. The detailed procedure is identified in Mil STD 882 E, 11 May 2011 [2].

The system safety process consists of eight activity elements. The logical sequence of the eight elements is shown in Figure 4.  


Figure 4. Eight Elements of the System Safety Process [2]

 

2.1 Hazard Severity Category and Probability Levels (882E)

A hazard is defined as a real or potential condition that could lead to an unpleasant event or series of events (i.e. mishaps) resulting in death, injury, occupational illness, damage or loss of equipment or property, or damage to the environment.

The severity categories of the hazards are defined in Table 4.

              Table 4. Severity Categories of Hazard (Mil 882)

Description

Severity Category

Hazard (Mishap) Result Criteria

Catastrophic

1

Could result in one or more of the following: death, permanent total disability, irreversible significant environmental impact, or monetary loss equal to or exceeding $10M.

Critical

2

This could result in one or more of the following: permanent partial disability, injuries, or occupational illness that may result in hospitalization of at least three personnel, reversible significant environmental impact, or monetary loss equal to or exceeding $1M but less than $10M.

Marginal

3

Could result in one or more of the following: injury or occupational illness resulting in one or more lost work day(s), reversible moderate environmental impact, or monetary loss equal to or exceeding $100 K but less than $1M.

Negligible 

4

Could result in one or more of the following: injury or occupational illness not resulting in a lost workday, minimal environmental impact, or monetary loss less than $100K.

The probability level of a hazard is defined as the likelihood of occurrence of a mishap. Probability level ‘F’ is used to document cases where the hazard is no longer present. The quantitative probability levels have been suggested in Appendix A of Mil-STD 882E. The improbable level is generally considered to be less than one in a million. The probability Level of occurrences of Hazard is shown in table 5. 

Table 5. Probability Level of Occurrences of Hazards (Mil 882)

Description

Level

Specific Individual Item

Quantitative value

Fleet or Inventory

Frequent

A

Likely to occur often in the life of an item

Probability of occurrence ≥ 10e-1

Continuously experienced 

Probable

B

Will occur several times in the life of an item

The probability of occ  less than  10e-1 but ≥10e-2    

Will occur frequently

Occasional

C

Likely to occur sometimes in the life of an item

The probability of occ less than 10e-2 but ≥ 10e-3   

Will Occur several times

Remote

D

Unlikely, but possible to occur in the life of an item

The probability of occ is less than 10e-3 but ≥ 10e-6    

Unlikely, but can reasonably be expected to occur

Improbable

E

So unlikely, it can be assumed occurrence may not be experienced in the lifetime of an item

The probability of occ is less than 10e-6

Unlikely to occur, but possible

Eliminated

F

Incapable of occurrence within the life of an item. This category is used when potential hazards are identified and later eliminated. 

 

2.2 Risk Assessment Code (Mil 882E)

The assessed risks are expressed as a Risk Assessment Code (RAC) which is a combination of the Risk severity category and the probability of its occurrence level. The Risk Assessment Matrix is shown in Table 6. For example, in Table 6, RAC in cell 1A is High, which has risk severity as “Catastrophic” and probability of occurrence as “Frequent”. The risks are assigned risk levels of High, Serious, Medium, or Low for each RAC.

         Table 6. Risk Assessment Matrix

            Severity

Probability

Catastrophic

(1)

Critical

(2)

Marginal

(3)

Negligible

(4)

Frequent

(A)

High

High

Serious

Medium

Probable

(B)

High

High

Serious

Medium

Occasional

(C)

High

Serious

Medium

Low

Remote

(D)

Serious

Medium

Medium

Low

Improbable

(E)

Medium

Medium

Medium

Low

Eliminated

(F)

Eliminated

 Note: The definitions in tables 2, 3, and 4 (RAC) shall be used unless tailored alternative definitions are formally approved by procurement executives

 

3. Risk Mitigation Procedure (Mil 882)

The potential risk mitigation(s) shall be identified, and the expected risk reduction(s) of the alternative(s) shall be estimated and documented in the Hazard Tracking System (HTS). The goal should always be to eliminate the risk, however, when a hazard cannot be eliminated, the risk should be reduced to the lowest acceptable level within the constraints of cost schedule and performance by applying the system design order of precedence Mil 882 standard has gone through several revisions. Released in 1960, it was revised as 882A on 15 Aug 1979, 882B on 30 Mar 1984, 882C on 19 Jan 1993, 882D on 10 Feb 2000, and the last revision 882E came on 11 May 2011. There have been a few changes that happened during the revisions, notable among them are:  

a)    The Hazard Risk index and suggested mitigation criteria have been changed.

b)    In 882E, the risk mitigation goal has been redefined to eliminate the hazard if possible. When a hazard cannot be eliminated, the associated risk should be reduced to the lowest acceptable level within the constraints of cost, schedule, and performance by applying the system safety design order of precedence. However, no quantitative value has been indicated.

c)     Mil 882C had considered software as a critical item. However, no specific direction for software safety analysis was indicated. Mil 882 E has introduced specific Software Safety analysis.

 

 

3.1 Mil 882C Acceptance Criteria

While the Mil-STD 882E variant does not clearly indicate the Risk acceptance criteria, the earlier variants did suggest risk acceptance criteria concerning Table 4. Mil STD 882C, 19 Jan 1993: suggested two acceptance criteria; one qualitative and one quantitative. The Risk Index Matrix along with the two acceptance criteria are shown in table 7a and 7b.  

                             Table 7a. Hazard Risk Index Matrix (Mil 882C, App – A)

                       

                           Category      

   Frequency

Catastrophic

(1)

Critical

(2)

Marginal

(3)

Negligible

(4)

(A)   Probable (X > 10e-1  )

1A

2A

3A

4A

          (B) Frequent (10e-1>X >10e-2  )

1B

2B

3B

4B

(C) Occasional ( 10e-2 > X > 10e-3 )

1C

2C

3C

4C

(D) Remote (10e-3> X >10e-6  )

1D

2D

3D

4D

(E) Improbable (10e-6 > X)

1E

2E

3E

4E

 

   Hazard Risk Index               Suggested Acceptance Criteria

   1A, 1B, 1C, 2A, 2B, 3A        -   Hazard Unacceptable

   1D, 2C, 2D, 3B, 3C                -   Hazard Undesirable (management decision required)

   1E,2E,3D,3E,4A,4B             -   Hazard Acceptable with review by Management

   4C, 4D, 4E                           -   Hazard Acceptable without Review. 

          Table 7b. Hazard Risk Index Matrix (Mil 882C, App – A)

Category

Frequency

Catastrophic

Critical

Marginal

Negligible

Frequent

1

3

7

13

Probable

2

5

9

16

Occasional

4

6

11

18

Remote

8

10

14

19

Improbable

12

15

17

20

         Hazard Risk Index        Suggested Acceptance Criteria

                   1 – 5                    -   Hazard Unacceptable
                   6 – 9                    -   Hazard Undesirable (management decision required)
                  10 – 17                 -   Hazard Acceptable with review by Management
                  18 – 20                 -   Hazard Acceptable without Review. 

It may be seen that 1C is not unacceptable as per 7a but is undesirable as per 7b. 

3.2 Mil-Std-882 D Acceptance

Mil-STD 882D, 10 Feb 2000 uses the risk assessment matrix as shown in table 7b and uses the same acceptance criteria as shown in table 7c.

      Table 7c. Mishap risk categories and mishap risk acceptance levels (882D, App–A).

Mishap Risk Assessment Value

Mishap Risk Category

Mishap Risk Acceptance Level

1 – 5

High

Component Acquisition Executive

6 – 9

Serious

Program Executive Officer

10 – 17

Medium

Program Manager

18 – 20

Low

As Directed

3.3 Other Risk Mitigation Procedure (Mil 882E)

If mitigation through alternative design change or material, does not appear feasible, other means as indicated below may be adopted:

a)    Consider the design change that reduces the severity and/or the probability of the mishap potential caused by the hazard.

b)    Reduce the severity or probability of the mishap potential caused by the hazard by using engineered features or devices (alternative operating procedures/mechanisms etc.).

c)     Provide warning devices.

d)    Incorporate signage, procedure, training, and Personal Protective Equipment.

e)    Manage Life Cycle Risk – After the system is fielded, the program office should continue the system safety process of identifying and maintaining a ‘Hazard Tracking System’. If a new hazard is discovered or a known hazard is determined to have a higher risk level than previously assessed, the new or revised hazard will need to be formally accepted and dealt with appropriately. 

 

4. Software Contribution to System Risk

The assessment of risk for software-controlled or software-intensive systems cannot rely solely on the risk severity and probability of occurrence like in the case of hardware. Software (s/w) is generally application-specific, and the reliability parameters associated with a s/w cannot be estimated in the same manner as a hardware. Therefore, another approach will have to be used for the assessment of s/w’s contributions to system risk that considers the potential risk severity and the degree of control the software exercises over the hardware.

The system safety approach of Mil 882E generates a Software Safety Criticality Matrix (SSCM) using the same severity categories as catastrophic, critical, marginal and negligible. However, in place of the probability of occurrences, we place the Software Control Categories (SCC) levels. The SCC defines the degree of control the software exercises on the system. The levels are defined from 1 to 5 for the ‘Autonomous’ to ‘No Safety Impact (NSI)’ software (see table 8). The SSCM index obtained depending on the SCC and severity categories indicates the Level of Rigor (LOR) task category. The ‘LOR’ is a specification that defines the depth and breadth of software analysis and verification activities necessary to provide a significant level of confidence that a safety critical or a safety-related software function will perform as required.     

The software Control Categories are shown in Table 8.

Table 8. Software Control Categories

Software Control Categories

Level

Name

Description

1

Autonomous

(AT)

·   Software functionality that exercises autonomous control authority over potentially safety-significant hardware systems, subsystems, or components without the possibility of predetermined safe detection and intervention by a control entity to preclude the occurrence of a mishap or hazard.

     (This definition includes complex system/software functionality with   multiple subsystems, interacting parallel processors, multiple interfaces, and safety-critical functions that are time-critical.)

2

Semi

Autonomous

(SAT)

·   Software functionality that exercises control authority over potentially safety-significant hardware systems, subsystems, or components, allowing time for predetermined safe detection and intervention by independent safety mechanisms to mitigate or control the mishap or hazard.

(This definition includes the control of moderately complex system/ software functionality, no parallel processing, or few interfaces, but other safety systems/mechanisms can partially mitigate. System and software fault detection and annunciation notifies the control entity of the need for required safety actions.)

 

·   Software item that displays safety-significant information requiring immediate operator entity to execute a predetermined action for mitigation or control over a mishap or hazard. Software exception, failure, fault, or delay will allow, or fail to prevent, mishap occurrence.

     (This definition assumes that the safety-critical display information may    be time-critical, but the time available does not exceed the time required for adequate control entity response and hazard control.)

 3.

Redundant Fault Tolerant (RFT)

·   Software functionality that issues commands over safety-significant hardware systems, subsystems, or components requiring a control entity to complete the command function. The system detection and functional reaction includes redundant, independent fault-tolerant mechanisms for each defined hazardous condition.

(This definition assumes that there is adequate fault detection annunciation, tolerance, and system recovery to prevent the hazard occurrence if the software fails, malfunctions, or degrades. There are redundant sources of safety-significant information, and mitigating

functionality can respond within any time-critical period.)

 

·   Software that generates information of a safety-critical nature used to make critical decisions. The system includes several redundant, independent fault tolerant mechanisms for each hazardous condition, detection and display.

  4.

Influential

·   Software generates information of a safety-related nature used to make decisions by the operator but does not require operator action to avoid a mishap.

5.

No Safety Impact

(NSI)

·   Software functionality that does not possess command or control authority over safety-significant hardware systems, subsystems, or components and does not provide safety-significant information. Software does not provide safety-significant or time sensitive data or information that requires control entity interaction. Software does not transport or resolve communication of safety-significant or time sensitive data.

 

4.1 Software Safety Criticality Matrix and LOR Tasks

The Software Safety Criticality Matrix as per Mil-Std-882E is shown in Table 9. The SSCM (Table 9a) uses the Table 4 severity categories for the columns and Table 8 software control categories for the rows. Table 9a assigns Software Criticality Index (SwCI) numbers to each cross-referenced block of the matrix.

 Table 9a. Software Safety Criticality Matrix 

Software Safety Criticality Matrix

 

Severity Category

S/W Control Category

Catastrophic (1)

Critical (2)

Marginal (3)

Negligible (4)

1

SwCI 1

SwCI 1

SwCI 3

SwCI 4

2

SwCI 1

SwCI 2

SwCI 3

SwCI 4

3

SwCI 2

SwCI 3

SwCI 3

SwCI 4

4

SwCI 3

SwCI 4

SwCI 4

SwCI 4

5

SwCI 5

SwCI 5

SwCI 5

SwCI 5

The Level of Rigor (LOR) tasks are associated with the specific SwCI as defined in SSCM table 9a. Although the SSCM table is similar in appearance to the Risk Assessment Matrix (Table 6), the SSCM is not an assessment of risk. The LOR tasks associated with each SwCI number are the minimum set of tasks required to assess the software contributions to the system-level risk.

The system safety and software system safety hazard analysis processes identify and mitigate the exact software contributors to hazards and mishaps. The successful execution of pre-defined LOR tasks increases the confidence that the software will perform as specified to software performance requirements while reducing the number of contributors to hazards that may exist in the system. The LOR task matrix is shown in Table 9b.

Table 9b. Software Level of Rigor Matrix

Level of Rigor

SwCI

Level of Rigor Task Description

SwCI 1

The program shall perform an analysis of requirements, architecture, design, and code; and conduct in-depth safety-specific Testing.

SwCI 2

The program shall perform an analysis of requirements, architecture, and design; and conduct in-depth safety-specific testing

SwCI 3

The program shall perform an analysis of requirements and architecture, and conduct in-depth safety-specific testing.

SwCI 4

The program shall conduct safety-specific testing.

SwCI 5

Once assessed by safety engineering as Not Safety, then no safety-specific analysis or verification is required.

 

4.2 Assessment of Software Contribution to Risk  

All software contributions to system risk, including any results of Table 10 (below) application, shall be documented in the HTS.

a)    The LOR tasks shall be performed as per table number 9b. Results of the LOR tasks provide a level of confidence in safety-significant software and document causal factors and hazards that may require mitigation. The results of the LOR shall be included in the risk management process.

b)    If the required LOR tasks are not performed, then the system risk(s) contributions associated with unspecified or incomplete LOR tasks shall be documented according to Table 10. Table 10 depicts the relationship between SwCI, risk levels, completion of LOR tasks, and risk assessment. The assignment of risk level is shown in column 2 of Table 10.

c)     Table 10 shows how not meeting the LOR task requirements affects risk.

TABLE 10. Software Contribution to System Risk

Relationship between SwCI, risk level, LOR tasks, and risk

S/w Criticality

Index (SwCI)

Risk Level

Software LOR Tasks and Risk Assessment/Acceptance

SwCI 1

High

If SwCI 1 LOR tasks are unspecified or incomplete, the contributions to system risk will be documented as HIGH and provided to the PM for decision. The PM shall document the decision of whether to expend the resources required to implement SwCI 1 LOR tasks or prepare a formal risk assessment for acceptance of a HIGH risk.

SwCI 2

Serious

If SwCI 2 LOR tasks are unspecified or incomplete, the contributions to system risk will be documented as SERIOUS and provided to the PM for decision. The PM shall document the decision of whether to expend the resources required to implement SwCI 2 LOR tasks or prepare a formal risk assessment for acceptance of a SERIOUS risk.

SwCI 3

Medium

If SwCI 3 LOR tasks are unspecified or incomplete, the contributions to system risk will be documented as MEDIUM and provided to the PM for decision. The PM shall document the decision of whether to expend the resources required to implement SwCI 3 LOR tasks or prepare a formal risk assessment for acceptance of a MEDIUM risk.

SwCI 4

Low

If SwCI 4 LOR tasks are unspecified or incomplete, the contributions to system risk will be documented as LOW and provided to the PM for decision. The PM shall document the decision of whether to expend the resources required to implement SwCI 4 LOR tasks or prepare a formal risk assessment for acceptance of a LOW risk.

SwCI 5

Not Safety

No safety-specific analyses or testing is required.


The risks associated with system hazards that have software causes and controls may be acceptable based on evidence that hazards, causes, and mitigations have been identified, implemented, and verified by DoD customer requirements. The evidence supports the conclusion that hazard controls provide the required level of mitigation and the resultant risks can be accepted by the appropriate risk acceptance authority. In this regard, software is no different from hardware and operators. If the software design does not meet safety requirements, then there is a contribution to the risk associated with inadequately verified software hazard causes and controls. Generally, risk assessment is based on quantitative and qualitative judgment and evidence.

Table 11 shows risk levels and how the risk can affect the system's functioning.

Table 11. Software Hazard Causal Factor Risk Assessment Criteria

Risk Levels

Description of Risk Criteria

A software implementation or software design defect that upon occurring during normal or credible off-nominal operations or tests:

High

·      Can lead directly to a catastrophic or critical mishap, or

·      Places the system in a condition where no independent functioning interlocks preclude the potential occurrence of a catastrophic or critical mishap.

Serious

·      Can lead directly to a marginal or negligible mishap, or

·      Places the system in a condition where only one independent functioning interlock or human action remains to preclude the potential occurrence of a catastrophic or critical hazard

Medium

·      Influences a marginal or negligible mishap, reducing the system to a single point of failure, or

·      Places the system in a condition where two independent functioning interlocks or human actions remain to preclude the potential occurrence of a catastrophic or critical hazard

Low

·      Influences a catastrophic or critical mishap, but where three independent functioning interlocks or human actions remain, or

·      Would be a causal factor for a marginal or negligible mishap, but two independent functioning interlocks or human actions remain.

·      A software degradation of a safety critical function that is not categorized as high, serious, or medium safety risk.

·      A requirement that, if implemented, would negatively impact safety; however code is implemented safely

 

4.3 Software Criticality Levels (DO 178B)

In RTCA DO 178B [4], a software level is decided based on its contributions to potential failure conditions as determined by the system safety assessment process. The software level implies that the level of effort required to show compliance with certification requirements varies with the failure condition category. The software-level definitions are:

a)    Level A: Software whose anomalous behavior, as shown by the system safety assessment process, would cause or contribute to a failure of system function resulting in a catastrophic failure condition for the aircraft.

b)    Level B: Software whose anomalous behavior, as shown by the system safety assessment process, would cause or contribute to a failure of system function resulting in a hazardous/severe-major failure condition for the aircraft.

c)     Level C: Software whose anomalous behavior, as shown by the system safety assessment process, would cause or contribute to a failure of system function resulting in a major failure condition for the aircraft.

d)    Level D: Software whose anomalous behavior, as shown by the system safety assessment process, would cause or contribute to a failure of system function resulting in a minor failure condition for the aircraft.

e)    Level E: Software whose anomalous behavior, as shown by the system safety assessment process, would cause or contribute to a failure of system function with no effect on aircraft operational capability or pilot workload. Once the software has been confirmed as level E by the certification authority, no further guidelines of the DO 178B document apply.

These five categories may be compared as equivalent to the five categories of S/w Control Categories of Mil 882 E (see Table 8).

 

4.4 Software Data Control Categories (DO 178B)

Software life cycle data are assigned to one of two categories: Control Category 1 (CC1) and Control Category 2 (CC2). These categories are related to the Software Configuration Management (SCM) controls placed on the data. SCM introduces more severe control for the category CC1 data control during each activity of the SCM process.

The SCM is involved with 13 different process objective verifications. While CC1 data control verifies all 13 objectives, CC2 data controls verifies only 6 of the process control objectives.

4.5. Aircraft and Engine Software Certification Procedure (DO 178B)

For certification of aircraft and engine software, the certification authority considers the software as part of the airborne system or equipment installed on the aircraft or engine; that is, the certification authority does not approve the software as a unique, stand-alone product. The certification authority establishes the certification basis for the aircraft or engine in consultation with the applicant. The certification basis defines the regulations together with any special conditions that may supplement the published regulations.

For modified aircraft or engines, the certification authority considers the impact the modification has on the certification basis originally established for the aircraft or engine. In some cases, the certification basis for the modification may not change from the original certification basis; however, the original means of compliance may not be applicable for showing that the modification complies with the certification basis and may need to be changed.

The certification authority assesses the Plan for Software Aspects of Certification for completeness and consistency with the means of compliance that was agreed upon to satisfy the certification basis. The certification authority satisfies itself that the software level(s) proposed by the applicant is consistent with the outputs of the system safety assessment process and other system life cycle data. The certification authority informs the applicant of issues with the proposed software plans that need to be satisfied before the certification authority agreement.

Prior to certification, the certification authority determines that the aircraft or engine (including the software aspects of its systems or equipment) complies with the certification basis. For the software, this is accomplished by reviewing the Software Accomplishment Summary and evidence of compliance. The certification authority uses the Software Accomplishment Summary as an overview of the software aspects of certification.

The certification authority may review at its discretion the software life cycle processes and their outputs or evaluate other data or evidence of compliance as felt necessary by the certification authority.

4.6 Software Verification Plan (DO 178B)

The Software Verification Plan is a description of the verification procedures to satisfy the software verification process objectives. These procedures may vary by software level as defined in Annex A-1 of DO 178B. The plan should include:

a) Organization: Organizational responsibilities within the software verification process and interfaces with the other software life cycle processes.

b. Independence: A description of the methods for establishing verification independence, when required, for example, the independent verification and validation procedure.

c. Verification methods: A description of the verification methods to be used for each activity like Review, Analysis and testing.

d) Verification environment – Equipment for testing, analysis tools and testing guidelines.

Annex A-1, DO 178b [4] indicates following categories of verification methods:

a)    The objectives should be satisfied with independence – (IV)

b)    The objectives should be satisfied – (V)

c)     Satisfaction of objectives at Applicants Discretion – (AD).

The verification methods for objectives for A, B, C, and D levels of Software and their outputs are identified in the annexure Tables A-1 to A-10. The data control category (CC1 and CC2) is also indicated for each objective verification process (including the sub-objectives) for different software levels.   

These annexures provide guidelines for the software life cycle process objectives and outputs described in this document by software level. These tables reference the objectives and outputs of the software life cycle processes previously described in this document.

The tables include guidelines for:

a. For level E software, no separate certification requirements.

b. The independence by software level of the software life cycle process activities applicable to satisfy that process's objectives.

c. The control category by software level for the software life cycle data produced by the software life cycle process activities (subsection 7.3).

A summary of the Aircraft and Engine Certification requirements as indicated in Annexures A-1 to A-10 of DO 178 b [4] is shown below:

a)    Some of the objectives of A and B level software are subjected to independent verifications (IV), and rests are put to program verifications i.e the objectives should be verified (V) category.   

b)    Most of the C level S/w are put up for program verification (V) and the D level s/w are mostly AD (satisfaction of objectives is applicants; discretion) category  

c)     Data controls for requirements are generally CC1 and data control for records are of the CC2 category.

The summary details are shown in Table 12.

Table 12. Aircraft & Engine Software Certification Details

DO 178 Table No.

Applicable Verification Method, Output,   and Data Control Category.

A-1: S/w Planning Process (7 objectives)

 

The table has 7 objectives, and the outputs are various plans like design, development, certification plans, coding, SCM, SQA, etc. Verification – V for A, B, and C and AD for D. Data control CC1 and CC2 based on the data.

A-2: S/w Development Process (7 objectives)

There are 7 objectives, and the outputs are S/w design requirement, design description, source and executable codes. All verifications are V category and

Data control - CC1 & CC2

A-3: Verification of OP of S/w Requirement Process (7 objectives)

There are 7 objectives, and the outputs are S/w verification results. IV (for Accuracy & compliance of high-level S/w, and Accuracy of Algorithm – A & B level S/w). V for C & D level with some D levels are AD. Data control mostly CC2.

A-4: Verification of OP of S/W Design Process (13 objectives)

There are 13 objectives, and the outputs are S/w verification results. IV for A & B level for S/w Compliance, Accuracy, architecture, and portioning integrity. C level s/w are all V and D level s/w are of the AD category. Data controls are CC2.  

A-5: Verification of OP of S/w Coding & Integration Processes (7 objectives)

There are 7 objectives, and the outputs are S/w verification results. IV for A level for Source Code Compliance to low-level requirement, s/w architecture, Accuracy & Consistency, for rest of the objectives V category. IV for source code accuracy and consistency of B level s/w, rest V category. C levels are V category, and all D level s/w are AD category. Data control is CC2

A-6: Testing of Outputs of Integration Process (5 objectives)

There are 5 objectives, and the outputs are S/w verification cases and Procedures and S/w verification results. IV for A level S/w for two objectives – object code Compliance to low-level requirement, and robustness. All other objectives are V category. IV for B level s/w for objective code compliance to low-level requirement and rest of the objectives V category. All C-level S/w are V categories and D levels are V and AD categories. Data control is CC1 for requirements and CC2 for rest.

A-7: Verification of Verification Process Results (8 objective cases)

There are 8 objectives, and the outputs are S/w verification cases and Procedures and S/w verification results. IV for all objectives for A level S/w. IV for 3 objectives of B level s/w – Test coverage for s/w structures, and V for rest. C and D level S/w are V and AD category. Data control is CC2.

A-8: S/W Configuration Management Process (6 objectives)

There are 6 objectives, and the outputs are SCM Records and S/w Life cycle environment Configuration Index. All 6 objectives for A, B, C, and D level s/w are subjected to V category verification. Data control baseline traceability and s/w life cycle environment control is CC1 and all others are CC2.  

A-9: S/W QA Process (3 objectives)

There are 3 objectives, and the outputs are S/w QA Records. All 3 objectives for A, B, C, and D level S/w are subjected to IV verifications. Data control is CC2. 

A-10: Certification Liaison Process (3 objectives)

There are 3 objectives, and the outputs Plan for S/w aspect of Certification. All 3 objectives for A, B, C, and D level S/w are subjected to V verifications. All data control is CC1. 

 

4.7 Comparison of Software Safety – Mil 882, RTCA DO 178, and DO 278

RTCA DO 278/ EUROCAE ED 109, “Guidelines for Communication, Navigation, Surveillance, and Air Traffic Management (CNS/ATM) Systems Software Integrity Assurance” [5], is the ground-based complement to the DO-178B airborne standard. RTCA DO-278 provides guidelines for the assurance of software contained in non-airborne CNS/ATM systems.  DO-178B/ED-12 defines a set of objectives that are recommended to establish assurance that the airborne software has been reviewed, and in some cases, modified for application to non-airborne CNS/ATM systems. DO-278 is intended as an interpretive guide for the application of DO-178 B guidance to non-airborne CNS/ATM systems. The two standards are thus interrelated [6].

In the aviation industry, the Unmanned Aerial System (UAS) contains the Ground Control Centre and the Datalink which are essential parts of the UAS. These being ground-based components; the software elements of these components may be designed as per DO 278 standard.  

DO-278 provides guidelines to produce software for ground-based avionics systems and equipment that performs its intended function with a level of confidence in safety. The guidelines are in the form of:

· Objectives of software life cycle processes

· Description of activities and design considerations for achieving these objectives

· Description of the evidence that indicates that the objectives have been satisfied.

The document discusses those aspects of certification that pertain to the production of software for ground-based avionics systems and used in CNS or ATM equipment.

A comparison of the Software Safety/Assurance Levels amongst Mil 882, DO 178 and DO-278 is shown in table 12.

Table 12. Comparison of Software Assurance levels: Mil 882, DO 178, and DO-278

 

Failure Category

Failure Description

DO 178 S/w Criticality Level

DO 278 Assurance Level

Mil-STD 882 SwCI

Catastrophic

Prevents continued safe flight or landing, and many fatal injuries

Level A

AL1

SwCi 1

Critical /Severe major

Failure conditions would reduce the capability of the aircraft or the ability of the crew to cope with adverse operating conditions.

Level B

 

AL2

 

SwCI 2

Major/

Marginal

Impairs crew efficiency, discomfort,   or possible injuries to occupants

Level C

AL3

SwCI 3

 

 

Not Used

AL4

Not used

Minor/

Negligible

Reduced aircraft safety margins, but well within crew capabilities

Level D

AL5

SwCI 4

No Safety Impact

S/w resulting in no effect on the system

Level E

AL6

SwCI 5

 

5. MiL-STD-882E Tasks

Mil-STD-882 defines system safety tasks to be performed. The 100-series tasks apply to management. The 200-series tasks apply to the analysis. The 300-series tasks apply to evaluation. The 400-series tasks apply to verification. These tasks can be selectively applied to fit a tailored system safety effort. Each desired task shall be specifically called out in a contract because the task descriptions do not include requirements for any other tasks.

 

5.1 Task structure.

Each individual task is divided into three parts: purpose, task description, and details to be specified.

a. The purpose explains the rationale for performing the task.

b. The task description describes the work a contractor shall perform if the task is placed on contract. When preparing proposals, the contractor may recommend the inclusion of additional tasks or the deletion of specified tasks with supporting rationale for each addition/deletion.

c. The details to be specified in each task description lists specific information, additions, modifications, deletions, or options to the requirements of the task that should be considered when requiring a task.

5.2 Task Section 100 - Management

a) Task 101 is to integrate “Hazard Identification and Mitigation” into the Department of Defense Acquisition Systems Engineering process using the system safety methodology.

b) Task 102 is to develop a “System Safety Program Plan” (SSPP) that documents the system safety methodology for the identification, classification, and mitigation of safety hazards as part of the overall Systems Engineering process.

c) Task 103 is to develop a “Hazard Management Plan” (HMP) that documents a standard, generic system safety methodology for the identification, classification, and mitigation of hazards as part of the overall Systems Engineering (SE) process.

d) Task 104 is to support reviews, certifications, boards, and audits performed by or for the Government.

e) Task 105 is to provide support to designated program office Integrated Product Teams (IPTs) or Working Groups (WGs).

f) Task 106 is to establish and maintain a closed-loop Hazard Tracking System (HTS).

g) Task 107 is to submit periodic progress reports summarizing the pertinent hazard management and engineering activities that occurred during the reporting period.

h) Task 108 is to implement a ‘Hazardous Materials Management Plan’ (HMMP) which shall be made available to the Government on request.

 

5.3 Task Section 200 - Analysis

a) Task 201 is to compile a list of “Potential hazards” early in development.

b) Task 202 is to perform and document a “Preliminary Hazard Analysis” (PHA) to identify hazards, assess the initial risks, and identify potential mitigation measures.

c) Task 203 is to perform and document a “System Requirements Hazard Analysis” (SRHA) to determine the design requirements to eliminate hazards or reduce the associated risks for a system, to incorporate these requirements into the appropriate system documentation, and to assess compliance of the system with these requirements.

d) Task 204 is to perform and document a “Subsystem Hazard Analysis” (SSHA) to verify subsystem compliance with requirements to eliminate hazards or reduce the associated risks; to identify previously unidentified hazards associated with the design of subsystems; and, to recommend actions necessary to eliminate identified hazards or mitigate their associated risks.

e) Task 205 is to perform and document a “System Hazard Analysis” (SHA) to verify system compliance with requirements to eliminate hazards or reduce the associated risks.

f) Task 206 is to perform and document an “Operating and Support Hazard Analysis” (O&SHA) to identify and assess hazards introduced by operational and support activities and procedures; and to evaluate the adequacy of operational and support procedures, facilities, processes, and equipment used to mitigate risks associated with identified hazards.

g) Task 207 is to perform and document a “Health Hazard Analysis” (HHA) to identify human health hazards, to evaluate proposed hazardous materials and processes using such materials, and to propose measures to eliminate the hazards or reduce the associated risks when the hazards cannot be eliminated.

h) Task 208 is to perform and document a “Functional Hazard Analysis (FHA) of an individual system or subsystem(s).

i) Task 209 is to perform and document an analysis of the “System-of-Systems” (SoS) to identify unique SoS hazards.

j) Task 210 is to perform and document an ‘Environmental Hazard Analysis” (EHA) to support design development decisions.

 

5.4 Task Section 300 – Evaluation

a) Task 301 is to perform and document a “Safety Assessment Report” (SAR) to provide a comprehensive evaluation of the status of safety hazards and their associated risks before the test or operation of a system, before the next contract phase, or at contract completion.

b) Task 302 is to perform and document a Hazard Management Assessment Report(HMAR) to provide a comprehensive evaluation of the status of hazards and their associated risks before the test or operation of a system, before the next contract phase, or at contract completion.

c) Task 303 is to participate in the “Test and Evaluation” (T&E) process to evaluate the system, verify and validate risk mitigation measures, and manage risks for test events.

d) Task 304 is to perform and document the application of the system safety process described in Section 4 of this Standard to “Engineering Change Proposals” (ECPs); change notices; deficiency reports; mishaps; and requests for deviations, waivers, and related change documentation.

 

5.5 Task Section 400 – Verification

a) Task 401 is to define and perform tests and demonstrations or use other verification methods on safety-significant hardware, software, and procedures to “verify compliance with safety requirements”.

b) Task 402 is to perform tests and analyses, develop data necessary to comply with hazard classification regulations, and prepare “Explosive hazard classification data” associated with the development or acquisition of new or modified explosives and packages or commodities containing explosives (including all energetics).

c) Task 403 is to provide “Explosive Ordnance Disposal” (EOD) source data, recommended render-safe procedures, and disposal considerations.

 

6. Conclusion

Defining and following a process for assessing the risk associated with hazards is critical to the success of a program, particularly as systems are combined into more complex System of Systems (SoS). These SoS often involve systems developed under disparate development and safety programs and may require interfaces with other Service (Army, Navy/Marines, and Air Force) or DoD agency systems. These other SoS stakeholders may have their own safety processes for determining the acceptability of systems to interface with theirs.

Therefore, Mil-Std-882 procedure of identifying hazards, assessing the contribution of and deciding the levels of mitigation required to achieve the acceptable level can ensure the system design safe for operation.

Software contribution to system risks are discussed in section 4.4 of Mil-Std-882E, RTCA Do 178B for airborne systems and equipment and RTCA DO 278 discusses the software integrity assurance for ground based CNS/ATM systems. All these standards segregate the software into different levels based on their failure criticality and recommends appropriate procedure for development, testing and qualifications. A comparison of the criticality levels of these standards have been established and shown in table 12.

For contribution to safety by system software, Mil 882 indicates that during the development process, program shall ensure the Level of Rigor for the airborne software as per their criticality as indicated in the standard (see table 10). This procedure is likely to ensure adequate software system safety. However, it does not indicate separate procedure for regulatory verification or testing for software belonging to different criticality. The author is not competent enough to specify whether software verification as indicated in RTCA DO 178B for airborne and DO 278A for ground-based system software verification methodology can be eliminated in case software was developed following the Level of Rigor as described in Mil ST 882E. The regulatory body or the certification authority can only take these decisions.  

 

References

1.  Dev G Raheja, ‘Assurance Technologies – Principles and Practices’, McGraw Hill Publication.

2.  Mil STD 882 E, 11 May 2012, ‘Department of Defense Standard Practices, System Safety’.

3.  L S Srinath, ‘Reliability Engineering’, 4th Edition, Affiliated East West Private Limited, New Delhi, 2013.

4.  RTCA DO 178 B, 01 Dec 1992: Software Consideration in Airborne Systems and Equipment Certification.

5.  RTCA DO 278A: Guidance for the development of software for Communication Navigation, Surveillance, and Air Traffic Management (CNS/ATM) system software integrity Assurance.

6.  Stephen A Jaclin, ‘Certification of Safety-Critical Software Under DO -178C and DO -278A. NASA Ames Research Centre, Moffett Field, CA, 94035. American Institute of Aeronautics and Astronautics.

 

 

1 comment: