SESSION ID: GRC-T11 Break the Risk Paradigms - Overhauling Your Risk Program Evan Wheeler MUFG Union Bank Director, Information Risk Management
Your boss asks you to identify the top risks for your organization where do you start? 2
Goals of Risk Management Minimizing uncertainties for the business Aligning and controlling organizational components to produce the maximum output Providing governance and oversight Cost effective 3
Status Quo + = 4
Challenges with Current Approaches 1. Analysis - Confusion No single definition for terms Unclear scoping Undocumented assumptions 2. Measurement - Inconsistent model Vaguely defined rating scales Focus on possibility vs. probability No adjustments for bias or confidence Rarely data driven 3. Different mental risk models 5
Breaking the Mold Implementing FAIR 6
What is FAIR? A risk methodology should at least include: Single definition of risk Risk factors or ontology Methodology to measure risk Alignment with control maturity and threat intelligence standards Integration into Enterprise Risk Frameworks ISO 31000 COSO ERM Control Checklist Analytic Measurement Framework FAIR? OCTAVE NIST RMF ISACA Risk IT COBIT 7
Benefits of Using FAIR Ontology and method for understanding, analyzing and measuring information risk Logical and rational risk analysis framework Expresses risk in the context of a loss scenario Improves ability to defend conclusions and recommendations Additional standards have been built on it, such as a Controls Ontology The Open Group is a global consortium that enables the achievement of business objectives through IT standards Open industry standard Mappings to ISO, NIST, STIX, etc. Standard Evolved from a global insurance firm Used top companies across sectors Relevant Designed for operational risk Risk factors for data, technology, and cyber scenarios Tailored Critical thinking framework Layers of abstraction Qualitative or quantitative Extensible
Alignment with NIST CSF NIST evaluates the control environment using a relative maturity rating FAIR measures risk exposure based on how often loss is likely to occur and how bad it s likely to be FAIR is the future of information security, as that s how we will bridge the gap and talk about risk in a common language. CISO, Federal Reserve Bank of NY 9
Where do you start assessing? Incidents 1 Asset Profiling 2 Threat Modeling 3 Incident / Vulnerability Analysis 4 Controls Self Assessment Controls 10
Iterative Adoption Approach Typical Qualitative => Simple Estimation => Advanced Estimation Inherent Risk Control Environment Residual Risk (ordinal scale) 5 Categories of Primary Loss Primary Loss Event Frequency Predefined Ranges (min, max) Annualized Timeframe Best, Most Likely or Worst Case Confidence (qualitative) Residual Risk (ordinal scale) 11 Threat Event Frequency Susceptibility 5 Categories of Secondary Loss Secondary Loss Event Frequency Flexible Ranges (min, m/l, max) Simulations Confidence (interval) Residual Risk (distribution) 5 Categories of Primary Loss Annualized Timeframe
Prep & Scoping Simple Estimation 12
FAIR Ontology Risk Loss Frequency Loss Magnitude Threat Event Frequency Susceptibility Primary Loss Secondary Loss Contact Frequency Probability of Action Threat Capability Resistance Strength Loss Event Frequency Loss Magnitude 13
Risk Model Basic Factor Analysis Def: the probable frequency, within a given timeframe, that threat agent will inflict harm upon an asset How much risk is associated with? Risk Def: the probable frequency and magnitude of future loss Def: the probable magnitude of loss resulting from a loss event Probable Loss Event Frequency (#) Probable Loss Magnitude ($) 14
Scenario Scoping What is the risk of data loss? How much risk is associated with an employee intentionally deleting client health data from the production systems if the backups are unreliable, worst case over the next year? Focus on outcomes, not control weaknesses Break the problem down into smaller measurable questions 15
How much risk is associated with a failed backup when data needs to be restored due to insider maliciously deleting production data? Asset at Risk Threat Community Motivation Loss Area Business Line X, Application Y Client Health Records Amateur Hacker Cyber Criminal Nation State Privileged Insider Accidental Malicious Confidentiality Integrity Availability Assumptions Approximately 1,000 client records in application Employee data isn t impacted Health records fall under HIPAA regulations Susceptibility to privileged insider abuse is ~ 100% Not all impacted clients will notice an impact directly Client turnover (loss of future business) would be minimal Insurance will cover some response costs Records could be recreated from paper and manually re-entered
Measurement & Analysis Simple Estimation
Qualitative Drawbacks How much risk reduction is enough? X = high risk Where are the opportunities to reduce our exposure? Frequency isn t used explicitly What is the time horizon for our outlook and estimates? Next 3 months, next 10 years? How many Lows equals a High rating? 18
Quantitative Assumptions Won t our SMEs just be guessing? We don t have enough data How can we estimate when it has never happened before? Objections to quantitative measurement models But we are a unique snowflake! 1. Your problem is not as unique as you think. 2. You have more data than you think. 3. You need less data than you think. 4. There is a useful measurement that is much simpler than you think. 19
Measuring Risk How often loss is likely to occur and how bad it s likely to be. When you evaluate a risk, you are estimating the future potential for some event(s). It will have ranges of probable impact and likelihood of occurrence (or frequency of re-occurrence). 20
Simple FAIR Estimation Elements Predefined Ranges (min, max) Annualized Timeframe Best, Most Likely or Worst Case 5 Categories of Primary Loss Primary Loss Event Frequency Confidence (qualitative) Residual Risk (ordinal scale)
Key Concepts Accuracy vs. Precision Time Horizon Minimum: X - Maximum: Y Annualized Loss Expectancy
Order Matters Always estimate impact first Worst-case? Most common outcome? Rate likelihood second Best Case Most Likely Worst Case Forces you to clarify the event you re evaluating, which helps to avoid misalignment EXCEEDING TOLERANCE Most likely annualized risk M Dec 2015 One-time maximum loss H 23
1 Probable Loss Magnitude 2 Loss Event Frequency 3 Residual Risk Exposure
Forms of Loss 1 Probable Loss Magnitude Productivity Response Replacement Fines & Judgments Reputation / Competitive Adv. Operational inability to deliver products or services resulting in unrealized revenue (i.e. $ / time) Costs of managing an event (i.e. communication, regulatory demands, etc.) Replacement of capital assets (i.e. applications, personnel, etc.) Fines or judgments levied against the organization through civil, criminal or contractual actions External stakeholder perspective on organization s value decreased or liability increased, or intellectual property or key competitive differentiators damaged 25
Sample Pre-Defined Impact Tables Magnitude Min Max Productivity 1 Response 2 Replacement Severe $25m Above High $1m <$25m Moderate $500 k <$1m Full service exceeds 1 business day, or degradation exceeds 1 week Full service exceeds RTO, or partial exceeds RTOx2 Partial service up to RTOx2, or full service up to RTO 1,000 hours or more 500 up to 1,000 hours 100 up to 500 hours Low $5k <$500k Partial service up to RTO 5 up to 100 hours Immaterial $0 <$5k No SLA breach up to 5 hours 1. Assumes revenue isn t collected during downtime and won t be recuperated afterwards 2. Avg. loaded person hourly rate @ $75 - $150 26 Funding approval from Board required Requires out of budget funding In function s budget but postpones planned investment Replacement cost in function s discretionary budget No cost or covered by insurance
Probability & Frequency 2 Loss Event Frequency Probability - how likely something bad is to happen Frequency - how many times something bad is likely to happen Past performance is not always an indicator of the future variables change! Threat characteristics example: The frequency with which threat agents come into contact with our organizations or assets The probability that threat agents will act against our organizations or assets The probability of threat agent actions being successful in overcoming protective controls The probable nature (type and severity) of impact to our assets FREQUENCY SCALE < 0.1 times per year (less than once every 10 years) between 0.1 and 1 times per year between 1 and 5 times per year between 5 and 50 times per year > 50 times per year 27
Evaluating Adversarial Threats Sophistication of skills required Availability of exploit tools Size of user community (threat universe) Motivation of attacker Opportunity 28
Confidence Initial / Intuitive - Immature or developing assessment approach exists, a formal assessment model may not be established or is in early stages. Predictions are largely based on the experience of the assessors. Repeatable - An assessment model is established and is producing consistent assessments using a standard criteria. Risks are being regularly assessed. Assessment may be based on consensus opinion, or assessors are at least engaging risk-practiced SME's, reviewing incident statistics, or referencing trend data to inform assessments. Measurable - Assessment model is well defined and has been refined/calibrated over time, and trend data and incident statistics have been analyzed to model future predictions. Assessors are trained, practiced, and experienced analyzing risks in this area. The assessments themselves may have been revised and updated over time. 29
1 Probable Loss Magnitude 2 Loss Event Frequency 3 Residual Risk Exposure
Program Development 31
Two Approaches Ground Up Choose a standard set Housekeeping and clean up Engage line managers Establish risk mitigation expectations Review existing assessment data Prioritize & execute action plans Gather activity based metrics Demonstrate value to process owners Top Down Implement a risk mgt. policy & model Identify inherent risk Establish governance & assign roles Prioritize areas for assessment Solicit risk information from business Prioritize & execute initiatives Gather performance based metrics Demonstrate value to risk committee 32
Program Maturity Optimized structured, organization wide program is enforced and well managed. Consistent across the organization, ground up and top down, integrated into all the business processes. Continual reassessment of risks and inefficiencies in the program. Managed & Measurable standard part of procedures, regular reporting of risks and performance metrics to management, informed decision making based on risk assessments, risks regularly reassessed, some automation in place Essentials Implemented process defined with significant adoption across the organization, regular reporting of highest risks to management, risk reassessed, formalized tracking in place Defined Process process defined but not widely adopted, awareness/training made available, based on a standard methodology 3 Repeatable but Intuitive 1 Non-existent 4 Defined Process 5 Essentials Implemented Repeatable but Intuitive Initial / Ad Hoc 2 Initial / Ad Hoc reactive and rarely has any accountability, tactical level only, never gets management visibility Non-existent this does not occur 33 6 Managed & Measurable 7 Optimized Where are you on this maturity scale? immature and developing approach exists and is implemented for major initiatives or risks
Mature Program Elements Formal risk responsibilities and escalation process documented Embedded in key processes throughout organization Performance indicators for the risk program itself Ensure that the scope and focus of the program is reviewed regularly Risk training program and outreach Recognize employees for identifying risks 34
Apply Implementing a Better Model Formalize terminology Create scoping and analysis templates Determine initial impact ranges Train analysts Analyze scenarios in parallel with existing model Evangelize benefits of new methodology Recalibrate and refine impact ranges 35
Recommended Reading Security Risk Management: Building an Information Security Risk Management Program from the Ground Up ISBN: 9781597496155 Amazon Link: http://amzn.to/hyrmvc Measuring and Managing Information Risk: A FAIR Approach ISBN: 978-0124202313 Amazon Link: http://amzn.com/0124202314 The content of this presentation does not reflect the views or opinions of MUFG Union Bank. 36
Appendix - Example Analysis Using FAIR Hurricane Call Center 37
Hurricane Scenario 38
Define the Scenario Issue Statement: Scope: The company s only two call centers aren t regionally dispersed. How much risk is associated with a storm impacting both of the company s call centers at the same time, making them inaccessible to employees? 39
Seeking Risk Acceptance Why? Mitigation is cost prohibitive? Mitigation strategy has long duration or is unknown? Likelihood of occurrence is insignificant? Risk exposure is temporary?? 30% 40
Analysis Steps 0. Prerequisite 1. Identify scenario scope Conduct calibration exercise to ensure your stakeholders are comfortable with estimates Identify the asset at risk Identify the threat community under consideration Prep Meeting Sections 2. Evaluate Loss Magnitude 3. Evaluate Loss Event Frequency Estimate the Forms of Loss impact Results will drive Detective and Response Controls Estimate the Probable Frequency Results will drive Preventative Controls Workshop Sections 4. Derive & articulate Risk Determine the risk and capture results in standard format Post-Scenario Steps Post Workshop Section 41
Scenario Scope Asset at Risk Call Center Outsourcing Service Call Center Facilities Threat Community Privileged Insider Amateur Hacker Cyber Criminal Nation State Act of Nature Motivation Malicious Accidental Impact Area Availability Confidentiality Integrity Probable Loss Magnitude Best Case Most Likely Case Worst Case Forms of Loss Productivity / Loss or Disruption of Services Response Replacement Legal and Regulatory Competitive Advantage / Reputation Loss Event Frequency Top Risk Alignment To be determined during scenario exercise Major operations disruption will prevent company from meeting client SLAs. Assumptions Company provides call center outsourcing as a service provider to other corporations Both call center sites are located on different coasts of Florida (i.e. Tampa and Jacksonville) If both call centers are unavailable, the support function cannot shift to another location, however, employees can work from home if the call center still has power Employees are not able to perform their duties remotely for some subset of clients who have strict rules requiring staff to be at the physical location to access their client information Contracts with premier clients require 99.98% service availability, and a recovery time objective of 2 hours All client contracts stipulate unlimited liability for disruptions that are caused by gross negligence Our company is not directly regulated, however, several financial services and healthcare clients are, so those requirements are indirectly inherited Revenue is only lost when both call centers are unavailable Call centers of backup power generators Company owns the call center buildings Insurance policy deductible is $100k, and policy doesn t cover flood damage 42
Impact Assuming worst case major hurricane (Cat 3 or above) and path hits both coasts of Florida Electricity and water may be unavailable to residents for several days to weeks after the storm passes When a major hurricane hits, the transportation and power infrastructure can be unavailable to commercial areas for 1 day to 5 days on average Major hurricane may result in loss of power to the call center and staff denial of access Employee homes and call centers will be unavailable simultaneously for at least one day 43
Forms of Loss Loss Type Productivity / Loss or Disruption of Services Inability to deliver products or services Impact Description Call centers are unavailable from 1 day to 5 days Revenue per day is $50k 30% of client revenue cannot be supported using work from home capabilities Expected loss of $50k - $250k Response Costs of managing an event (i.e. client communication, regulatory, etc.) Replacement Replace capital assets (i.e. database app) Legal and Regulatory Fines or judgments levied against organization through civil, criminal or contractual actions Competitive Advantage / Reputation External stakeholder perspective on organization s value decreased or liability increased, or intellectual property or key competitive differentiators damaged 44 Staff time of IT staff to restore systems from power outage Min: 2 staff x 4 hrs x $75 rate = $600 M/L: 4 staff x 6 hrs x $75 rate = $1,800 Max: 6 staff x 22 hrs x $75 rate = $9,900 Staff time of Facilities staff to restore working conditions from weather damage Min: 4 staff x 8 hrs x $20 rate = $640 M/L: 4 staff x 24 hrs x $20 rate = $1,920 Max: 4 staff x 60 hrs x $20 rate = $4,800 Repairs to the building due to debris or flood $1k - $10k - $100k None Based on scenario assumptions, reputational impact will be significant with threat of losing premier clients to competitors Morale and retention issues if employees are forced to work rather than looking after their own homes and families
Frequency Data 1 major hurricane hits Florida every other year on average No more than 4 hit in any one year 1 in 5 hurricanes that impact Florida will affect both sides of the state Min: 0, Most Likely: 0.1, Max: 1 45
Simple FAIR Analysis 1 Probable Loss Magnitude 1 in 7 hurricanes that impact Florida will affect both sides of the state 2 Loss Event Frequency 3 Residual Risk Exposure 46
Risk Treatment Would additional work from home capabilities help? Move a call center? Establish remote staff in another state? Lower insurance deductible? Accept as is? The content of this presentation does not reflect the views or opinions of MUFG Union Bank. 47