12th International Conference on Information Fusion Seattle, WA, USA, July 6-9, 2009 Likelihood-based Optimization of Threat Operation Timeline Estimation Gregory A. Godfrey Advanced Mathematics Applications Division Metron, Inc. Reston, VA, U.S.A. godfrey@metsci.com Thomas L. Mifflin Advanced Mathematics Applications Division Metron, Inc. Reston, VA, U.S.A. mifflin@metsci.com Abstract - TerrAlert is a system that Metron, Inc. has developed to track the progress of suspected terrorist operations and optimize courses of action to delay or disrupt these operations. The underlying algorithms use Monte Carlo sampling and Bayesian, nonlinear filtering to estimate the state (schedule) of a terrorist operation defined by a project management model (such as a Program Evaluation and Review Technique (PERT) or Gantt chart) with uncertain task durations. However, in order to generate schedules via sampling, it is not sufficient to specify only the model and estimated task duration distributions. The analyst must also provide a distribution of start dates for the operation, which we have observed is relatively difficult for analysts to do accurately. In this paper, we describe a likelihood-based approach for estimating the most likely start date given the available evidence, and perform a series of experiments to validate this approach. Keywords: Bayesian tracking, particle filtering, project management. 1 Introduction In prior work, the authors have developed a set of algorithms around a rigorous methodology for estimating the progress of a terrorist operation, modeled in a project management format, such as a PERT or Gantt chart [1]. This model contains a set of tasks, some of which can be performed in parallel and some of which must be performed in a particular sequence (the precedence relations). Each task is assumed to have a fixed duration that is unknown to the analyst. This methodology has been implemented in a software product called TerrAlert, which has undergone recent testing and evaluation with intelligence analysts. TerrAlert assumes that the analyst will be able to provide four primary types of data: 1. the set of tasks and precedence relations between the tasks; 2. an estimated probability distribution for the duration of each task; 3. an estimated probability distribution for the start date of the operation; and 4. the set of available evidence regarding the state of different tasks on different dates, and an estimate of the credibility of the report and the reliability of the source. The first type of data (operational model) may be difficult to know in practice, and is the topic of current and future research in which TerrAlert considers automatically alternative model configurations (tasks in different orders or satisfying different precedence relations). The second type of data (task durations) has been made more reasonable by training analysts in the use of triangular distributions, which derive a distribution given estimates of the minimum, most likely and maximum task durations. The third type of data (operational start date) is addressed directly in this paper. The fourth type of data (likelihood functions) can be derived given independent estimates of the report credibility and the source reliability, on a six-point scale. TerrAlert converts these credibility and reliability assessments into likelihood values that describe the probability of observing a particular reported operational state given that the operation is in a particular state (we will elaborate on the details of what this means in the next section). We observed in the TerrAlert training and evaluation that it was difficult for analysts to estimate the start date of a terrorist operation. For example, even now, how many analysts would know when the initial planning tasks started for the 9/11 attacks? If the analysts hedge their bets by specifying a wide distribution of possible start dates, then this wide range of uncertainty propagates directly into a wide range of uncertainty on the attack task at the end of the operation. After the TerrAlert training and evaluation, we concluded that asking analysts to provide task duration estimates is reasonable, but requesting operational start date estimates is not. The technical basis by which TerrAlert would estimate the start of the operation is to find the start date that best fits the available evidence to the operational model. In this paper, we describe an approach for doing so, where the fit is judged using a maximum likelihood calculation. In section 2, we define the notation and formulate the problem to be solved. In section 3, we derive the start date likelihood equation to be optimized. In section 4, we apply a Golden Section, one- 978-0-9824438-0-4 2009 ISIF 948
dimensional search algorithm to find the maximum of the start date likelihood equation. In section 5, we design and perform a set of experiments to test the effectiveness of the optimization algorithm. Finally, in section 6, we summarize the conclusions and define directions for future research. p 1 p 2 p N Rainbow Chart Not Started Ongoing Finished 2 Notation and Problem Formulation In this section, we summarize the TerrAlert methodology (additional details are available in [1]) and define the notation used this paper. Consider a hypothetical plan (in Figure 1, a nerve agent attack) that consists of a set of J tasks z 1, z 2,, z J with precedence relations describing the order in which tasks must be performed, with some tasks in serial and some in parallel. What makes this model stochastic are the task durations, denoted by the random variable τ j for task j, which follows a specified probability distribution. µ 1 = 6 months Assemble Team µ 2 = 6 months Develop Delivery Method µ 3 = 12 months Produce Nerve Agent µ 4 = 3 months Prepare Equipment µ 5 = 2 months Select Target µ 6 = 1 month Attack Target Figure 1 Representative model with average task duration μ listed above each task In order to approximate the induced distribution of start and end dates for each of the tasks, we use a nonlinear particle filtering approach. To form the approximation, we generate N schedules (generally in the thousands) for the model via Monte Carlo simulation of the task durations. Initially, all sample schedules are assumed to be equally likely in terms of representing the actual schedule for the operation (weight p i = 1/N for all sample schedules i = 1,, N). Each schedule contains the entire past, present and future for one particular sample path for executing the activity. 2.1 Activity State Space To track the progress of the operation, we define a compact state space for the operation. Given a particular date, we can assess the state of each task in a sample schedule as being either Not Started (NS), Ongoing (OG) or Finished (F). The state of a sample schedule, then, is the state of each of its tasks on a particular date. The aggregate state of the entire set of schedules can be summarized by taking the weighted sum of each task state across all schedules. For example, the probability that Task 2 is OG at a given time is the sum of the probabilities on those schedules that have Task 2 in the OG state at that time. If we assign each of the task states a color, then the aggregate task probabilities correspond to the color slices for each task in Figure 2. We call the resulting picture a rainbow chart. Color indicates Task State at Time t (Not Started, Ongoing, Finished) Figure 2 Rainbow chart used to track the progress of a hypothetical operation 2.2 Measurement State Space We use a Bayesian, nonlinear tracking approach to update each sample schedule weight based on evidence that supports or refutes the schedule. For example, evidence that a particular task is ongoing as of a given date means increasing the probability weight on schedules that have that task as ongoing on that date and decreasing it on schedules that do not. The Bayesian approach specifies the amount of change on each schedule weight, taking into account the uncertainty of the evidence. Let Y j (t) = y be an evidence report regarding the state of task j at time t, and X j (t) be a random variable representing the (unknown) ground truth state of task j at time t. The likelihood function L(y ) converts an observation y to a function on the task state value x according to { j j } Ly ( x) = Pr Y( t) = y X( t) = x,. (1) where x {NS, OG, F}. That is, the likelihood function provides a probability of observing y given that the task is in state NS, OG or F. The observation (evidence) y is known, but the ground truth task state x is not. Often, likelihood functions can be defined from a confusion matrix, such as the one illustrated in Figure 3. In this example, the confusion matrix takes a ground truth state and applies noise to produce the reported state. If the ground truth state is OG, then the reported state is either NS, OG or F with probabilities 0.05, 0.80 and 0.15, respectively. The likelihood function associated with a particular reported state is the relevant column in the confusion matrix. If the reported state is OG, then L(y ) = {0.2, 0.8, 0.2}. Ground Truth Reported State NS OG F NS 0.70 0.20 0.10 OG 0.05 0.80 0.15 F 0.05 0.20 0.75 Figure 3 Example of a confusion matrix used to define the evidence likelihood function, L(y ) 949
Given an observed report y j (t) for task j at time t, we update the probability weight p i (t) on schedule i based on the (deterministic) state x ji (t) of task j under schedule i at time t ( ) p () t = p () t L y () t x () t. (2) i i j ji In other words, multiply the schedule weight by the likelihood of observing the evidence given the state of that particular schedule. Using the example from Figure 3, if the reported state is OG, then multiply all schedules with task state OG on that date by 0.8 and all other schedules by 0.2. After completing this Bayesian update, renormalize the probabilities to sum to one across all schedules. In addition to tracking the progress of an activity, this same approach can be used to forecast the progress in the future. Since each sample schedule contains the past, present and future for that Monte Carlo sample path of the activity execution, making future predictions about the progress of the activity in the absence of additional evidence is as easy as advancing the clock. 3 Derivation of Start Date Likelihood Calculation Let us assume that a model can be specified completely by knowing the set of tasks, precedence relations, task duration distributions and operational start distribution for that model. We summarize the model by the start date random variable τ 0 and the task duration random variables τ j for task j = 1,, J. We assume there is a set of K evidence reports available, with the k th report, y jk (t k ), describing an observation of the state of task j k at time t k. To simplify the notation, we drop the explicit reference to the time of the k th report, t k, and use the presence of the k subscript or sub-subscript to clarify the time association. We want to use these reports to infer the maximum likelihood start date, τ 0. To do so, we would compute the cumulative likelihood of observing the set of available evidence under this model, which we write as,..., j,,..., J ) L y y τ τ τ. (3) 1 K 0 1 Since we need to find the optimal value of τ 0, we would integrate over the task durations τ 1 to τ J to get an expected likelihood conditioned only on the start date,,..., y ) 1 j τ 0 = K (,...,,,..., ) (,..., ) L y L y y τ τ τ p τ τ dτ dτ.(4) j1 jk 0 1 J 1 J 1 J We can use the independence assumption of the evidence reports to break the likelihood calculation (3) into the product of individual evidence likelihoods across all reports.,..., yj τ ) L y = 1 K 0 K L( yj τ0, τ1,..., τ ) ( 1,..., ) k J p τ τj dτ1 dτj k = 1. (5) Although the expression in equation (5) is explicit, in general, the integration is intractable due to the need to compute the likelihood of the report for each start date and set of task durations. Instead, we use the Monte Carlo sampling approach to generate individual schedules conditioned on a particular start date τ 0 and a set of sampled durations. We will approximate the integral over all possible task durations as a sum over the set of sampled task durations. Since all schedules generated via the Monte Carlo sampling are equally likely, then we can replace the probability in the integral with 1/N in the sum. Under schedule i, task j k (associated with evidence report k) has a particular state x ijk at time t k. Given this information, we can compute the likelihood of the report conditioned on schedule i as (,,..., j 0 1 ) ( ) k i ij jk ijk L y τ τ τ = L y x. (6) That is, the likelihood of the report depends only on the state of the corresponding task under schedule i. Substituting equation (6) in equation (5) and changing the continuous integral to a discrete sum, we get N K 1,..., ) ( ) 1 j τ K 0 jk ijk L y y L y x N i= 1 k = 1. (7) Choosing the number of schedules N to use in the cumulative likelihood calculation involves trading off fidelity in the calculation against the computational expense. For example, the purpose of defining the cumulative likelihood is to assess the quality of fit between the evidence and different operational start dates. Figure 4 illustrates the cumulative likelihood over a wide range of operational start dates using either 10,000 Monte Carlo schedules or the single average schedule derived from using the average duration for each task. Model Likelihood Given Start Date (Log10) 1.E-03 1.E-04 1.E-05 1.E-06 1.E-07 1.E-08 1.E-09 1.E-10 1.E-11 1.E-12 1.E-13 1.E-14 1.E-15 1.E-16 0 2 4 6 8 10 12 14 Operational Start Date 10,000 Monte Carlo Schedules Single Schedule using Average Durations Figure 4 Cumulative likelihood comparison given average and Monte Carlo schedules 950
The likelihood based on the average schedule is piecewise-constant as a function of the operational start date. If the start date is set to be early enough, then all evidence reports that observe a task in the Finished state will agree with the average schedule, and all other reports will disagree with the schedule. If the start date is set to be late enough, there will be agreement with all reports that observe a task in the Not Started state, and disagreement with all other schedules. The model likelihood jumps in value when there is a change in the agreement between the evidence and the average schedule given that start date. For 10,000 Monte Carlo schedules, the function is also piecewise-constant, but with smaller steps because the likelihood changes when there is a change in agreement between the evidence and any one of the schedules. Note also that the Monte Carlo and average curves have the same asymptotic likelihoods in either time direction. The Monte Carlo curve also tends to have a wider peak because the variance in task start and end dates is higher over multiple schedules. In the next section, we describe an algorithm for finding the peak of either of these functions efficiently. 4 Golden Section Search Algorithm Regardless of whether the model likelihood is computed using the average schedule or a set of Monte Carlo schedules, we need an efficient algorithm for determining the maximum likelihood start date of the operation (the date corresponding to the peak of the model likelihood function). In this section, we describe a one-dimensional Golden Section search algorithm that draws heavily from sections 10.1 and 10.2 in [2]. There are two main parts: (1) determine the initial bracket for the one-dimensional search, and (2) select the next start date to consider as a possible peak. Figure 5 serves as a visual guide to the sequence of points generated by the algorithm. Model Likelihood (Log10) 1 7 8 After adding point 4, remove the interval from 1 to 3 5 5 3 7 Start Date 6 8 4 Figure 5 Illustration of sequence of points for Golden Section search 6 2 Let S be the length of the critical path of the average schedule (that is, the schedule that uses the expected values for the task durations). If t 1 t 2 t K are the dates for the K evidence reports, then an extremely conservative initial bracket of start dates would be a = t 1 S and c = t K. (8) We need to select an interior point b in this interval that has a strictly greater likelihood than either endpoint. The initial guess of this interior point is based on the Golden Section value W. W 3 5 = 0.382, 2 ( ) ( 1 ) b = a+ W c a = Wc+ W a. If f (b) > f (c) and f (b) > f (a), then continue to the next part of the algorithm. Otherwise, we will sample other points until the interior point b has a greater likelihood than either endpoint. In Figure 5, this initial triplet of points is indicated by the points labeled 1, 2 and 3. 4.2 Select the next point (Golden Section search) The process for finding the peak involves cutting off one end of the interval [a, c] at each iteration, given the triplet (a, b, c). To do so, we determine the next point to be sampled, q, which will be in the interior of the larger of the two intervals [a, b] and [b, c]. We consider these two cases separately. If b a > c b, then interval [a, b] is larger, and we have the set of points (a, q, b, c), where ( ) ( 1 ) q = b W b a = Wa+ W b. As part of this update, the portion to be cut will be either [a, q] or [b, c]. Evaluate the likelihood f (q) and update the triplet (a, b, c) as follows. ( ) ( ) ( abc) ( qbc) If f ( q) > f( b), then let a, b, c a, q, b. Otherwise, let,,,,. For the other case in which b a c b, the interval [b, c] is larger, and we have the set of points (a, b, q, c), where 4.1 Determine the initial bracket The first phase consists of finding three start dates (or points), (a, b, c) that have a cumulative model likelihood (or value), (f (a), f (b), f (c)), that must satisfy the following conditions: ( ) ( 1 ) q = b+ W c b = Wc+ W b. The portion to be cut will be either [a, b] or [q, c]. Evaluate the likelihood f (q) and update the triplet (a, b, c) as follows. a < b < c and f (b) > f (c) and f (b) > f (a). 951
( ) ( ) ( abc) ( abq) If f ( q) > f( b), then let abc,, bqc,,. Otherwise, let,,,,. In either case, for the new triplet (a, b, c), f (b) is guaranteed to be larger than either f (a) or f (c). Repeat this step and compute a new point q as before. Figure 5 shows a sequence of new points, and for each new point, we show which interval is removed in that step. For example, after adding point 6, we remove the interval between points 4 and 5. At each step, a fraction W (38%) of the interval will be removed regardless of whether it is chopped from the left or the right, so the bracket will decrease in size very quickly. In fact, after 14 iterations, the interval will decrease by a factor of about 1,000 from the original length. 5 Experimental Results We performed a series of experiments to evaluate the performance of the start date optimization algorithm. There are 100 different models, each constructed to have exactly twelve tasks. Each task has a duration modeled as a triangular distribution. For a particular task, let Z 1, Z 2 and Z 3 be independent random samples drawn from an Exponential distribution with parameter λ. Then the triangular distribution for that task has minimum duration Z 1, most likely duration Z 1 + Z 2, and maximum duration Z 1 + Z 2 + Z 3. The precedence relations are chosen at random by partitioning the tasks into subgroups. Each subgroup is performed in sequence, and the tasks within a subgroup are performed in parallel, such as illustrated in Figure 6. For each model, we attempt to find the maximum likelihood operational start date using either the average schedule, Monte Carlo with 100 generated schedules, or Monte Carlo with 10,000 generated schedules. In each case, the initial bounding interval for a particular model and set of evidence will be the same, and is defined by equation (8). The range of this initial interval is approximately five years, or roughly 1800 days. Given an initial triplet of points (a, b, c), we apply the Golden Section algorithm for 17 iterations. This cuts the size of the bounding interval to about a half-day, and we choose as the peak the interior point in this final interval. Figure 7 shows the results of the experiments. For each model, we compute the cumulative likelihood assuming that the ground truth start date is known, where the likelihood is computed using either the average schedule (AVG), Monte Carlo with 100 schedules (MC100), or Monte Carlo with 10,000 schedules (MC10K). These ground truth likelihood values are plotted on the x-axis in the chart. As a comparison with the ground truth values, we use the Golden Section algorithm to find the start date with the maximum likelihood for each model using either AVG, MC100 or MC10K. This maximum likelihood is plotted on the y-axis in the chart. Likelihood given Optimized Start Date (Log10) 1.E-03 1.E-05 1.E-07 1.E-09 1.E-11 1.E-13 1.E-15 1.E-17 1.E-19 1.E-21 1.E-23 1.E-23 1.E-21 1.E-19 1.E-17 1.E-15 1.E-13 1.E-11 1.E-09 1.E-07 1.E-05 1.E-03 Likelihood given Ground Truth Start Date (Log10) Average Sched 100 MC Sched 10,000 MC Sched Figure 6 Example of precedence relations from random task partition approach For each model, we construct a ground truth schedule using a fixed operational start date and the most likely value for each task duration. We generate 31 evidence reports for each model, spaced equally in time across the operation. For each evidence date, either all tasks are Not Started, all tasks are Finished, or at least one task is Ongoing under the ground truth schedule. If at least one task is Ongoing, then choose one of those tasks at random uniformly. Otherwise, choose one of the Not Started or Finished tasks at random uniformly. We apply a symmetric confusion matrix to convert the ground truth state to the reported state. The confusion matrix, which we assume is known, has values of 0.9 on the diagonals and 0.05 on the off-diagonals. Figure 7 Comparison of cumulative likelihoods given the ground truth start date versus the Golden Search-optimized start date In addition to the individual points in the chart, we include a diagonal line that shows where the two likelihood values would be the same. If our Golden Section search algorithm found the ground truth start date every time, then all points would fall on this line. Points above this line suggest that the search algorithm finds a start date that is a better fit to the evidence than the ground truth start date, and points below this line suggest that the optimized start date is worse fit. For the AVG results (pink boxes), there is a large spread of likelihood values, but nearly all of the optimized likelihood values are greater than ground truth. There are two primary reasons for this behavior. First, the average schedule provides a fast, crude approximation to the true likelihood curve, as 952
is shown in Figure 4, so the start date peak using the average schedule may not agree with the ground truth start date. Second, the noise in the evidence reports provide an opportunity to find start dates that agree with the evidence better than the ground truth start date. The Monte Carlo results are similar, with less spread in the likelihood values as the number of schedules increases because the likelihood curve approximation becomes more refined as more schedules are added. Although there are a few models for which the optimized likelihood is less than the ground truth likelihood, the gap is relatively small (generally less than an order of magnitude). 6 Conclusions In this paper, we have described an approach for optimizing the start date of an operation based on maximizing a start date likelihood calculation. In experimental testing, the algorithm performs well and the number of schedules used in the calculation can be tuned to the available computational budget. We are have implemented these algorithms into the operational version of TerrAlert, where we believe it will improve the analyses and reduce the amount and precision of information that an analyst must specify manually, especially when available evidence is relatively plentiful. In addition, this start date optimization base opens up research opportunities that we believe will lead to significant, new capabilities for real-world analysts. First, there are other extensions to the start date optimization that we plan to consider. For example, in this paper, we find the optimal start date by setting all Monte Carlo schedules to start on that date and computing the cumulative likelihood for that date. One alternative to this approach, which is virtually guaranteed to increase the maximum cumulative likelihood, is to optimize the start date for each Monte Carlo schedule independently. The maximum cumulative likelihood is then the weighted average of the individually optimized schedule likelihoods. This will increase the computational effort, but we suspect at most by a factor of two or so. Second, we would like to extend the start date optimization to incorporate resampling, which is a feature of particle filters described in detail in [1]. Given a set of generated schedules, TerrAlert updates the probabilities on each schedule based on available evidence. Resampling is an approach for pruning low probability schedules and splitting high probability schedules in two such that both schedules share the same past but have different Monte Carlo futures. Since this is an important feature within TerrAlert, we would like for the start date optimization to incorporate this approach as well. Finally, the start date optimization is the foundation for a new capability by which TerrAlert could generate automatically alternative configurations of a model (different task order and precedence relations) to find the maximum likelihood configuration. We will be investigating approaches that generate alternative configurations, optimize the start date for that configuration and compute the cumulative likelihood of the evidence given that configuration. We believe this capability will provide a great leap in the ability of analysts to consider multiple hypotheses regarding a terrorist operation automatically. REFERENCES [1] G. Godfrey, J. Cunningham and T. Tran, A Bayesian, Nonlinear Particle Filtering Approach for Tracking the State of Terrorist Operations, Intelligence and Security Informatics, IEEE, 23-24 May 2007, 350-355. [2] W. Press, B. Flannery, S. Teukolsky and W. Vetterling, Numerical Recipes: The Art of Scientific Computing, 9 th printing, Cambridge University Press, Cambridge, 1989. [3] L. Stone, C. Barlow and T. Corwin, Bayesian Multiple Target Tracking, Artech House, Boston, 1999. 953