Success Probability of Multiple/Multidimensional Linear Cryptanalysis Under General Key Randomisation Hypotheses

Size: px

Start display at page:

Download "Success Probability of Multiple/Multidimensional Linear Cryptanalysis Under General Key Randomisation Hypotheses"

Briana Brianna Payne
5 years ago
Views:

1 uccess Probability of Multiple/Multidimensional Linear Cryptanalysis Under General Key Randomisation Hypotheses ubhabrata amajder and Palash arkar Applied tatistics Unit Indian tatistical Institute 03, B.T.Road, Kolkata, India subhabrata.samajder@gmail.com, palash@isical.ac.in Abstract This work considers statistical analysis of attacks on block ciphers using several linear approximations. A general and unified approach is adopted. To this end, the general key randomisation hypotheses for multidimensional and multiple linear cryptanalysis are introduced. Expressions for the success probability in terms of the data complexity and the advantage are obtained using the general key randomisation hypotheses for both multidimensional and multiple linear cryptanalysis and under the settings where the plaintexts are sampled with or without replacement. Particularising to standard/adjusted key randomisation hypotheses gives rise to success probabilities in 6 different cases out of which in only five cases expressions for success probabilities have been previously reported. Even in these five cases, the expressions for success probabilities that we obtain are more general than what was previously obtained. A crucial step in the analysis is the derivation of the distributions of the underlying test statistics. While we carry out the analysis formally to the extent possible, there are certain inherently heuristic assumptions that need to be made. In contrast to previous works which have implicitly made such assumptions, we carefully highlight these and discuss why they are unavoidable. Finally, we provide a complete characterisation of the dependence of the success probability on the data complexity. Keywords: multidimensional linear cryptanalysis, multiple linear cryptanalysis, chi-squared distribution, success probability, data complexity, advantage. Introduction Linear cryptanalysis for block ciphers was introduced by Matsui in []. Matsui s work spurred a great deal of research and considered several aspects of linear cryptanalysis. At a broad level, the attacks are of two types. The goal of one type of attack is to recover a subset of the bits of the secret key and such attacks are called key recovery attacks. A different and weaker type of attack seeks to only distinguish the output of a block cipher from uniform random bits. uch attacks are called distinguishing attacks. In this work, we will be concerned only with key recovery attacks. At a broad level, linear cryptanalysis proceeds in the following manner. A careful study of the block cipher results in one or more linear approximations. During the data collection phase, N plaintexts P,..., P N are chosen and the corresponding ciphertexts under a secret but, fixed key are obtained. The key recovery algorithm is applied to the obtained plaintext-ciphertext pairs and the output is a list of possible values of the partial key. An attack is said to be successful if the correct value of the key is in the output list. For an attack, the success probability is denoted by P ; the data complexity is N; and the attack has an advantage a, if the size of Financial support from the R. C. Bose Center for Cryptology and ecurity, Indian tatistical Institute, Kolkata, India.

2 INTRODUCTION the output list is a times the total number of partial keys. The goal of a statistical analysis of such a key recovery attack is to obtain a relation between P, N and a. A formal statistical treatment of linear cryptanalysis has the following aspects. Multiple versus multidimensional linear cryptanalysis: One issue is whether a single linear approximation is available or, whether several such linear approximations are available. In the later case, analysis is of two types depending on whether the several linear approximations can be considered independent or not. If the analysis is under the independent assumption, then the attack is often called multiple linear cryptanalysis whereas if the independence assumption is not made, then the attack is often called multidimensional linear cryptanalysis. ampling with or without replacement: For the attack, plaintexts P,..., P N are randomly sampled and the corresponding ciphertexts are obtained. One issue is whether the plaintexts are considered to be sampled uniformly at random with replacement or, whether they are considered to be sampled uniformly at random without replacement. Key randomisation hypothesis: The linear approximations hold with certain probabilities. The basis for the attack is that the probability corresponding to the right key is different from the probability corresponding to a wrong key. In the standard key randomisation hypothesis, the probabilities corresponding to both the right and the wrong key are assumed to be fixed. The adjusted or, revised as termed in [7] key randomisation hypothesis assumes that the probabilities themselves are random variables. Our Contributions In this work, we consider the scenario when several linear approximations are available. Our goal is to express P in terms of N and a in each of the above mentioned settings. Table lists all the 6 possible cases that can arise and in each case mentions whether the case has been previously considered in the literature or whether it is new. If a case has occurred earlier, then the corresponding reference is provided and the last column provides the section number of this work where an expression for P can be found. We observe that out of the 6 possible cases, only 6 cases have been considered earlier and in 5 of these cases expressions for success probabilities have been reported. We provide a general and unified treatment to the extent possible and the 6 different cases are obtained as special cases of the general treatment. The route that we take is similar to the route taken in [7] for single linear cryptanalysis. Linear cryptanalysis identifies a target sub-key and attempts to obtain the correct value of the target sub-key in time less than an exhaustive search over all possible values of the whole secret key. At a broad level, linear cryptanalysis applies a statistical test to each possible value of the target sub-key. ection provides an overview of linear cryptanalysis and identifies the test statistic that is to be used. The test statistic is parameterised by the choice of the target sub-key and the distribution of the test statistic depends on whether the choice is right or wrong. For a statistical analysis, it is required to obtain the distributions of the test statistic under both the right and the wrong choices of the target sub-key. The literature provides two approaches for analysing success probability, namely the order statistics based approach and the hypothesis testing based approach. Assuming certain forms of the distributions of the test statistic for the right and the wrong key choices, ection 3 obtains expressions for P following both the order statistic and the hypothesis testing based approaches. Certain problems with the order statistics based approach which were earlier pointed out in [5, 7] are briefly summarised. It is shown that if some approximations are applied to the expression for P obtained using the hypothesis testing based approach then one obtains the expression for P obtained using the order statistics based approach. ince such approximations do not seem

3 INTRODUCTION 3 type samp. RKRH WKRH new previous P new P std std no [6] [6] ection 7.. wr std adj yes ection 7.. adj std no [7] ection 7..3 md adj adj no [7] [7] ection 7..4 std std yes ection 7.. wor std adj yes ection 7.. adj std yes ection 7..3 adj adj no [7] [7] ection 7..4 m wr wor std std yes ection 7.. std adj yes ection 7.. adj std yes ection 7..3 adj adj no [7] [7] ection 7..4 std std yes ection 7.. std adj yes ection 7.. adj std yes ection 7..3 adj adj no [7] [7] ection 7..4 Table : Here md resp. m denotes multidimensional resp. multiple linear cryptanalysis; wr resp. wor denotes sampling with resp. without replacement. RKRH resp. WKRH is an abbreviation for right resp. wrong key randomisation hypothesis; std resp. adj denotes whether the standard resp. adjusted key randomisation hypothesis is considered. to be necessary, the rest of the paper follows the expression for P obtained using the hypothesis testing based approach. The literature has separately considered the standard and the adjusted key randomisation hypotheses. In ection 4, we discuss the existing hypotheses and point out some heuristic assumptions in their formulation that have been implicitly made in the literature. We propose a general right key randomisation hypothesis and a general wrong key randomisation hypothesis and show that the existing key randomisation hypotheses can be obtained as special cases of these two general hypotheses. ection 5 takes up the crucial task of obtaining the distributions of the test statistic. These distributions are obtained under the general right and wrong key randomisation hypotheses. The cases of multidimensional and multiple linear cryptanalysis and that of sampling with and without replacement are treated separately. For obtaining the distributions, we proceed formally to the extent possible. The derivation of the distributions, however, requires several heuristic assumptions. We carefully identify these heuristics and discuss why these cannot be replaced by formal analysis. Distributions of the test statistic under the right and wrong key have been obtained earlier for particular cases. We remark that heuristic assumptions similar to those that we identify have also been implicitly made in previous works. ection 6 obtains expressions for P under the general key randomisation hypotheses for the cases of multidimensional/multiple linear cryptanalysis. It turns out that a compact expression for P can be provided covering both sampling with and without replacement. The expressions for P are obtained by combining the distributions of the test statistics obtained in ection 5 with the expression for P obtained in ection 3 following the hypothesis testing framework. Expressions for P for the 6 possible cases mentioned in Table are obtained in ection 6. These expressions are obtained by specialising the general key randomisation hypotheses to either the standard or the adjusted key randomisation hypothesis for both right and wrong key choices. As mentioned above, expressions for P

4 INTRODUCTION 4 are obtained for the first time in out of the 6 possible cases. In the remaining five cases, making several approximations to the expressions for P that are obtained in this work, it is possible to obtain the expressions for P obtained in earlier works. ince such approximations do not seem to be necessary, even in the remaining five cases, the expressions for P are more general than what was previously known. Intuitively, one may assume that for a fixed value of the advantage a, the success probability is a monotonic increasing function of the data complexity N. On the other hand, the expressions for P show a complicated dependence on N. ection 8 closely analyses the dependence of the success probability on N. To do this, the general and compact expressions for P obtained in ection 6 are used. A complete characterisation of the nature of monotonicity of P on N is obtained. This characterisation is then specialised to the particular cases of standard/adjusted key randomisation hypothesis and sampling with/without replacement. To the best of our knowledge, no previous work in the literature has carried out such an extensive analysis of the monotonic behaviour of P with respect to N. Previous and Related Works Linear cryptanalysis was introduced by Matsui []. An earlier work [30] had considered linear approximation in the context of an attack on -boxes of FEAL. The initial work of Matsui [] considered using a single linear approximation. A subsequent work [] by Matsui himself showed how to improve linear cryptanalysis if two linear approximations are available. Independently, Kaliski and Robshaw [0] also showed that the availability of several linear approximations with certain restrictions leads to an improved attack. Both the attacks [, 0] considered the linear approximations to be independent. Further analysis under the independence assumption of the linear approximations was later done in [4]. Murphy [3] observed that the independence assumption may not be valid. A series of papers [, 3, 9] carried out a systematic investigation of multiple linear cryptanalysis where the linear approximations are not necessarily independent. The motivation of these works was to analyse and obtain optimal distinguishers to distinguish between two distributions. This was done using the framework of hypothesis testing. everal important techniques, including the log-likelihood ratio test, were successfully developed to build optimal distinguishers. Matsui s original work [] employed a ranking approach to key recovery attacks. A subsequent work by elçuk [8] proposed a formal statistical treatment of this approach using the methodology of order statistics. The work by elçuk proved to be quite influential and the order statistics based approach was adopted in a number of later papers [6, 5]. elçuk s work required an asymptotic result on normal approximation of order statistic. A concrete error bound on the normal approximation was obtained in [5] and several problematic issues with the order statistics approach were pointed out. The alternative hypothesis testing based approach to analysing key recovery attacks was suggested in [5] and has been subsequently used in [7]. Treatment of key recovery attacks for multidimensional linear cryptanalysis without requiring any independence assumption on the linear approximations was carried out by Hermelin, Cho and Nyberg [6]. This work followed the order statistic based approach of elçuk [8] and analysis of the same setting using the hypothesis testing based approach was done in [5]. The standard wrong key randomisation hypothesis was formally introduced by Harpes et al. in [5]. The first work to consider the adjusted key randomisation hypothesis was by Bogdanov and Tischhauser []. This was in the setting of single linear cryptanalysis. The formulation of the adjusted key randomisation hypothesis was based on an earlier work on statistical properties of uniform random permutation by Daemen and Rijmen [4]. A later work on adjusted key randomisation hypotheses for single linear approximation is by Ashur et al. []. A general and unified treatment of success probability under general key randomisation hypotheses for single linear cryptanalysis has been done in [7]. Extension of the adjusted right key randomisation hypothesis from single to multidimensional linear cryptanalysis was considered in Huang et al. [7]. The work did not provide an expression for the success probability.

5 LINEAR CRYPTANALYI 5 Out of the 6 possible cases listed in Table, four cases were considered by Blondeau and Nyberg in [7] and expressions for P obtained in these cases. As mentioned earlier, these expressions are less general than the ones that we obtain in the present work. A related line of work [0,, 9, 8] considers zero correlation attacks. The notion of sampling without replacement was first considered in the setting of multidimensional zero correlation attack [9]. In this paper, we do not consider zero correlation attacks. Much of the analysis in the context of linear cryptanalysis is based on approximations where the errors in the approximations are not known. A more rigorous approach has been advocated in [4] where such approximations are avoided and instead rigorous upper bounds on the data complexity are obtained. A test statistic whose analysis avoids approximations and also avoids some of the problems associated with the generally used test statistics has been proposed in [6]. Linear Cryptanalysis Let the function E : {0, } k {0, } n {0, } n denote a block cipher such that for each K {0, } k, E K = EK, is a bijection from {0, } n to itself. Here K is called the secret key. The n-bit input to the block cipher is called the plaintext and n-bit output of the block cipher is called the ciphertext. In general, block cipher constructions involve a simple round function parametrised by round key iterated over several rounds. The round functions are bijections of {0, } n. Round keys are produced by applying an expansion function, called the key scheduling algorithm, to the secret key K. Denote the round keys by k 0, k,... and round functions by R 0, R,.... Also, let K i denote the concatenation of the first i round k 0 k keys, i.e., K i = k 0 k i and E i denote the composition of the first i round functions, i.e., K i E K = R 0 k 0 ; E i K i = R i k i R 0 k 0 = R i k i E i k i ; i. A reduced round cryptanalysis of a block cipher targets r + rounds of the total number of rounds proposed by the block cipher design. For a plaintext P, we denote by C the output after r + rounds, i.e., C = E r+ P, K r+ and by B the output after r rounds, i.e, B = E r P and C = R r B. Throughout this paper, we will be K r k r assuming an attack on the first r + rounds of an iterated block cipher with r + rounds. Linear approximations: Block cipher cryptanalysis starts off with a detailed analysis of the block cipher. This results in one or possibly more relations between the plaintext P, the input to the last round B and possibly the expanded key K r. In case of linear cryptanalysis these relations are linear in nature and are of the following form: Γ i P, P Γi B, B = Γi K, Kr ; i =,,..., l; where Γ i P, Γi B {0, }n and Γ i {0, } nr denote the plaintext mask, the mask to the input of the last round K r and the key mask. A linear relation of the form above is called a linear approximation of the block cipher. These linear approximations usually hold with some probability which is taken over random choices of the plaintext P. In case l >, it is required to work with the corresponding joint distribution. Obtaining such relations and their joint distribution is a non-trivial task and requires a lot of ingenuity and experience. They form the basis on which the statistical analysis of block ciphers are built. In this work we will only consider l >. There are two cases. Multiple linear cryptanalysis: The linear approximations are assumed to be independent. Multidimensional linear cryptanalysis: The linear approximations are not assumed to be independent.

6 LINEAR CRYPTANALYI 6 Let i L i = Γ P, P Γi B, B ; for i =,,..., l. Inner key bits: Let z i = Γ i K, Kr ; i =,..., l. Note that for a fixed but unknown key K r, z i represents a single unknown bit. Denote by z = z,..., z l the collection of the bits arising in this manner. ince, all the l key masks Γ K,..., Γl K are known, the tuple z is determined only by the unknown but fixed K r. Hence, there is no randomness either of K r or z. We call z as the inner key bits. Target sub-key bits: Any linear relation between P and B of the form usually involves only a subset of the bits of B. When l >, several relations between P and B are known. In such cases, it is required to consider the subset of the bits of B which covers all the relations. In order to obtain these bits from the ciphertext C it is required to partially decrypt C by one round. This involves a subset of the bits of the last round key k r. We call this subset of bits of k r as the target sub-key. Recall that the ciphertext C is obtained by encrypting P using the secret key K. Let κ denote the value of the target sub-key corresponding to the secret key K. The goal of linear cryptanalysis is then to find the correct value of the target sub-key κ using the l linear approximations and their joint or marginal distributions. Denote the size of the target sub-key by m. In other words, these m key bits are sufficient to partially decrypt C by one round and obtain the bits of B involved in any of the l linear approximations. There are m possible choices of the target sub-key out of which only one correct. The purpose of the attack is to identify the correct key. Joint distribution parametrised by inner key bits: Let the plaintext P be chosen uniformly at random from {0, } n ; C be the ciphertext obtained after encrypting with the secret key K; and B the result of partial decryption of C with a choice κ of the target sub-key. The random variable B depends on the choice κ used to invert C partially by one round whereas the ciphertext C depends on the correct choice κ of the target sub-key and hence so does B. o the random variable L i depends on both κ and κ. Hence, to emphasise this dependence we write L κ,κ,i for κ κ and simply write L κ,i for κ = κ. Define the random variables X κ,κ and X κ as follows: X κ,κ = L κ,κ,,..., L κ,κ,l and X κ,i = L κ,,..., L κ,l. Also, define the joint distribution of the random variables X κ,κ z and X κ z to be q κ,κ,zη = Pr[L κ,κ, = η z,..., L κ,κ,l = η l z l ] = l + ɛ κ,κ,ηz; and p κ,zη = Pr[L κ, = η z,..., L κ,l = η l z l ] = l + ɛ κ,ηz 3 respectively, where / l ɛ κ,κ,ηz, ɛ κ,ηz / l. Denote by q κ,κ,z = q κ,κ,z0, q κ,κ,z,..., q κ,κ,z l and p κ,z = p κ,z0, p κ,z,..., p κ,z l the corresponding probability distributions, where the integers {0,,..., l } are identified with the set {0, } l. For each choice of z, we obtain a different but related distribution. Let z = z β for some β {0, } l. It is easy to verify that ɛ κ,κ,ηz = ɛ κ,κ,η βz and ɛ κ,ηz = ɛ κ,η βz, which implies that q κ,κ,z βη = q κ,κ,zη β and p κ,z βη = p κ,zη β. 4

7 LINEAR CRYPTANALYI 7 Let p κ and q κ,κ denote the probability distributions p κ,0 l and q κ,κ,0l, respectively. We write For i =,..., l, define q κ,κ = q κ,κ 0,..., q κ,κ l and p κ = p κ 0,..., p κ l. 5 q κ,κ,i = Pr[L κ,κ,i = ] and p κ,i = Pr[L κ,i = ]. 6 tatistical model of the attack: Let P,..., P N, with N n, be N plaintexts chosen randomly from the set {0, } n of all possible plaintexts and assume that these N plaintexts follow some distribution over the set {0, } n. Also assume that the adversary possess N plaintext-ciphertext pairs P j, C j ; j =,,..., N, such that C j = E K P j for some fixed key K. Given N plaintext-ciphertext pairs, the goal of the adversary is then to find κ in time faster than a brute force search on all possible keys of the block cipher. For each choice κ of the target sub-key it is possible for the attacker to partially decrypt each C j by one round to obtain B κ,j ; j =,,..., N. Note that B κ,j is dependent on κ even though C j may not be. For κ = κ, C j clearly depends on κ, whereas for the κ κ, C j has no relationship with κ. Define, L κ,i,j = Γ i P, P j Γ i B, B κ,j, 7 X κ,z,j = L κ,,j z,..., L κ,l,j z l, 8 Q κ,z,η = #{j {,,..., N} : X κ,z,j = η}, 9 where κ {0,,,..., m }; z,..., z l {0, }; j =,,..., N; i =,,..., l. Note that The condition X κ,z β,j = η is written as where β = β,..., β l. Therefore, η {0,} l Q κ,z,η = N. 0 L κ,,j z β,..., L κ,l,j z l β l = η L κ,,j z,..., L κ,l,j z l β l = η β X κ,z,j = η β, Q κ,z β,η = Q κ,z,η β. The variable X κ,z,j is determined by the pair P j, C j, the choice κ of the target sub-key and the choice z of the inner key bits. Recall that C j depends upon K and hence upon κ which implies that X κ,z,j also depends upon κ through C j. The randomness of X κ,z,j arises from the randomness in P j and also possibly from the randomness of the previous P,..., P j. In fact it depends on how P,..., P N are sampled from {0, } n. Therefore Pr[X κ,z,j = η] potentially depends upon the following quantities: z : the choice of the inner key bits; p κ,zη or p κ,κ,zη : the probabilities of linear approximations as given in and 3. j : the index determining the pair P j, C j. This models a general scenario which captures a possible dependence on the index j. The dependence on j will be determined by the joint distribution of the plaintexts P,..., P N. In the case that P,..., P N are independent and uniformly distributed, Pr[X κ,z,j = η] does not depend on j. On the other hand, suppose that P,..., P N are sampled without replacement. In such a scenario, Pr[X κ,z,j = η] does depend on j.

8 LINEAR CRYPTANALYI 8 Test statistic for multidimensional linear cryptanalysis: For each choice κ of the target sub-key and the inner key bits z, let T κ,z T X κ,z,,..., X κ,z,n denote the test statistic. Then T κ,z = η {0,} l Qκ,z,η N l N l. T κ,z β = = = η {0,} l η {0,} l η β {0,} l Qκ,z β,η N l N l Qκ,z,η β N l N l ; [By ] Qκ,z,η N l N l = η {0,} l Qκ,z,η N l N l = T κ,z. o T κ,z is independent of z. Therefore it is sufficient to consider z = 0 l. To simplify notation, we will write T κ instead of T κ,z. Therefore, T κ = Qκ,η N l N l. η {0,} l There are m choices of κ, which give rise to m random variables T κ. The distribution of T κ depends on whether κ is correct or incorrect. For statistical analysis of an attack, it is required to obtain the distribution of T κ under both correct and incorrect choices of the target sub-key. Later we will consider this issue in more details. Remark: Recall that, since there is no randomness over K r, the bits z i s also have no randomness even though they are unknown. Therefore the distribution of L κ,i,j z i is completely determined by the distribution of L κ,i,j. Test statistic for multiple linear cryptanalysis: In this case, the linear approximations are assumed to be independent. As a result, it is possible to define a simpler test statistic. For each choice κ of the target sub-key and inner key bits z = z,..., z l, let Y κ,z,i,j = L κ,i,j z i and Y κ,z,i = N Y κ,z,i,j, where i =,..., l and j =,..., N. For z = 0 l, we simply write Y κ,i,j and Y κ,i instead of Y κ,z,i,j and Y κ,z,i respectively. Let β = β,..., β l. If β i = 0, then Y κ,z β,i,j = Y κ,z,i ; if β i =, then Y κ,z β,i,j = L κ,i,j z i and Y κ,z β,i = N Y κ,z,i. Consequently, for any β, Y κ,z,i N/ = Y κ,z β,i N/. Let T κ,z T X κ,z,,..., X κ,z,n denote the test statistic l Y κ,z,i N/ T κ,z =. N/4 For β = β,..., β l, i= j= T κ,z β = l Y κ,z β,i N/ i= N/4 = l Y κ,z,i N/ = T κ,z. N/4 i=

9 LINEAR CRYPTANALYI 9 o, T κ,z is independent of β and as in the multidimensional case, it is sufficient to consider z = 0 l. We will write T κ instead of T κ,0 l and this is defined as follows. T κ = l Y κ,i N/. 3 N/4 i= uccess probability: An attack will produce a set or a list of candidate values of the target sub-key. The attack is considered successful if the correct value of the target sub-key κ is in the output set. The probability of this event is called the success probability of the attack. Advantage: An attack is said to have advantage a if the size of the set of candidate values of the target sub-key is equal to m a. In other words, a fraction a portion of the possible m values of the target sub-key is produced by the attack. Data complexity: The number N of plaintext-ciphertext pairs required for an attack is called the data complexity of the attack. Clearly, N depends on the success probability P and the advantage a. One of the goals of a statistical analysis is to be able to obtain a closed form relation between N, P and a. Additional Notation Capacity: Let p = p 0,..., p l be a probability distribution over {0, } l. The multidimensional capacity C md p is defined as l l C md p = l p i l = l ɛ i 4 i=0 where ɛ = p i l. When p is clear from the context, we will simply write C md instead of C md p. There is a corresponding notion [7] which is useful in the case of multiple linear cryptanalysis. Let p = p,..., p l be such that 0 p i, i =,..., l; then C m p is defined to be i=0 C m p = l 4 p i / = i= l 4ɛ i 5 i= where ɛ = p i /. When p is clear from the context, we will simply write C m instead of C m p. Normal distribution: By N µ, σ we will denote the normal distribution with mean µ and variance σ. The density function of N µ, σ will be denoted by fx; µ, σ. The density function of the standard normal will be denoted by φx while the distribution function of the standard normal will be denoted by Φx. Chi-squared distribution: The probability density function of a central chi-square distribution with ν degrees of freedom will be denoted by χ νx and its corresponding cumulative density function will be denoted by Ψ ν x. The density function of a non-central chi-square distribution with ν degrees of freedom and a non-centrality parameter δ will be denoted by χ ν,δ x and its cumulative density function will be denoted by Ψ ν,δx.

10 3 TWO APPROACHE FOR DERIVING UCCE PROBABILITY 0 3 Two Approaches for Deriving uccess Probability The test statistic for the multidimensional case is given in and for the multiple case is given in 3. To obtain the success probability of an attack it is required to obtain the corresponding distributions of T κ for the two scenarios κ = κ and κ κ. uppose that the following holds. T κ N µ 0, σ 0; T κ /ω χ ν, κ κ, 6 where ω > 0 is a constant. In this section, we consider the derivation of the success probability in terms of µ 0, σ0, ν and ω. Later, we will see how to obtain µ 0, σ0, ν and ω. In particular, we will see that δ depends on N whereas ν depends on the number of linear approximations l. From 6, there are two approaches to deriving success probability which we discuss below. 3. Order tatistics Based Analysis This approach is based on a ranking methodology used originally by Matsui [] and later formalised by elçuk [8]. The idea is the following. There are m random variables T κ corresponding to the m possible values of the target sub-key. uppose the variables are denoted as T 0,..., T m and assume that T 0 corresponds to the choice of the correct target sub-key κ. Let T,..., T m be the order statistics of T,..., T m, i.e., T,..., T m is the ascending order sort of T,..., T m. o, the event corresponding to a successful attack with a-bit advantage is T 0 > T m q, where q = a. Using a well known result on order statistics, the distribution of T m q can be assumed to approximately follow N µ q, σq where µ q = Ψ ν a and σq = m+a a χ νµ q. For the asymptotic version of the result refer to [3] and for a concrete error bound refer to [5]. Further assuming that T 0 and T m q are independent the success probability P can be approximated in the following manner. P = Pr[T 0 > T m q] = Pr[T 0 T m q > 0] Φ µ 0 µ q = Φ µ 0 µ q σ0 + σ q σ0 + σ q = Φ µ 0 Ψ ν a ; 7 σ0 + σ q where µ 0 = E[T 0 ] = E[T κ ] = ν + δ and σ 0 = E[T 0 µ 0 ] = E[T κ µ 0 ] = ν + δ. ome criticisms: The order statistics based approach is crucially dependent on the normal approximation of the distribution of the order statistics. A key observation is that the order statistics result is applied to m random variables and for the result to be applied even in an asymptotic context, it is necessary that m is sufficiently large. In [5] a close analysis of the hypothesis of the theorem and the error bound in the concrete setting showed that both m and m a must be large. In particular, to ensure that the approximation error is at most around 0 3, it is required that m a should be at least around 0 bits. ince a is the advantage of the attack, the applicability of the order statistics based analysis for attacks with high advantage is not clear. For the analysis to be meaningful one needs to make two further independence assumptions which were implicitly used by elçuk in [8]. This issue has been pointed out in [7].

11 3 TWO APPROACHE FOR DERIVING UCCE PROBABILITY. The hypothesis of the result on the normal approximation of order statistics requires the random variables T, T,..., T m to be independent and identically distributed. The randomness of all of these random variables arise from the randomness of P,..., P N and so these random variables are certainly not independent. As a result, the independence of these random variables is a heuristic assumption.. It is assumed that T 0 T m q follows a normal distribution. A sufficient condition for T 0 T m q to follow a normal distribution is that T 0 and T m q are independent normal variates. ince the randomness of both T 0 and T m q arise from the randomness in P,..., P N, they are clearly not independent. As a result, the assumption that T 0 T m q follows a normal distribution is also a heuristic assumption. The net effect of the above two assumptions is that the test statistics corresponding to different choices of the sub-key are independent. 3. Hypothesis Testing Based Analysis tatistical hypothesis testing for analysing block cipher cryptanalysis was carried out in [] in the context of distinguishing attacks. For analysing linear cryptanalysis based key recovery attacks, the hypothesis testing based approach was used in [5] as a method for overcoming some of the theoretical limitations of the order statistics based analysis. ubsequently, hypothesis testing based approach for analysing key recovery attacks in the context of key dependent assumptions was performed in [7]. The idea of the hypothesis testing based approach is simple and intuitive. For each choice κ of the target sub-key, let H 0 be the null hypothesis that κ is correct and H be the alternative hypothesis that κ is incorrect. The test statistic T κ is used to test H 0 against H where the distributions of T κ are as in 6 for both κ = κ and κ κ. From 6, we get E[T κ ] = µ 0 and E[T κ ] = ν. Later on we will see that µ 0 = ν + δ, where δ > 0 is a constant. ince E[T κ ] = µ 0 > ν = E[T κ ], the following hypothesis test is considered. } H 0 : κ is correct; versus H : κ is incorrect. 8 Decision rule: Reject H 0 if T κ t. Here t is a threshold whose exact value is determined depending on the desired success probability and advantage. The idea of the test is the following. The mean µ 0 under H 0 is greater than the mean ν under H, so, if the value of the test statistic is lesser than a certain threshold, it is guessed that H 0 does not hold. uch a hypothesis test gives rise to two kinds of errors: H 0 is rejected when it holds which is called the Type- error; and H 0 is accepted when it does not hold which is called the Type- error. If a Type- error occurs, then κ = κ is the correct value of the target sub-key but, the test rejects it and so the attack fails to recover the correct value. The attack is successful if and only if Type- error does not occur. o, the success probability P = Pr[Type- error]. On the other hand, for every Type- error, an incorrect value of κ gets labelled as a candidate key. As a result, the number of times that Type- errors occurs is the size of the list of candidate keys. Theorem. Let κ {0, } m. For κ {0, } m, let T κ be m random variables, where T κ N µ 0, σ0, and for κ κ, T κ /ω χ ν for some constant ω > 0. uppose the hypothesis test given in 8 is applied to T κ for all κ {0, } m. Let P = Pr[Type- error] and the expected number of times that Type- errors occurs is m a. Then µ0 ωγ P = Φ 9 where γ = Ψ m a. m σ 0

12 3 TWO APPROACHE FOR DERIVING UCCE PROBABILITY Proof. Let α = Pr[Type- error] and β = Pr[Type- error] and so P = α. For each κ κ, let Z κ be a binary valued random variable which takes the value if and only if a Type- error occurs for κ. o, Pr[Z κ = ] = β. The size of the list of candidate keys returned by the test is κ κ Z κ and so the expected size of the list of candidate keys is E Z κ = E [Z κ ] = Pr[Z κ = ] = m β. 0 κ κ κ κ κ κ The expected number of times that Type- errors occurs is m a. o, The Type- and Type- error probabilities are calculated as follows. α = Pr[Type- error] = Pr[T κ t H 0 holds] β = m a m. = Pr[T κ t] t µ0 = Φ ; σ 0 β = Pr[Type- error] Using β = m a / m in 3, we obtain = Pr[T κ > t H holds] = Pr[T κ /ω > t/ω H holds] = Ψ ν t/ω. 3 t = ωψ ν ubstituting t in and noting that P = α, we obtain µ0 ωγ P = Φ. m a m = ωγ. 4 σ 0 Remarks:. Note that γ = Ψ m a / m 0.. The computation in 0 does not require the Z κ s or the T κ s to be independent. 3. The theoretical limitations of the order statistics based analysis namely, m and m a are large and the heuristic assumption that the T κ s are independent are not present in the hypothesis testing based analysis. 4. Comparing 9 to 7, we find that the two expressions are equal under the following three assumptions: a m / m : this holds for moderately large values of m, but, is not valid for small values of m. b σ 0 σ q : this assumption was used in [8]. c ω. In the rest of the work, we will use 9 as the expression for the success probability.

13 4 GENERAL KEY RANDOMIATION HYPOTHEE 3 4 General Key Randomisation Hypotheses At this point it is important to make the distinction between multiple and multidimensional linear cryptanalysis as it appears in the literature. Multiple linear cryptanalysis [4] refers to linear attacks using l linear approximations where the linear approximations are assumed to be statistically independent. Whereas in multidimensional linear cryptanalysis [6] the attacker exploits all linear approximations with linear masks Γ P, Γ B 0, 0 in a linear space. In other words, in multidimensional linear cryptanalysis the linear approximations are not assumed to be statistically independent. Therefore, in case of multidimensional linear cryptanalysis the attacker works with the joint distribution of the l linear approximations whereas in case of multiple linear cryptanalysis the attacker works with the marginal distributions. Recall the definitions of q κ,κ η and p κ η from 5. The corresponding biases are ɛ κ,κ η and ɛ κ η. For obtaining the distributions of T κ and T κ, κ κ, it is required to hypothesise the behaviour of p κ η and q κ,κ η, respectively. 4. General Multidimensional Key Randomisation Hypotheses The two standard multidimensional key randomisation hypotheses are the following. tandard multidimensional right key randomisation hypothesis: For every choice of κ, p κ η = p η, such that 0 < p η < and η {0,} l p η =. tandard multidimensional wrong key randomisation hypothesis: For every choice of κ and κ κ, q κ,κ η = l for all η {0, } l. The standard wrong key randomisation hypothesis for l = was formally considered in [5] and later generalised to l > in [6]. Based on the work in [4] the standard wrong key randomisation for l = was modified in [] and for l > in [7]. An earlier version [6] of [7] uses the following formulation. Remarks: Adjusted multidimensional wrong key randomisation hypothesis: For each κ κ, η {0, } l, q κ,κ η N, l n+l and qκ,κ 0,..., q l κ,κ l are independent.. In this hypothesis, there is no explicit dependence of the bias on either κ or κ.. As q κ,κ η is a probability, 0 q κ,κ η. On the other hand, a random variable following a normal distribution can take any real value. o, the above hypothesis may lead to q κ,κ η taking a value outside the range [0, ] which is not meaningful. As a result, the adjusted wrong key randomisation hypothesis must necessarily be considered to be a heuristic assumption. 3. The probability that q κ,κ η takes values outside of [0, ] can be bounded as follows. Pr[q κ,κ η < 0 or q κ,κ η > ] = Pr[q κ,κ η < 0] + Pr[q κ,κ η > ] = Pr[q κ,κ η l < l ] + Pr[q κ,κ η l > l ] Pr[ q κ,κ η l < l ] + Pr[ q κ,κ η l > l ] n+l l l + n+l l l [By Chebyshev s inequality] = l n + n l + n+l. n l n l + n l

14 4 GENERAL KEY RANDOMIATION HYPOTHEE 4 In other words, q κ,κ η takes values outside [0, ] with exponentially low probability, provided that n l is large; if n l is not too large, then the probability is not negligible. Modification of the right key randomisation hypothesis was first considered in [7] in the context of multidimensional linear cryptanalysis. In [7], Theorem of [3] was taken as the right key hypotheses, i.e., it was assumed that even for the right choice of the target sub-key, the probability of a linear approximation follows a normal distribution. This assumption was later used in [7] and the following can be stated. Remarks: Adjusted multidimensional right key randomisation hypothesis: For all η {0, } l, p κ η N p η, σ, where 0 < p η < is a constant such that η {0,} l p η = and each subset of l random variables out of l possible random variables q κ,κ η are independent and this set determines the remaining random variable uniquely.. The first two remarks for adjusted multidimensional wrong key randomisation hypothesis also holds for adjusted multidimensional right key randomisation hypothesis.. ince the form of σ is not given nothing can be said about the probability that p κ η lies outside [0, ]. 3. The random variables p κ 0,..., p κ l are not assumed to be independent. On the other hand, while the marginals are assumed to follow normal distribution, no assumption is made on the joint distribution. The normality of the marginals do not imply that the joint distribution is also normal. 4. The assumption that each possible subset of l random variables out of l possible random variables p κ η are independent is a heuristic assumption. The rationale for this assumption is perhaps to justify that the distribution of the test statistic under the right key follows a non-central chi-squared distribution. This assumption, however, is not sufficient for this purpose, as we discuss later. Let C be the expected value of l η {0,} lp κ η l, i.e., C = l η {0,} l E[p κ η l ]. 5 In [7], the value of σ in the adjusted right key randomisation hypothesis is expressed in terms of C and the capacity C md in the following manner. C = l η {0,} l E[p κ η l ] = l η {0,} l E[p κ η p η + p η l + p η l p κ η p η ] = l σ + C md σ = C Cmd l. Motivated by the description of the standard and adjusted right and wrong key randomisation hypotheses in [7] we formulate the following general multidimensional key randomisation hypotheses for both the right and the wrong key. 6

15 4 GENERAL KEY RANDOMIATION HYPOTHEE 5 General multidimensional right key randomisation hypothesis: For all η {0, } l, p κ η N p η, s 0, where 0 < pη < is a constant such that η {0,} l p η = and each subset of l random variables out of l possible random variables p κ η are independent and this set determines the remaining random variable uniquely. Further, s 0 n. General multidimensional wrong key randomisation hypothesis: For each κ κ, η {0, } l, q κ,κ η N, s, where s l n ; and q κ,κ 0,..., q κ,κ l are independent. The heuristic nature of the adjusted right and wrong key hypotheses discussed earlier also hold for the general hypotheses.. As s 0, the random variable q κ,κ η becomes degenerate and takes the value l. In this case, the general multidimensional wrong key randomisation hypothesis becomes the standard multidimensional wrong key randomisation hypotheses.. For s = n+l, the general multidimensional wrong key randomisation hypothesis becomes the l adjusted multidimensional wrong key randomisation hypothesis. 3. As s 0 0, the general multidimensional right key randomisation hypothesis reduces to the standard multidimensional right key randomisation hypothesis. 4. For s 0 = σ, the general multidimensional right key randomisation hypothesis becomes the adjusted multidimensional right key randomisation hypothesis. 4. General Multiple Key Randomisation Hypotheses For a single linear approximation, the standard/adjusted/general wrong and right key randomisation hypotheses have been proposed in the literature [5,, 7]. The extension to multiple linear cryptanalysis is essentially extending to several independent linear approximations. This requires making assumptions on p κ,i and q κ,κ,i given by 6. The standard multiple right and wrong key randomisation hypotheses were first considered in [4] and can be stated as follows. tandard multiple right key randomisation hypothesis: For each choice of κ and for i =,..., l, p κ,i = p i with 0 < p i <. tandard multiple wrong key randomisation hypothesis: For each choice of κ and κ κ, and for i =,..., l, q κ,κ,i = /. Based on [4], the multiple wrong key randomisation hypothesis was modified in [6] which is an earlier version of [7] in the following manner. Adjusted multiple wrong key randomisation hypothesis: For each κ κ and for i =,..., l, q κ,κ,i i.i.d. N, n. Remarks: The remarks given below are essentially extensions of similar comments given in [7] in the context of single linear approximation.. There is no explicit dependence of the bias on either κ or κ.. As q κ,κ,i is a probability it takes values from [0, ]. On the other hand, a random variable following a normal distribution can take any real value. o, similar to the multidimensional case, here also, the above hypothesis may lead to q κ,κ,i taking a value outside the range [0, ] which is not meaningful. Hence, the adjusted wrong key randomisation hypothesis must necessarily be considered to be a heuristic assumption.

16 4 GENERAL KEY RANDOMIATION HYPOTHEE 6 3. The variance n is an exponentially decreasing function of n and by Chebyshev s inequality Pr[ q κ,κ,i / > /] 4 n = n. In other words, q κ,κ,i takes values outside [0, ] with exponentially low probability. Modification of the standard right key randomisation hypothesis in the context of multiple linear approximation was considered in [7]. The formulation given below follows [6]. Adjusted multiple right key randomisation hypothesis: For all κ and for i =,..., l, p κ,i i.i.d. N p i, σ. Remarks: The first two remarks for the adjusted multiple wrong key randomisation hypothesis also hold in this case. As the mathematical form of σ is not given, nothing can be said about the probability that a particular p κ,i lies outside [0, ]. Motivated by the description of the standard and adjusted right and wrong key randomisation hypotheses in [7] we formulate the following general multiple key randomisation hypotheses for both the right and the wrong key. General multiple right key randomisation hypothesis: For all κ and for i =,..., l; p κ,i i.i.d. N p i, s 0, where pi [0, ] and s 0 n. General multiple wrong key randomisation hypothesis: For all κ and κ κ, and for i =,..., l; q κ,κ,i i.i.d. N, s, where s n. The heuristic nature of the adjusted right and wrong key hypotheses discussed earlier also hold for the general hypotheses. We note the following.. As s 0 0, the random variable p κ,i becomes degenerate and takes the value of the constant p i. In this case, the general multiple right key randomisation hypothesis becomes the standard multiple right key randomisation hypothesis.. For s 0 = σ, the general multiple right key randomisation hypothesis becomes the adjusted multiple right key randomisation hypothesis. 3. As s 0, the random variable q κ,κ,i becomes degenerate and takes the value /. In this case, the general multiple wrong key randomisation hypothesis becomes the standard multiple wrong key randomisation hypothesis. 4. For s = n, the general multiple wrong key randomisation hypothesis becomes the adjusted multiple wrong key randomisation hypothesis. 4.3 Differences with the Formulation of the Various Hypotheses in [7] We have postulated the various hypotheses as conditions on p κ and q κ,κ given by 5 in the case of multidimensional linear cryptanalysis and as conditions on p κ,i and q κ,κ,i given by 6 in the case of multiple linear cryptanalysis. This follows the approach taken in an earlier version [6] of [7]. The hypotheses in the published version [7] are of the following types.. For the multidimensional case, the adjusted right key randomisation hypothesis is formulated as an assumption on p κ as in the earlier version [6] while the adjusted wrong key randomisation hypothesis is formulated as an assumption on Q κ,η N l.. For the multiple case, the adjusted right key randomisation hypothesis is formulated as an assumption on Y κ,i N/ while the adjusted wrong key randomisation hypothesis is formulated as an assumption on Y κ,i N/.

17 5 HEURITIC DITRIBUTION OF THE TET TATITIC 7 o, in [7], out of four cases, in one case the assumption is on underlying probability while in the other three cases, the assumptions are on derived random variables. In our opinion, if one follows the work in [4], then the assumptions should be on the underlying probabilities rather than on the derived random variables. That is why we have chosen to state the hypotheses as formulated in [6]. We emphasise that the general formulation that we present here and the detailed consideration of the heuristic nature of these hypotheses do not appear either in [6] or in [7]. 5 Heuristic Distributions of the Test tatistics The form of the test statistic T κ is given by for multidimensional linear cryptanalysis and by 3 for multiple linear cryptanalysis. As outlined in ection 3, to obtain the success probability it is required to obtain the distributions of T κ for both the right and wrong choices of κ. In the case of mutidimensional linear cryptanalysis, T κ is defined from the Q κ,η s and so to obtain the distribution of T κ it is required to obtain the distribution of Q κ = Q κ,0,..., Q κ, l. imilarly, in the case of multiple linear cryptanalysis, T κ is defined from Y κ,i and to obtain the distribution of T κ it is required to obtain the distribution of Y κ,,..., Y κ,l. The derivations of the distributions of T κ under the various settings are heuristic and provide only a rough approximation where it is hard to estimate the error in approximation. We explain this issue in the context of multidimensional linear cryptanalysis where sampling with replacement is used, but, similar considerations hold in the other settings. In the setting of multidimensional linear cryptanalysis, T κ given by is defined from the random vector Q κ = Q κ,0,..., Q κ, l where Q κ,η s are defined as in 9 satisfying the condition given in 0. For sampling with replacement, Q κ follows a multinomial distribution and Q κ,η follows BinN, p κ η where p κ η is heuristically assumed to follow a normal distribution. The p κ η s are not assumed to be independent. The mean vector of the random vector Q κ is Np κ 0,..., Np κ l. The distribution of a random variable whose parameters are also random variables is called a compound distribution. If the p κ η s took values in [0, ], then it would have been possible to formally consider the distribution of Q κ. ince the p κ η s are assumed to follow normal, they can take values outside of [0, ] and so, we see no way of formally deriving the distribution of Q κ. The heuristic assumption of normality on p κ η implies that the distribution of Q κ and hence of T κ are both fundamentally heuristic assumptions. It is not possible to derive these distributions formally; one can only try to provide some justification for the heuristic assumptions. The key randomisation hypotheses postulates that the marginals p κ η s are approximately normal. It does not postulate anything about the joint distribution of the p κ η s. If the marginals are normal, it does not necessarily follow in fact, it mostly does not that the joint distribution is also normal. From the normal assumption on the marginals p κ η s, we can only heuristically argue as argued in ection 5. below that each of the marginals Q κ,η follow an approximate normal distribution. Nothing can be proved about the joint distribution of the Q κ,η s. Instead, it is required to make a heuristic assumption that Q κ follows a multivariate normal distribution. Further, this heuristic assumption does not clarify the nature of the variance-covariance matrix of the multivariate normal distribution of Q κ. The form of T κ given by suggests that the distribution of T κ should be given by a suitable chi-squared distribution. This would follow if it is possible to show that the Q κ approximately follows a multivariate normal distribution whose variance-covariance matrix satisfies the conditions of Theorem A. of Appendix 6. ince this cannot be proved formally, it is heuristically assumed that Q κ follows an appropriate multivariate normal so that the distribution of T κ can be approximated by a chi-squared distribution. Note that for the actual computation of the parameters degrees of freedom and the non-centrality parameter of the chi-squared distribution, it is sufficient to have the mean vector for Q κ. ince it is possible to heuristically justify that the marginals for Q κ follow an approximate normal distribution, an approximation of the mean vector for Q κ can be obtained. o, it is possible to obtain approximate values of the parameters of the chi-squared

Another Look at Success Probability in Linear Cryptanalysis

Another Look at Success Probability in Linear Cryptanalysis Another Look at uccess Probability in Linear Cryptanalysis ubhabrata amajder and Palash arkar Applied tatistics Unit Indian tatistical Institute 03, B.T.Road, Kolkata, India - 70008. subhabrata.samajder@gmail.com,