Cryptographic techniques used to provide integrity of digital content in long-term storage

Similar documents
FITTING EXPONENTIAL MODELS TO DATA Supplement to Unit 9C MATH Q(t) = Q 0 (1 + r) t. Q(t) = Q 0 a t,

Normal Random Variable and its discriminant functions

IFX-Cbonds Russian Corporate Bond Index Methodology

Michał Kolupa, Zbigniew Śleszyński SOME REMARKS ON COINCIDENCE OF AN ECONOMETRIC MODEL

Section 6 Short Sales, Yield Curves, Duration, Immunization, Etc.

Chain-linking and seasonal adjustment of the quarterly national accounts

SOCIETY OF ACTUARIES FINANCIAL MATHEMATICS. EXAM FM SAMPLE SOLUTIONS Interest Theory

The Financial System. Instructor: Prof. Menzie Chinn UW Madison

Lab 10 OLS Regressions II

Noise and Expected Return in Chinese A-share Stock Market. By Chong QIAN Chien-Ting LIN

Accuracy of the intelligent dynamic models of relational fuzzy cognitive maps

Network Security Risk Assessment Based on Node Correlation

The UAE UNiversity, The American University of Kurdistan

Improving Forecasting Accuracy in the Case of Intermittent Demand Forecasting

Fugit (options) The terminology of fugit refers to the risk neutral expected time to exercise an

Correlation of default

Floating rate securities

Tax Dispute Resolution and Taxpayer Screening

Interest Rate Derivatives: More Advanced Models. Chapter 24. The Two-Factor Hull-White Model (Equation 24.1, page 571) Analytic Results

Dynamic Relationship and Volatility Spillover Between the Stock Market and the Foreign Exchange market in Pakistan: Evidence from VAR-EGARCH Modelling

Return Calculation Methodology

Online appendices from Counterparty Risk and Credit Value Adjustment a continuing challenge for global financial markets by Jon Gregory

A valuation model of credit-rating linked coupon bond based on a structural model

American basket and spread options. with a simple binomial tree

Deriving Reservoir Operating Rules via Fuzzy Regression and ANFIS

Baoding, Hebei, China. *Corresponding author

Methodology of the CBOE S&P 500 PutWrite Index (PUT SM ) (with supplemental information regarding the CBOE S&P 500 PutWrite T-W Index (PWT SM ))

Using Fuzzy-Delphi Technique to Determine the Concession Period in BOT Projects

Estimation of Optimal Tax Level on Pesticides Use and its

Explaining Product Release Planning Results Using Concept Analysis

Improving Earnings per Share: An Illusory Motive in Stock Repurchases

Terms and conditions for the MXN Peso / US Dollar Futures Contract (Physically Delivered)

Pricing and Valuation of Forward and Futures

Albania. A: Identification. B: CPI Coverage. Title of the CPI: Consumer Price Index. Organisation responsible: Institute of Statistics

Bank of Japan. Research and Statistics Department. March, Outline of the Corporate Goods Price Index (CGPI, 2010 base)

Quarterly Accounting Earnings Forecasting: A Grey Group Model Approach

An Inclusion-Exclusion Algorithm for Network Reliability with Minimal Cutsets

Recall from last time. The Plan for Today. INTEREST RATES JUNE 22 nd, J u n e 2 2, Different Types of Credit Instruments

Effective Feedback Of Whole-Life Data to The Design Process

Optimal Combination of Trading Rules Using Neural Networks

A Multi-Periodic Optimization Modeling Approach for the Establishment of a Bike Sharing Network: a Case Study of the City of Athens

A Framework for Large Scale Use of Scanner Data in the Dutch CPI

DEA-Risk Efficiency and Stochastic Dominance Efficiency of Stock Indices *

Determinants of firm exchange rate predictions:

Comparing Sharpe and Tint Surplus Optimization to the Capital Budgeting Approach with Multiple Investments in the Froot and Stein Framework.

A Hybrid Method to Improve Forecasting Accuracy Utilizing Genetic Algorithm An Application to the Data of Operating equipment and supplies

Numerical Evaluation of European Option on a Non Dividend Paying Stock

UNN: A Neural Network for uncertain data classification

Mind the class weight bias: weighted maximum mean discrepancy for unsupervised domain adaptation. Hongliang Yan 2017/06/21

MORNING SESSION. Date: Wednesday, May 4, 2016 Time: 8:30 a.m. 11:45 a.m. INSTRUCTIONS TO CANDIDATES

(1 + Nominal Yield) = (1 + Real Yield) (1 + Expected Inflation Rate) (1 + Inflation Risk Premium)

THE APPLICATION OF REGRESSION ANALYSIS IN TESTING UNCOVERED INTEREST RATE PARITY

A Novel Approach to Model Generation for Heterogeneous Data Classification

Fairing of Polygon Meshes Via Bayesian Discriminant Analysis

Estimating intrinsic currency values

DYNAMIC ECONOMETRIC MODELS Vol. 8 Nicolaus Copernicus University Toruń 2008

The Proposed Mathematical Models for Decision- Making and Forecasting on Euro-Yen in Foreign Exchange Market

Differences in the Price-Earning-Return Relationship between Internet and Traditional Firms

Hardware-Assisted High-Efficiency Ray Casting of Unstructured Time-Varying Flows Using Temporal Coherence

A Hybrid Method for Forecasting with an Introduction of a Day of the Week Index to the Daily Shipping Data of Sanitary Materials

Agricultural and Rural Finance Markets in Transition

Assessing Long-Term Fiscal Dynamics: Evidence from Greece and Belgium

The Empirical Research of Price Fluctuation Rules and Influence Factors with Fresh Produce Sequential Auction Limei Cui

Empirical Study on the Relationship between ICT Application and China Agriculture Economic Growth

Economics of taxation

Introduction. Enterprises and background. chapter

Overview of the limits applicable to large exposures across Europe

Gaining From Your Own Default

Problem Set 1 Answers. a. The computer is a final good produced and sold in Hence, 2006 GDP increases by $2,000.

Permanent Income and Consumption

HFR Risk Parity Indices

STOCK PRICES TEHNICAL ANALYSIS

Co-Integration Study of Relationship between Foreign Direct Investment and Economic Growth

The Asymmetric Effects of Government Spending Shocks: Empirical Evidence from Turkey

Pricing Model of Credit Default Swap Based on Jump-Diffusion Process and Volatility with Markov Regime Shift

SkyCube Computation over Wireless Sensor Networks Based on Extended Skylines

Open Access Impact of Wind Power Generation on System Operation and Costs

PFAS: A Resource-Performance-Fluctuation-Aware Workflow Scheduling Algorithm for Grid Computing

MULTI-SPECTRAL IMAGE ANALYSIS BASED ON DYNAMICAL EVOLUTIONARY PROJECTION PURSUIT

Multiple Choice Questions Solutions are provided directly when you do the online tests.

Index Mathematics Methodology

Cointegration between Fama-French Factors

Keywords: School bus problem, heuristic, harmony search

Financial Stability Institute

ESSAYS ON MONETARY POLICY AND INTERNATIONAL TRADE. A Dissertation HUI-CHU CHIANG

SETTING CUT OFF SCORES FOR SELECTIVE EDITING IN STRUCTURAL BUSINESS STATISTICS: AN AUTOMATIC PROCEDURE USING SIMULATION STUDY.

Online appendices from The xva Challenge by Jon Gregory. APPENDIX 14A: Deriving the standard CVA formula.

Game-theoretic dynamic investment. information: futures contracts

FINAL EXAM EC26102: MONEY, BANKING AND FINANCIAL MARKETS MAY 11, 2004

The Net Benefit to Government of Higher Education: A Balance Sheet Approach

OPTIMIZED CALIBRATION OF CURRENCY MARKET STRATEGIES Mustafa Onur Çağlayan 1, János D. Pintér 2

Supplement to Chapter 3

Global Warming and International Cooperation

CENTRO DE ESTUDIOS MONETARIOS Y FINANCIEROS T. J. KEHOE MACROECONOMICS I WINTER 2011 PROBLEM SET #6

Bond Prices and Interest Rates

EXPLOITING GEOMETRICAL NODE LOCATION FOR IMPROVING SPATIAL REUSE IN SINR-BASED STDMA MULTI-HOP LINK SCHEDULING ALGORITHM

The impact of government support to industrial R&D on the Israeli economy. Final report. English translation from Hebrew

Enforcement aspects of conservation policies: compensation payments versus reserves. Energy, Transport and Environment

MACROECONOMIC CONDITIONS AND INCOME DISTRIBUTION IN VENEZUELA:

Career wage profiles and the minimum wage

Transcription:

RB/3/2011 Crypographc echnques used o provde negry of dgal conen n long-erm sorage REPORT ON THE PROBLEM

Problem presened by Marn Šmka Paweł Wojcechowsk Polsh Secury Prnng Works (PWPW) 1

Repor auhors Małgorzaa Bladoszewska (Unversy of Warsaw) Tomasz Brożek (Warsaw School of Informaon Technology) Mchał Zając (Unversy of Warsaw) Conrbuors Lucyna Ceślk (Polsh Academy of Scences) Mara Donen-Bury (Unversy of Warsaw) Kaml Kulesza (Polsh Academy of Scence) John Ockendon (Unversy of Oxford) Łukasz Sener (Polsh Academy of Scences) Por Wojdyłło (Polsh Academy of Scences) Wladmr Zubkow (Unversy of Oxford) was jonly organsed by Sysem Research Insue of he Polsh Academy of Scences Insue of Mahemacs of he Polsh Academy of Scences Oxford Cenre for Collaborave Appled Mahemacs and was suppored by Sygny S.A. Indusral Developmen Agency Jon Sock Company under he honorary paronage of The Brsh Embassy n Poland 2

Execuve Summary The man objecve of he projec was o oban advanced mahemacal mehods o guaranee he verfcaon ha a requred level of daa negry s mananed n long-erm sorage. The secondary objecve was o provde mehods for he evaluaon of daa loss and recovery. Addonally, we have provded he followng nal consrans for he problem: a lmaon of addonal sorage space, a mnmal hreshold for desred level of daa negry and a defned probably of a sngle-b corrupon. Wh regard o he man objecve, he sudy group focused on he exploraon mehods based on hash values. I has been ndcaed ha n he case of gh consrans, suggesed by PWPW, s no possble o provde any mehod based only on he hash values. Ths observaon sems from he fac ha he hgh probably of b corrupon leads o unaccepably large number of broken hashes, whch n urn sands n conradcon wh he lmaon for addonal sorage space. However, havng loosened he nal consrans o some exen, he sudy group has proposed wo mehods ha use only he hash values. The frs mehod, based on a smple scheme of daa subdvson n dsjon subses, has been provded as a benchmark for oher mehods dscussed n hs repor. The second mehod ( hypercube mehod), nroduced as a ype of he wder class of clever-subdvson mehods, s bul on he concep of rewrng daa-sream no a n-dmensonal hypercube and calculang hash values for some parcular (overlappng) secons of he cube. We have obaned neresng resuls by combnng hash value mehods wh error-correcon echnques. The proposed framework, based on he BCH codes, appears o have promsng properes, hence furher research n hs feld s srongly recommended. As a par of he repor we have also presened feaures of secre sharng mehods for he benef of novel dsrbued daa-sorage scenaros. We have provded an overvew of some neresng aspecs of secre sharng echnques and several examples of possble applcaons. 3

Table of conens 1 INTRODUCTION... 5 1.1 PROBLEM DESCRIPTION... 5 1.2 PROBLEM BREAKDOWN... 5 2 HASH FUNCTIONS... 6 2.1 BASICS... 7 2.2 RESTRICTIONS ON USING HASH FUNCTIONS... 7 2.3 ALTERNATIVE DIVISION METHOD... 8 2.4 HASH CODES WITH ERROR CORRECTION... 11 3 SECURE SECRET SHARING METHOD... 13 3.1 BASIC CAPABILITIES... 14 3.2 EXTENDED CAPABILITIES... 15 3.3 COMBINING PROPERTIES... 15 3.4 ADDITIONAL CONSIDERATIONS... 16 3.5 OPEN QUESTIONS... 16 4 CONCLUSION AND PROPOSALS FOR FURTHER RESEARCH. 16 4.1 HASH FUNCTIONS... 16 4.2 SECRET SHARING METHOD... 17 4.3 OTHER POSSIBILITIES... 17 BIBLIOGRAPHY... 18 5 APPENDIX... 18 5.1 HYPERCUBE MODEL... 18 4

1 Inroducon 1.1 Problem descrpon (1.1.1) The ncrease n he amoun of daa, boh creaed and sored elecroncally, enals he necessy o consruc varous daa sorage sysems. In he vew of dfferen requesed sorage perods, we dvde sysems no: shor-erm sorage perod no longer han 3 years, medum-erm sorage perod beween 3 and 10 years, long-erm sorage perod longer han 10 years, bu wh a specfed end-dae, unlmed sorage perod longer han 10 years wh no specfed end-dae. (1.1.2) The unlmed sorage s somemes called eernal. In hs case we have o pay specal aenon o he negry of he sored dgal conen. For hs reason, varous dgal markng echnques are used, so ha even afer a long me one should be able o verfy he negry of sored daa. (1.1.3) The man objecve s o use advanced mahemacal mehods, especally crypographc echnques appled n he process of dgal markng of he conen. These echnques ough o guaranee verfcaon and negry of he long-ermsored dgal conen. (1.1.4) Proposed mehods should ake no accoun manly: dfferen knds (classes) of sored conen, e.g. culural herage, cour documenaon, accounng documenaon ec., lmaons of daabase sze, ancpaed frequency of access o sored resources. (1.1.5) Anoher very mporan aspec of he problem consss of fndng he lms on applcaons of advanced mahemacal mehods, especally hose based on crypographc echnques and checkng her applcably n he evaluaon of daa losses (e.g. due o he "corroson" of meda) as well as n a poenal daa recovery. Orgnal daa s marked as daa n me 0, whle daa ha mgh be corruped (because of corroson ) as daa n me 1. (1.1.6) Specal aenon should be pad o: Sysems and schemes of codng, whch allow for a deecon and correcon of wre errors Crypographc echnques, such as: publc-key and asymmerc encrypon, secre sharng mehods, secure mulpary compuaons. 1.2 Problem breakdown (1.2.1) A few assumpons and consrans have been proposed by he PWPW Represenave when dscussng he problem: T - amoun of sored daa R - amoun of addonal dsk space we can use, n order o provde proof of daa correcness we assume ha R 0. 1T r - b error rae (BER), we assume ha r s abou 0.01,.e. a me of esng negry of daa, 1% of all bs s corruped. 5

g - accuracy of gven proof of correcness, we assume ha g s abou 0.01, ha s T Tr Tg our proof should show ha a leas of daa s correc. T 0 - me of sorng he orgnal daa 1 - me of esng he negry of daa (1.2.2) The problem les n fndng such a mehod ha can be used o deermne a me 1, wh gven accuracy, he rao of he correc daa, sored a me 0 o he all daa avalable. Furhermore, s expeced ha he mehod allows assessng wheher daa s false. (1.2.3) I would be useful f he mehod proved ha daa s no corruped above a ceran hreshold value of BER. (1.2.4) Durng he alk wh he PWPW Represenave we made he followng assumpons and remarks: Errors canno be avoded. A carrer whch sores our daa s mperfec, so we can be sure ha here wll be errors n daa over long me horzon. The b error rae, amounng up o 1% of daa, s very hgh. For example, le us assume ha we have a book n whch every sngle leer s coded wh 8 bs. Due o he error rao, we ancpae abou 1 error n every sequence of 100 bs, so n every sequence of 12 leers we shall expec a wrong leer. Therefore, n hs paper we would lke o presen some soluons n whch our nal assumpons were less consraned and hs rao s assumed o be smaller. Sored daa s organzed n fles and we know he ype of every fle, lke documen, vdeo, audo, archve fles. Neverheless, we can rea daa as a sequence of bs (raw daa approach). (1.2.5) The PWPW Represenave presened an dea of usng hash funcons so as o provde a proof of correcness of parcular pars of daa. Our work shows ha he use of hash funcon only s no suffcen o complee our ask, so hash funcons wh addonal mehods of correcng errors have been consdered. We have also aken no accoun anoher way of dealng wh hashes. I s based on he dea of compung a number of hashes from dfferen dvsons of daa no blocks (a.k.a. hypercube mehod). (1.2.6) We do no have any nformaon abou physcal properes of daa carrers, so we added an assumpon of unformly dsrbuon of errors. If furher deals abou dsrbuon are avalable, our mehods can be calbraed o deal wh whou any loss of usably. (1.2.7) Havng deal wh hash funcons, we focused on a soluon based on secre sharng mehod. Due o me lmaons, however, could no have been compleed durng he workshop. Therefore, we have presened a helcoper vew of he funconales provded by secre sharng schemes. 2 Hash funcons The sarng pon for our research was a mehod based on hash funcons. In hs chaper we wll show resrcons of usng hash funcons and descrbe he man deas of exenson of such approach. 6

2.1 Bascs The erms and noaon used n hs secon comes, unless saed oherwse, from Handbook of Appled Crypography ( [1]). (2.1.1) The hash funcon s a well known crypographc ool, wdely appled n provdng daa negry check. The dea behnd s use n he dscussed problem s very smple we compue he value of he hash funcon for gven daa wce: a he begnnng and a he end of he sorage process. If daa s changed (loss of negry), hen hese wo hash values would mos lkely dffer, oherwse boh values reman he same. (2.1.2) In a more formal way, we can say ha hash funcon h s a funcon from k { 0,1} o l { 0,1}, whch has he followng properes 1 : A mnor change of he npu srng alers he oupu n a leas l / 2 bs. Probably of fndng a b-sream of he same hash value as anoher gven bsream s neglgble 2. Probably of fndng a b-sream of a gven hash value s neglgble. (2.1.3) In order o nroduce he noaon used n furher secons, we shall descrbe he negry check process n a more formal way: We descrbe daa by s and hash value by v, where v = h(s) a he begnnng of sorng perod (me 0 ). Afer sorage (me 1 ) daa may dffer a lle (e.g. due o he corroson), so we descrbe by s ' and he correspondng hash value by v ' = h( s' ). If v = v' hen he daa s correc wh probably almos 1, oherwse we conclude ha daa s corruped. Unforunaely, we do no know he percenage of corruped bs even f only one b changes, he whole block s corruped. 2.2 Resrcons on usng hash funcons (2.2.1) The use of hash funcons n a way presened above has some lmaons. We shall presen hem wh he followng algorhm: le us assume ha we have dvded daa no k blocks of he same lengh. For every block we compue he hash value a he begnnng and a he end of he sorage perod. Le us say ha we have deeced l corruped blocks (he approprae hash values dffer). Then he rao l / k descrbes he upper bound of he b-corrupon rao. (2.2.2) More formally, we can descrbe he procedure above n hs way: We dvde a sequence of bs s no blocks a 1,a 2,,a k of he same lengh. For = 1,2,..., k we calculae and save he value v = h( a ). Nex, we compare wh he value v' = h( a' ), ha s wh he hash funcon value compued on he block afer sorage perod. If probably almos 1, ha here were no corruped bs n he par cases we have o assume ha every b mgh be corruped. v = v', we know wh a. In oher 1 Where l < k. Secure hash funcons should have he lengh of oupu l >= 256 bs. 2 Tha s expeced me needed o oban wo b-sreams wh he same has value s exponenal o he lengh of oupu of consdered funcon. 7

We compue l = { {,... k} : } 1,2 v v' and denoe he value l / k as b. (2.2.3) In order o deermne he usefulness of hash funcons we calculae he expeced value of he rao l / k under consrans gven n (1.1.2). Snce bs are corruped ndependenly (as we have assumed above), we have he followng probably ha -h b s no corruped: P a ( v v' ) = (1 r) =. (1) The expeced value of he l / k reads: 1 k a E( b) = (1 P( v ' )) 1 (1 ) 1 = v = r. k = (2) As menoned n (2.1.2), he lengh of hash funcon oupu s abou 256 bs. Snce he addonal space for hash codes s R = 0. 1T, he lengh of a for = 1,2,..., k should be a leas 2560 bs long. Therefore, n hs case he expeced value of b error rae s: 2560 12 E ( b) = 1 (0.99) = 1 6.7 10 1, (3) whch s unaccepably hgh. (2.2.4) In concluson, dvdng daa no dsjon blocks and compung a hash value for each of hem o check he negry of daa s no parcularly useful when assumng he consrans gven n (1.2.1). Such consrans requre blocks o be que bg, whch makes he probably of block corrupon equal almos 1. However, removng some of consrans and usng smaller oupu blocks (e.g. 100 bs long) resuls n lower performance of he hash funcon. 2.3 Alernave dvson mehod Inroducon (2.3.1) In paragraph 2.2, we have dscussed he scheme of dvdng daa no dsjon blocks of equal lengh. Naurally, hs s no he only possble approach o he gven problem: a sngle b needs no o be only n one block and blocks may have dfferen lengh. I ranspres ha dvdng daa no blocks n a clever way leads o a beer esmaon of a corrupon rao, so ha he upper bound for he rao s closer o real value of rao. (2.3.2) Our frs sep o consruc such a clever dvson was o arrange bs n a square (as n he Fgure 1. In hs case he blocks for whch we calculae hash codes are composed as rows and columns n a square. Therefore, each b s ncluded n 2 blocks (1 row and 1 column, cf. Fgure 1). 8

Fgure 1 Example of arrangng daa n a square, 12 hash values are calculaed and saved 6 for rows ( v 1,1, v1,2,..., v1, 6 ) and 6 for columns ( v 2,1, v2,2,..., v2, 6 ). (2.3.3) A frs glance, seems o make no sense o dvde daa no overlappng blocks blocks would need o be longer so as o sasfy he requremen R = 0. 1T, whch ncreases he probably of corrupng a hash value. Neverheless, such an approach may provde a very good upper lm of errors n he daa f ceran assumpons are fulflled. Consder he suaon presened n he Fgure 2. Fgure 2 Vsualzaon of he gven daa afer sorage. Blocks, for whch hash value has changed, are hghlghed. As every corruped b changes he hash value eher for row and column, all poenally corruped bs are locaed on he nersecons of he hghlghed rows and columns. Le p 1, p2 be he percenages of corruped hash values n respecvely rows and columns. The percenage of corruped hash values for all blocks equals p 1 + p p 2 h = (4) 2 However, he upper bound of corruped bs s generally smaller and reads: b = p 1 p 2. (5) In he case presened n he Fgure 2 p h = 0.417, b = 0.167, r 0. 083. Generalzaon he hypercube mehod (2.3.4) A generalzaon of he mehod presened n (2.3.2) can be obaned by arrangng daa n a d -dmensonal hypercube (cf. Fgure 3 for 3-dmensonal case). 9

Fgure 3 3-dmensonal hypercube conssng of 64 cells (grey). Every cell represens 1 b of daa. Block s a subse of bs formng a secon (red, green). In hs case here are 3 16 = 48 blocks. In hs general case, f he percenages of corruped hash values n dmensons 1,2,...,d equal p 1, p2,..., pd respecvely, hen we have he followng upper bound of b error rae: b = d 1 p1 p2... pd. (6) Calculaons and opmzaon for hypercube mehod (2.3.5) I was our am o fnd such parameers of he hypercube ha he obaned upper bound of corruped bs s he bes (he lowes). A he same me, we opmzed he number of dmensons d and he sze (number of cells) S of he hypercube (dealed calculaons are presened n he Appendx). We decded o dvde he daa no pars, each conssng of S bs and make a hypercube for each of hem separaely. (2.3.6) For he purpose of calculang he expeced value of b, we assumed ha he dsrbuon of errors n he daa s unform and bs are corruped. (2.3.7) We calculaed ha hs mehod does no work for nal consrans: r = 0.01, R = 0. 1T and 256-b-long hash codes. The reason for hs dscovery s analogous o he one descrbed n Secon 2.2: for large blocks he probably of corrupng s hash value s very hgh. Therefore, we decded o change some of our assumpons. Frsly, we chose r = 0. 0001. Secondly, we decded o use 100-b-long hash codes. Resuls (2.3.8) For he assumpons made n 2.3.7, he opmal dmenson of a hypercube s equal o 6 2 and each par consss of 4 10 bs (see Appendx). In hs mehod he expeced value of he upper bound of corruped bs s smaller han 0.058. We would lke o emphasze ha dvdng daa no dsjon blocks of bs (mehod descrbed n 2.2.1) under he same assumpons would gve he expeced upper bound of corruped bs equal o 0.095. The hypercube mehod lowers he upper bound of errors more han 1.6 mes. 10

Remarks & furher research (2.3.9) As ndcaed n (2.3.6), we assumed unform dsrbuon of errors n daa. We suspec ha hs mehod may work very badly for a specfc dsrbuon of errors. However, we beleve ha hs problem s manageable. (2.3.10) The mehod s based on arrangng daa n a hypercube and calculang hash values for secons. Generally, here mgh be oher accepable ways of dvdng daa no blocks, gvng lower expeced value of b under he same assumpons. Frsly, we can arrange daa n a hypercube and calculae hash values for oher subses of bs, for example hyper-planes. Secondly, we can abandon he dea of he hypercube and nven a compleely dfferen dvson. (2.3.11) Ths mehod does no gve a sasfyng upper bound of corruped bs for r = 0. 01, whch makes useless n some real-world applcaons. On he oher hand, f we had a mehod of measurng he level of he negry of he daa based on dvdng daa no blocks, we mgh mprove he expeced value of b by dvdng daa e.g. n a way presened n 2.3.4. We recommend as a supporng ool. 2.4 Hash codes wh error correcon Inroducon (2.4.1) The major lmaon of he hash funcons n solvng he problem s a very hgh probably of hash value corrupon for long blocks. As menoned before, he analyss of he hash values enables us only o say wheher here are any corruped bs n a block. To calculae he upper bound, we have o assume ha all bs from he block marked as corruped may have changed. If we recognzed whch blocks are corruped only a b and whch are more corruped, hen we would be able o measure he level of he negry of daa more precsely. (2.4.2) For he purpose of such recognon, we decded o use error correcng codes. Generally, error correcng codes are bs added o orgnal daa (or par of daa), wh he am of correcng a predeermned number of errors. In our problem, we use hem n he followng way (example n he Fgure 4): 1. We dvde daa no blocks and calculae hash values for each of hem. 2. We add codes correcng up o d errors o each block. 3. Afer sorage we correc errors usng error correcng codes. 4. Afer correcon we compare saved and new hash values. If here are more han d errors n a parcular block, hen sored and calculaed hash values dffer. Oherwse, hey wll reman he same. Based on he nformaon of how many hash values are changed, we can calculae he upper lm of corruped bs. 11

Fgure 4 Exemplary vsualsaon of he proposed mehod. There are 4 blocks of 5 bs each. We add codes correcng 1 error (grey). Some errors occur n daa afer sorage (3.) (red). We use codes o correc hem. Fnally, we calculae hash values agan. If a number of errors n a block was bgger han 1 (lke n a second block), hen no all errors were correced and he hash value s changed. (2.4.3) In order o grasp he sgnfcance of hs mehod, consder a suaon presened n he Fgure 4. Afer he sorage here are 4 errors: 2 n orgnal daa and 2 n he added bs. If we used mehod presened n 2.2, wo blocks would be corruped. Snce we use error correcng codes, we can recognze ha n he fourh block only 1 b has changed. Moreover, we can correc hs error, whch s an added value of he mehod. Furhermore, he errors whch occur n added bs are also correced. I means ha we do no need o deal wh hem addonally. Theory (2.4.4) Durng our research we focused only on he BCH error correcng codes. We would lke o quoe he heorem, whch enabled us o make some calculaons. We wll use he followng erms: word sequence of bs; coded word a word whch we would lke o correc; conrol symbols addonal symbols (bs) used o correc errors n a coded word; codng word a coded word wh conrol symbols. (2.4.5) Below we shall presen he heorem of he BCH codes (proof n [2]): m 2 2 For each d, m Z+, d < here exss such a BCH code ha all followng m saemens are rue: Codng words are 2 m 1 long. Ths code correcs d errors n a codng word. The number of conrol symbols s d m. m Ths means ha he lengh of a coded word s 2 dm 1. 12

(2.4.6) One of he mos mporan conclusons of hs heorem s ha we do no need a lo of addonal space for conrol symbols. If he sze of a block s A, we need approxmaely d log A of addonal space o correc d errors. Resuls (2.4.7) We were neresed wheher we overcame he major lmaon of usng hash funcons, so we calculaed he probably of corrupng hash value. I ranspred ha usng error correcng codes combned wh hash funcons would gve sasfyng upper bound of corruped bs for r 0. 5. We made calculaons for dfferen values of r and R and red o choose he bes parameers m, d for hem. We assumed ha we know he BCH code correcng d errors n he 2 m 1 bs long codng word. We decded o use 100-b-long hash codes. Resuls are presened n he Table 1. Table 1. Probably of corrupng hash code dependng on he values of m, d, r, R. m d r R Probably of corrupng hash code 16 600 1% 0.185T < 0.1% 16 357 0.5% 0.1T < 0.01% 17 678 0.53% 0.1T < 0.7% 16 357 0.53% 0.1T < 0.4% 15 187 0.53% 0.1T < 0.5% 14 96 0.53% 0.1T < 1.6% Remarks & furher research (2.4.8) For he purpose of calculaons we assumed ha he dsrbuon of errors n he daa s unform (lke n he paragraph 2.3). Once more nformaon of error dsrbuon s avalable (e.g. he specfc sorage hardware s seleced), obaned resuls can be adaped accordngly, possbly wh mproved performance. (2.4.9) The man advanage of hs mehod s ha no only measures he level of he negry of sored daa, bu also mproves. I can also be combned wh error codes ha are already used by PWPW wh he am of enhancng performance. (2.4.10) We would lke o emphasze ha he heorem (2.4.5) guaranees only he exsence of he BCH code sasfyng some requremens. We do no know wheher effecve algorhms of consrucng such a code or codng and decodng words exs. Moreover, canno be ruled ou ha here are some oher error correcng codes whch mgh be more useful n a real world applcaon. Ths area s open for furher sudy. 3 Secure Secre Sharng Mehod Havng nvesgaed hash funcons, we shall now check a dfferen approach. Apar from ordnary verfcaon of he negry of long-erm-sored dgal conen, mgh provde some addonal feaures, namely: 13

exended capables n: verfcaon of he negry, recovery of corruped bs, desgn of he access srucure o he sored dgal conen; an opporuny o opmze he PWPW s requremens concernng sorage. All he feaures lsed above and many ohers can be provded by secre sharng proocols. In crypography, Secure Secre Sharng (SSS) scheme [4] s undersood as a mehod of he dsrbuon of a secre among a group of parcpans, all of hem havng her own share n he secre. The secre can be reconsruced only when auhorzed parcpans combne her shares. 3.1 Basc capables (3.1.1) By usng Secre Sharng Schemes one can sore daa dsrbued n some nsecure locaons n a secure 3 way ([3]). (3.1.2) Threshold secre sharng. A hreshold s a mnmal number of parcpans whch have o co-operae o reconsruc he secre. A scheme, where a leas ou of n players s necessary o reveal he secre s descrbed as a (, n) hreshold scheme. I allows placng securely 1 shares ousde secure locaons (e.g. own rused sysems), say, lerally dsrbue 1 shares over he Inerne. (3.1.3) Schemes for whch we can provde verfcaon of he negry of secres are called Verfable Secre Sharng (VSS). (3.1.4) A proper desgn of he access srucure mproves he funconaly of secre sharng. One of he smples access srucures was presened above every se of a leas ou of n parcpans s allowed o reconsruc he secre. More advanced srucures can be mplemened as follows: P = P,..., P } s a se of parcpans akng par n sharng. { 1 n Every famly R of subses of P can be an access srucure. We can provde dfferen levels of access for dfferen parcpans. For example, he man parcpan (PWPW) has more rghs han a rused ousder (e.g. governmenal nsuons), whch n urn has more rghs han a no rused parcpan (e.g. ones usng shares from he Inerne). Example 1 (generalsed access srucure) Our ask s o guaranee verfcaon of he negry of long-erm-sored dgal conen. For example, le us consder recordngs of speeches of famous polcans. One can dsrbue a secre among some governmenal nsuons and se he condon under whch he secre can be revealed, e.g. a leas 5 nsuons from 5 dfferen mnsres have o collaborae n order o reconsruc he secre and so on. By usng addonal parcpans wh dfferen levels of prvleges we can mnmse he probably of leakng or losng he daa. 3 In he secre sharng, here are a leas wo noons of secury: nformaon-heorecal and compuaonal secury. There are sgnfcan dfferences beween he wo ypes, ye, s raher beyond he scope of hs paper. In order o smplfy furher dscusson whou losng s generaly, we wll smply dscuss secure or perfecly secure secre sharng schemes. 14

3.2 Exended capables (3.2.1) Now exended capables of secre sharng schemes shall be presened. (3.2.2) Pre-posoned secre sharng. A pre-posoned secre sharng s an example of an access srucure where all daa requesed o reconsruc he secre s known excep for a sngle crucal share whch has o be gven laer. For example, he PWPW can dsrbue he whole daa over he Inerne by a pre-posoned secre sharng scheme wh a shor, crucal share kep locally. Le us explore a dfference beween secre sharng and smple encrypon of daa n hs model. The advanages wll be clear once more exended capables are oulned. Example 2 (scheme wh an acvang share) Le us assume ha we have a suaon descrbed n he Example 1 daa s sored locally on he servers of PWPW and n a few places all over he world n he Uned Saes, Chna, Russa ec. By means of a pre-posoned scheme foregn nsuons can parake n gven shares (a share made ou of a share s called a subshare) beyond unauhorsed parcpans who canno reconsruc nsuonal shares unl foregn and rused pares cooperae, because her shares are crucal. (3.2.3) Proacve Secre Sharng (PSS) has he followng feaures: One can change (perodcally renew) parcpans' shares n a secre whou revealng or changng. One can recover corruped shares (hese shares correspond o dshones parcpans). In our case we can perodcally check he conssency of shares and recover corruped ones. Anoher reason o use pro-acve secre sharng s he fac ha f we fnd a corruped share durng he verfcaon process (by e.g. VSS scheme), we can easly replace a broken share wh a correc one. So here s a smple way o manan negry of shares perodcally, whch mples negry of daa. (3.2.4) Mul-secre shares have he followng feaures: A scheme where any subse of se of parcpans shares anoher secre s avalable. I seems ha a sngle share can be used n a few secres, opmzng sorage space. Example 3 (mul-secre scheme) In he presened case we can use a mul-secre scheme. We do no need o creae a separae shares and secres for all fles. We can make jus a sngle sharng scheme wh such a propery ha dfferen subses of foregn nsuons can reconsruc speeches of dfferen polcans and all speeches reconsruced n hs way make up a collecon. 3.3 Combnng properes (3.3.1) One of he mos desred properes of secre sharng schemes s s flexbly n combnng funconales descrbed above. Furher research s requred o descrbe whch properes can be combned wh each oher. (3.3.2) In our case we mgh conduc research amed a developng a scheme whch s e.g. perfecly secure, pro-acve, negry-provdng and acvaed by a share from he PWPW. a mul-secre scheme where any subse of parcpans has s own secre ha canno be revealed whou he share from he PWPW. 15

(3.3.3) A scheme wh combned properes s no necessarly a exbook maeral avalable rgh away, bu has o be carefully engneered nsead. Hence, some addonal work mgh be requred before mplemenaon. 3.4 Addonal Consderaons (3.4.1) Reconsrucng orgnal daa from shares mgh occasonally need some compuaonal effor and daa may no always be avalable n real-me. Sll, ask complexy s polynomal n me, ye, usually feasble n pracce. (3.4.2) I seems ha once a perfecly secure secre sharng scheme s appled s users should be proeced agans fuure developmens n crypoanalyss, whch would affec he crypography based on compuaonal complexy (e.g. mos of he employed publckey cryposysems lke RSA). 3.5 Open quesons (3.5.1) As descrbed above, he secre sharng schemes provde many ools o deal wh he PWPW problem. Sll, here are some open quesons ha defnely need furher nvesgaon: Whch verfcaon echnques are opmal n solvng he problem of corruped shares and daa n he case of dgal conen of he PWPW neres? One should remember ha by usng secre sharng scheme, we can provde some verfcaon based on secure mul-pary compuaons. Furher effor should be expended o desgn an opmal access srucure for parcular ypes of sored fles. Varous ypes of fles have dfferen daa ha s crucal o her conssency. I seems ha we do no necessarly need o proec he whole daa, bu only some crucal pars. I s worh consderng whch fragmens are really mporan for each ype of fles. If we made hs classfcaon, we would be able o proec crucal pars only by means of secre sharng. 4 Concluson and proposals for furher research 4.1 Hash funcons (4.1.1) Le us recall he mehod proposed n secon 2.2. We dvded daa no dsjon blocks of he same lengh and compared wo hash values for each of hem he frs value was compued before he sorage, he second afer. Ths mehod was no consdered as a good way of dealng wh he gven problem. The probably ha n any block of daa for whch we compue he value of hash funcon wll reman he same afer some me s neglgble. (4.1.2) Wh some addonal assumpons, lke lmed daa sze and smaller b error rae, we have shown ha dvdng daa no blocks n a clever way may mprove he esmaon of corruped bs rao. However, due o s lmaons, hs mehod should be appled only as a supporng ool. (4.1.3) Hash funcons combned wh error correcon mehods may provde very good error esmaon. Under gven consrans concernng he b error rae and he maxmal amoun of addonal daa, here s a probably of 0.0001 ha no every error n a sngle code word wll be correced. (4.1.4) One of he mos convenen cases, for whch we can prove ha he upper bound of he number of corruped bs s small, s when errors are unformly dsrbued. 16

Neverheless, we beleve ha errors occur raher n blocks, n parcular pars of carrer of daa ec., bu no unformly. The queson s can we reorganze he bs o make he dsrbuon of errors unform? 4.2 Secre sharng mehod (4.2.1) Secre sharng mehod s an alernave way of hnkng abou daa sorng. I provdes a number of new funconales whch allow sorng of daa dvded among some local (rused) parcpans and some unrused pares (lke publc FTP servers or n general he Inerne ) n a secure way. (4.2.2) Dfferen parcpans akng par n daa sharng can enjoy a dfferen level of prvleges n daa access and recovery. I s mporan o deermne how many levels of prvleges should be desgned and how many parcpans should be on each level. I seems ha almos any access srucure can be mplemened by usng he secre sharng. (4.2.3) In many secre sharng mehods we assume ha shares sored locally (rused parcpans) are a leas of he sze of secre. I s worh nvesgang wheher s possble o delver a mehod whch would be boh: secure and space-savng (.e. local shares are smaller han a secre). We beleve ha such schemes can be obaned. (4.2.4) An mporan propery of secre sharng schemes s verfably of shares. I s especally crucal n our problem, n whch we deal wh corruped daa, as verfcaon proocols can play a role of correcng codes. I s worh explorng whch of hem would be opmal n our problem. (4.2.5) Dfferen secre sharng schemes have varous properes. We have descrbed properes of schemes whch are: pro-acve (we can perodcally change parcpans shares), pre-posoned (here s a crucal share whou whch a secre canno be revealed), mul-secre (a few dfferen secres are shared) or verfable (we can deermne whch shares were corruped and reveal a secre whou hem). The queson s: whch of he menoned properes can be combned? 4.3 Oher possbles (4.3.1) In hs secon, we wll oulne an addonal approach, whch was dscussed afer he 77 h ESGI, neverheless s worh furher research. There are check-dg schemes ha allow deermnng wheher a b-sream was corruped over a ceran hreshold, say, 1% of bs were changed. Should hs be a case, he check-dg scheme provdes nformaon ha corrupon has occurred. Usually he hreshold can be se ndvdually for a parcular applcaon. Furhermore, snce he man ask of he scheme s error deecon no error correcon, usually less addonal nformaon s sored (shorer checksum) han n error correcon codes. In general, he lengh of he checksum can even decreased furher, should sascal reasonng be nroduced, for nsance s allowed ha n a small number of cases scheme sensvy s dfferen from he se hreshold (no necessarly lower). In such a suaon, s even possble o decrease he raon of checksum s sze o he sze of nformaon sored wh he ncreasng volume of nformaon. A good example of such consrucon s graph colorng based on he check-dg scheme descrbed n [5]. I s recommended o research applcaons of check-dg schemes wh he characerscs oulned above for he purpose of he problem presened by PWPW and o revaluae resuls already obaned for hash funcons as well as o nvesgae a joned use of check-dg schemes wh secre sharng mehods. 17

Bblography [1] Alfred J. Menzes, Paul van Oorscho, Sco A. Vansone, Handbook of Appled Crypography, hp://www.cacr.mah.uwaerloo.ca/hac/ (lnk acve: 2010/10/17) [2] Wold Lpsk, Wkor Marek. Analza kombnaoryczna. Bbloeka Maemayczna PWN, Warszawa 1986. [3] Ronald Cramer, Ivan Damgård, Jesper Buus Nelsen, Mulpary Compuaon, an Inroducon, Conemporary Crypology (Caalano/ Cramer/ Damgaard/ DCrescenzo/ Poncheval/ Takag), Advanced Courses n Mahemacs CRM Barcelona, Brkhauser, 2005. [4] Ad Shamr, How o share a secre, Communcaons of he ACM 22 (11), pages 612 613, 1979 [5] Kaml Kulesza, Zbgnew Koulsk, On a Check-Dg Mehod Based On Graph Colorng, Proceedng of IEEE Inernaonal Conference on Compuer as a Tool, EUROCON 2007, Warsaw, Sepember 9-12, pp. 214-217, IEEE expolre. 5 Appendx 5.1 Hypercube model Inroducon (5.1.1) In hs paragraph we wll presen dealed calculaons for he specfc clever dvson based on a hypercube (he respecve dea and vsualzaon are presened n paragraph 2.3). Specfcaon (5.1.2) We wll use he followng noaon: he dmenson of hypercube H s d. Every b (cell) has d coordnaes ( x 1,x2, x3,..., xd ). The sze of daa S = 10, so he sde lengh of hypercube s d 10. d (5.1.3) A secon n H s a se of 10 cells such ha (d 1) of her coordnaes are he same. Secons are parallel o axes. Secon parallel o -h axs and meeng pon x 1,x 2,...,x d s defned below: ( ) S ( x,x 2,...,x d ) Dependences ( x,x,...,x ) 1 2 d H :(x1 = x1 ) (x2 = x 2 )... 1 = (7)... (x 1 = x 1 ) (x+ 1 = x+ 1 )... (xd = x d ) (5.1.4) Before sang anyhng abou he hypercube mehod, we shall dscuss he man dependences beween dfferen values descrbng he mehod, such as: addonal space R o remember hash values, he sze of daa S = 10, he dmenson of he hypercube d and so on. 18

(5.1.5) Frsly, we wll calculae how much of addonal space s necessary o remember he hash values. Every cell s n d secons and here are 10 cells. Every secon / d consss of 10 cells. As a resul, he oal amoun of secons n H s: d 10 10 d = d 10 d 1 d. (8) Snce we use 100 bs long hash values, we need addonal space equal o: R d 1 + 2 d = d 10 (5.1.6) I s also necessary o know he maxmal amoun of errors n daa, f here s a gven amoun of wrong hash codes. Suppose here are k errors n hash codes. Le (k,k,... 1 2,kd ) be he number of wrong hash codes n he frs, second, and d -h drecon respecvely. The followng upper bound of errors n he daa would be: d 1 k1 k 2... k d b. (9) 10 Moreover, s easy o show ha: d k k1 k2... kd. (10) d I means ha f here are k wrong hash codes: d k d 1 d b. (11) 10 The las hng we need s he maxmal amoun of wrong hash codes, f here are l corruped bs. Every b may corrup d hash codes, so he maxmal amoun of corruped hash codes s Opmzaon l d. (12) (5.1.7) Suppose ha afer sorage here are 10 4 errors n he daa (n oher words: r = 0.0001). Due o (12), we know ha hese bs are corruped a mos d 10 4 hash codes. Wh he nformaon ha a mos d 10 4 hash codes are wrong, we may calculae (from equaon #11) ha: d d 10 4 d 1 d b. (13) 10 We would lke o know ha no all of he bs are corruped ( b 1), so and d mus sasfy he nequaly: 19

d d d 10 4 d 1 Obvously, he lower, he more precse we mgh be. < 10 (14) < 4d. (15) 1 (5.1.8) As R = 0.1T = 10, we can creae addonal nequaly for and d d 1 + 2 1 d 10 d 10 (16) d( log 10 d + 3 ). (17) So and d mus sasfy: d( log 10 d + 3) < 4d. (18) For < 40 such d can be found. As ndcaed before, he lower, he more precse we are. We decded o choose = d( log 10 d + 3 ). (19) (5.1.9) We would lke o fnd he value of d whch would make our predcon more precse. The frs sep was o fnd d, such ha nerval(d( log 10 d + 3 ),4d) s as bg as possble. We defned funcon g(d) = d( 1 log10d) (20) and found s maxmum, whch s approx. 3.7. Then we checked values of d such as 2,3,4,5 and calculaed ha predcon s he mos precse when d = 2. (5.1.10) We decded o prove ha predcon s he mos precse when d = 2 Case d = 2 = 2( log10 2+ 3 ) 6.6 (21) 2 6.6 4 2 1 ( ) 10 =10 5.2 he possble number of errors. 5.2 10 0.0398 < 4% - he maxmal rao of corruped daa. 6.6 10 Oher cases ( d 2 ). We calculaed ha he possble percenage of corruped daa s equal o 4d d 1 10. (22) We would lke o be as small as possble. We calculaed ha for d > 1 hs funcon ncreases, so he opmum s d = 2. (5.1.11) We would lke o emphasze, ha b < 0. 04 only f r = 0. 0001. We calculaed (usng compuer), ha f E ( r) = 0. 0001 and bs corrup ndependenly wh probably 0.0001, hen E ( b) < 0. 058 20

(5.1.12) Noe ha dvdng daa n exclusve blocks of bs would gve a worse expeced upper bound of corruped bs, equal o 0.095. 21