Tree-based and GA tools for optimal sampling design

Similar documents
Quiz on Deterministic part of course October 22, 2002

II. Random Variables. Variable Types. Variables Map Outcomes to Numbers

Tests for Two Correlations

Solution of periodic review inventory model with general constrains

Economic Design of Short-Run CSP-1 Plan Under Linear Inspection Cost

ECE 586GT: Problem Set 2: Problems and Solutions Uniqueness of Nash equilibria, zero sum games, evolutionary dynamics

STAT 3014/3914. Semester 2 Applied Statistics Solution to Tutorial 12

Linear Combinations of Random Variables and Sampling (100 points)

/ Computational Genomics. Normalization

Elton, Gruber, Brown and Goetzmann. Modern Portfolio Theory and Investment Analysis, 7th Edition. Solutions to Text Problems: Chapter 4

Tests for Two Ordered Categorical Variables

An Example (based on the Phillips article)

OPERATIONS RESEARCH. Game Theory

Induction of Quadratic Decision Trees using Genetic Algorithms and k-d Trees

The Integration of the Israel Labour Force Survey with the National Insurance File

Chapter 5 Student Lecture Notes 5-1

MgtOp 215 Chapter 13 Dr. Ahn

Random Variables. b 2.

CS 286r: Matching and Market Design Lecture 2 Combinatorial Markets, Walrasian Equilibrium, Tâtonnement

Multiobjective De Novo Linear Programming *

EVOLUTIONARY OPTIMIZATION OF RESOURCE ALLOCATION IN REPETITIVE CONSTRUCTION SCHEDULES

Using Harmony Search with Multiple Pitch Adjustment Operators for the Portfolio Selection Problem

15-451/651: Design & Analysis of Algorithms January 22, 2019 Lecture #3: Amortized Analysis last changed: January 18, 2019

SIMPLE FIXED-POINT ITERATION

Clearing Notice SIX x-clear Ltd

Least Cost Strategies for Complying with New NOx Emissions Limits

Introduction to PGMs: Discrete Variables. Sargur Srihari

occurrence of a larger storm than our culvert or bridge is barely capable of handling? (what is The main question is: What is the possibility of

Using Conditional Heteroskedastic

Heuristic optimization of complex constrained portfolio sets with short sales

Parallel Prefix addition

CHAPTER 3: BAYESIAN DECISION THEORY

A HEURISTIC SOLUTION OF MULTI-ITEM SINGLE LEVEL CAPACITATED DYNAMIC LOT-SIZING PROBLEM

Efficient calculation of expected shortfall contributions in large credit portfolios

A Hybrid Meta-heuristic Approach for Customer Service Level in the Vehicle Routing Problem

THE ECONOMICS OF TAXATION

Notes are not permitted in this examination. Do not turn over until you are told to do so by the Invigilator.

Scribe: Chris Berlind Date: Feb 1, 2010

Cyclic Scheduling in a Job shop with Multiple Assembly Firms

3: Central Limit Theorem, Systematic Errors

Numerical Optimisation Applied to Monte Carlo Algorithms for Finance. Phillip Luong

Production and Supply Chain Management Logistics. Paolo Detti Department of Information Engeneering and Mathematical Sciences University of Siena

Capability Analysis. Chapter 255. Introduction. Capability Analysis

Problem Set 6 Finance 1,

Optimising a general repair kit problem with a service constraint

Labor Market Transitions in Peru

Global sensitivity analysis of credit risk portfolios

Dynamic Analysis of Knowledge Sharing of Agents with. Heterogeneous Knowledge

Centre for International Capital Markets

European Journal of Business and Management ISSN (Paper) ISSN (Online) Vol.5, No.6, 2013

Evaluating Performance

A FRAMEWORK FOR PRIORITY CONTACT OF NON RESPONDENTS

Equilibrium in Prediction Markets with Buyers and Sellers

Calibration Methods: Regression & Correlation. Calibration Methods: Regression & Correlation

Sequential equilibria of asymmetric ascending auctions: the case of log-normal distributions 3

Optimal Service-Based Procurement with Heterogeneous Suppliers

Instituto de Engenharia de Sistemas e Computadores de Coimbra Institute of Systems Engineering and Computers INESC - Coimbra

A Case Study for Optimal Dynamic Simulation Allocation in Ordinal Optimization 1

Risk and Return: The Security Markets Line

Topics on the Border of Economics and Computation November 6, Lecture 2

Using the Constructive Genetic Algorithm for Solving the Probabilistic Maximal Covering Location-Allocation Problem

references Chapters on game theory in Mas-Colell, Whinston and Green

The Hiring Problem. Informationsteknologi. Institutionen för informationsteknologi

A MODEL OF COMPETITION AMONG TELECOMMUNICATION SERVICE PROVIDERS BASED ON REPEATED GAME

Analysis of Variance and Design of Experiments-II

Mixed-Integer Credit Portfolio Optimization: an application to Italian segregated funds

Comparison of Singular Spectrum Analysis and ARIMA

Teaching Note on Factor Model with a View --- A tutorial. This version: May 15, Prepared by Zhi Da *

Survey of Math Test #3 Practice Questions Page 1 of 5

REFINITIV INDICES PRIVATE EQUITY BUYOUT INDEX METHODOLOGY

Stochastic ALM models - General Methodology

Mutual Funds and Management Styles. Active Portfolio Management

Efficient Sensitivity-Based Capacitance Modeling for Systematic and Random Geometric Variations

A Comparison of Statistical Methods in Interrupted Time Series Analysis to Estimate an Intervention Effect

Dr.Ram Manohar Lohia Avadh University, Faizabad , (Uttar Pradesh) INDIA 1 Department of Computer Science & Engineering,

Introduction. Chapter 7 - An Introduction to Portfolio Management

Financial mathematics

The IBM Translation Models. Michael Collins, Columbia University

Wages as Anti-Corruption Strategy: A Note

Elements of Economic Analysis II Lecture VI: Industry Supply

A Bootstrap Confidence Limit for Process Capability Indices

A DUAL EXTERIOR POINT SIMPLEX TYPE ALGORITHM FOR THE MINIMUM COST NETWORK FLOW PROBLEM

AC : THE DIAGRAMMATIC AND MATHEMATICAL APPROACH OF PROJECT TIME-COST TRADEOFFS

Mode is the value which occurs most frequency. The mode may not exist, and even if it does, it may not be unique.

ISyE 2030 Summer Semester 2004 June 30, 2004

Multifactor Term Structure Models

Creating a zero coupon curve by bootstrapping with cubic splines.

- contrast so-called first-best outcome of Lindahl equilibrium with case of private provision through voluntary contributions of households

HYBRIDISING LOCAL SEARCH WITH BRANCH-AND-BOUND FOR CONSTRAINED PORTFOLIO SELECTION PROBLEMS

iafor The International Academic Forum

Project Management Project Phases the S curve

Appendix - Normally Distributed Admissible Choices are Optimal

Advisory. Category: Capital

FM303. CHAPTERS COVERED : CHAPTERS 5, 8 and 9. LEARNER GUIDE : UNITS 1, 2 and 3.1 to 3.3. DUE DATE : 3:00 p.m. 19 MARCH 2013

TCOM501 Networking: Theory & Fundamentals Final Examination Professor Yannis A. Korilis April 26, 2002

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #21 Scribe: Lawrence Diao April 23, 2013

Graphical Methods for Survival Distribution Fitting

A multi-objective approach to the parcel express service delivery problem

Pricing Policies under Different Objectives: Implications for the Pricing Behaviour of AWB Ltd.

Transcription:

Tree-based and GA tools for optmal samplng desgn The R User Conference 2008 August 2-4, Technsche Unverstät Dortmund, Germany Marco Balln, Gulo Barcarol Isttuto Nazonale d Statstca (ISTAT)

Defnton of the problem () In a survey, the optmalty of a stratfed sample can be defned n terms of both the followng elements: total cost (unt cost per ntervew, product the sample sze); planned accuracy (epected samplng varance related to target estmates). A sample desgn s acceptable f epected samplng errors are below pre-defned lmts, and costs are sustanable.

Defnton of the problem (2) Bethel (985) proposed an algorthm allowng to determne total sample sze and allocaton of unts n strata, so to mnmse costs under the constrants of defned precson levels of estmates, n the multvarate case (more than one estmate). Under ths approach, populaton stratfcaton,.e. the partton of the samplng frame obtaned by cross-classfyng unts by means of stratfcaton varables, s gven. But stratfcaton has a great mpact on samplng varance and, n general, t should not be consdered as gven, but determned on the bass of the survey requrements.

Defnton of the problem (3) Our proposal s: gven a populaton frame, wth p X aulary varables, and a sample survey, wth specfc constrants on the accuracy of g Y target varables, then jontly determne: 2. the best stratfcaton (partton by means of aulary varables) of ths frame, and 3. the mnmum sample sze and allocaton of unts n strata, requred to satsfy constrants on estmates accuracy. Ths can be done by usng search technques (tree or genetc algorthm) to eplore the possble solutons,.e. the dfferent possble stratfcatons, that are evaluated by means of the Bethel algorthm.

Bethel algorthm The optmal multvarate allocaton problem can be defned as the search for the soluton of the mnmum (wth respect to n h ) of lnear functon C under the conve constrants V ( Yg ) U g g,..., G / nh f nh Bethel suggested that by ntroducng the varable h otherwse the problem s equvalent to search the mnmum of the conve functon C(,..., H ) under the set of lnear constrants H h N 2 h S 2 h, g h An algorthm, that s proved to converge to the soluton (f t ests), s provded by Bethel (and Chromy) by applyng Lagrange multplers method to ths problem. N h S 2 h, g U g

Optmal stratfcaton: the tree-based approach () The tree-based approach has been deated by Benedett, Espa, Lafratta: A tree-based approach to form strata n mult-purpose busness surveys, Dscusson Paper n.5/2005, Unverstà degl Stud d Trento. The proposed procedure searches the best stratfcaton by generatng a tree wth a splttng rule such that, at any gven level, the generatng node s chosen n such a way that the decrease of the overall sample sze from one level to the other, s mamsed.

Optmal stratfcaton: the tree-based approach (2) Gven p aulary varables n the frame, wth doman sets D { },..., m (,..., p) we can represent a soluton by means of a vector p of cardnalty M whose elements v j k m k X,..., X p can assume or 0 values. [ v v,..., v M If we set j ( m k ) + q k then we have v j f the q - th value of the - th varable s actvated 0 otherwse

Optmal stratfcaton: the tree-based approach (3) The tree-based algorthm s a sequence of four dfferent steps. Step 0 (ntalsaton): the node assocated to the stratfcaton charactersed by a unque stratum, concdng wth the whole populaton, s the root of the tree (level k 0), and s set as generatng node. Step : from the generatng node at level k, chld nodes of level (k+) are generated, by on turn actvatng a sngle value of the vector v [ v,..., v M among those not yet actvated..

Optmal stratfcaton: the tree-based approach (4) Step 2: at level (k+), the overall sample sze n s calculated wth the Bethel-Chromy algorthm for each node n the level. The node wth the mnmum n s set as generatng node. Step 3 ( stoppng rule): steps and 2 are repeated untl (c) the mamum acceptable number of strata has been reached (the actvaton of new values n X s domans ncreases the number of resultng strata) (d) the gan n terms of reducton of the overall sample sze becomes neglgble. Best soluton s then selected by consderng the one assocated to the generatng node of the prevous level.

Optmal stratfcaton: the tree-based approach (5) [,..., m [0,,0 Level 0 [,0,0, [0,..,,..,0 [0,..,,0 [0,0,, mn n [,0,0,, [,0,0, 0,, [,0,0, 0,, [,0,,0, mn n Level Level 2 [,0,0,,,0,0, mn n Level q

Optmal stratfcaton: the tree-based approach (6) Basc strata strata Tree Bethel Precson constrants on estmates Parameters of eecuton Soluton Output strata

Optmal stratfcaton: the evolutonary approach () The applcaton of the tree-based algorthm, prevously ntroduced, allows to obtan a (relatvely) fast soluton. Ths approach, however, may be subject to local mnma. It s therefore convenent to verfy (and possbly mprove) the resultng soluton by sequentally applyng a dfferent algorthm, whch s of the evolutonary type,.e. based on the genetc algorthm.

Optmal stratfcaton: the evolutonary approach (2) To be appled, a genetc algorthm requres two basc elements to be defned: a genetc representaton of the soluton doman; a ftness functon to evaluate each soluton. In our problem, each soluton can be represented by the v [ v,..., v M vector already ntroduced n the tree-based approach, that dentfes a partcular stratfcaton (partton) of the populaton frame. The ftness of any gven soluton s evaluated by means of the Bethel algorthm, and t s gven by the mnmum sample sze requred to satsfy precson constrants to samplng estmates.

Optmal stratfcaton: the evolutonary approach (3) The mplemented genetc algorthm makes use of genalg package (Wllghagen 2005), and s based on the followng steps. Step 0 (ntalsaton): an ntal set of t ndvduals (possble solutons) are randomly generated, possbly contanng (as a suggeston ) the soluton found by the tree-based approach; the ftness of each ndvdual s evaluated. Step : the net generaton of ndvduals s generated by selectng the fttest ones of the current generaton, and by applyng the genetc operators crossover and mutaton Step 2 (stoppng rule): step s terated k tmes, then the best soluton (the fttest,.e the one wth the mnmum sample sze) s outputted

Optmal stratfcaton: the evolutonary approach (4) crossover : gven two parents, a subset of chromosomes are echanged between them mutaton: gven the probablty that an arbtrary chromosome may change from ts orgnal state to another (mutaton chance), for each chromosome n an ndvdual, a random value s drawn n order to decde to change or not Mutaton s very mportant to decde the rapdty of the convergence: too rapd, rsk of local mnma

Optmal stratfcaton: the evolutonary approach (5) generaton j [,,0,...,0,,... [... [0,,0,...,,0,... [... [0,,0,...,,0,... [... [0,,0,...,,0,... [ m t m j m m s s s s selecton wth probablty proportonal to ftness [0,,0,...,,0,... [ [0,,0,...,,0,... [ m j m s s mutaton + crossover [0,,0,...,0,,... [... [,,0,...,,,... [... [0,,,...,,0,... [... [,,0,...,0,,... [ m t m j m m s s s s generaton j+

Optmal stratfcaton: the evolutonary approach (6) Tree-based soluton genalg package Basc strata nformaton strata Genalg Bethel Precson constrants on estmates Parameters of eecuton Soluton Output strata nformaton

An applcaton: the Italan Farm Structure Survey The samplng frame used for the selecton of FSS sample contans 2,53,70 farms, each one charactersed by the followng X varables: provnces (03 dfferent values); legal status (2 values); sector of economcal actvty (9 values); dmenson n terms of producton (3 values); dmenson n terms of agrcultural surface (3 values); dmenson n terms of owned cattle (3 values) altmetry class (5 values). 4 dfferent Y varables have been consdered as the man target of FSS, on whch requred precson (n terms of mamum coeffcent of varaton) has been fed at regonal levels (domans of nterest).

() Current sample sze (2) Tree-based soluton % dff. Itala 52,73 29,726-43.6 Pemonte 3,560,546-56.57 Valle d A. 409 384-6. Lombarda 5,25 2,237-56.35 Bolzano 687 540-9.94 Trento 667 638-4.35 Veneto 3,873 2,299-40.64 Frul V.G.,262 69-50.95 Lgura,327 777-4.45 Emla R. 3,7,966-36.93 Toscana 2,833,34-52.67 Umbra,363 858-37.05 Marche,88 508-57.24 Lazo 3,70 2,620-29.38 Abruzzo,222 950-22.26 Molse,83 867-26.7 Campana 3,63 2,54-3.90 Pugla 6,595 2,326-64.73 Baslcata 965 684-29.2 Calabra 2,846 2,080-26.9 Scla 5,0 3,82-36.50 Sardegna 2,607,40-36.50

(2) Tree-based soluton (3) evolutonary soluton % dff. Itala 29,726 28,955-2.59 Pemonte,546,546 0.00 Valle d A. 384 376-2.08 Lombarda 2,237 2,237 0.00 Bolzano 540 540 0.00 Trento 638 638 0.00 Veneto 2,299 2,38-7.00 Frul V.G 69 69 0.00 Lgura 777 657-5.44 Emla R.,966,933 -.68 Toscana,34,30-2.3 Umbra 858 858 0.00 Marche 508 498 -.97 Lazo 2,620 2,620 0.00 Abruzzo 950 876-7.79 Molse 867 79-7.07 Campana 2,54 2,040-5.29 Pugla 2,326 2,272-2.32 Baslcata 684 684 0.00 Calabra 2,080 2,072-0.38 Scla 3,82 3,82 0.00 Sardegna,40,40 0.00

Conclusons In a sample survey desgn, the jont adopton of a consoldated algorthm for determnng best sample sze and unts allocaton, together wth search technques, as tree-based and genetc algorthm, to eplore dfferent possble stratfcatons, can be very convenent n stuatons where many dfferent stratfcatons of a samplng frame are possble. A lmtaton of ths approach s n the constrant on the nature of aulary varables X, that must be categorcal. An open problem s n the treatment of contnuous X varables.