MODELLING IME OF UNEMPLOYMEN VIA COX PROPORIONAL MODEL Jan POPELKA - Department of Informatcs and Geonformatcs, Unversty of J.E.Purkyne, Horen 3, 4 96 Ust nad Labem, Czech Republc, an.popelka@uep.cz Abstract Factors nfluencng the tme of unemployment n the Prbram regon, Czech Republc, are analyzed. Analyss s based on data acqured from the Labor offce n Prbram. he data set ncludes all unemployed regstered n year 22. he follow-up perod s 3 months. As rght-censored data representng tmes of unemployment occur n the sample, semparametrc regresson models, namely Cox proportonal model s used. he tme of unemployment s modeled n dependence on age, sex, educaton, place of lvng, season of regstraton, state of health and martal status. he character of the dependence on age, the only contnuous explanatory varable, s examned. wo models of dfferent nfluence of age are ftted. Based on estmated hazard ratos the dfferences between varous groups of unemployed are descrbed. Survvorshp functon s estmated and graphs are evaluated for better representaton of dscovered dfferences. Key words: unemployment, censored data, hazard functon, hazard rato, Cox proportonal model, lkelhood functon, partal lkelhood functon, survval functon. Introducton he modelng approach to the analyss of survval data answers the queston, how the survval experence of a group of persons depends on the values of one or more explanatory varables, whose values have been recorded for each person at the tme orgn. he frst use of survval data analyss comes from medcal research. he smlarty between clncal and unemployment studes s the reason for applyng these methods for modelng the tme of unemployment. here are two man areas of smlarty there. he duraton of unemployment s as well as survval tme postvely skewed and censored. Censorng occurs due to the fact that some subects are not employed durng the follow-up perod or are lost to follow-up. hs means that the regstraton by Labor offce s canceled due to subect s own request or as sancton or the subect moves, returns to full-tme study, enters retrement or rases chldren. If the end of the tme s unknown, the tme of unemployment s rght-censored. he other data are non-censored. hs paper nvestgates whether there s evdence of duraton dependence n unemployment, and the role of personal characterstcs n explanng unemployment duraton. One obectve of ths paper s to estmate the parameters of the probablty of leavng unemployment through fndng a ob. he second goal s to estmate the survval functon to compare the length of unemployment for subects wth dfferent personal characterstcs. Analyses are based on data acqured from the Labor Offce n Prbram. Semparametrc regresson model he reason for usng semparametrc model for analyzng the censored survval tme data s to avod havng to completely specfy the hazard functon. he hazard functon s the probablty that an event occurs at tme t, condtonal on t has not occurred tll that tme. 25
ht () = lm δ t ( < + δ ) δt P t t t t he utlty of ths model stems from the fact that a reduced set of assumptons s needed to provde the hazard ratos formed from the coeffcents that are easly nterpreted. Concrete form of the hazard functon suggested Cox (972): ht (, x, β) = h()exp( t x β ) (2) o ft the regresson model to survval tme data the maxmum lkelhood approach s used. From the computaton pont of vew, t s more convenent to maxmze the logarthm of the lkelhood functon. Cox proposed the lkelhood functon dependng only on the parameters of nterest, n case that the error components of the model are not fully specfed (the dstrbuton of survval tme and error components s not known). he so called partal lkelhood functon s gven by followng expresson (assumng that there are no ted data): n x β x β l( β ) = e e. 2 (3) = R c In order to accommodate ted observatons the partal lkelhood functon (3) has to be modfed n some way. he approprate lkelhood functon n the presence of ted observatons has been gven by Kalbflesch and Prentce (98): L( ) n x β x β R β = exp te e exp( t) dt, 3 (4) = D here are number of approxmatons to the lkelhood functon whch have computatonal advantages over the exact method. he smplest approxmaton s that due to Breslow (974): n β x D ( ) β x L β = e e. 4 (5) = R d hs lkelhood s qute straghtforward to compute, and s an adequate approxmaton when the number of ted observatons at any one tme of re-employment s not too large. Efron (977) proposed: L( β) = n d = l = e β x e β D l d R D x e β x (). (6) hs s the closer approxmaton to the approprate lkelhood functon than that due to Breslow (974), although n practce, both approxmatons often gve smlar results. Unemployment n regon Prbram he analyss descrbed s a part of the proect: Analyss of factors nfluencng tme to reemployment n the Czech Republc supported by IGA 5. Analyss s based on data acqured 2 Where the summaton n the denomnator s over all subects n the rsk set at tme t, denoted by R. 3 D represents the subects wth survval tmes equal to t. 4 d denotes number of subects wth survval tme t. 5 Grant no. IG 443 26
from the Labor offce n Prbram. hs data set ncludes more observatons and characterstcs than the frst analyzed one on whch the models publshed n (Esser and Popelka, 23; Jarosova 23a, 23b; Jarosova, Mala, Popelka, 24) were based. he data set contans nformaton about all subects regstered by the Labor offce n 22. he follow-up perod s 3 months. It begns on st of January 22 and ends on 8 th of June 24. he sample nvolves 4275 unemployed. here are 272 females (5%) and 23 males (49%) n the sample. 39 observatons s rght censored. hese subects remaned unemployed at the end of the follow-up perod or were lost to follow-up durng the perod. 2966 subects exts to a ob durng the follow-up perod. Dstrbuton of tme of unemployment s postvely skewed (see Fgure ). he shortest length of unemployment s days, the longest 894 days. he mean tme s 45 days, medan tme s 93 days. 66 subects were unemployed for 3 days that s the most of all subects n vew. he mean age of unemployed s 33 years; the medan age s 3 years. Most of the unemployed were 9 years old, 243 of them. he youngest subect was 5, the oldest 6. Dstrbuton of age s postvely skewed (see Fgure 2). Frequency 6 4 2 8 6 4 2 - -2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 me of unemployment (days) Fgure Dstrbuton of tme of unemployment (uncensored data) Frequency 9 8 7 6 5 4 3 2 5-2 2-25 26-3 3-35 36-4 4-45 46-5 5-55 56-6 Age (years) Fgure 2 Dstrbuton of age of unemployed here are 248 (48%) unemployed wth secondary educaton wthout GCE, 35 (32 %) wth secondary educaton wth GCE and 8 (4 %) wth tertary educaton. Remnder of the subects s 697 (6 %) wth basc educaton. 2476 (58%) subects lve n towns 6, 88 (42%) n vllages. As shown n Fgure 3 and able the hghest nflow of unemployed was n autumn (35% of all regstratons), the lowest n wnter (7%). Detal vew show that the most subects were regstered by Labor offce n July, 6 Breznce, Dobrs, Novy Knn, Prbram, Rozmtal pod remsnem a Sedlcany (source: Czech Statstcal Offce) 27
January and September 22 (about 7%). here was a small nflow n February, March and Aprl (about 6%). Sprng 862 unemployed (2%) Summer 82 unemployed (28%) Wnter 744 unemployed (7%) Autumn 487 unemployed (35%) Fgure 3 Dstrbuton of unemployed by season of regstraton by the Labor offce n Prbram able Frequency table for months of regstraton by Labor offce n Prbram Month Frequency Relatve frequency Month Frequency Relatve frequency January 475,% July 486,37% February 269 6,29% August 346 8,9% March 244 5,7% September 449,5% Aprl 298 6,97% October 33 7,74% May 32 7,49% November 352 8,23% June 35 8,9% December 355 8,3% Wth respect to martal status, there are 855 (44%) marred subects or subects n commonlaw marrage n the sample. 242 (56%) unemployed are sngle, dvorced or wdowed. he last characterstc acqured s the state of health. hree levels are set off. Perfect - 382 (89%), dsabled 73 (4%) and subects wth full or partal dsablty penson 28 (7%). Age (denoted as AGE) s the only contnuous varable. As shown n prevous papers the relatonshp between tme of unemployment and age s not lnear (Jarosova 23a, 23b; Jarosova, Mala, Popelka, 24). One way to create more sutable model s to use a quadratc functon of age (new varable AGE^2). hs s the way to make provson for hypothess that chances for gettng new work are for very young and very old people lower than for subects n the md-age. hs model was publshed n the prevous study (Jarosova, Mala, Popelka, 24). Another way s to classfy the age nto ntervals as publshed n e.g. (Foley, 997). New varable denoted as AGEM s no longer contnuous. he length of nterval s 5 years, the number of ntervals s 9 as shown n Fgure 2. Parameters of both proposed Cox proportonal hazard models are estmated usng S-Plus 4.5 software. Because of ted data and large number of observatons the Effron approxmaton of the lkelhood functon was used to reduce the computatonal tme for estmaton of model parameters. o select the most sutable relatonshp between age and tme of unemployment the Akake nformaton crteron (AIC) 7 s used. ables 2 and 3 (lkelhood rato test) show, that model wth AGEM varable s preferable to the model wth quadratc relatonshp. Modfed outputs from S-PLUS show ables 4 and 5. 7 AIC 2 log Lˆ α q = +, where α s between 2 and 6 and q s number of model parameters. 28
Varable AGE n model able 2 Comparson of alternatve models Number of varables 2logLˆ AIC AGE+AGE^2 (quadratc relatonshp) 3 4489,9 4496,9 AGEM (ntervals) 9 44873,2 449,2 able 3 Comparson of alternatve models - lkelhood rato test Compared models G Df p-value 2 vs 7,88 6,7 able 4 Proportonal hazard model estmaton Varable Parameter Hazard 95% Hazard Rato Estmaton Rato Confdence Lmts SEX.M.2277**.225.35.322 AGEM (2-.32874**.389.23.568 AGEM (26-.57**.62..335 AGEM (3-.27294**.34.27.532 AGEM (36-.24667**.28.92.499 AGEM (4-.247.33.96.337 AGEM (46-.8754.9.926.286 AGEM (5- -.78.597.863 AGEM (56 -.3.29.443 EDU2.676**.854.643 2.9 EDU3.65555**.926.698 2.85 EDU4.7576** 2.46.679 2.492 SEASON2 -.24**.894.82.996 SEASON3 -.9577*.99.822.4 SEASON4 -.2567**.882.797.976 FAMILY.774.8.92.3 HEALH -.57.4.626 HEALH2 -.355.29.434 OWN -.893**.95.85.984 Notce: Parameter t-test (* P<., ** P<.5, *** P<.) able 5 Proportonal hazard model tests estng Global Null Hypothess: BEA= Ch- est Square DF Pr > ChSq Lkelhood Rato 623 9 <. Wald test 486 9 <. Effcent score test 52 9 <. All tests presented n able 5 prove the statstcal sgnfcance of selected model.e. at least one of estmated parameters s statstcally sgnfcant. Estmated hazard rato for SEX.M s statstcally sgnfcant. Men are found to experence sgnfcantly hgher chance for re-employment (.22 tmes greater). hs s n opposte wth 29
conclusons made n prevous study. here was no statstcal sgnfcant parameter for sex n the foregong model (Kalbflesch and Prentce, 98). A monotonc relatonshp between educaton and re-employment probablty exst. he hgher the level of educaton s the hgher s the hazard rato. Varable educaton has four levels: EDU subect wth no or basc educaton (basc level), EDU2 subect wth secondary educaton wthout GCE, EDU3 subect wth secondary educaton wth GCE, EDU4 subect wth tertary educaton. Subect wth tertary educaton has 2 tmes greater chance for extng to a ob as that wth basc educaton. o make a graphcal comparson see the estmated survval functons as a representaton of probablty of contnung the unemployment (see Fgure 6). he dfference s also n season of regstraton by Labor offce. he best chances for reemployment holds subect that entered unemployment n wnter (SEASON - December, January and February). he worst stuaton turns up n autumn (SEASON4 - September, October and November) and n sprng (SEASON2 - March, Aprl, May) as the hazard ratos are.882 and.894. As the hazard rato for SEASON3 (June, July and August) s.99, the chance n summer s. tmes smaller than n wnter. here s also hghly sgnfcant dfference between unemployed wth perfect state of health (HEALH) and dsabled (HEALH2) or subects wth full or partal dsablty penson (HEALH3) n the model (see Fgure 7). People from vllages experence. tmes hgher hazard for extng to a labor force n comparson wth people that lve n towns (varable OWN). he estmated model proclams that there s no relatonshp between martal status and the probablty of extng to a ob. Very nterestng conclusons provde the AGEM varable estmatons. he orgnal model (quadratc functon of age) respects the hypothess that chance of gettng new work for very young and very old people are lower than for subects n md-age. he hghest chance holds the 33 year old subect (see Fgure 4). However n ths paper presented model shows dfferent results. here s no dfference between people fewer than 2 and between 4 and 5 years of age. Very old people (above 5) hold the worst poston from all. here s a strong smlarty n nterval from 2 to 4 years of age, the chance of re-employment s.3 tmes greater than n group under 2 (see Fgure 5). Dfferent conclusons that results from both models represent the severty of fndng approprate functonal form of age nfluence. Survval functon estmaton Although there s no nformaton about the form of dstrbuton of the tme of unemployment, t s possble to estmate the survval functon S(t,x,β) as a probablty that an event (extng to a ob) has not happened snce tme t. Followng the baselne survval functon (see Fgure 4 and 5) estmated by S-PLUS the survval functon for any subect wth any personal characterstcs can be estmated usng: [ o ] exp( x β ) l St ˆ(, x, β ˆ) = Sˆ () t. (7) o compare the probablty of contnung the unemployment for dfferent groups some graphcal representatons follow (Fgure 4 to 7). he lower n graph the survval lne s, the lower the probablty of stayng unemployed.e. the lower the duraton of unemployment s. he conclusons made from these representatons comply wth those made n prevous chapter. 2
Estmated survval functon,9,8,7,6,5,4,3,2, Age 2 Age 33 Age 54 Baselne survval functon (age ) 2 3 4 5 me of unemployment(days) Fgure 4 Contnuous age model. Estmated survval functon for female, basc educaton, regstered n wnter, perfect health condton, vllage, sngle. Dstncton by age. Age 2-25 Estmated survval functon,9,8,7,6,5,4,3,2, 2 3 4 5 me of unemployment (days) Age 3-35 Age 5-55 Baselne survval functon (Age 5-2 let) Fgure 5 Interval classfed age model. Estmated survval functon for female, basc educaton, regstered n wnter, perfect health condton, vllage, sngle. Dstncton by age. 2
Estmated survval functon,9,8,7,6,5,4,3,2, Basc educaton Secondary educaton wthout GCE Secondary educaton wth GCE ertary educaton 2 3 4 5 me of unemployment (days) Fgure 6 Interval classfed age model. Estmated survval functon for female, 33 years old, regstered n wnter, perfect health condton, vllage. Dstncton by level of educaton. Estmated survval functon,9,8,7,6,5,4,3,2, Perfect Dsabeled Full or partal dsablty penson 2 3 4 5 me of unemployment (days) Fgure 7 Interval classfed age model. Estmated survval functon for male, 33 years old, secondary educaton wth GCE, regstered n wnter, sngle, vllage. Dstncton by state of health. Concluson he paper attempts to complete and expand the foregong model that was based on shorter data set. he longer follow-up perod and larger number of data enables the mprovement n parameter estmaton. Expanson of examned varables brngs better explanng of the role of personal characterstcs. he model tres to show another way to descrbe the relatonshp between the length of unemployment and the age. Dfferent conclusons that results from dfferent models prove the demand of fndng approprate functonal form of age nfluence. hs part remans open for another research. he next research work should be orented on the Czech Republc as a complex. Fndng new factors or even revew of current wll be however very dffcult. It s not easy to obtan such a 22
wde data as here presented. he analyss of relatonshp between the probablty of reemployment and some personal characterstcs mght be not possble wthn the scope of ths paper. here are some factors that should be taken nto account n the next research. Because of wde dfference n unemployment rate n regons of Czech Republc, the nfluence of regonal dversfcaton should be also examned. References [] BRESLOW, N. (974) Covarance Analyss of Survval Data under the Proportonal Hazards Model, Internatonal Statstcal Revew 974, č.43 [2] COX, D.R. (972) Regresson Models and Lfe ables, Journal of the Royal Statstcal Socety, Seres B 972, č.34 [3] EFRON, B. (977) he Effcency of Cox s Lkelhood Functon for Censored Data, Journal of the Amercan Statstcal Assocaton 977, č.72 [4] ESSER, M., POPELKA, J. (23) Analyss of Factors Influencng me of Unemployment Usng Survval me Analyss, Zborník 2. medznárodného semnára Výpočtová štatstka, SŠDS, Bratslava 23 [5] FOLEY, M.C. (997) Determnants of Unemployment Duraton n Russa, Center Dscusson Paper, Yale Unversty 997, č. 779 [6] HOSMER, D.W., LEMESHOW, S. (999) Appled Survval Analyss, J.Wley & Sons, N.Y. 999 [7] JAROŠOVÁ, E. (23) Analyss of Interval Censored Data, Unversta Matea Bela, Banská Bystrca 23 [8] JAROŠOVÁ, E. (23) Explorng the Functonal Form of Covarates n Cox Model, Zborník 2. medznárodného semnára Výpočtová štatstka, SŠDS, Bratslava 23 [9] JAROŠOVÁ, E., MALÁ, I., POPELKA, J. (24) Modellng tme of unemployment va log-locaton-scale model, COMPSA 24 [CD-ROM], Praha 24 [] KALBFLEISCH, J.D., PRENICE, R.L. (98) he Statstcal Analyss of Falure me Data, Wley, N.Y. 98 [] POPELKA, J. (24) Analýza faktorů ovlvňuících délku doby nezaměstnanost využtím metod analýzy přežtí, Sborník prací účastníků vědeckého semnáře doktorského studa Fakulty nformatky a statstky VŠE v Praze, Praha 24 23