Factors affecting life expectancy in Kazakhstan

panzabekova.aksana@ieconom.kz ABSTRACT Relevance. Life expectancy is a comprehensive indicator reflecting the quality of life in a country or region, which is why it is important to estimate the impact of various socio-economic factors on this indicator as accurately as possible. Our study makes a novel contribution to the existing research by conducting a correlation and regression analysis of factors affecting life expectancy in regions of Kazakhstan based on panel data. Research objective. This paper aims to present a modified methodology for estimation of factors affecting life expectancy in regions of Kazakhstan. Data and methods. Our research relies on panel data on regions and cities of Kazakhstan. The data are provided by the Ministry of National Economy and the Ministry of Health Care of the Republic of Kazakhstan. Methodologically, the research is based on regression and correlation analysis. The two main criteria were applied for data selection: availability of statistical data for a sufficiently long period and the potential impact of factors on life expectancy. We built a two-factor power regression model calculated with the help of software package Microsoft Excel. Results. In our research, regression models were used to formulate conclusions concerning the impact of certain socio-eco-nomic factors on life expectancy in regions of Kazakhstan. We also brought to light the factors whose relationship to life expectancy requires further investi-gation. Conclusions. It was found that the most significant factors affecting life expectancy in regions of Kazakhstan are economic ones. The proposed methodology can be used for short- and medium-term predictions of life


Introduction
Life expectancy is one of the key indicators of the quality of life, encompassing a multitude of different factors. Analysis of these factors is thus necessary to bring to light those with the strongest impact and devise state policies to improve the quality of life in the country. The extent of this or that factor's influence may vary across regions, which means that such studies may need to analyze regional data.
Our research is aimed at analyzing the relationships between life expectancy and a set of social, economic and medical factors in regions of Kazakhstan. We intended to explore the connections between specific factors and life expectancy, factors peculiar to specific regions and possibilities of forecasting life expectancy by building models that would take into consideration the most important factors. To this end, we modified the methodology proposed by Molchanova and Kruchek (2013) to make it suitable for evaluation of factors affecting life expectancy in Kazakhstan.
Our research objectives are as follows: -to select indicators that may affect life expectancy; -to reveal the relationships between these factors and life expectancy in specific regions of Kazakhstan; -to identify the factors with the strongest influence for each region; -to formulate recommendations for extending life expectancy.
To address these goals, we selected indicators that may influence life expectancy in regions of Kazakhstan. Furthermore, for each region and for each indicator we calculated coefficients of their correlation with life expectancy, thus excluding the indicators with coefficients below the threshold value. To identify the factors with the strongest influence on life expectancy we built two-factor power regression models based on the Cobb-Douglas production function. The results of our regression and correlation analysis have led us to formulate recommendations concerning measures to increase life expectancy.

Literature review
Factors affecting life expectancy have gained much attention on the part of social scholars and economists, who sought to elaborate guidelines for improvement of relevant state policies.
A variety of methods is used to select the factors for analysis and to estimate their impact on life expectancy. There are numerous studies using different models to analyze the impact of environmental and medical factors on life expectancy. For example, Leliveld et al. (2020)  Shabunova, Rybakova and Tikhomirova from the Institute of Socio-Economic Studies of Population of the Russian Academy of Sciences (ISESP RAS) concentrated on the case of Vologda region to evaluate the influence of a set of factors on life expectancy by applying the correlation analysis method (2009). These estimates were compared with those of 'subjective health' in relation to different socio-economic criteria, leading to conclusions about the public health and dynamics of the quality of life in Vologda region.
Timashev, Voronina and Makarova considered the notion of average life expectancy and evaluated the impact of infrastructure factors on its dynamics (2013). They also proposed their own approach based on correlation and regression analysis to evaluating quantitative correlations between these indicators. Novoselova (2016) studied the main factors of life expectancy in big cities by looking at regional differences in the dynamics of socio-economic indicators, in particular health care. As a result, the key factors that have a negative influence on the growth of life expectancy in Moscow were identified.
In general, our review of the research literature shows that the vast majority of the above-mentioned and similar studies (Kabanov, 2015;Andrianov, 2019;Kulak, 2016;Zvezdina & Ivanova, 2015) use mathematical methods to estimate the impact of various factors on life expectancy. Such choice of methods can be explained by the fact that they allow researchers to analyze large amounts of data and reveal hidden patterns. These studies put the main emphasis on medical factors such as morbidity rates of diseases, habits and lifestyle (for example, alcohol consumption habits), and cause-specific mortality. There are also studies analyzing the impact of environmental and economic factors. All of the above influenced the structure and logic of this study.
Our review of the contemporary research on life expectancy in Kazakhstan has shown that such studies use a limited range of methods. Thus, so far no evaluations of the factors affecting life expectancy in Kazakhstan have been made that would be based on correlation and regression analysis of data by region. Our study aims to address this research gap.

Methodology
Our analysis of the factors affecting life expectancy in regions of Kazakhstan relies on the methodology developed by Russian scholars Molchanova and Kruchek (2013). Several adjustments were made to adapt certain statistical indicators for evaluation of the factors shaping life expectancy. In particular we decided not to apply the 'correlation pleiad' method since we did not intend to look for the relationships between all the factors but instead wanted to focus on the influence of specific factors on just one indicator -life expectancy. Furthermore, the data we had for the given time period led us to choose indicators that differed from those included in the original methodology. The regression model was calculated by applying the least squares method for linear regression, we used logarithmic indicator values to transform the results into a power function.
Our research methodology relies on the following: 1) a set of socio-economic, environmental and medical indicators that formed the primary set of factors based on panel (longitudinal) data; R-ECONOMY, 2020, 6(4), 261-270 doi: 10.15826/recon.2020.6.4.023 Online ISSN 2412-0731 2) calculation of correlation coefficients to select the main factors affecting life expectancy; 3) the use of regression models for evaluation of factors affecting life expectancy.
The main sources of data were the web-sites of the Committee on Statistics of the Ministry of National Economy of the Republic of Kazakhstan 1 , information service of the Committee on the Legal Statistics and Special Accounts of the State Office of Public Prosecutor of the Republic of Kazakhstan 2 , and the Republican Center for Health Development 3 . The key criterion for data selection was their availability for the given time period -from 2001 to 2018. For the sake of data homogeneity, we considered the statistics for Turkestan region and the city of Shymkent as one indicator since these two regions used to be a part of South Kazakhstan region.

Results
In Kazakhstan, life expectancy increased by 7.35 years between 2001 to 2018 -from 65.80 in 2001 to 73.15 in 2018. This indicator varies across regions, with the maximum value in the city of Nur Sultan -76.21 years in 2018 -and the minimum in North Kazakhstan region -71.14.
As Table 1 illustrates, life expectancy in Nur-Sultan and Almaty shows the highest positive deviation from the mean while life expectancy in North Kazakhstan, Akmola and Karaganda regions shows the highest negative deviation.
Our primary set of factors comprises 13 factors with correlation coefficient values above the threshold of 0.7, that is, these are the factors showing a strong correlation with life expectancy. It should be noted that in different regions of Kazakhstan correlation coefficients for the same factor could be different and in some cases this or that factor had to be excluded from the set of factors that affected life expectancy in a particular region while in other regions this factor remained important. Moreover, the factors were evaluated for multicollinearity. In some cases, factors with direct or functional relationships were removed, in others, we kept some of the factors (e.g. poverty and unemployment) to ensure the quality of our regression models. There are, however, no regression models with overlapping collinear factors, that is, the quality of the models did not suffer. For easier software data processing, each factor was coded the following way: NMI, the nominal monetary income; SM, subsistence minimum; NISM, ratio of nominal income to subsistence minimum; HW, the number of health workers; DpM, the number of divorces per 1,000 marriages; P, poverty level; U, the rate of unemployment; CMR, the rate of cancer morbidity; BD, the rate of blood diseases; MD, the rate of substance-induced mental disorders; CSD, morbidity rates of circulatory system diseases; RD, respiratory disease morbidity rate; and CR, crime rates.
https://journals.urfu.ru/index.php/r-economy Online ISSN 2412-0731 For each of these factors we calculated correlation coefficients for each specific region and then ranked the factors in descending order depending on their their correlation coefficients in modulus (that is, regardless of the coefficient's positive or negative sign). At this stage we excluded the factors whose correlation coefficients in modulus did not exceed 0.7. Thus we were able to identify the factors that have the most impact on life expectancy in each region.
The factors were grouped by macro-region to identify those that occur most frequently and have a high correlation. The results are shown in Table 2.
As Table 2 shows, such socio-economic indicators as nominal per capita income, subsistence minimum, poverty level, and unemployment correlate with life expectancy in all the regions of Kazakhstan.
In many regions, there are correlations between such indicators as the ratio of divorces to marriages, the ratio of nominal income to subsistence minimum, and the rate of substance-induced mental disorders. Such regions as the North, East and Centre of Kazakhstan have a high incidence of cancer and blood disorders. We compiled ranking tables (Tables 3-6) to show which factors were the most important for which region. The factors for each region are arranged in descending order of the regression coefficients in modulus.  Note: 'correlation coefficient' shows the correlation between the factor and the region's life expectancy. The same codes were used to denote the factors as in Table 2.
R-ECONOMY, 2020, 6(4), 261-270 doi: 10.15826/recon.2020.6.4.023 Online ISSN 2412-0731 Our analysis shows that different regions may have a different relationship between certain factors and life expectancy: sometimes this or that factor may have a strong correlation with life expectancy while in other regions it would not even reach the threshold value of 0.7.
The picture in some regions is quite unusual. For example, in West Kazakhstan, the level of crime has a positive correlation with life expectancy while in Jambyl and Pavlodar regions, life expectancy is positively correlated with cancer morbidity. In East Kazakhstan, life expectancy has a positive correlation with the morbidity rate of circulatory system diseases. In such cases we may suppose that there is a factor that was left unaccounted for in the analysis. In other words, there might be a factor that has a positive influence on life expectancy and at the same time on the above-mentioned factors. For instance, income growth (which, as we see, often has a positive impact on life expectancy) can result in people consuming more unhealthy food and thus lead to an increase in the incidence of a disease. Such connections make no socio-economic or medical sense, which is why they were excluded from our regression analysis. However, they can become a subject of further research.
In different regions the same indicator may exert a directly opposite influence. For example, quite expectedly, the level of unemployment in Kyzylorda region has an inverse relationship with life expectancy: a drop in unemployment causes a rise in life expectancy and vice versa. At the same time in Atyrau region, these two indicators have Note: 'correlation coefficient' shows the correlation between the factor and the region's life expectancy.
The same codes were used to denote the factors as in Table 2.

Turkestan region and Shymkent
Correlation coefficient a direct relationship, which does not make much sense. In such situations the intervening variable should be excluded from the regression analysis since this variable itself can be influenced by other factors that are not considered in this study due to the absence of data or for other reasons. For each region we built a regression model based on the modified Cobb-Douglas production function [ i ]:

Nur-Sultan Correlation coefficient
where Y is the value of life expectancy in the given region; b 0 , b 1 , b 2 are the regression coefficients calculated with the help of the least squares method for the logarithms of factors; and X 1 и X 2 are independent variables.
For each region, independent variables were chosen by the forward selection method. At the first stage, we selected the variable that had the strongest correlation with life expectancy. After that, we calculated Fisher's f-statistic for the resulting model to estimate its significance: if the model is significant, one more variable is added and the F-statistic is calculated. For each variable, Student's t-distribution was computed to assess its significance for the model (Kabanov, 2015). Afterwards, out of all the possible combinations we chose the one that generated the best model.
For each region, the number of observations was 18 while to support one independent variable, 7-10 observations are needed. Thus, our regression model could not include more than two independent variables. Therefore, for each region we built a two-factor regression model (see Table 7). The only exception is the city of Nur-Sultan, for which we have not found a second factor that would improve the quality of the model and have acceptable estimates of the model's quality. The factors that occur most frequently in regression models are as follows: the number of divorces per 1,000 marriages, 9 times; substance-induced mental disorders, 7 times; nominal per capita income, 5 times; subsistence minimum, 5 times; unemployment, twice; number of health workers, once; morbidity rates of blood diseases, once, and morbidity rates of circulatory system diseases, once.
Thus, we can conclude that the most significant factors in terms of scope are economic ones such as income and subsistence minimum. Among other factors that influence life expectancy are the demographic ones, such as the ratio of divorces to marriages, and medical ones, especially those related to mental health such as the rate of substance-induced mental disorders. To forecast life expectancy in relation to these factors we used regression power equations shown in Table 8.

Conclusions
To evaluate the factors affecting life expectancy in regions of Kazakhstan, we used a modified methodology. To investigate the relationship between life expectancy and socio-economic factors in Kazakhstan, we selected a set of indicators and calculated correlation coefficients for each region and each indicator. Indicators with coefficients below the threshold were excluded. Factors with the strongest influence on life expectancy were selected by applying two-factor power regression models based on the Cobb-Douglas production function. As a result, we have found the most significant factors affecting life expectancy in Kazakhstan and built models for short-and midterm forecasting of life expectancy.
Our calculations have led us to the conclusion that economic factors have the strongest influence on life expectancy. These factors determine financial well-being of people in Kazakhstan and, therefore, correlate with the overall quality of life, which includes housing conditions, food, opportunities for recreation, access to medical and educational services.
Our regression models of life expectancy often include such indicators as the number of divorces per 1,000 marriages and the rate of substance-induced mental disorders. In some regions health-related indicators come to the fore. It can be supposed that after a certain level of socio-economic development is reached, other factors related to the quality of life and life expectancy start to gain prominence. Forecasting such developments can prove useful for strategy-and policy-makers aiming to extend life expectancy in the long term.