Survival Analysis in Patients with Dengue Hemorrhagic Fever (DHF) Using Cox Proportional Hazard Regression

Indonesia is a tropical country that has two seasons: the rainy season and dry season. In the rainy season frequent flooding or puddles of water that could become mosquito breeding and the spread of various diseases, one of which is the dengue fever. Dengue Hemorrhagic Fever (DHF) is the cause of public health problems with a very rapid deployment and can lead to death within a short time. This causes dengue become one of the attractions to be investigated further. This study discusses the survival analysis and the factors that affect the healing rate of dengue patients using Cox proportional hazard regression based on data from the medical records of hospitalized dengue patients at the Jember Klinik Hospital. The results showed that the factors of age, gender, hemoglobin, trombonist, and hematocrit affect the healing rate of DHF patients.


I. INTRODUCTION
Indonesia is a tropical country that has two seasons, namely the rainy season and the dry season. In the rainy season, for various reasons, there are floods or puddles that can become a mosquito breeding and spread various diseases, one of which is dengue fever disease. Dengue Haemorrhagic Fever (DHF) is an acute febrile illness caused by dengue virus that enters the bloodstream through the bite of Aedes Aegypti mosquito (Wikipedia, 2012), which spreads very quickly and can result in death in a short time. Dengue Hemorrhagic Fever (DHF) is one of the diseases that almost always cause public health problems and the number is always there, even tended to increase. It is known that since the first DBD appeared in Indonesia, precisely in Surabaya in 1968, it quickly spread to other areas so that in 1980 all provinces in Indonesia have been infected with DHF (Darmowandowo, 2006). Indonesia is the country with the highest incidence of DHF in Southeast Asia since 1968-2009(WHO, 2009). This causes DBD to be one of the interesting objects to be studied and studied further, for example, to know the length of time to survive the DHF patients to recover. The application of statistical methods that can be used to analyze the case is survival analysis. Survival analysis is a statistical analysis that is specifically used to analyze data or cases related to the time or length of time until a particular event occurs. This survival analysis is usually used in the health field (Kleinbaum and Klein, 2012). Observational data for survival analysis are survival data, ie observation data about the time period from the beginning of observation until the occurrence of an event, the event could be death, healing or other symptoms (Lee, 1992). According to Collett (1994), survival data do not meet the statistical standard procedure requirements used in data analysis, since survival data is usually not symmetrically distributed. The histogram model of survival time in a group of individuals will tend to be skewed to the right, so it is possible that survival data is not only normally distributed. There are several other distributions that are commonly used in survival analysis, ie, exponential distributions, Weibull distributions and lognormal distributions. The distribution used in a survival analysis can be determined from the estimation of the survival data distribution with some statistical test methods, such as

A. Survival Analysis
The survival analysis or survival analysis is a timerelated data analysis, from the beginning to the occurrence of a specific event (Collett, 2003). Duration  ) may be a failure, death, relapse or  recovery from an illness, a response from an experiment,  or another event chosen according to the researcher's interest. The survival analysis has a special characteristic, namely the distribution of data in the form of long life time (skewed) right because the value will always be positive and the data is censored (Lee, 1992). 1. SurvivalTime Survival time can be defined as the time from the beginning of observation to the occurrence of events, can be in days, months, and years. Such events may be the development of a disease, the response to treatment, the recurrence of an illness, death or other event chosen in accordance with the interests of the researcher. Therefore, the time of survival can be the time of recovery from the disease, the time from start of treatment to the occurrence of response and time to death (Lee and Wang, 2003 2. CencoredData The difference between survival analysis and other statistical analysis is the presence of censored data. Censored data is recorded data when there is information about individual survival times, but does not know the exact time of survival (Kleinbaum & Klein, 2012). Censorship is one of the steps that must be taken to overcome the incompleteness of an observation data. The data is said to be censored if the data can not be observed completely because the research subject is lost or resigned or until the end of the research the subject has not experienced a certain incident, while the data can be observed completely until the end of research called unencensored data (Lee & Wang, 2003). The causes of censored data are: a. Loss to follow up, occurs when objects move, dies or b. refuses toparticipate.Drop out, occurs when the treatment is stopped c. for somereason.Termination of study, occurs when the study period ends intermediate object observed has not reached the failureevent. The 3 (three) types of sensors used in survival analysis (Collet, 1994) are as follows: a. RightSensorSensors that occur when a failure event has not occurred until the end of the study. b. LeftSensorSensors that occur when a failure event occurs before the research begins. c. IntervalSensorSensors that if termination in data collection and event failure occur between these time intervals. 3

. SurvivalFunction and HazardFunction
The function of survival and hazard function is a fundamental function in survival analysis. In theory, the survival function can be described with a smooth curve and has the following characteristics (Kleinbaum & Klein, 2012): 1. Not increase, curve tends to decrease when tincreases 2. For t = 0, S (t) = S (0) = 1 is the beginning of the study, since no object has an event, the probability of survival time 0 is 1 3. For t = ∞, S (t) = S (∞) = 0 theoretically, if the period of study increases without limit then none survives so that the survival curve approaches zero. Fig.2 (Kleinbaum & Klein, 2005) The survival function is essential in survival analysis, since there is a survival probability for various t values that is important information from survival data. The survival function is used to represent the individual probability of surviving from the initial time to some time. The survival function, S(t), is defined as the probability that an individual survives greater than t time (Le, 1997), thus:

.2: Survival Function Curve
In contrast to survival functions that focus on nonoccurrence of events, hazard functions focus on the occurrence of events. Therefore, hazard function can be viewed as the information giver that is opposite to the survival function. Similar to the survival survival curve, the hazard function curve also has characteristics, namely (Kleinbaum and Klein, 2012): 1. Always nonnegative, ie equal to or greater thanzero 2. Has no upper limit 3. In addition the hazard function is also used for reasons: 4. Giving an overview of the failurerate 5. Identify the specific model shape 6. Creating a mathematical model for survival analysis is usually written in the form of a hazardfunction The hazard function h(t) is the probability of a person failing after a given time unit, which is the opposite of the survival function S(t). The hazard formula can be interpreted as the probability of occurrence at a time interval between t and t + Δt where the survival time T is greater than or equal to t.
In other words, the hazard function h(t) estimates the proportion of deaths of individuals or individuals experiencing an event in time t (Kleinbaum and Klein, 2012). When the hazard function is always constant, itwill get a constant-risk model (exponential). The following is the functional relationship between cumulative hazard function, H(t), and survival function, S(t) (Le, 1997) are:

B. Kaplan Meier
The purpose of survival analysis is to estimate and interpret survival function. In this research, the method used is Kaplan-Meier method. Kaplan-Meier method is a technical type of survival analysis that is often used. This product is often referred to as product limit method, the method is not made a certain interval and the effect is calculated exactly when it happens. The length of each subject's observations is composed of the shortest to the longest, with the censored records included or calculated, this is considered to be proportional to numerical measurements. This research is nonparametric statistical research with censored data, so use Kaplan-Meier's method is the best.
Actually, a life-table method is the same as Kaplan-Meier, but in the life-table object is classified based on certain characteristics which each character is arranged with interval by considering the chance of effect during the interval period is constant so that the data obtained will be more general. While the Kaplan-Meier method is analyzed according to their original time. This results in a definite proportion of survival because it uses precise time survival in order to obtain more accurate data. In addition, Kaplan-Meier is a method used when no model is feasible for survival data (Sari, 2011).

Log Rank Test
The log rank test is a significance test for comparing survival functions between the two groups. This test is a nonparametric statistic test and is suitable to be used when data is not symmetrical ie data tilted to the right. In addition, the Log Rank test is widely used in clinical trials to look at the efficiency of a new treatment compared to the old treatment when measured is the time until an event occurs. To calculate the log rank there are several stages,namely: a. Count the number of risky subjects in each group at the time of failure( ). b. Calculates the number of subjects who experienced events in each group at the time of failure( ). c. Calculates the number of subjects who experienced the expected event for each group at the time of failure ( ).
With :

C. Cox Regression (Cox Proportional Hazard Regression)
The function of survival and hazard function is an analysis used to see the difference of 2 or more groups. However, if there are covariate variables that want to be controlled or if using some explanatory variables in explaining the relationship between survival time then cox regression is used. Thus cox regression can be used to create a model that describes the relationship between time of survival as a dependent variable with a set of independent variables. This independent variable can be either continuous or categorical. Cox proportional hazard is the model used in survival analysis which is a semiparametric model. Cox proportional hazard regression is used when the observed outcame is the length of time of an event. Initially this modeling was used in the branch of statistics, especially biostatistika, which is used to analyze the death or life expectancy of a person. But over the development of the modeling era is widely used in various fields, including academic, medical, social, science, engineering, agriculture and so on (Sari, 2011). When investigating a case in medicine for example the case of a patient with a particular disease, a relationship between the patient's survival time and the clinical characteristics of the patient's medical data is required. By denoting the average hazard function h0 (t) can be determined hazard h(t) of a given patient, by: ℎ( ) = ( )ℎ 0 ( ) (2.9) The cox model formula is the multiplication of two magnitudes, the baseline hazard function and the exponential form for the linear summation of βiXi, ie the sum of the independent variables X (Kleinbaum and Klein, 2012). The Bazeline hazard function is the hazard rate when it is an unknown function because thedistribution of survival time (T) is unknown. This function is time-dependent only and does not contain X. This exponential quantity depends only on X called time independent covariate. This is because X does not depend on time, so X is called an independent time covariate. However, when X is time dependent, a different method is needed to model the hazard.
In the general regression model hazard function h depends on t and dependent covariates X1, X2,..., Xm(t). And on a simple proportional hazard cox model, with the covariates X1, X2, ..., Xmis independent of t the hazard function is as follows: ℎ( , 1 , … , , β 1 , 2 , … , ) = ℎ 0 ( )exp { 1 1 + 1 1 + } (2.10) ℎ( ) = ℎ 0 ( ) { 1 1 + 1 1 + } = ℎ 0 ( ) exp{ ′ } (2.11) The most specific of these formulas is the proportional hazard assumption of baseline hazard is a function of t but does not involve the variable X. Unlike the exponential form involving variable X but does not involve t. X is said to be time independent (independent of time). The assumption on the proportional hazard cox model is the hazard ratio that compares the two categories of independent variables is constant at all times or is independent of time. If this assumption is not met or time dependent (X depends on time) then the model used is extended cox model. Another important characteristic of the cox model is that the baseline hazard, h0(t), is an unspecified function. This is what makes cox proportional hazard a parametric model. The cox proportional hazard model is a well-known model for survival analysis. According to Kleinbaum and Klein (2012) the causes of this model are well known and widely used include: 1. If each strata of the tested variable is parallel, then the assumption is satisfied. If it is not parallel then the proportional assumption is not met.

Using time dependent variable in extended cox model
The trick is to make the interaction between independent variables with survival time then see the significance value. The proportional assumption is met when the value 3. Using goodness of fittest. The trick is to look at the value of p (Chi-square). If the value then the proportional assumption is met. These three ways have advantages and disadvantages, for that a researcher should use at least two ways to test the proportional assumptions. Candidate variable included in the interaction test is the independent variable that influences the survival time ( p 0,25). Next test the interaction between independent variables by using likelihood ratio test. If thevalue included in the model. Cox (Cox Proportional Hazard) regression model is: ℎ( ) = ℎ 0 ( ) { 1 1 + 1 1 + } (2.12) If the assumption is not met then the model used cox regression is recommended with time dependent covariate or extended cox model and can also use cox stratification model. D. Dengue Hemorrhagic Fever (DHF) Dengue Hemorrhagic Fever (DHF), a medical language called Dengue Hemorrhagic Fever (DHF), is a disease caused by dengue virus that is transmitted through the bite of Aedes Aegypti and Aedes Albopictus mosquitoes, which causes disruption of the capillary blood vessels and the blood clotting system, resulting in bleeding. The disease is found in many tropical regions Hematocrit variable that is the percentage of hematocrit of DHF patients during undergoing inpatient at Jember Clinic Hospital.

C. ResearchSteps
The steps taken in this research are as follows: 1. Conducting a descriptive statistical analysis to determine the characteristics of DHFpatients. 2. Make a Kaplan Meier Curve and perform a Log-Rank test on a categorical independent variable(predictor). 3. Conduct analysis of factors that affect the healing rate of DHFpatients.

IV. RESULTS AND DISCUSSION Descriptive StatisticsAnalysis
The following is the result of descriptive statistical analysis on the characteristics of DHF patients during undergoing inpatient at Jember Clinic Hospital as many as 100 patients.
a. Survival Time, Age, Hemoglobin, Platelets andHematocrit Descriptive statistical analysis of DHF patients in RSP Jember Clinic for survival time, age, hemoglobin, platelets and hematocrit are presented in Table 4.1. Especially for hemoglobin, thrombocyte and hematocrit are divided into two, namely at the time of admission and during the last medical record of each DHF patient.

International Journal of Advanced Engineering Research and Science (IJAERS)
[     Figure 4.1 informs visually that the longer DHF patients undergo inpatient (t), the probability of a DHF patient not recovering (not undergoing clinical improvement) until the time t is less close to zero, meaning the longer the patient is hospitalized (the longer Get medical treatment) then the greater the probability of patients to recover (clinical improvement). Description of probability of healing DHF patients can be see in Table  4.2.   Figure 4.2 shows a dotted line (black) more dominating above a straight line (red) indicating that the probability of not recovering female patients is greater than that of male patients, meaning that the survival time of the male gender is better than patients with female gender. The next step is to do a log-rank test to find out the differences between the survival probability curves in Figure 4.2. From the log-rank test results obtained pvalue value of 0,604. When compared with chi-square of 0.3 with degrees of freedom 1, it shows that the probability of survival of DBD patients of either male or female sex does not differ significant.

Factors Affecting the Healing Rate of DHFPatients
To know what factors influence the rate of healing of DHF patients, then modeling between response variables (survival time) and predictor variables used. Modeling with all predictor variables is shown in Table  4.3.   V. CONCLUSION Descriptive statistical analysis showed that from 100 DHF patients admitted to RSP Jember Klinik, the average clinical condition improved within 3 days, with 53 patients belonging to productive age group (15-59 years) and 58 male sexually transmitted Men. At the time of admission was 83 patients whose platelet count was below normal (less than 150,000 / mm3). While during the last medical record there was a decrease in the number of patients with under-normal platelets to 64 patients. The mean hemoglobin and thrombocyte levels of DHF patients at baseline were 13.55 g / dL and 41.02% respectively, while during the last medical record the mean hemoglobin and platelet counts were 12.94 grams / DL and 39.28%. While the factors that significantly influence the rate of improvement of clinical condition of DHF patients are age andplatelets.