Prepare data for further usage
prepare_data.Rd
The function filters large claims and adds columns that are needed for generating pools and for sampling.
It summarises seven subfunctions, see details for detailed description of parameters and subfunctions.
Usage
prepare_data(
claims_data,
indices,
threshold,
first_orig_year,
last_orig_year,
expected_year_of_growing_large = 3,
reserve_classes,
pool_of_annuities = NULL
)
Arguments
- claims_data
dataframe, see details.
- indices
dataframe, see details.
- threshold
numeric value, see details.
- first_orig_year
integer or numeric value, see details.
- last_orig_year
integer or numeric value, see details.
- expected_year_of_growing_large
integer or numeric value, see details.
- reserve_classes
numeric vector, see details.
- pool_of_annuities
dataframe, see details.
Details
Data preparation before applying sicr
is done in seven steps. The subfunctions for each step can be used separately.
This allows users to skip steps in order to use own implementations instead.
Description of needed input data
first_orig_year
Integer value for the first origin year for which a full claim history is available.
Information from older claims for which history is only available fromfirst_orig_year
to today may though be considered for the pools, seeclaims_data
.last_orig_year
Integer value for the last origin year which will also be treated as the last calendar year, usually the calendar year just ended.claims_data
For each claim that could exceed the threshold after indexation this dataframe must contain one row for every calendar year in which at least one ofCl_reserve
,An_reserve
,Cl_payment_cal
orAn_payment_cal
is not equal to 0.
To consider older claims for which history is only available fromfirst_orig_year
until now, add one row for this claim for calendar yearfirst_orig_year - 1
and fillCl_payment_cal
andAn_payment_cal
with the cumulated payments up to that year andCl_reserve
andAn_reserve
with the reserve at the end offirst_orig_year - 1
. This allows for checking if a claim belongs to large or small claims, but due to the missing first part of the history the Dev_year_of_growing_large (see step 5) can't be determined for these claims. Dev_year_of_growing_large has to be estimated instead (see parameter expected_year_of_growing_large).
claims_data must consist of the following columns:Claim_id
type: character. Each claim must be assigned a unique claim id. These must correspond to the claim id's in the dataframepool_of_annuities
.Origin_year
type: integer or numeric. Origin year of the claim.Calendar_year
type: integer or numeric. Calendar year of payments and reserves of the claim.Cl_payment_cal
type: numeric. Claim payments made in this calendar year without annuity payments.Cl_reserve
type: numeric. Claim reserve at the end of this calendar year without annuity reserves.An_payment_cal
type: numeric. Annuity payments made in this calendar year.An_reserve
type: numeric. Annuity reserve at the end of this calendar year.
pool_of_annuities
This dataframe contains one row for each annuity that has ever been agreed on regardless of whether the annuity is still active or not.pool_of_annuities
must consist of the following columns:Claim_id
type: character. Each claim must be assigned a unique claim id. These must correspond to the claim id's in the dataframeclaims_data
.Annuity_id
type: character. Annuity ID for the case of multiple annuities in one claim.Origin_year
type: integer or numeric. Origin year of the claim.Calendar_year
type: integer or numeric. Calendar year of information.Entering_year
type: integer or numeric. Year in which insurer and recipient have agreed on the annuity.Annuity_start
type: integer or numeric. Year in which payment starts, may be a past or a future year.Annuity_end
type: integer or numeric. Year in which payment ends, may be a past or a future year.Birth_year
type: integer or numeric. Birth year of recipient for assigning survival probability from mortality tables.Gender
type: character "m" or "f". Gender of recipient for assigning mortality table.Annual_payment
type: numeric. Agreed annual payment.Dynamic
type: numeric. If a dynamic increase of the annual payment is part of the agreement, it can be specified here, e.g. 0.02 for 2% annual payment increasement.
indices
This dataframe contains one row for each historic calendar year and 250 future calendar years.
Although the indices for the future calendar years (as well as the columnIndex_re
) are not needed here but in the subsequent simulation, it is required to create all rows and all columns of this dataframe just once with the helper functions.expand_historic_indices()
andadd_transition_factor()
.indices
must consist of the following columns:Calendar_year
type: integer or numeric.Index_gross
type: numeric. Claim payment development from one year to another, e.g. 0.02 for 2% increaseIndex_re
type: numeric. Contractually fixed claim payment development that is to be used in special index clauses that are a common part of longtail xl resinsurance programs.Transition_factor
type: numeric. The transition_factor for year j indexes a payment of year j to the niveau of payments in the fixed index_year.
threshold
Numeric value to separate small from large claims.Threshold
has to be stated at the niveau of index_year.reserve_classes
numeric vector to specify reserve classes at the niveau of index_year, for examplec(1, 1001)
for the three reserve classes(-Inf, 1)
,[1, 1001)
and[1001, Inf)
.expected_year_of_growing_large
Integer value for the estimated development year of becoming large for claims without full history (see description above). Default: 3.
Step 2 - Add missing rows
claims_data
is requested to only contain rows in which one of Cl_reserve
, An_reserve
, Cl_payment_cal
or An_payment_cal
is not equal to 0.
This step adds rows so that claims_data contains one row per calendar year between the origin year and the last origin year for each claim.
Step 3 - Add columns
This step adds derived columns to claims_data
that are needed for further operations:
Development_year
Sum of claim and annuity payments and reserve
Payment_cal
as the sum ofCl_payment_cal
andAn_payment_cal
Reserve
as the sum ofCl_reserve
andAn_reserve
cumulated columns
Cl_payment_cum
as the cumulated claim paymentsAn_payment_cum
as the cumulated annuity paymentsPayment_cum
as the sum of the two latter
incurred columns
Cl_incurred
as the sum ofCl_payment_cum
andCl_reserve
An_incurred
as the sum ofAn_payment_cum
andAn_reserve
Incurred
as the sum of the two latter
Entry reserve columns
Entry_cl_reserve
as theCl_reserve
of the beginning of the yearEntry_an_reserve
as theAn_reserve
of the beginning of the yearEntry_reserve
as the sum of the two latter
Step 4 - Add indexed columns
This step uses the dataframe indices to index every payment and every reserve to the indexation year.
The following columns are calculated:
indexed payments
Ind_cl_payment_cal
Ind_an_payment_cal
Ind_payment_cal
as the sum of the two latter
indexed reserves
Ind_cl_reserve
Ind_an_reserve
Ind_reserve
as the sum of the two latter
indexed entry reserves
Ind_entry_cl_reserve
Ind_entry_an_reserve
Ind_entry_reserve
as the sum of the two latter
cumulated columns
Ind_cl_payment_cum
Ind_an_payment_cum
Ind_payment_cum
as the sum of the two latterInd_cl_incurred
Ind_an_incurred
Ind_incurred
as the sum of the two latter
Step 5 - Filtering large claims
The indexed columns allow for the exact calculation of which claim has become large and when.
Claims that have not become large yet are eliminated.
The following new columns are derived from this information:
Large_since
as the calendar year in which the claim exceeded the threshold for the first timeDev_year_of_growing_large
as the development year in which the claim exceeded the threshold for the first timeDev_year_since_large
as the "new" development year where counting starts in yearLarge_since
instead of the origin year
Step 6 - Add reserve classes
The columns Entry_reserve_class
and Exit_reserve_class
can now be derived from Ind_entry_cl_reserve
and Ind_cl_reserve
.
Step 7 - Attach future annuities
The columns New_annuity_1
to New_annuity_5
are added to the dataframe. If new annuities have been agreed on
in the calendar year and the claim of the row, these columns contain the row number of these annuities in the corresponding
dataframe pool_of_annuities
. These columns are needed for the pool as the sampling shall consider claim payments and
new reserve classes as well as new annuities.
The function always adds five columns for new annuities which is sufficient for the rare case that five new annuities are agreed on for one single claim in one calendar year. Each further new annuity in one year will be ignored, which could lead to a slight underestimation of the best estimate. To avoid this, data may be manually adjusted, for example by moving further annuities to another calendar year or by aggregating annuities.
Examples
# this example uses data provided with this package
new_data <- prepare_data(claims_data = minimal_claims_data_xmpl,
indices = indices_xmpl,
threshold = 400000,
first_orig_year = 1989,
last_orig_year = 2023,
expected_year_of_growing_large = 3,
reserve_classes = c(1, 200001, 400001, 700001, 1400001),
pool_of_annuities = minimal_pool_of_annuities_xmpl)
head(new_data)
#> Claim_id Origin_year Calendar_year Cl_payment_cal Cl_reserve An_payment_cal
#> 1 Claim#1 1976 1976 0 0 0
#> 2 Claim#1 1976 1977 0 0 0
#> 3 Claim#1 1976 1978 0 0 0
#> 4 Claim#1 1976 1979 0 0 0
#> 5 Claim#1 1976 1980 0 0 0
#> 6 Claim#1 1976 1981 0 0 0
#> An_reserve Development_year Payment_cal Reserve Cl_payment_cum An_payment_cum
#> 1 0 1 0 0 0 0
#> 2 0 2 0 0 0 0
#> 3 0 3 0 0 0 0
#> 4 0 4 0 0 0 0
#> 5 0 5 0 0 0 0
#> 6 0 6 0 0 0 0
#> Payment_cum Cl_incurred An_incurred Incurred Entry_cl_reserve
#> 1 0 0 0 0 0
#> 2 0 0 0 0 0
#> 3 0 0 0 0 0
#> 4 0 0 0 0 0
#> 5 0 0 0 0 0
#> 6 0 0 0 0 0
#> Entry_an_reserve Entry_reserve Ind_cl_payment_cal Ind_an_payment_cal
#> 1 0 0 0 0
#> 2 0 0 0 0
#> 3 0 0 0 0
#> 4 0 0 0 0
#> 5 0 0 0 0
#> 6 0 0 0 0
#> Ind_payment_cal Ind_cl_reserve Ind_an_reserve Ind_entry_cl_reserve
#> 1 0 0 0 0
#> 2 0 0 0 0
#> 3 0 0 0 0
#> 4 0 0 0 0
#> 5 0 0 0 0
#> 6 0 0 0 0
#> Ind_entry_an_reserve Ind_entry_reserve Ind_reserve Ind_cl_payment_cum
#> 1 0 0 0 0
#> 2 0 0 0 0
#> 3 0 0 0 0
#> 4 0 0 0 0
#> 5 0 0 0 0
#> 6 0 0 0 0
#> Ind_an_payment_cum Ind_payment_cum Ind_cl_incurred Ind_an_incurred
#> 1 0 0 0 0
#> 2 0 0 0 0
#> 3 0 0 0 0
#> 4 0 0 0 0
#> 5 0 0 0 0
#> 6 0 0 0 0
#> Ind_incurred Large_since Dev_year_of_growing_large Dev_year_since_large
#> 1 0 1978 3 -1
#> 2 0 1978 3 0
#> 3 0 1978 3 1
#> 4 0 1978 3 2
#> 5 0 1978 3 3
#> 6 0 1978 3 4
#> Entry_reserve_class Exit_reserve_class New_annuity_1 New_annuity_2
#> 1 0 0 0 0
#> 2 0 0 0 0
#> 3 0 0 0 0
#> 4 0 0 0 0
#> 5 0 0 0 0
#> 6 0 0 0 0
#> New_annuity_3 New_annuity_4 New_annuity_5
#> 1 0 0 0
#> 2 0 0 0
#> 3 0 0 0
#> 4 0 0 0
#> 5 0 0 0
#> 6 0 0 0