Column
|
Description
|
Changes
|
Missing or Invalid Values
|
Notes
|
Loan_ID
|
Record Identifier
|
None
|
N/A
|
Primary Key
|
Account_ID
|
Account Identifier
|
None
|
N/A
|
Foreign Key
|
Date
|
Date the loan was granted
|
Changed format from YYMMDD to MM/DD/YYYY
|
Ignore
|
Correlation with Status Attribute:
R2 = 98%
r = 0.49
|
Amount
|
Amount of loan
|
Removed
Discretized values stored in Amt attribute
|
N/A
|
Correlation with Status Attribute:
R2 = 68%
r = 0.34
|
Duration
|
Duration of the loan
Possible Values:
12 months
24 months
36 months
48 months
60 months
|
Removed
Discretized values stored in Dur attribute
|
N/A
|
Correlation with Status attribute:
R2 = 100%
r = 0.51
Distribution of Values:
12 = 19%
20 = 20%
36 = 19%
40 = 20%
60 = 21%
|
Payments
|
Monthly loan payment
|
Removed
Discretized values stored in Amt attribute
|
N/A
|
|
Status
|
Status of loan pay-off
- 'A' = Contract finished, no problems
- 'B' = Contract finished, loan not payed
- 'C' = Contract running, OK thus-far
- 'D' = Contract running, client in debt
|
None
|
N/A
|
Correlation with Status attribute:
R2 = 98%
r = 0.49
Distribution of values:
'A' = 30%
'B' = 4.5%
'C' = 59%
'D' = 6.5%
|
Dur
|
Discretized Duration Attribute:
(*,30)
(31,42)
(43,54)
(55,*)
|
Added
Discretized in Rosetta using Entropy Algorithm
|
N/A
|
Distribution of values:
(*,30) = 39%
(31,42) = 19%
(43,54) = 20%
(55,*) = 21%
|
Pmt
|
Discretized Payment Attribute:
(*,8041) = 1
(8041,*) = 2
|
Added
Discretized in Rosetta using Entropy Algorithm
Rosetta discretized into 50+ values.. We merged values using <1%
|
N/A
|
Distribution of values:
(*,8041) = 94%
(8041,*) = 6%
|
Amt
|
Discretized Amount Attribute:
(*,30708) = 1
(30709,49380) = 2
(49381,76926) = 3
(76927,230310) = 4
(230311,*) = 5
|
Added
Discretized in Rosetta using Entropy Algorithm
Rosetta discretized into 100+ values.. We merged values using <1%
|
N/A
|
Distribution of values:
(*,30708) = 9%
(30709,49380) = 10%
(49381,76926) = 13%
(76927,230310) = 47%
(230311,*) = 21%
|