The following activities were completed to prepare the dataset for
use in the data mining exercise:
|
1. |
Converted the ascii files to: |
|
a. |
MS Excel and/or MS Word files for cleaning data |
|
b. |
MS Access database for use in data mining, de-normalizing or ‘flattening’ files, and basic querying to learn more about the data. |
|
c. |
Put all modified files into file types recognized by Weka. These files are comma delimited with a ‘heading’ of attribute definition information. |
2. |
Verified all table relationships: |
|
a. |
Every account has an Owner via Disp and Account tables |
|
b. |
Order and Loan records are duplicated in transaction records. That is, the transactions include Order records and Loan payments. |
|
|
i. |
Loan records in Trans are identified by k_symbol=”LP” |
3. |
Change attributes as necessary. The tables linked (below) describe the changes made: |
|
a. |
Add, change, remove, discretize attributes |
4. |
De-normalize, or ‘flatten’, files for mining. Our database is relational. In order to mine or cluster attributes, those attributes must be in a single table. We have created a de-normalized table based on our goals. |
|
a. |
Goal: Analyze credit-card information to extrapolate the type of customer who makes a good candidate for a credit-card. |
|
|
i. |
Combine tables:
Account-Client-Disp-Card-District-Loan-Transaction |
|
|
|
º |
Using information we discovered about accounts from previous clustering, cluster customer information |
|
|
|
º |
Using card type as clustering attribute |
|
|
|
º |
Added "N" (None) as a possible value to the Loan Status attribute |
|
|
|
º |
In order to better understand customers, we looked at this table in two ways: |
|
|
|
|
º |
To identify, from all customers, which were credit card holders and which were not. |
|
|
|
|
º |
To examine the variances that exist between all credit card holding customers. |