A variety of PII fields and matching combinations are supported by our Privacy-Preserving Matching by default. This article outlines our default schema and matching combinations, how to work with fields that have multi values and some general guidelines when preparing your data for upload to the Contributor Node.
General Notes
These notes apply to all Fields
Field type or scenario | Considerations for formatting |
Missing / No data | When a field does not have any data, it should be left empty. Do not use "space", hyphen, NULL character, or other placeholder values. |
Numeric fields | Take particular care with numeric fields that they do not default to "0" when an empty value should be used (e.g. due to missing data) |
Large integers | Large integers should not be written in scientific notation, as this often loses precision. This can be particularly problematic for phone numbers – take care to export these as string. |
Custom fields | Custom fields may be added – contact DR. |
Tokens (as strings) | They are random integers written in hexadecimal notation. You should process them as "opaque strings" to avoid any conversion to scientific notation, stripping of leading zeros, or loss of precision caused by conversion to floating point types. |
When updating a CSV | Privacy-Preserving Matching takes the whole record to be an "upsert" operation.
|
Data upload format |
|
Default Schema
NOTE: CN template headers are case sensitive - should all be in lower case
Field Name | Type | Description | Formatting & Normalization Rules |
personid | String (varchar 100) |
|
|
String (varchar) |
|
| |
phone | Numeric |
|
|
dpid | Numeric |
|
|
nationalid | String (varchar) |
|
|
frequent_flyer_number | String (varchar) |
|
|
custom_name | String (varchar) |
|
|
birthdate | Date |
|
|
family_name | String (varchar) |
|
|
given_name | String (varchar) |
|
|
postcode | String (varchar) |
|
|
Default Matching Combinations
The following fields are always matched because they are generally "unique enough" in a data set to be useful to match on.
email
phone
dpid
nationalid
frequent_flyer_number
custom_name
These next fields however are used as "qualifiers" – they are not matched on their own but instead are matched in combination. This is because alone, they are not useful fields to match on (e.g. it is not useful to match all the first names together, as it would include many false positives).
email + given_name
custom_name + postcode
phone + given_name
custom_name + birthdate
email + family_name
given_name + family_name (available in US & SG only)
email + family_name
phone + family_name
dpid + family_name
dpid + given_name
Other combinations can be configured for you. Talk to Data Republic.
Working with Multi-Value Fields
The following fields can accept multiple values for a field (a list of values):
Field name | Max values |
2 | |
phone | 3 |
dpid | 2 |
nationalid | 1 |
frequent_flyer_number | 1 |
custom_name | 2 |
When using multi-valued fields in a CSV file, please follow these rules / examples:
In your CSV file, use the same header name for the field but append a digit to separate into multiple columns, one for each value you want to provide.
For example, to provide up to 2 values for "email", use the field names "email:0" and "email:1".