A variety of PII fields and matching combinations are supported by our Privacy-Preserving Matching by default. This article outlines our default schema and matching combinations, how to work with fields that have multi values and some general guidelines when preparing your data for upload to the Contributor Node.
- General Notes
- Default Schema
- Default Matching Combinations
- Working with multi-value fields
- Related Articles
These notes apply to all Fields
Field type or scenario
Considerations for formatting
Missing / No data
When a field does not have any data, it should be left empty. Do not use "space", hyphen, NULL character, or other placeholder values.
Take particular care with numeric fields that they do not default to "0" when an empty value should be used (e.g. due to missing data)
Large integers should not be written in scientific notation, as this often loses precision. This can be particularly problematic for phone numbers – take care to export these as string.
Custom fields may be added – contact DR.
Tokens (as strings)
They are random integers written in hexadecimal notation. You should process them as "opaque strings" to avoid any conversion to scientific notation, stripping of leading zeros, or loss of precision caused by conversion to floating point types.
When updating a CSV
Privacy-Preserving Matching takes the whole record to be an "upsert" operation.
Data upload format
NOTE: CN template headers are case sensitive - should all be in lower case
Formatting & Normalization Rules
String (varchar 100)
Default Matching Combinations
The following fields are always matched because they are generally "unique enough" in a data set to be useful to match on.
These next fields however are used as "qualifiers" – they are not matched on their own but instead are matched in combination. This is because alone, they are not useful fields to match on (e.g. it is not useful to match all the first names together, as it would include many false positives).
- email + given_name
- custom_name + postcode
- phone + given_name
- custom_name + birthdate
- email + family_name
- given_name + family_name (available in US & SG only)
- email + family_name
- phone + family_name
- dpid + family_name
- dpid + given_name
Other combinations can be configured for you. Talk to Data Republic.
Working with Multi-Value Fields
The following fields can accept multiple values for a field (a list of values):
When using multi-valued fields in a CSV file, please follow these rules / examples:
- In your CSV file, use the same header name for the field but append a digit to separate into multiple columns, one for each value you want to provide.
- For example, to provide up to 2 values for "email", use the field names "email:0" and "email:1".