Skip to main content

Matching Algorithms

Kora Compliance uses a composite matching algorithm to compare subject names against watchlist entries. The composite score combines four different string similarity methods, each catching different types of variations.

Composite Score

The final match score is a weighted combination of four algorithms:

Score = (Primary × 0.60) + (Token × 0.15) + (N-gram × 0.15) + (Phonetic × 0.10)

where Primary = max(Token, Fuzzy, N-gram)

Match Strength Classification

StrengthComposite ScoreDescription
EXACT1.0Normalized names are identical
STRONG≥ 0.92Very high confidence match
POSSIBLE≥ 0.75Potential match requiring review

Scores below 0.75 are not returned as matches.

Algorithm Details

1. Jaro-Winkler Distance (Primary)

Weight: 60% (as primary score component)

Measures character-level similarity between two strings, with a bonus for matching prefixes. Effective for catching typos and character transpositions.

Input AInput BScore
"John Smith""Jon Smith"0.96
"Mohammed Ali""Mohamed Ali"0.97
"Smith, John""John Smith"0.82

Parameters:

  • Scaling factor: 0.1 (prefix bonus)
  • Maximum prefix length: 4 characters

2. Token-Based Matching

Weight: 15%

Splits names into tokens (words) and compares all combinations. Handles name reordering — "John Doe" matches "Doe John" equally well.

Input AInput BScore
"John Michael Doe""Doe John Michael"1.00
"John Doe""John Michael Doe"0.88
"Al-Rashid Mohammed""Mohammed Al Rashid"0.95

Parameters:

  • Minimum token match threshold: 85–95%

3. N-gram Similarity (Bigrams)

Weight: 15%

Compares 2-character sequences (bigrams) between strings. Catches character-level variations and partial name matches.

Input AInput BScore
"Alexander""Aleksander"0.89
"Mueller""Muller"0.86
"Tchaikovsky""Chaikovsky"0.88

Parameters:

  • N-gram size: 2 (bigrams)
  • Minimum threshold: 0.85

4. Soundex (Phonetic)

Weight: 10%

Compares phonetic encodings of names. Catches names that sound alike but are spelled differently.

Input AInput BSoundex ASoundex BMatch
"Smith""Smyth"S530S530Yes
"Schmidt""Smith"S530S530Yes
"Catherine""Katherine"C365K365Partial

Name Normalization

Before matching, all names go through normalization:

  1. Lowercase conversion — "JOHN DOE" → "john doe"
  2. Diacritic removal — "José García" → "jose garcia"
  3. Title/honorific removal — Strips: Mr., Mrs., Ms., Dr., Prof., Sir, Lord, Dame, Hon.
  4. Punctuation removal — Removes punctuation while preserving spaces and digits
  5. Whitespace normalization — Collapses multiple spaces into one

Normalization Examples

OriginalNormalized
"Dr. José María García-López""jose maria garcia lopez"
"Mr. MOHAMMED AL-RASHID""mohammed al rashid"
"Prof. Sir John Smith III""john smith iii"
"김정은 (Kim Jong-un)""김정은 kim jong un"

Matching Example

Subject: "Mohamed Al Rasheed" Watchlist entry: "Mohammed Al-Rashid"

StepResult
Normalize subject"mohamed al rasheed"
Normalize entry"mohammed al rashid"
Jaro-Winkler0.91
Token match0.93
N-gram0.85
Soundex0.90
Primarymax(0.93, 0.91, 0.85) = 0.93
Composite(0.93 × 0.60) + (0.93 × 0.15) + (0.85 × 0.15) + (0.90 × 0.10) = 0.915
StrengthPOSSIBLE (≥ 0.75)

Tuning Thresholds

Match thresholds can be adjusted per screening check type via the configuration API. Lowering thresholds increases recall (more matches) but may increase false positives. Raising thresholds reduces noise but may miss fuzzy matches.

Default thresholds work well for most compliance use cases.