Matching Algorithms

Kora Compliance uses a composite matching algorithm to compare subject names against watchlist entries. The composite score combines four different string similarity methods, each catching different types of variations.

Composite Score

The final match score is a weighted combination of four algorithms:

Score = (Primary × 0.60) + (Token × 0.15) + (N-gram × 0.15) + (Phonetic × 0.10)

where Primary = max(Token, Fuzzy, N-gram)

Match Strength Classification

Strength	Composite Score	Description
`EXACT`	1.0	Normalized names are identical
`STRONG`	≥ 0.92	Very high confidence match
`POSSIBLE`	≥ 0.75	Potential match requiring review

Scores below 0.75 are not returned as matches.

Algorithm Details

1. Jaro-Winkler Distance (Primary)

Weight: 60% (as primary score component)

Measures character-level similarity between two strings, with a bonus for matching prefixes. Effective for catching typos and character transpositions.

Input A	Input B	Score
"John Smith"	"Jon Smith"	0.96
"Mohammed Ali"	"Mohamed Ali"	0.97
"Smith, John"	"John Smith"	0.82

Parameters:

Scaling factor: 0.1 (prefix bonus)
Maximum prefix length: 4 characters

2. Token-Based Matching

Weight: 15%

Splits names into tokens (words) and compares all combinations. Handles name reordering — "John Doe" matches "Doe John" equally well.

Input A	Input B	Score
"John Michael Doe"	"Doe John Michael"	1.00
"John Doe"	"John Michael Doe"	0.88
"Al-Rashid Mohammed"	"Mohammed Al Rashid"	0.95

Parameters:

Minimum token match threshold: 85–95%

3. N-gram Similarity (Bigrams)

Weight: 15%

Compares 2-character sequences (bigrams) between strings. Catches character-level variations and partial name matches.

Input A	Input B	Score
"Alexander"	"Aleksander"	0.89
"Mueller"	"Muller"	0.86
"Tchaikovsky"	"Chaikovsky"	0.88

Parameters:

N-gram size: 2 (bigrams)
Minimum threshold: 0.85

4. Soundex (Phonetic)

Weight: 10%

Compares phonetic encodings of names. Catches names that sound alike but are spelled differently.

Input A	Input B	Soundex A	Soundex B	Match
"Smith"	"Smyth"	S530	S530	Yes
"Schmidt"	"Smith"	S530	S530	Yes
"Catherine"	"Katherine"	C365	K365	Partial

Name Normalization

Before matching, all names go through normalization:

Lowercase conversion — "JOHN DOE" → "john doe"
Diacritic removal — "José García" → "jose garcia"
Title/honorific removal — Strips: Mr., Mrs., Ms., Dr., Prof., Sir, Lord, Dame, Hon.
Punctuation removal — Removes punctuation while preserving spaces and digits
Whitespace normalization — Collapses multiple spaces into one

Normalization Examples

Original	Normalized
"Dr. José María García-López"	"jose maria garcia lopez"
"Mr. MOHAMMED AL-RASHID"	"mohammed al rashid"
"Prof. Sir John Smith III"	"john smith iii"
"김정은 (Kim Jong-un)"	"김정은 kim jong un"

Matching Example

Subject: "Mohamed Al Rasheed" Watchlist entry: "Mohammed Al-Rashid"

Step	Result
Normalize subject	"mohamed al rasheed"
Normalize entry	"mohammed al rashid"
Jaro-Winkler	0.91
Token match	0.93
N-gram	0.85
Soundex	0.90
Primary	max(0.93, 0.91, 0.85) = 0.93
Composite	(0.93 × 0.60) + (0.93 × 0.15) + (0.85 × 0.15) + (0.90 × 0.10) = 0.915
Strength	POSSIBLE (≥ 0.75)

Tuning Thresholds

Match thresholds can be adjusted per screening check type via the configuration API. Lowering thresholds increases recall (more matches) but may increase false positives. Raising thresholds reduces noise but may miss fuzzy matches.

Default thresholds work well for most compliance use cases.

Composite Score​

Match Strength Classification​

Algorithm Details​

1. Jaro-Winkler Distance (Primary)​

2. Token-Based Matching​

3. N-gram Similarity (Bigrams)​

4. Soundex (Phonetic)​

Name Normalization​

Normalization Examples​

Matching Example​

Tuning Thresholds​

Composite Score

Match Strength Classification

Algorithm Details

1. Jaro-Winkler Distance (Primary)

2. Token-Based Matching

3. N-gram Similarity (Bigrams)

4. Soundex (Phonetic)

Name Normalization

Normalization Examples

Matching Example

Tuning Thresholds