Reliability of paraphrase-mpnet-base-v2-fuzzy-matcher without char-level spacing?
#3
by
Ramibelg
- opened
Hi 👋,
Thanks for releasing shahrukhx01/paraphrase-mpnet-base-v2-fuzzy-matcher — it’s been very helpful!
The README recommends splitting each input into space-separated characters, e.g.:
word = " ".join(char for char in word) # char-level fuzzy match
If I remove that line and feed the raw string instead:
Similarity still reaches 1.00 for identical strings.
Scores for near-miss spellings (e.g. “Think different” vs “Think differe”) stay high (~0.87).
Scores fall less sharply for truncated inputs (e.g. “Think” → ~0.70).