Reliability of paraphrase-mpnet-base-v2-fuzzy-matcher without char-level spacing?

#3
by Ramibelg - opened

Hi 👋,

Thanks for releasing shahrukhx01/paraphrase-mpnet-base-v2-fuzzy-matcher — it’s been very helpful!

The README recommends splitting each input into space-separated characters, e.g.:

word = " ".join(char for char in word) # char-level fuzzy match

If I remove that line and feed the raw string instead:

Similarity still reaches 1.00 for identical strings.

Scores for near-miss spellings (e.g. “Think different” vs “Think differe”) stay high (~0.87).

Scores fall less sharply for truncated inputs (e.g. “Think” → ~0.70).

Sign up or log in to comment