Badnyal's picture
Add YAML metadata to fix warning
4c4abbe verified
metadata
language:
  - en
  - kha
license: cc0-1.0
tags:
  - sentence-transformers
  - khasi
  - semantic-search
  - northeast-india
  - cross-lingual
library_name: sentence-transformers
pipeline_tag: sentence-similarity

Khasi-English Semantic Search Model

First production-ready semantic search model for Khasi-English language pairs.

Overview

This model enables semantic search between English and Khasi languages, supporting Northeast India's linguistic diversity. Trained on 66,794 English-Khasi translation pairs.

Use Cases

  • Cross-lingual semantic search (English ↔ Khasi)
  • Document similarity in bilingual contexts
  • Cultural content discovery for Northeast India
  • Educational language learning tools

Performance

  • English-Khasi similarity: 0.69-0.74
  • Model size: ~90MB (lightweight deployment)
  • 384-dimensional embeddings

Quick Start

from sentence_transformers import SentenceTransformer

model = SentenceTransformer('MWirelabs/khasi-english-semantic-search')
sentences = ['Hello', 'hangne', 'Good morning']
embeddings = model.encode(sentences)

Developed by MWirelabs for Northeast India AI innovation.