Shkumbin Sherifi — AI Systems Engineer

Shkumbin
Sherifi

I build local-first AI systems that are measurable, reliable, and controllable.

My work spans speech recognition, retrieval systems, agent workflows, simulations, and knowledge graphs, with a focus on evaluation, observability, and production constraints.

Local inference · Speech recognition · Retrieval systems · Evaluation pipelines · Agent workflows · Knowledge graphs

Coding. Building. AI Buildings.

High performance simulation engine built to evaluate vector synchronization, relational aggregation, and orchestration performance across multi-service architecture.

Pipeline Stages

Stage 1: Draft Generation

  • 7 rounds × 32 picks
  • Automated attribute scoring matrices
  • Prospect generation and ranking logic

Stage 2: Embedding System

  • 4,940 player vectors
  • PyTorch VAE clustering into 12 archetypes
  • Low-latency similarity retrieval via ChromaDB

Stage 3: Season Simulation

  • 17-week simulation engine
  • Play-by-play matchup execution
  • Seeded variance and home-field weighting

Stage 4: Progression Engine

  • Physical aging curves
  • Development trait progression
  • Rookie growth and veteran regression

Stage 5: Salary Cap & Front Office

  • Rule-of-51 enforcement
  • Dead-money calculations
  • Asset valuation and trade logic
Supporting Systems

Coaching Layer

  • Play-calling logic, scheme-fit metrics, and in-game adjustments.

Scouting Engine

  • Regional grading pipelines and combine evaluation models.

Free Agency Marketplace

  • Multi-agent contract bidding and team-fit valuation scoring.

Draft Intelligence

  • Need-weighted board ranking and trade-up/down evaluation.
Infrastructure

DuckDB for structured OLAP analytics across ~1,700 entities

ChromaDB vector storage for 45-dimensional embeddings

Full 10-season franchise lifecycle simulation computed in ~16 seconds locally

PythonPyTorchDuckDBChromaDBMLXDockerNext.js
data

Albanian Speech-to-Text

Local first transcription pipeline for Albanian (Kosovo dialect) with human-correction feedback loops and iterative fine-tuning workflows.

Pipeline

901 curated YouTube audio sources processed through Faster-Whisper for segment-level transcription.

Configurable inference models from tiny to large-v3 with language auto-detection and re-ranking.

Human correction feedback loop for continual dataset refinement.

Observability

SQLite-backed tracking for inference latency, correction rates, WER, and CER.

Real-time monitoring dashboard across deployment variants and model sizes.

Structured evaluation workflows for reproducible ASR benchmarking.

WhisperFaster-WhisperPythonASRSQLiteAudio ProcessingWeb UI
runtime

Local Agent Runtime

A local runtime layer that routes tasks across local and fallback models, pauses dispatch under memory pressure, and tracks execution through supervised queues.

Architecture

Routing: local MLX first, with policy-gated fallback providers

Queues: SQLite-backed orchestration with state tracking and retry and recovery

Supervision: containerized service lifecycle management

Resource gating: dispatch pauses under memory pressure

Observability: execution traces and queue state monitoring

In active development: governed task execution and external runtime integrations.

PythonMLXSQLiteDocker
graph

Keep-Graph: 10 years of personal notes, visualized

10 years of Google Keep notes turned into a knowledge graph — 1,032 wiki pages, 12,801 nodes, 44,534 edges. Mapped.

Pipeline

Load all Google Keep Takeout notes (JSON)

Filter relevant notes with Qwen 0.6B classifier

Extract concepts per note with Qwen 27B

Deduplicate and canonicalize concepts with Qwen 27B

Synthesize wiki pages per cluster with Qwen 27B

Build graph.json and render D3 force layout

Infrastructure

Extracted entirely on-device using local LLMs (Qwen 0.6B + 27B via Apple MLX)

Multilingual: English, Albanian, Arabic with cross-language concept deduplication

Per-concept wiki synthesis across 1,032 generated pages

Interactive D3 timeline spanning 2015 to 2026

PythonMLXQwen 0.6B / 27BD3.jsLocal-First

Production Systems

Në Dritën Islame

E-commerce and operational administration platform built with Next.js, Supabase, and automated fulfillment workflows. Handles order processing and backend admin automation.

Gloweb

Client-facing web systems and backend integrations across React, TypeScript, and Node.js. Focused on production deployment workflows and API integration layers.

Arbnori Engineering

Multilingual business platform with deployment automation and localization systems for multi-region operations.

Contact

Open to AI systems, ML infrastructure, and applied AI engineering roles.