Shkumbin Sherifi — AI Systems Engineer

AI Systems Engineer

Shkumbin
Sherifi

Designing and operating local first AI systems for inference, retrieval, orchestration, and evaluation on constrained hardware.

Local inference · Retrieval systems · Multi-model routing · Evaluation systems

High performance simulation engine built to evaluate vector synchronization, relational aggregation, and orchestration performance across constrained local hardware environments.

Pipeline Stages

Stage 1 — Draft Generation

  • 7 rounds × 32 picks
  • Automated attribute scoring matrices
  • Prospect generation and ranking logic

Stage 2 — Embedding System

  • 4,940 player vectors
  • PyTorch VAE clustering into 12 archetypes
  • Low-latency similarity retrieval via ChromaDB

Stage 3 — Season Simulation

  • 17-week simulation engine
  • Play-by-play matchup execution
  • Seeded variance and home-field weighting

Stage 4 — Progression Engine

  • Physical aging curves
  • Development trait progression
  • Rookie growth and veteran regression

Stage 5 — Salary Cap & Front Office

  • Rule-of-51 enforcement
  • Dead-money calculations
  • Asset valuation and trade logic
Supporting Systems

Coaching Layer

  • Play-calling logic, scheme-fit metrics, and in-game adjustments.

Scouting Engine

  • Regional grading pipelines and combine evaluation models.

Free Agency Marketplace

  • Multi-agent contract bidding and team-fit valuation scoring.

Draft Intelligence

  • Need-weighted board ranking and trade-up/down evaluation.
Infrastructure

DuckDB for structured OLAP analytics across ~1,700 entities

ChromaDB vector storage for 45-dimensional embeddings

Full 10-season franchise lifecycle simulation computed in ~16 seconds locally

PythonPyTorchDuckDBChromaDBMLXDockerNext.js
data

Albanian Speech-to-Text

Local first transcription pipeline for Kosovo Albanian with human-correction feedback loops and iterative fine-tuning workflows.

Pipeline

901 curated YouTube audio sources processed through Faster-Whisper for segment-level transcription.

Configurable inference models from tiny to large-v3 with language auto-detection and re-ranking.

Human correction feedback loop for continual dataset refinement.

Observability

SQLite-backed tracking for inference latency, correction rates, WER, and CER.

Real-time monitoring dashboard across deployment variants and model sizes.

Structured evaluation workflows for reproducible ASR benchmarking.

WhisperFaster-WhisperPythonASRSQLiteAudio ProcessingWeb UI
constraint

Hermes Workflow Environment

Local first workflow environment integrating MLX inference, provider routing, automation pipelines, and retrieval systems.

Inference Layer

oMLX model server (:8001) as the primary inference backend

Automatic routing across local and private cloud providers

Fallback chain: local MLX → OpenRouter → Cerebras → cloud GPU infrastructure

Orchestration

s6-overlay supervision for containerized service lifecycle management

SQLite-backed kanban orchestration system with queue state tracking

Cron-driven monitoring with automatic retry and failure recovery

Memory-pressure gating pauses dispatch during constrained local resource states

Workflow Automation

Integrated agent workflows inspired by procedural task execution patterns

Multi-step task chaining with parameterized execution flows

Retrieval-assisted workflows using SQLite observability and vector search pipelines

Experimentation with local first agent coordination and automation tooling

PythonMLXSQLiteNext.jsAutomationAgents

Production Systems

Në Dritën Islame

E-commerce and operational administration platform built with Next.js, Supabase, and automated fulfillment workflows. Handles order processing and backend admin automation.

Gloweb

Client-facing web systems and backend integrations across React, TypeScript, and Node.js. Focused on production deployment workflows and API integration layers.

Arbnori Engineering

Multilingual business platform with deployment automation and localization systems for multi-region operations.

Contact

Open to AI systems, ML infrastructure, and applied AI engineering roles.