Overview
Production Readiness
0.78
Novelty Score
0.48
Cost Impact Score
0.8
Citation Count
0
Why It Matters For Business
Small businesses can run secure, low-cost RAG chatbots on commodity hardware while keeping strong tenant isolation and practical defenses against prompt injection.
Summary TLDR
This paper presents an open-source, multi-tenant platform for small businesses to deploy retrieval-augmented chatbots on low-cost, distributed k3s clusters. Security is handled at the platform level with container isolation, PII screening, guard prompts, and a pre-generation detector (GenTel-Shield). In an e-commerce case study, guard prompts alone give near-100% recall; GenTel-Shield achieves high precision (99.51%) and moderate recall (81.6%); combined defenses reach ~100% recall and ~99.8% F1. The k3s private cloud matched or reduced latency versus bare-metal for evaluated LLMs.
Problem Statement
Small businesses lack budget and engineering staff to run cloud GPU fleets and to harden RAG chatbots against prompt injection and data leakage; we need a low-cost, deployable platform that enforces tenant isolation and practical prompt-injection defenses without retraining models.
Main Contribution
An open-source, multi-tenant platform built on lightweight k3s clusters and an encrypted overlay network for small-business LLM deployments.
A layered, platform-level prompt-injection mitigation combining system-level guard prompts and the GenTel-Shield detector that avoids model retraining.
A real-world e-commerce case study showing security effectiveness and inference latency comparisons between bare-metal and k3s private-cloud deployments.
Key Findings
Guard prompts block prompt-injection attacks almost perfectly in the case study.
GenTel-Shield provides model-agnostic detection with high precision but misses some attacks.
Combining guard prompts and GenTel-Shield gives the most robust defense.
k3s-based private cloud lowered end-to-end inference latency versus bare-metal in the case study.
Base LLM safety alone is insufficient to stop prompt-injection attacks.
Results
Guard Prompts recall
GenTel-Shield precision / recall / F1
Combined defenses (Guard + GenTel-Shield) F1
End-to-end latency (GPT-4.1-mini)
End-to-end latency (GPT-4.1)
End-to-end latency (Ministral-3B)
Who Should Care
What To Try In 7 Days
Run a k3s demo cluster on spare machines to test private-cloud deployment.
Add simple system-level guard prompts to existing LLM prompts and test with known injection samples.
Integrate a model-agnostic pre-generation detector (e.g., GenTel-Shield) and measure false positives and missed attacks.
Optimization Features
Infra Optimization
- k3s lightweight Kubernetes
- Pooling commodity hardware to reduce cost
System Optimization
- Encrypted overlay networking
- Multi-tenant container isolation
Inference Optimization
- GPU-aware scheduling
- Distributed inference across heterogeneous nodes
- Containerised workload placement
Reproducibility
Code Urls
Code Available
Open Source Status
- partial
Risks & Boundaries
Limitations
- Guard prompts require manual, scenario-specific tuning and may not generalise to new domains.
- GenTel-Shield misses some attacks (recall ~81.6%) when used alone.
- Case study is a single-tenant e-commerce deployment; results may vary for other industries and scales.
- Operational setup assumes some on-site hardware and network reliability; misconfiguration can widen blast radius.
When Not To Use
- High-stakes autonomous decision systems where full formal verification is required.
- Unrestricted creative generation tasks where prompt constraints would block desired outputs.
- Contexts with strict legal requirements that prohibit local data pooling or shared multi-tenant infrastructure.
Failure Modes
- Detector false negatives allow obfuscated injection to reach the model.
- Guard prompts can be bypassed by novel or domain-specific obfuscation without prompt updates.
- Cross-tenant leakage if container isolation or access controls are misconfigured.
- Network instability can cause latency spikes that hurt user experience.
Core Entities
Models
- GPT-4.1
- GPT-4.1-mini
- Ministral-3B
- GenTel-Shield (detector)
Metrics
- Precision
- Recall
- F1
- End-to-end inference latency (s)
Datasets
- Customer Support queries (ATS)
- GenTel-Safe prompt-injection attack dataset (adapted)
Benchmarks
- GenTel-Safe

