Open-source, low-cost platform that secures RAG chatbots for small businesses using k3s clusters and layered prompt-defences

January 21, 20267 min

Overview

Decision SnapshotNeeds Validation

The system is practically oriented and tested in a real e-commerce deployment, but results come from one case study and adapted attack datasets, so broader validation is still needed.

Citations0

Evidence Strength0.80

Confidence0.85

Risk Signals11

Trust Signals

Findings with numeric evidence: 5/5

Findings with evidence refs: 5/5

Results with explicit delta: 6/6

Reproducibility

Status: Partial assets available

Open source: Partial

At A Glance

Cost impact: 80%

Production readiness: 78%

Novelty: 48%

Authors

Jiazhu Xie, Bowen Li, Heyu Fu, Chong Gao, Ziqi Xu, Fengling Han

Links

Abstract / PDF / Code

Why It Matters For Business

Small businesses can run secure, low-cost RAG chatbots on commodity hardware while keeping strong tenant isolation and practical defenses against prompt injection.

Who Should Care

Summary TLDR

This paper presents an open-source, multi-tenant platform for small businesses to deploy retrieval-augmented chatbots on low-cost, distributed k3s clusters. Security is handled at the platform level with container isolation, PII screening, guard prompts, and a pre-generation detector (GenTel-Shield). In an e-commerce case study, guard prompts alone give near-100% recall; GenTel-Shield achieves high precision (99.51%) and moderate recall (81.6%); combined defenses reach ~100% recall and ~99.8% F1. The k3s private cloud matched or reduced latency versus bare-metal for evaluated LLMs.

Problem Statement

Small businesses lack budget and engineering staff to run cloud GPU fleets and to harden RAG chatbots against prompt injection and data leakage; we need a low-cost, deployable platform that enforces tenant isolation and practical prompt-injection defenses without retraining models.

Main Contribution

An open-source, multi-tenant platform built on lightweight k3s clusters and an encrypted overlay network for small-business LLM deployments.

A layered, platform-level prompt-injection mitigation combining system-level guard prompts and the GenTel-Shield detector that avoids model retraining.

Key Findings

Guard prompts block prompt-injection attacks almost perfectly in the case study.

NumbersRecall 99.6100%, F1 ~100% (Table 1)

Practical UseIf you carefully craft and test guard prompts, you can achieve near-complete attack blocking without model changes, but expect manual tuning and maintenance.

Evidence RefTable 1

GenTel-Shield provides model-agnostic detection with high precision but misses some attacks.

NumbersPrecision 99.51%, Recall 81.6%, F1 ~89.7% (Table 1)

Practical UseDeploy GenTel-Shield to reduce false positives and simplify operations; pair it with other controls to catch missed attacks.

Evidence RefTable 1

Results

MetricValueBaselineDeltaSplit / DatasetEvidenceEvidence Ref
Guard Prompts recall99.6100%Pure LLMLarge increase vs. baselineBalanced benign/adversarial set (250/250)Table 1 shows near-100% recall and F1 for Guard PromptsTable 1
GenTel-Shield precision / recall / F199.51% / 81.6% / ~89.7%Pure LLMHigh precision, moderate recallBalanced benign/adversarial set (250/250)Table 1 GenTel-Shield rowTable 1

What To Try In 7 Days

Run a k3s demo cluster on spare machines to test private-cloud deployment.

Add simple system-level guard prompts to existing LLM prompts and test with known injection samples.

Integrate a model-agnostic pre-generation detector (e.g., GenTel-Shield) and measure false positives and missed attacks.

Optimization Features

Infra Optimization
k3s lightweight KubernetesPooling commodity hardware to reduce cost
System Optimization
Encrypted overlay networkingMulti-tenant container isolation
Inference Optimization
GPU-aware schedulingDistributed inference across heterogeneous nodesContainerised workload placement

Reproducibility

Code AvailableYes
Data AvailableNo
Open Source StatusPartial
LicenseUnknown

Risks & Boundaries

Limitations

Guard prompts require manual, scenario-specific tuning and may not generalise to new domains.

GenTel-Shield misses some attacks (recall ~81.6%) when used alone.

When Not To Use

High-stakes autonomous decision systems where full formal verification is required.

Unrestricted creative generation tasks where prompt constraints would block desired outputs.

Failure Modes

Detector false negatives allow obfuscated injection to reach the model.

Guard prompts can be bypassed by novel or domain-specific obfuscation without prompt updates.

Core Entities

Models

GPT-4.1GPT-4.1-miniMinistral-3BGenTel-Shield (detector)

Metrics

PrecisionRecallF1End-to-end inference latency (s)

Datasets

Customer Support queries (ATS)GenTel-Safe prompt-injection attack dataset (adapted)

Benchmarks

GenTel-Safe