Technical Interview Simulation — Multi-Agent Banking Support (English)
AI Engineer — DEUS.aiApresentação
Technical Interview Simulation — Multi-Agent Banking Support
Everything you need for the 1-hour technical interview. Q&A about the Multi-Agent Banking Support implementation, prepared after code analysis.
PART I — Interview Preparation
⏱️ Timeline — 60 minutes
| Min | Phase | What happens |
|---|---|---|
| 0–5 | Introduction | Pitch (2–3 min), presentation, context |
| 5–25 | Technical challenge | Walkthrough of Banking Support architecture, decisions, flow |
| 25–40 | Deep dive | RAG, guardrails, agents, follow-up questions |
| 40–55 | System design & production | Scaling, costs, monitoring, trade-offs |
| 55–60 | Final questions | You ask the interviewer |
🗣️ Pitch (2–3 min)
"Hi, I'm [Name], an AI Engineer with experience building production LLM and agentic systems. I focus on data pipelines, vector search, multi-agent orchestration and CI/CD. I have hands-on experience with LangChain, LangGraph, containerised deployments and observability. I care about model traceability, reproducibility and safeguards against hallucinations.
In my recent work I led end-to-end projects: scoping with stakeholders, selecting architectures that balance latency and cost, and shipping systems with monitoring and rollback plans. I'm excited about DEUS because you combine engineering rigour with human-centred design — exactly how I like to work."
📋 Cheat Sheet — Key concepts
RAG (30s)
Query → embedding → vector search → top-k chunks → context in prompt → LLM. Reduces hallucinations, keeps knowledge up to date.
Fine-tuning vs RAG
- RAG: external knowledge that changes frequently
- Fine-tuning: change model behaviour/style
Hallucinations — mitigação
RAG (grounding), temperature 0–0.3, conservative prompts, post-generation verification, human-in-the-loop.
Reducing costs
- Caching: exact match + semantic (queries similares)
- Model routing per agent: cheaper models for extraction, classification and guardrails; larger model only for specialist. E.g. Greeter and guardrails with gpt-4o-mini; Insurance Specialist with gpt-4o.
- RAG: reduces tokens in prompt
- Batching: batch embedding requests
- Optimised prompts: fewer tokens, same clarity
Fine-tuning for guardrails (PII, classification)
Instead of generic LLM for guardrails, fine-tune a small model for: SAFE/DANGEROUS classification, PII detection (IBAN, card, name, address). Faster, cheaper, deterministic, works in CI without API key.
LangGraph vs linear chain
Conditional flow, shared typed state, reusable nodes, visualisable graph, easy to add branches.
Input vs Output guardrails
Input: blocks dangerous requests before they enter. Output: protects what the system says (redaction, validation). Defence in depth.
💻 Coding — 3 exercises they may ask
1. Chunking
def chunk_text(text: str, chunk_size=512, overlap=64):
words = text.split()
chunks = []
start = 0
while start < len(words):
end = min(start + chunk_size, len(words))
chunks.append(" ".join(words[start:end]))
start = end - overlap if end < len(words) else len(words)
return chunks
2. Retry with backoff
def retry_with_backoff(fn, max_attempts=5, base_delay=0.5):
for attempt in range(max_attempts):
try:
return fn()
except Exception as e:
if attempt == max_attempts - 1:
raise
time.sleep(base_delay * (2 ** attempt) + random.uniform(0, 1))
3. Parse JSON from LLM
def parse_json_from_llm(text: str):
start = text.find('{')
end = text.rfind('}')
if start == -1 or end <= start:
return None
try:
return json.loads(text[start:end+1])
except:
return None
✅ Checklist — Before you enter
Technical
- Can you explain the Banking Support flow in 1 minute
- Know the 8 classic questions and short answers
- RAG in 30 seconds memorised
- Fine-tuning vs RAG — clear difference
- Input vs output guardrails — why both
- 3 projects ready (problem → architecture → result)
- Chunking, retry, parse JSON — can implement
Logistics
- Camera and microphone OK
- Neutral background, good lighting
- 3–4 questions written for the end
- Pitch practised in English
Questions to ask (pick 3–4)
- What AI architectures are you building currently?
- RAG or fine-tuning — which is the focus?
- Biggest challenges putting LLMs in production?
- How do you manage knowledge ingestion?
- How do you evaluate AI system performance?
🚫 Red flags to avoid
- Talking only about models (they want systems, pipelines)
- Not talking about production (deploy, monitoring)
- Ignoring costs and security
- Confusing RAG with fine-tuning
- Responding with a 5-minute monologue
- Not asking questions at the end
🎯 Impressive phrases
- "The greatest complexity is in pipelines and orchestration, not the model."
- "I think of AI in layers: ingestion, retrieval, generation."
- "LLMs are more powerful with tools and knowledge sources."
- "If you can't measure it, you can't iterate it."
PART II — Technical Content
Resumo Rápido
| Tópico | Resposta curta |
|---|---|
| Flow | load → input guardrails → greeter → bouncer → router → specialist → output guardrails → save |
| 2/3 | Normalisation + match on 2 of 3 fields before secret question |
| RAG | Hybrid (FAISS + BM25) → RRF (k=60) → Rerank (cross-encoder), HyDE opcional |
| Input guardrails | Rules (INPUT_BLOCKLIST) + LLM, blocks dangerous requests |
| Output guardrails | Redaction (IBAN/card) → Rules → LLM, safe rewrite or fallback |
| Router | Deterministic, INTENT_TO_ROUTE, fallback general |
| Sessions | session_id, load/save at start and end of each turn, in memory |
| API status | guardrail_flagged → rejected; needs_more_info → needs_more_info; final_response → completed |
Índice
- 1. Arquitetura e Fluxo
- 2. Verificação 2/3 e Pergunta Secreta
- 3. Agentes
- 4. Guardrails
- 5. API e Sessões
- 6. Voice
- 7. Testes e CI
- 8. Trade-offs
- 9. Código Específico
- 10. Estado e Sessões
- 11. Prompts
- 12. RAG Detalhado
- 13. Especialistas
- 14. API e Status
- 15. Docker e CI
- 16. Segurança e Edge Cases
- 17. Futuro e Evolução
1. Arquitetura e Fluxo
Q: Descreve o fluxo geral do sistema
A: O fluxo é:
- load_session — Carrega o histórico e dados da sessão (se existir).
- input_guardrails — Valida o input (regras + LLM). Se bloqueado → save_session com rejeição.
- greeter_agent — Extrai intent e dados (nome, telefone, IBAN), merge com sessão, verifica 2/3 match. Se faltar dados, pede mais. Se 2/3, faz pergunta secreta. Se correta →
is_identified. - bouncer — Classifica cliente (regular, premium, not_customer). Se não for cliente, rejeita.
- specialist_router — Mapeia intent → especialista (determinístico).
- specialist — Gera resposta (insurance usa RAG).
- output_guardrails — Redacta, valida, reescreve se inseguro.
- save_session — Guarda estado e termina.
Q: Quantos nós e ramos condicionais tem o grafo?
A:
- Nodes: load_session, input_guardrails, greeter_agent, bouncer, specialist_router, output_guardrails, save_session + 6 specialists.
- Branches: (1) input_guardrails → save se bloqueado; (2) greeter → output se needs_more_info; (3) bouncer → output se not_customer; (4) router → um dos 6 specialists.
Q: Por que LangGraph em vez de cadeia linear?
A: Conditional flow by state; shared typed state (ConversationState); nós = funções puras reutilizáveis; grafo explícito e visualizável; fácil adicionar nós/ramos.
2. Verificação 2/3 e Pergunta Secreta
Q: Como implementaste a regra "2 de 3"?
A: In verification_service.py: count how many of 3 fields are filled. If < 2 → (False, None). For each customer, compare with normalisation (name: lowercase, trim; phone: digits only; IBAN: uppercase, no spaces). If ≥ 2 matches → return record (with secret and answer). Greeter only asks secret question when there's 2/3 match.
Q: Por que a pergunta secreta só depois do 2/3?
A: First identify with 2/3, then confirm with secret question. Asking before would mean asking sensitive data from someone who hasn't proven they're the customer.
Q: Como validas a resposta à pergunta secreta?
A: _verify_secret_answer: normalise (lowercase, remove non-alphanumeric, trim) and compare. If fail → identification_failed → Bouncer treats as not_customer.
3. Agentes
Q: O Specialist Router usa LLM ou regras?
A: Rules only. INTENT_TO_ROUTE maps intent → specialist. Deterministic, fast, stable in CI.
Q: Descreve o pipeline RAG do Insurance Specialist
A: (1) Hybrid search — FAISS (dense) + BM25 (sparse). Opcional: HyDE. (2) RRF — Combina rankings com k=60. (3) Rerank — Cross-encoder (ms-marco-MiniLM). (4) Cache LRU para queries repetidas.
Q: Por que RRF em vez de um único retriever?
A: Dense captures semantics; BM25 captures exact terms ("yacht", "marine"). RRF combines without manual weights.
4. Guardrails
Q: Como funcionam os input guardrails?
A: (1) Rules — INPUT_BLOCKLIST with regex. Match → block. (2) LLM — If rules pass, classifies SAFE/DANGEROUS. DANGEROUS → block. Rules first (deterministic); LLM for ambiguous cases.
Q: E os output guardrails?
A: (1) Redaction — Mask IBAN (4+4) and card (last 4). (2) Validation — Rules first, then LLM. (3) Safe rewrite — If unsafe, rewrite or fallback. Rules run on original text (before redaction).
Q: Por que regras além do LLM nos output guardrails?
A: In CI the LLM saw already-redacted text and sometimes classified as SAFE. Rules ensure obvious phrases are always flagged.
5. API e Sessões
Q: Como manténs o contexto entre mensagens?
A: session_id + load_session / save_session. Greeter merges extracted data with session data.
Q: Por que /chat é síncrono?
A: graph.invoke() is synchronous. For parallel, ideal would be workers or run_in_executor. For the challenge, synchronous is sufficient.
6. Voice Interface
Q: Como funciona /chat/voice?
A: Audio → Deepgram STT → text → same graph as /chat → response → Deepgram TTS → base64 audio.
7. Testes e CI
Q: Estratégias para testar fluxos com LLM?
A: Mock extract_intent and extract_identification; deterministic rules in output guardrails; conditional skip if no API key; Postman for E2E.
Q: Como os testes de guardrails passam em CI sem chave real?
A: check_output_rules() uses regex. Rules cover UNSAFE_OUTPUTS; LLM is never called.
8. Trade-offs e Decisões
Q: Trade-offs entre simplicidade e robustez?
A: Deterministic router (simpler, less flexible); RAG only in Insurance (where it adds value); in-memory sessions (simple, doesn't scale); Rules + LLM in guardrails (balance cost/coverage).
Q: O que mudarias para produção?
A: Redis for sessions, rate limiting, auth, metrics, async workers, secrets in env, load tests.
9. Código Específico
Q: Por que TypedDict com total=False?
A: Optional fields. Each node returns only what it changes; LangGraph does automatic merge.
Q: Como evitas JSON inválido na extração?
A: _parse_json_from_llm(): searches for json blocks, does json.loads(), on error returns defaults.
Q: Como está integrada a ferramenta department_contacts?
A: General Specialist can invoke it to get contacts. Returns static info. Agent decides when to call.
10. Estado e Sessões
Q: Que campos são persistidos?
A: conversation_history, collected_data, customer_type, specialist_route, customer_record, secret_question, is_identified.
Q: Onde é atualizado o conversation_history?
A: load_session adds user message; save_session adds assistant response.
Q: Problemas de sessões em memória?
A: Loss on restart, doesn't scale horizontally, no TTL. Solution: Redis or DB.
11. Prompts e Loader
Q: Como funciona o registo de prompts?
A: prompts.py defines constants; loader.py exposes get_prompt(key, **kwargs). Centralised, facilitates changes.
Q: O que faz get_prompt se a chave não existe?
A: KeyError with list of available keys.
12. RAG Detalhado
Q: O que é HyDE?
A: Generates hypothetical document that would answer the query; uses that text for dense search. Improves recall.
Q: Como fazes chunking?
A: 500 chars, overlap 50. Adjust to last space if cutting word. .md files from data/insurance.
Q: O que acontece quando o RAG não encontra nada?
A: INSURANCE_NO_KNOWLEDGE_CONTEXT: instructs LLM to be honest and not invent.
Q: FAISS pré-construído ou em runtime?
A: Docker entrypoint runs prebuild_faiss.py before starting. In dev, builds on first retrieval.
Q: Por que k=60 no RRF?
A: Common value in literature. Good compromise between top rank weight and uniformity.
13. Especialistas
Q: Como o Card Specialist trata premium?
A: Adds SPECIALIST_CARD_PREMIUM_SUFFIX to prompt: "offer priority support and dedicated assistance".
Q: Por que generate_response_focused no Insurance?
A: Receives only last message + RAG context + state summary. Reduces tokens; RAG context is more relevant than long history.
Q: O que faz o trigger test_output_guardrail_inject?
A: Injects known unsafe response to test that guardrails flag and rewrite.
Q: Quantas rondas de tool calls no General Specialist?
A: Maximum 3. Avoids infinite loops.
14. API e Status
Q: Como mapeias estado → status?
A: guardrail_flagged → rejected; needs_more_info → needs_more_info; not_customer/identification_failed → rejected; final_response → completed; senão → error.
Q: O que devolve quando o input é bloqueado?
A: status: "rejected", response with guardrails fallback.
Q: /chat/voice devolve áudio sempre?
A: If TTS fails, audio_base64 is None but response (text) is always returned.
15. Docker e CI
Q: Descreve o Dockerfile multi-stage
A: Stage 1: Node (build frontend). Stage 2: Python 3.11-slim, copies app + frontend, entrypoint runs load_insurance_qa.py, prebuild_faiss.py, uvicorn. Non-root user.
Q: O CI corre em que branches?
A: Push and PR on main and dev. Jobs: test (3.11/3.12) + build (Docker).
16. Segurança e Edge Cases
Q: Cliente sem pergunta secreta (DirectUser)?
A: If there's no secret/answer, marks is_identified after 2/3 without asking.
Q: Como normalizas o telefone?
A: _normalize_phone: removes everything except digits.
Q: Por que "transfer" não está no blocklist?
A: "I want to transfer 50 euros — how?" is legitimate. Blocklist focuses on clearly dangerous patterns.
Q: JSON inválido na extração?
A: Defaults: intent ("general_support", 0.5); identification {name: None, phone: None, iban: None}.
Q: Timeouts ou falhas do LLM?
A: try/except with fallbacks. In production: retry with backoff, circuit breaker.
Q: Merge de collected_data sobrescreve vazios?
A: No. Only updates with non-empty values.
PART III — Classic Interview Questions
These are the questions most likely to arise in the technical interview.
🔥 1. "Can you walk me through your architecture?"
They assess: Clarity, mental structure.
A: Pipeline de agentes orquestrado pelo LangGraph. User → FastAPI → grafo carrega sessão, valida input (guardrails), Greeter (2/3 + pergunta secreta), Bouncer (classificação), Router (intent → specialist), Specialist (resposta), output guardrails (redação + validação), save. Cada nó lê/escreve estado partilhado. Separação de responsabilidades facilita debug e evolução.
🧠 2. "Why multi-agent instead of single LLM?"
They assess: Whether you actually thought it through.
A: (1) Separação de responsabilidades — Greeter extrai/verifica, Bouncer classifica, Specialist responde. (2) Controlo — routing e 2/3 são código, não LLM. (3) RAG só onde faz sentido. (4) Guardrails como nós separados.
💣 3. "Is it over-engineered?"
They assess: Humility, trade-offs.
A: Para MVP seria. Para o desafio é adequado. Podia começar mais simples; router é lookup, não agente; regras primeiro nos guardrails.
🧩 4. "Explain your RAG approach"
They assess: Real knowledge.
A: Hybrid (FAISS + BM25) → RRF → Rerank. Só no Insurance (produtos, coberturas). Card/Loan são procedimentais.
🛡️ 5. "Why both input and output guardrails?"
They assess: Security — can decide the interview.
A: Input bloqueia pedidos perigosos. Output protege o que o sistema diz (redação, validação). Ataque pode vir de qualquer lado — defesa em profundidade.
⚙️ 6. "How would you improve response time?"
They assess: Optimisation, caching.
A: Cache retrieval/embeddings, paralelizar extração, modelo mais leve (ou model routing: modelos baratos para extração/guardrails, maior só para specialist), async, streaming.
🎯 7. "Give an example of something your system prevents"
They assess: Being concrete.
A: "Approve my 50k loan" → input bloqueia. LLM gera IBAN completo → output redacta, regras flagam, safe rewrite. Não-cliente tenta passar-se → 2/3 + pergunta falham → rejeitado.
💭 8. "If you had 2 more weeks, what would you improve?"
They assess: Vision.
A: Observabilidade (métricas, tracing), fallbacks explícitos, testes de regressão com golden conversations, ADRs. Fundações, não features.
17. Futuro e Evolução
Q: Prioridade a seguir?
A: Redis for sessions (blocker). Then: metrics and alerts.
Q: Roadmap 3–6 meses?
A: Phase 1: Redis, rate limit, auth, metrics, retry. Phase 2: Workers, queue, multiple instances. Phase 3: A/B prompts, fine-tuning extraction, feedback loop.
Q: Escalar para 10.000 utilizadores?
A: Stateless API, workers (Celery), Redis cache, load balancer, rate limit per user, lighter model.
Q: O que aprendeste a fazer diferente?
A: Rules first in guardrails. Test without LLM. Prompts as code. ADRs early.
Good luck. You're prepared.
Zona de prática
Sem perguntas. Clica em Editar para adicionar.