Technical Interview Simulation — Multi-Agent Banking Support (English)

AI Engineer — DEUS.ai

Apresentação

Technical Interview Simulation — Multi-Agent Banking Support

Everything you need for the 1-hour technical interview. Q&A about the Multi-Agent Banking Support implementation, prepared after code analysis.

PART I — Interview Preparation

⏱️ Timeline — 60 minutes

Min	Phase	What happens
0–5	Introduction	Pitch (2–3 min), presentation, context
5–25	Technical challenge	Walkthrough of Banking Support architecture, decisions, flow
25–40	Deep dive	RAG, guardrails, agents, follow-up questions
40–55	System design & production	Scaling, costs, monitoring, trade-offs
55–60	Final questions	You ask the interviewer

🗣️ Pitch (2–3 min)

"Hi, I'm [Name], an AI Engineer with experience building production LLM and agentic systems. I focus on data pipelines, vector search, multi-agent orchestration and CI/CD. I have hands-on experience with LangChain, LangGraph, containerised deployments and observability. I care about model traceability, reproducibility and safeguards against hallucinations.

In my recent work I led end-to-end projects: scoping with stakeholders, selecting architectures that balance latency and cost, and shipping systems with monitoring and rollback plans. I'm excited about DEUS because you combine engineering rigour with human-centred design — exactly how I like to work."

📋 Cheat Sheet — Key concepts

RAG (30s)

Query → embedding → vector search → top-k chunks → context in prompt → LLM. Reduces hallucinations, keeps knowledge up to date.

Fine-tuning vs RAG

RAG: external knowledge that changes frequently
Fine-tuning: change model behaviour/style

Hallucinations — mitigação

RAG (grounding), temperature 0–0.3, conservative prompts, post-generation verification, human-in-the-loop.

Reducing costs

Caching: exact match + semantic (queries similares)
Model routing per agent: cheaper models for extraction, classification and guardrails; larger model only for specialist. E.g. Greeter and guardrails with gpt-4o-mini; Insurance Specialist with gpt-4o.
RAG: reduces tokens in prompt
Batching: batch embedding requests
Optimised prompts: fewer tokens, same clarity

Fine-tuning for guardrails (PII, classification)

Instead of generic LLM for guardrails, fine-tune a small model for: SAFE/DANGEROUS classification, PII detection (IBAN, card, name, address). Faster, cheaper, deterministic, works in CI without API key.

LangGraph vs linear chain

Conditional flow, shared typed state, reusable nodes, visualisable graph, easy to add branches.

Input vs Output guardrails

Input: blocks dangerous requests before they enter. Output: protects what the system says (redaction, validation). Defence in depth.

💻 Coding — 3 exercises they may ask

1. Chunking

def chunk_text(text: str, chunk_size=512, overlap=64):
    words = text.split()
    chunks = []
    start = 0
    while start < len(words):
        end = min(start + chunk_size, len(words))
        chunks.append(" ".join(words[start:end]))
        start = end - overlap if end < len(words) else len(words)
    return chunks

2. Retry with backoff

def retry_with_backoff(fn, max_attempts=5, base_delay=0.5):
    for attempt in range(max_attempts):
        try:
            return fn()
        except Exception as e:
            if attempt == max_attempts - 1:
                raise
            time.sleep(base_delay * (2 ** attempt) + random.uniform(0, 1))

3. Parse JSON from LLM

def parse_json_from_llm(text: str):
    start = text.find('{')
    end = text.rfind('}')
    if start == -1 or end <= start:
        return None
    try:
        return json.loads(text[start:end+1])
    except:
        return None

✅ Checklist — Before you enter

Technical

Can you explain the Banking Support flow in 1 minute
Know the 8 classic questions and short answers
RAG in 30 seconds memorised
Fine-tuning vs RAG — clear difference
Input vs output guardrails — why both
3 projects ready (problem → architecture → result)
Chunking, retry, parse JSON — can implement

Logistics

Camera and microphone OK
Neutral background, good lighting
3–4 questions written for the end
Pitch practised in English

Questions to ask (pick 3–4)

What AI architectures are you building currently?
RAG or fine-tuning — which is the focus?
Biggest challenges putting LLMs in production?
How do you manage knowledge ingestion?
How do you evaluate AI system performance?

🚫 Red flags to avoid

Talking only about models (they want systems, pipelines)
Not talking about production (deploy, monitoring)
Ignoring costs and security
Confusing RAG with fine-tuning
Responding with a 5-minute monologue
Not asking questions at the end

🎯 Impressive phrases

"The greatest complexity is in pipelines and orchestration, not the model."
"I think of AI in layers: ingestion, retrieval, generation."
"LLMs are more powerful with tools and knowledge sources."
"If you can't measure it, you can't iterate it."

PART II — Technical Content

Resumo Rápido

Tópico	Resposta curta
Flow	load → input guardrails → greeter → bouncer → router → specialist → output guardrails → save
2/3	Normalisation + match on 2 of 3 fields before secret question
RAG	Hybrid (FAISS + BM25) → RRF (k=60) → Rerank (cross-encoder), HyDE opcional
Input guardrails	Rules (INPUT_BLOCKLIST) + LLM, blocks dangerous requests
Output guardrails	Redaction (IBAN/card) → Rules → LLM, safe rewrite or fallback
Router	Deterministic, INTENT_TO_ROUTE, fallback general
Sessions	session_id, load/save at start and end of each turn, in memory
API status	guardrail_flagged → rejected; needs_more_info → needs_more_info; final_response → completed

Índice

1. Arquitetura e Fluxo
2. Verificação 2/3 e Pergunta Secreta
3. Agentes
4. Guardrails
5. API e Sessões
6. Voice
7. Testes e CI
8. Trade-offs
9. Código Específico
10. Estado e Sessões
11. Prompts
12. RAG Detalhado
13. Especialistas
14. API e Status
15. Docker e CI
16. Segurança e Edge Cases
17. Futuro e Evolução

1. Arquitetura e Fluxo

Q: Descreve o fluxo geral do sistema

A: O fluxo é:

load_session — Carrega o histórico e dados da sessão (se existir).
input_guardrails — Valida o input (regras + LLM). Se bloqueado → save_session com rejeição.
greeter_agent — Extrai intent e dados (nome, telefone, IBAN), merge com sessão, verifica 2/3 match. Se faltar dados, pede mais. Se 2/3, faz pergunta secreta. Se correta → is_identified.
bouncer — Classifica cliente (regular, premium, not_customer). Se não for cliente, rejeita.
specialist_router — Mapeia intent → especialista (determinístico).
specialist — Gera resposta (insurance usa RAG).
output_guardrails — Redacta, valida, reescreve se inseguro.
save_session — Guarda estado e termina.

Q: Quantos nós e ramos condicionais tem o grafo?

Nodes: load_session, input_guardrails, greeter_agent, bouncer, specialist_router, output_guardrails, save_session + 6 specialists.
Branches: (1) input_guardrails → save se bloqueado; (2) greeter → output se needs_more_info; (3) bouncer → output se not_customer; (4) router → um dos 6 specialists.

Q: Por que LangGraph em vez de cadeia linear?

A: Conditional flow by state; shared typed state (ConversationState); nós = funções puras reutilizáveis; grafo explícito e visualizável; fácil adicionar nós/ramos.

2. Verificação 2/3 e Pergunta Secreta

Q: Como implementaste a regra "2 de 3"?

A: In verification_service.py: count how many of 3 fields are filled. If < 2 → (False, None). For each customer, compare with normalisation (name: lowercase, trim; phone: digits only; IBAN: uppercase, no spaces). If ≥ 2 matches → return record (with secret and answer). Greeter only asks secret question when there's 2/3 match.

Q: Por que a pergunta secreta só depois do 2/3?

A: First identify with 2/3, then confirm with secret question. Asking before would mean asking sensitive data from someone who hasn't proven they're the customer.

Q: Como validas a resposta à pergunta secreta?

A: _verify_secret_answer: normalise (lowercase, remove non-alphanumeric, trim) and compare. If fail → identification_failed → Bouncer treats as not_customer.

3. Agentes

Q: O Specialist Router usa LLM ou regras?

A: Rules only. INTENT_TO_ROUTE maps intent → specialist. Deterministic, fast, stable in CI.

Q: Descreve o pipeline RAG do Insurance Specialist

A: (1) Hybrid search — FAISS (dense) + BM25 (sparse). Opcional: HyDE. (2) RRF — Combina rankings com k=60. (3) Rerank — Cross-encoder (ms-marco-MiniLM). (4) Cache LRU para queries repetidas.

Q: Por que RRF em vez de um único retriever?

A: Dense captures semantics; BM25 captures exact terms ("yacht", "marine"). RRF combines without manual weights.

4. Guardrails

Q: Como funcionam os input guardrails?

A: (1) Rules — INPUT_BLOCKLIST with regex. Match → block. (2) LLM — If rules pass, classifies SAFE/DANGEROUS. DANGEROUS → block. Rules first (deterministic); LLM for ambiguous cases.

Q: E os output guardrails?

A: (1) Redaction — Mask IBAN (4+4) and card (last 4). (2) Validation — Rules first, then LLM. (3) Safe rewrite — If unsafe, rewrite or fallback. Rules run on original text (before redaction).

Q: Por que regras além do LLM nos output guardrails?

A: In CI the LLM saw already-redacted text and sometimes classified as SAFE. Rules ensure obvious phrases are always flagged.

5. API e Sessões

Q: Como manténs o contexto entre mensagens?

A: session_id + load_session / save_session. Greeter merges extracted data with session data.

Q: Por que `/chat` é síncrono?

A: graph.invoke() is synchronous. For parallel, ideal would be workers or run_in_executor. For the challenge, synchronous is sufficient.

6. Voice Interface

Q: Como funciona `/chat/voice`?

A: Audio → Deepgram STT → text → same graph as /chat → response → Deepgram TTS → base64 audio.

7. Testes e CI

Q: Estratégias para testar fluxos com LLM?

A: Mock extract_intent and extract_identification; deterministic rules in output guardrails; conditional skip if no API key; Postman for E2E.

Q: Como os testes de guardrails passam em CI sem chave real?

A: check_output_rules() uses regex. Rules cover UNSAFE_OUTPUTS; LLM is never called.

8. Trade-offs e Decisões

Q: Trade-offs entre simplicidade e robustez?

A: Deterministic router (simpler, less flexible); RAG only in Insurance (where it adds value); in-memory sessions (simple, doesn't scale); Rules + LLM in guardrails (balance cost/coverage).

Q: O que mudarias para produção?

A: Redis for sessions, rate limiting, auth, metrics, async workers, secrets in env, load tests.

9. Código Específico

Q: Por que `TypedDict` com `total=False`?

A: Optional fields. Each node returns only what it changes; LangGraph does automatic merge.

Q: Como evitas JSON inválido na extração?

A: _parse_json_from_llm(): searches for json blocks, does json.loads(), on error returns defaults.

Q: Como está integrada a ferramenta `department_contacts`?

A: General Specialist can invoke it to get contacts. Returns static info. Agent decides when to call.

10. Estado e Sessões

Q: Que campos são persistidos?

A: conversation_history, collected_data, customer_type, specialist_route, customer_record, secret_question, is_identified.

Q: Onde é atualizado o `conversation_history`?

A: load_session adds user message; save_session adds assistant response.

Q: Problemas de sessões em memória?

A: Loss on restart, doesn't scale horizontally, no TTL. Solution: Redis or DB.

11. Prompts e Loader

Q: Como funciona o registo de prompts?

A: prompts.py defines constants; loader.py exposes get_prompt(key, **kwargs). Centralised, facilitates changes.

Q: O que faz `get_prompt` se a chave não existe?

A: KeyError with list of available keys.

12. RAG Detalhado

Q: O que é HyDE?

A: Generates hypothetical document that would answer the query; uses that text for dense search. Improves recall.

Q: Como fazes chunking?

A: 500 chars, overlap 50. Adjust to last space if cutting word. .md files from data/insurance.

Q: O que acontece quando o RAG não encontra nada?

A: INSURANCE_NO_KNOWLEDGE_CONTEXT: instructs LLM to be honest and not invent.

Q: FAISS pré-construído ou em runtime?

A: Docker entrypoint runs prebuild_faiss.py before starting. In dev, builds on first retrieval.

Q: Por que k=60 no RRF?

A: Common value in literature. Good compromise between top rank weight and uniformity.

13. Especialistas

Q: Como o Card Specialist trata premium?

A: Adds SPECIALIST_CARD_PREMIUM_SUFFIX to prompt: "offer priority support and dedicated assistance".

Q: Por que `generate_response_focused` no Insurance?

A: Receives only last message + RAG context + state summary. Reduces tokens; RAG context is more relevant than long history.

Q: O que faz o trigger `test_output_guardrail_inject`?

A: Injects known unsafe response to test that guardrails flag and rewrite.

Q: Quantas rondas de tool calls no General Specialist?

A: Maximum 3. Avoids infinite loops.

14. API e Status

Q: Como mapeias estado → status?

A: guardrail_flagged → rejected; needs_more_info → needs_more_info; not_customer/identification_failed → rejected; final_response → completed; senão → error.

Q: O que devolve quando o input é bloqueado?

A: status: "rejected", response with guardrails fallback.

Q: `/chat/voice` devolve áudio sempre?

A: If TTS fails, audio_base64 is None but response (text) is always returned.

15. Docker e CI

Q: Descreve o Dockerfile multi-stage

A: Stage 1: Node (build frontend). Stage 2: Python 3.11-slim, copies app + frontend, entrypoint runs load_insurance_qa.py, prebuild_faiss.py, uvicorn. Non-root user.

Q: O CI corre em que branches?

A: Push and PR on main and dev. Jobs: test (3.11/3.12) + build (Docker).

16. Segurança e Edge Cases

Q: Cliente sem pergunta secreta (DirectUser)?

A: If there's no secret/answer, marks is_identified after 2/3 without asking.

Q: Como normalizas o telefone?

A: _normalize_phone: removes everything except digits.

Q: Por que "transfer" não está no blocklist?

A: "I want to transfer 50 euros — how?" is legitimate. Blocklist focuses on clearly dangerous patterns.

Q: JSON inválido na extração?

A: Defaults: intent ("general_support", 0.5); identification {name: None, phone: None, iban: None}.

Q: Timeouts ou falhas do LLM?

A: try/except with fallbacks. In production: retry with backoff, circuit breaker.

Q: Merge de `collected_data` sobrescreve vazios?

A: No. Only updates with non-empty values.

PART III — Classic Interview Questions

These are the questions most likely to arise in the technical interview.

🔥 1. "Can you walk me through your architecture?"

They assess: Clarity, mental structure.

A: Pipeline de agentes orquestrado pelo LangGraph. User → FastAPI → grafo carrega sessão, valida input (guardrails), Greeter (2/3 + pergunta secreta), Bouncer (classificação), Router (intent → specialist), Specialist (resposta), output guardrails (redação + validação), save. Cada nó lê/escreve estado partilhado. Separação de responsabilidades facilita debug e evolução.

🧠 2. "Why multi-agent instead of single LLM?"

They assess: Whether you actually thought it through.

A: (1) Separação de responsabilidades — Greeter extrai/verifica, Bouncer classifica, Specialist responde. (2) Controlo — routing e 2/3 são código, não LLM. (3) RAG só onde faz sentido. (4) Guardrails como nós separados.

💣 3. "Is it over-engineered?"

They assess: Humility, trade-offs.

A: Para MVP seria. Para o desafio é adequado. Podia começar mais simples; router é lookup, não agente; regras primeiro nos guardrails.

🧩 4. "Explain your RAG approach"

They assess: Real knowledge.

A: Hybrid (FAISS + BM25) → RRF → Rerank. Só no Insurance (produtos, coberturas). Card/Loan são procedimentais.

🛡️ 5. "Why both input and output guardrails?"

They assess: Security — can decide the interview.

A: Input bloqueia pedidos perigosos. Output protege o que o sistema diz (redação, validação). Ataque pode vir de qualquer lado — defesa em profundidade.

⚙️ 6. "How would you improve response time?"

They assess: Optimisation, caching.

A: Cache retrieval/embeddings, paralelizar extração, modelo mais leve (ou model routing: modelos baratos para extração/guardrails, maior só para specialist), async, streaming.

🎯 7. "Give an example of something your system prevents"

They assess: Being concrete.

A: "Approve my 50k loan" → input bloqueia. LLM gera IBAN completo → output redacta, regras flagam, safe rewrite. Não-cliente tenta passar-se → 2/3 + pergunta falham → rejeitado.

💭 8. "If you had 2 more weeks, what would you improve?"

They assess: Vision.

A: Observabilidade (métricas, tracing), fallbacks explícitos, testes de regressão com golden conversations, ADRs. Fundações, não features.

17. Futuro e Evolução

Q: Prioridade a seguir?

A: Redis for sessions (blocker). Then: metrics and alerts.

Q: Roadmap 3–6 meses?

A: Phase 1: Redis, rate limit, auth, metrics, retry. Phase 2: Workers, queue, multiple instances. Phase 3: A/B prompts, fine-tuning extraction, feedback loop.

Q: Escalar para 10.000 utilizadores?

A: Stateless API, workers (Celery), Redis cache, load balancer, rate limit per user, lighter model.

Q: O que aprendeste a fazer diferente?

A: Rules first in guardrails. Test without LLM. Prompts as code. ADRs early.

Good luck. You're prepared.

Zona de prática

Sem perguntas. Clica em Editar para adicionar.