Apresentação
Modelo de Dados — studyAI
Schema da base de dados PostgreSQL e relações entre entidades.
Índice
- Diagrama ER
- Entidades
- Vector DB (chunks)
- Entidades extendidas (Learning Loop, User Intelligence)
- Índices e constraints
- Migrations
Diagrama ER
A carregar diagrama…
Entidades
users
| Coluna | Tipo | Constraints |
|---|---|---|
| id | UUID | PK, default gen_random_uuid() |
| VARCHAR(255) | NOT NULL, UNIQUE | |
| password_hash | VARCHAR(255) | NOT NULL |
| name | VARCHAR(255) | |
| created_at | TIMESTAMPTZ | NOT NULL, default now() |
| updated_at | TIMESTAMPTZ | NOT NULL, default now() |
projects
| Coluna | Tipo | Constraints |
|---|---|---|
| id | UUID | PK, default gen_random_uuid() |
| user_id | UUID | FK users(id) ON DELETE CASCADE |
| title | VARCHAR(500) | NOT NULL |
| created_at | TIMESTAMPTZ | NOT NULL, default now() |
| updated_at | TIMESTAMPTZ | NOT NULL, default now() |
files
| Coluna | Tipo | Constraints |
|---|---|---|
| id | UUID | PK, default gen_random_uuid() |
| project_id | UUID | FK projects(id) ON DELETE CASCADE |
| filename | VARCHAR(500) | NOT NULL |
| mime_type | VARCHAR(100) | NOT NULL |
| storage_path | VARCHAR(1000) | NOT NULL |
| size | INTEGER | NOT NULL |
| status | VARCHAR(20) | NOT NULL, default 'pending' |
| retry_count | INTEGER | default 0 |
| last_error | TEXT | Última mensagem de erro |
| created_at | TIMESTAMPTZ | NOT NULL, default now() |
| updated_at | TIMESTAMPTZ | NOT NULL, default now() |
status: pending | processing | processed | error
Retry: Se status=error, job de ingestão retry com backoff (1min, 5min, 15min). Max retry_count=5.
documents
| Coluna | Tipo | Constraints |
|---|---|---|
| id | UUID | PK, default gen_random_uuid() |
| project_id | UUID | FK projects(id) ON DELETE CASCADE |
| title | VARCHAR(500) | NOT NULL |
| content | TEXT | |
| structure | JSONB | |
| status | VARCHAR(20) | NOT NULL, default 'draft' |
| created_at | TIMESTAMPTZ | NOT NULL, default now() |
| updated_at | TIMESTAMPTZ | NOT NULL, default now() |
status: draft | generating | ready | error
structure (exemplo):
{
"sections": [
{ "title": "Introdução", "subsections": [] },
{ "title": "Técnicas", "subsections": ["Polipectomia", "Mucosectomia"] }
]
}
questions
| Coluna | Tipo | Constraints |
|---|---|---|
| id | UUID | PK, default gen_random_uuid() |
| document_id | UUID | FK documents(id) ON DELETE CASCADE |
| question | TEXT | NOT NULL |
| solution | TEXT | NOT NULL |
| spoken | TEXT | |
| type | VARCHAR(50) | default 'recall' |
| difficulty | VARCHAR(20) | default 'medium' |
| order | INTEGER | default 0 |
| created_at | TIMESTAMPTZ | NOT NULL, default now() |
type: recall | application | comparison | other
difficulty: easy | medium | hard — para adaptação ao user_level
user_progress
| Coluna | Tipo | Constraints |
|---|---|---|
| id | UUID | PK, default gen_random_uuid() |
| user_id | UUID | FK users(id) ON DELETE CASCADE |
| question_id | UUID | FK questions(id) ON DELETE CASCADE |
| user_answer | TEXT | NOT NULL |
| score | INTEGER | |
| feedback | TEXT | |
| corrections | JSONB | |
| confidence_score | FLOAT | 0–1, agregado de scores |
| last_reviewed_at | TIMESTAMPTZ | Última vez que respondeu |
| next_review_at | TIMESTAMPTZ | Quando rever (spaced repetition) |
| repetition_count | INTEGER | default 0 |
| ease_factor | FLOAT | SM-2: fator de facilidade |
| interval_days | INTEGER | SM-2: intervalo em dias até próxima revisão |
| status | VARCHAR(20) | default 'new' |
| created_at | TIMESTAMPTZ | NOT NULL, default now() |
UNIQUE(user_id, question_id): Uma entrada por user por pergunta (última submissão).
status (Learning Loop): new | learning | review | mastered
refresh_tokens (auth)
| Coluna | Tipo | Constraints |
|---|---|---|
| id | UUID | PK |
| user_id | UUID | FK users(id) ON DELETE CASCADE |
| token_hash | VARCHAR(255) | NOT NULL |
| expires_at | TIMESTAMPTZ | NOT NULL |
| created_at | TIMESTAMPTZ | NOT NULL |
Vector DB (chunks)
Com pgvector, os chunks podem ficar na mesma base:
chunks
| Coluna | Tipo | Constraints |
|---|---|---|
| id | UUID | PK |
| project_id | UUID | FK projects(id) ON DELETE CASCADE |
| source_type | VARCHAR(20) | NOT NULL |
| source_id | UUID | NOT NULL |
| content | TEXT | NOT NULL |
| embedding | VECTOR(1536) | NOT NULL |
| content_tsv | TSVECTOR | Para BM25 / full-text search |
| metadata | JSONB | |
| created_at | TIMESTAMPTZ | NOT NULL |
source_type: file | document
content_tsv: to_tsvector('portuguese', content) — índice lexical para hybrid search (BM25).
metadata (exemplo):
{
"section": "Técnicas",
"subsection": "Polipectomia",
"page": 3,
"char_start": 0,
"char_end": 500
}
metadata: Adicionar embedding_model_version para tracking de versões.
Índices:
- HNSW ou IVFFlat em
embedding(similarity search) - GIN em
content_tsv(full-text / BM25)
Entidades extendidas
document_versions (versioning)
| Coluna | Tipo | Constraints |
|---|---|---|
| id | UUID | PK |
| document_id | UUID | FK documents(id) ON DELETE CASCADE |
| content | TEXT | NOT NULL |
| structure | JSONB | |
| version | INTEGER | NOT NULL |
| created_at | TIMESTAMPTZ | NOT NULL |
Uso: Histórico de alterações, comparação de evolução, rollback.
user_profile (User Intelligence)
| Coluna | Tipo | Constraints |
|---|---|---|
| id | UUID | PK |
| user_id | UUID | FK users(id) ON DELETE CASCADE, UNIQUE |
| level | VARCHAR(20) | default 'beginner' |
| weak_topics | JSONB | ["topic1", "topic2"] |
| recent_errors | JSONB | [{question_id, score, topic}] |
| preferred_style | VARCHAR(20) | default 'detailed' |
| created_at | TIMESTAMPTZ | NOT NULL |
| updated_at | TIMESTAMPTZ | NOT NULL |
level: beginner | intermediate | advanced
preferred_style: concise | detailed | technical
Ver USER_INTELLIGENCE.md.
conversation_memory (memória de sessão)
| Coluna | Tipo | Constraints |
|---|---|---|
| id | UUID | PK |
| user_id | UUID | FK users(id) ON DELETE CASCADE |
| project_id | UUID | FK projects(id) ON DELETE CASCADE |
| summary | TEXT | Resumo da conversa/sessão |
| embedding | VECTOR(1536) | Embedding do summary (para retrieval) |
| message_count | INTEGER | Nº de mensagens na sessão |
| last_activity_at | TIMESTAMPTZ | NOT NULL |
| created_at | TIMESTAMPTZ | NOT NULL |
| updated_at | TIMESTAMPTZ | NOT NULL |
Uso: "Continua o que estavas a dizer", personalização, contexto entre sessões.
UNIQUE(user_id, project_id): Uma memória por user por projeto (atualizada a cada N mensagens).
Ver USER_INTELLIGENCE.md.
chat_sessions
| Coluna | Tipo | Constraints |
|---|---|---|
| id | UUID | PK |
| project_id | UUID | FK projects(id) ON DELETE CASCADE |
| user_id | UUID | FK users(id) ON DELETE CASCADE |
| title | VARCHAR(255) | Título (ex: primeiras N chars da 1ª msg) |
| created_at | TIMESTAMPTZ | NOT NULL |
| updated_at | TIMESTAMPTZ | NOT NULL |
Uso: Persistir conversas para histórico, reabertura, analytics.
chat_messages
| Coluna | Tipo | Constraints |
|---|---|---|
| id | UUID | PK |
| session_id | UUID | FK chat_sessions(id) ON DELETE CASCADE |
| role | VARCHAR(20) | user | assistant |
| content | TEXT | NOT NULL |
| intent | VARCHAR(50) | Intent do Router (se assistant) |
| trace_id | VARCHAR(255) | LangSmith run ID |
| created_at | TIMESTAMPTZ | NOT NULL |
message_citations
| Coluna | Tipo | Constraints |
|---|---|---|
| id | UUID | PK |
| message_id | UUID | FK chat_messages(id) ON DELETE CASCADE |
| source | VARCHAR(500) | filename ou doc title |
| chunk_id | UUID | FK chunks (opcional) |
| excerpt | TEXT | Trecho citado |
| created_at | TIMESTAMPTZ | NOT NULL |
Uso: Citações por resposta, groundedness na UI.
jobs (tracking de jobs assíncronos)
| Coluna | Tipo | Constraints |
|---|---|---|
| id | UUID | PK |
| project_id | UUID | FK projects(id) |
| type | VARCHAR(50) | NOT NULL |
| status | VARCHAR(20) | NOT NULL |
| progress | INTEGER | 0–100 |
| message | TEXT | Mensagem humana ("A escrever secção 2...") |
| payload | JSONB | Input (file_id, topic, etc.) |
| result | JSONB | Output (document_id, etc.) |
| error | TEXT | Se failed |
| retry_count | INTEGER | default 0 |
| started_at | TIMESTAMPTZ | Quando começou |
| completed_at | TIMESTAMPTZ | Quando terminou |
| created_at | TIMESTAMPTZ | NOT NULL |
| updated_at | TIMESTAMPTZ | NOT NULL |
type: ingest_file | reindex_project | generate_content
status: pending | running | completed | failed
Ver JOBS_ASYNC.md.
response_feedback (feedback loop do user)
| Coluna | Tipo | Constraints |
|---|---|---|
| id | UUID | PK |
| user_id | UUID | FK users(id) |
| source_type | VARCHAR(20) | chat |
| source_id | UUID | chat_message_id ou user_progress_id |
| helpful | BOOLEAN | thumbs up/down |
| comment | TEXT | opcional |
| created_at | TIMESTAMPTZ | NOT NULL |
Uso: Melhorar prompts, detetar respostas problemáticas. source_id referencia chat_messages.id ou user_progress.id.
rag_evaluations (avaliação do RAG)
| Coluna | Tipo | Constraints |
|---|---|---|
| id | UUID | PK |
| project_id | UUID | FK projects(id) |
| query | TEXT | NOT NULL |
| response | TEXT | NOT NULL |
| grounded | BOOLEAN | Resposta fundamentada nos chunks? |
| groundedness_score | FLOAT | 0–1, score contínuo |
| answer_relevance | FLOAT | 0–1, relevância da resposta à pergunta |
| retrieval_recall | FLOAT | 0–1, chunks recuperados cobrem a pergunta? |
| hallucination_risk | FLOAT | 0–1 |
| source | VARCHAR(20) | chat |
| created_at | TIMESTAMPTZ | NOT NULL |
Uso: Avaliação automática do RAG, integração com LangSmith evals.
Índices e constraints
-- users
CREATE UNIQUE INDEX idx_users_email ON users(email);
-- projects
CREATE INDEX idx_projects_user_id ON projects(user_id);
-- files
CREATE INDEX idx_files_project_id ON files(project_id);
CREATE INDEX idx_files_status ON files(status);
-- documents
CREATE INDEX idx_documents_project_id ON documents(project_id);
CREATE INDEX idx_documents_status ON documents(status);
-- questions
CREATE INDEX idx_questions_document_id ON questions(document_id);
-- user_progress
CREATE INDEX idx_user_progress_user_id ON user_progress(user_id);
CREATE INDEX idx_user_progress_question_id ON user_progress(question_id);
CREATE UNIQUE INDEX idx_user_progress_user_question ON user_progress(user_id, question_id);
-- conversation_memory
CREATE UNIQUE INDEX idx_conversation_memory_user_project ON conversation_memory(user_id, project_id);
-- chat_sessions, chat_messages
CREATE INDEX idx_chat_sessions_project_user ON chat_sessions(project_id, user_id);
CREATE INDEX idx_chat_messages_session ON chat_messages(session_id);
-- jobs
CREATE INDEX idx_jobs_project_id ON jobs(project_id);
CREATE INDEX idx_jobs_status ON jobs(status);
-- chunks (pgvector + full-text)
CREATE INDEX idx_chunks_project_id ON chunks(project_id);
CREATE INDEX idx_chunks_embedding ON chunks USING hnsw (embedding vector_cosine_ops);
CREATE INDEX idx_chunks_content_tsv ON chunks USING gin (content_tsv);
Migrations
Usar Alembic para migrations:
api/
├── alembic/
│ ├── versions/
│ │ ├── 001_initial.py
│ │ ├── 002_add_chunks.py
│ │ └── ...
│ └── env.py
└── alembic.ini
Ordem sugerida:
- users, refresh_tokens
- projects, files, documents, document_versions, questions
- user_progress, user_profile, conversation_memory
- chat_sessions, chat_messages, message_citations
- jobs, response_feedback, rag_evaluations
- chunks (com extensão pgvector)
Zona de prática
Sem perguntas. Clica em Editar para adicionar.