A full-stack Customer Identity and Access Management (CIAM) / SSO platform built from first principles, built to understand every layer of identity infrastructure from cryptographic token issuance to multi-tenant org management.
Type
Solo · Full-stack
Stack
Next.js 15 · Node.js
Auth standard
OAuth 2.0 · OIDC
Live demo
vaultly.vercel.app →Source
GitHub →Most engineers integrate authentication via a third-party provider and never look inside. I wanted to understand what actually happens: how tokens are issued, how rotation works, how you prevent token theft, how multi-tenant isolation holds at every layer. The only way to learn this is to build it.
Vaultly is that build. It implements the full auth server from scratch: cryptographic key management, JWT issuance, OAuth 2.0 PKCE, TOTP MFA, refresh token families with reuse detection, and multi-tenant organisations with RBAC. Every decision is intentional and documented through the system.
The system is split into two services communicating server-to-server. The browser never directly touches the auth server. All calls are proxied through the Next.js BFF, keeping the auth server URL and credentials out of the client entirely.
System Overview
Why a separate auth server? Decoupling auth from the Next.js app means the auth server can serve multiple clients (web, mobile, other services) without duplication. The JWKS endpoint lets any service verify tokens independently.
The auth server owns an RSA key pair. Private key signs all JWTs; public key is served at /jwks.json. Any service can verify tokens without ever holding a shared secret. Just fetch the public key and verify the signature locally.
// JWT payload structure
{
"sub": "user-uuid",
"email": "user@example.com",
"role": "admin", // RBAC role
"org_id": "org-uuid", // tenant scope
"org_role": "member", // org-level role
"token_version": 3, // for invalidation
"iat": 1718000000,
"exp": 1718000900 // 15-min access token
}
// JWKS endpoint response
GET /jwks.json
{
"keys": [{
"kty": "RSA",
"use": "sig",
"alg": "RS256",
"kid": "2025-06",
"n": "...", // public modulus
"e": "AQAB"
}]
}Why RS256 over HS256? With HS256, every verifying service needs the shared secret, which can leak. RS256 lets you publish the public key openly. Compromise of any downstream service cannot be used to forge tokens.
Social login (Google, GitHub) uses the Authorization Code flow with PKCE (Proof Key for Code Exchange). PKCE prevents authorization code interception attacks, which is critical in public clients where a client secret can't be kept confidential.
Stored in server-side session, never exposed to browser
Browser receives an opaque session, never the raw JWT
Social users get identical JWTs. Whether you signed in with email/password or GitHub, you receive the same RS256 JWT. Downstream services don't need to know how you authenticated.
Access tokens are short-lived (15 min). Refresh tokens are longer-lived (7 days) but rotate on every use. The critical security property: if a stolen refresh token is replayed, the system detects it and revokes the entire token family, forcing re-authentication.
Token Family: Normal Flow
Reuse Attack: RT-1 Replayed After RT-2 Issued
DETECTED: RT-1 already used
Why this works
Each refresh token has a family_id. The DB stores which token is the current valid one in each family. Any use of a non-current token in a family is definitionally a replay.
-- Token family schema CREATE TABLE refresh_tokens ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), family_id UUID NOT NULL, -- links the rotation chain user_id UUID NOT NULL, token_hash TEXT NOT NULL, -- bcrypt hash, never raw is_current BOOLEAN DEFAULT true, -- only one true per family used_at TIMESTAMPTZ, expires_at TIMESTAMPTZ NOT NULL, created_at TIMESTAMPTZ DEFAULT now() ); -- On refresh: set old token is_current=false, insert new token -- On replay detection: DELETE WHERE family_id = $1
MFA is implemented via Time-based One-Time Passwords using otplib. The flow follows the standard provisioning pattern used by Google Authenticator, Authy, and 1Password.
otplib.authenticator.generateSecret()
Accepts current + ±1 window (30s each) for clock skew
Request Flow
The auth server URL is an environment variable. It never appears in any client bundle. If the BFF is compromised, an attacker gets the session cookie, but the auth server remains unexposed to the public internet.
iron-session for encrypted HTTP-only cookies. Session data is AES-256-GCM encrypted with a server-side password. The browser cannot read it, cannot modify it, and cannot forge it without the server secret.
Next.js Server Components run on the server but cannot write cookies — only Route Handlers can. This creates a problem: when a Server Component detects a stale access token, how does it trigger a refresh?
// The solution: Server Component detects stale token,
// redirects through a Route Handler that performs the refresh
// app/layout.tsx (Server Component)
const session = await getIronSession(cookies(), sessionOptions);
if (isTokenExpired(session.accessToken)) {
redirect('/api/auth/silent-refresh'); // → Route Handler
}
// app/api/auth/silent-refresh/route.ts (Route Handler)
export async function GET(req: Request) {
const session = await getIronSession(cookies(), sessionOptions);
const newTokens = await refreshAccessToken(session.refreshToken);
session.accessToken = newTokens.accessToken;
session.refreshToken = newTokens.refreshToken;
await session.save(); // ← can write cookies here
return redirect(req.headers.get('referer') || '/');
}For active sessions, a client-side component schedules a refresh 60 seconds before expiry without interrupting the user:
// components/TokenRefresher.tsx
'use client';
export function TokenRefresher({ expiresAt }: { expiresAt: number }) {
useEffect(() => {
const msUntilRefresh = expiresAt * 1000 - Date.now() - 60_000;
if (msUntilRefresh <= 0) return;
const timer = setTimeout(async () => {
await fetch('/api/auth/silent-refresh');
router.refresh(); // revalidate all Server Components
}, msUntilRefresh);
return () => clearTimeout(timer);
}, [expiresAt]);
return null;
}-- Core tables (simplified)
CREATE TABLE users (
id UUID PRIMARY KEY,
email TEXT UNIQUE NOT NULL,
password_hash TEXT, -- null for OAuth users
mfa_secret TEXT, -- encrypted
mfa_enabled BOOLEAN DEFAULT false,
token_version INT DEFAULT 0, -- invalidation counter
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE organisations (
id UUID PRIMARY KEY,
name TEXT NOT NULL,
slug TEXT UNIQUE NOT NULL,
owner_id UUID REFERENCES users(id),
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE TABLE org_members (
org_id UUID REFERENCES organisations(id),
user_id UUID REFERENCES users(id),
role TEXT CHECK (role IN ('admin','member','viewer')),
joined_at TIMESTAMPTZ DEFAULT now(),
PRIMARY KEY (org_id, user_id)
);
CREATE TABLE org_invitations (
id UUID PRIMARY KEY,
org_id UUID REFERENCES organisations(id),
email TEXT NOT NULL,
role TEXT NOT NULL,
token TEXT UNIQUE NOT NULL, -- signed invite token
expires_at TIMESTAMPTZ NOT NULL, -- 48hr window
accepted_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT now()
);Role is included in the JWT payload (org_role) and enforced via Express middleware on every protected route; no additional DB lookup needed for permission checks on hot paths.
When a user switches organisations, the auth server issues a new JWT scoped to the target org. The BFF updates the session and calls router.refresh() to revalidate all Server Components with the new org context. Stale tokens are detected server-side and silently reissued before the component renders.
Every meaningful event is recorded to PostgreSQL with enough context to reconstruct what happened, who did it, from where, and when.
CREATE TABLE audit_log ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID REFERENCES users(id), org_id UUID REFERENCES organisations(id), event TEXT NOT NULL, -- 'login.success', 'mfa.enabled', etc. metadata JSONB, -- event-specific details ip_address INET, user_agent TEXT, created_at TIMESTAMPTZ DEFAULT now() ); -- Indexed for fast per-user and per-org queries CREATE INDEX audit_log_user_id_idx ON audit_log(user_id, created_at DESC); CREATE INDEX audit_log_org_id_idx ON audit_log(org_id, created_at DESC); -- Events captured: -- login.success / login.failure / login.mfa_required -- mfa.enabled / mfa.disabled / mfa.challenge_failed -- token.refreshed / token.family_revoked (reuse detected) -- org.created / org.member_invited / org.member_removed -- org.role_changed / org.invitation_accepted
Drizzle is closer to raw SQL: schema definitions are TypeScript, queries compose like SQL. For a system where understanding exactly what hits the database matters, this beats Prisma's magic. Query performance is fully predictable.
NextAuth abstracts away exactly the things I needed to understand: how the session is stored, what's in the cookie, how expiry works. iron-session gives encrypted HTTP-only cookies with zero magic. Every byte in the cookie is mine to control.
Auth server and Next.js BFF share TypeScript types (request/response shapes, JWT payload). A monorepo with a shared `packages/types` package means a change to the JWT payload is a compile error in both services immediately.
RFC 6238 compliant, actively maintained, and the verify function handles the ±1 window (clock skew) correctly. Building TOTP from scratch would be a security anti-pattern.
Email deliverability is an operational problem, not an engineering one. Resend provides a clean API, good developer experience, and handles SPF/DKIM. Invitation tokens are signed server-side; Resend only delivers the link.