About arc42

arc42, the template for documentation of software and system architecture.

Template Version 8.2 EN. (based upon AsciiDoc version), January 2023

Created, maintained and © by Dr. Peter Hruschka, Dr. Gernot Starke and contributors. See https://arc42.org.


Note

This version of the template contains some help and explanations. It is used for familiarization with arc42 and the understanding of the concepts. For documentation of your own system you use better the plain version.

1. Introduction and Goals

YOVI is a web-based platform for playing Game Y — an abstract strategy board game where two players compete to connect all three sides of a triangular board with their pieces. The project is developed as part of the Software Architecture (ASW) course at the University of Oviedo, for the company Micrati.

The platform allows users to play against AI opponents with six difficulty levels, compete against other human players online via real-time multiplayer, interact with social features (friends, notifications), and track their performance through match history and statistics. An admin role provides privileged users with a management panel to oversee the platform. External bots can integrate with the system via a dedicated public interoperability API.

The system follows a microservices architecture composed of eight independent services: a React frontend (webapp), an API gateway (gateway), a JWT authentication service (authentication), a user management service (users), a Rust game engine and bot service (gamey), a real-time multiplayer service (multiplayer), a bot interoperability API (botapi), and an Nginx reverse proxy. Observability is provided by Prometheus and Grafana.

Along all the development of the project, we have focused on keeping a clear documentation of our progress in the wiki the Github project. In the different section of the arc42 documentations, link can be find to these sections of the wiki

Here the full wiki : GitHub Wiki YOVI

And all the team meetings : Meetings

🎮 Live deployment: https://yovi.13.63.89.84.sslip.io

1.1. Requirements Overview

The YOVI system is a web-based gaming platform based on Game Y, developed for the company Micrati. The main goal is to allow users to play matches against the machine or against other players online, with support for registration, social features, administration, and statistics.

Key functional requirements include:

  • Public deployment and web accessibility via HTTPS.

  • Web application in React + TypeScript supporting the classic version of Game Y.

  • Rust game engine to validate game state and compute bot moves.

  • User registration and JWT-based authentication.

  • Role-based access control: regular users and administrators.

  • Match history and game statistics with pagination.

  • Six AI difficulty levels selectable by the user. Bot Implementations wiki page

  • Real-time online multiplayer (Player vs Player) via WebSockets (Socket.IO).

  • Social features: friend requests, friend list management, and user search.

  • In-app notification system (friend requests, welcome messages).

  • User profile with editable fields (name, bio, location, preferred language).

  • Admin panel: privileged users can view all registered accounts, grant or revoke admin permissions, delete a user’s match history, and permanently delete accounts.

  • Public interoperability API for external bots using YEN notation.

  • Interoperability client mode: our bots can play against other teams' APIs.

  • Internationalization (i18n) — English and Spanish supported.

  • Monitoring with Prometheus metrics and Grafana dashboards.

  • Load testing suite with k6.

1.2. Quality Goals

The main quality attributes for the system architecture are:

Priority Quality Goal Motivation

1

Functionality

The system must correctly implement Game Y rules and provide all six working AI strategies. The multiplayer mode must accurately synchronize game state between two players in real time. The admin panel must correctly enforce role-based access so that only admin users can perform privileged operations.

2

Usability

The React-based interface must be intuitive for new players. The component-based architecture supports internationalization (EN/ES), theme switching (dark/light), and responsive layout.

3

Reliability & Availability

Using Docker and Docker Compose ensures consistent deployment. The gateway, multiplayer service, and Nginx are designed to be stateless and restartable. Docker restart policies limit downtime in case of failure.

4

Modularity & Maintainability

The eight-service architecture ensures components can be developed, tested, and extended independently. The Strategy Pattern in gamey allows new AI strategies to be added without modifying existing code.

5

Security

JWT-based authentication protects user-specific operations. bcrypt hashing is used for passwords. Role claims in the JWT payload allow the gateway and frontend to enforce admin-only routes without additional service calls. The reverse proxy limits direct exposure of internal services. The gateway sanitizes and validates all path parameters.

6

Testability

The separation of concerns allows comprehensive testing: unit tests in Rust (cargo test) and Node.js (Vitest, Jest), integration tests via supertest and tower::ServiceExt, property-based tests with proptest, and end-to-end tests via Playwright. Load tests use k6.

1.3. Stakeholders

Role / Name Contact Expectations

Development Team

Ana Pérez Bango (UO294100), Adriana García Suárez (UO300042)

Implement a scalable, maintainable, and well-documented solution that fulfills all course requirements.

End Users (Human Players)

N/A

Usable, stable interface with complete gameplay, multiplayer, social features, and statistics.

Platform Administrators

Privileged users via the admin panel

Ability to manage user accounts, grant or revoke admin permissions, delete match histories, and remove accounts from the platform.

Micrati (Client)

N/A

Fulfillment of game, API, public deployment, and interoperability requirements.

Project Evaluators

Arquisoft course staff

Compliance with documentation (Arc42 + ADRs), testing, deployment, code quality (SonarCloud), and feature completeness criteria.

External Bot Developers

API users via /interop/*

Well-documented, stable, and versioned API for automated bot integration using YEN notation.

Rival Teams (Interop)

Other ASW course teams

Stable bot interoperability API that allows bot-vs-bot matches between teams following the shared YEN contract.

2. Architecture Constraints

This section lists the main constraints that shape the architecture of the YOVI system. Unlike requirements, constraints define how it must be done or which technologies must be used. These are non-negotiable boundaries within which we must operate.

2.1. Technical Constraints

These constraints are imposed by the client (Micrati). They dictate the technology stack and communication protocols.

Constraint Explanation Rationale Negotiable?

Frontend Technology

The web application MUST be implemented in TypeScript. React is the chosen framework.

The client requires TypeScript for type safety and maintainability.

No (TypeScript mandatory)

Game Engine Language

The game logic module MUST be implemented in Rust.

Performance and memory safety requirements for game state validation and AI strategies.

No

Communication Protocol

All communication between the webapp and game engine MUST use JSON messages following YEN notation for game states.

Standardized format specified by the client for interoperability.

No

Public Interoperability API

The system MUST expose a documented public interoperability API (botapi) that allows external bots to interact using YEN notation.

Core requirement from Micrati for third-party bot integration and cross-team competitions.

No

Real-time Multiplayer

The player-vs-player mode MUST use WebSockets (via Socket.IO) for bidirectional real-time communication.

HTTP polling is insufficient for the latency requirements of a turn-based game with live state sync.

No (WebSockets mandatory; Socket.IO library negotiable)

Deployment

The complete system MUST be publicly accessible via the Web over HTTPS.

The client needs to demonstrate the working product to evaluators and end users.

No

Containerization

All services MUST be containerized using Docker with a root-level docker-compose.yml.

Ensures consistent deployment across environments.

Partial (format flexible)

Database

User data MUST be persisted. MongoDB is the current implementation.

Data persistence is mandatory; the specific database technology is negotiable.

Yes (technology choice)

Authentication

User operations MUST be secured using JWT-based authentication with bcrypt password hashing.

Ensures secure access to user-specific data and actions. Passwords must never be stored in plain text.

No

Role-based Access Control

The system MUST support at least two roles: regular user and administrator. Role information MUST be encoded in the JWT payload so the gateway and frontend can enforce access without additional service calls.

Required for the admin panel feature. Encoding roles in the token avoids an extra round-trip on every protected request.

Partial (additional roles could be added)

Load Testing

The system MUST include a load testing suite using k6 covering at minimum registration, login, and game start scenarios.

Required to validate that the system meets the concurrent user quality goals under realistic conditions.

Partial (additional scenarios encouraged)

2.2. Domain Constraints

These constraints come from the game domain itself and the specific requirements of Game Y.

Constraint Explanation Rationale Negotiable?

Game Rules

The system MUST correctly implement the rules of Game Y (classic version minimum).

Core business requirement; incorrect game rules make the product worthless.

No

Game Modes

Both player-versus-machine (PvM) and player-versus-player (PvP) modes MUST be available. The PvP mode operates in real time via WebSockets.

Primary use cases for the platform.

No

Board Size

The game MUST support variable board sizes configurable by the user (minimum size 3, no enforced maximum).

Required for different difficulty levels and game variants.

No

AI Strategies

The computer MUST implement more than one strategy, selectable by the user. Six strategies are currently implemented.

Demonstrates AI sophistication and provides varied gameplay experience.

No (minimum one strategy mandatory; more encouraged)

User Data

Users MUST be able to register and access their match history and statistics.

Basic user management requirement.

No

Admin Capabilities

Administrator users MUST be able to: view all registered accounts, grant or revoke admin permissions, delete a user’s match history, and permanently delete user accounts.

Required for platform moderation and management.

No

2.3. Organizational Constraints

These constraints govern our development process and team practices.

Constraint Explanation Rationale Negotiable?

Documentation Standard

Architecture MUST be documented following the Arc42 template (sections 1-15, including new sections for Load Testing, API Reference, and Monitoring).

Course requirement for the ASW lab.

No

Decision Recording

Architectural decisions MUST be recorded as ADRs (Architecture Decision Records) in the GitHub wiki.

Ensures traceability and rationale documentation.

No

Version Control

Development MUST use Git with the repository hosted on GitHub (Arquisoft/yovi_en2c).

Course requirement for collaboration and evaluation.

No

Branching Strategy

A branch-based workflow MUST be followed (feature branches, pull requests, code reviews).

Ensures code quality and team coordination.

Partial

Testing Requirements

The system MUST include unit tests, integration tests, end-to-end tests, and load tests.

Quality assurance requirement from course evaluation.

No

Test Coverage

Code coverage MUST meet thresholds defined in sonar-project.properties (minimum 80%).

Quality gate for acceptance.

No

CI/CD

Automated build, test, and deployment MUST be implemented via GitHub Actions. CI/CD Pipeline wiki page

Ensures reproducible builds and deployment automation.

No

Issue Tracking

All tasks MUST be tracked using GitHub Issues.

Project management and traceability requirement.

No

Code Quality

Code quality MUST be monitored via SonarCloud. Critical issues block merges.

Required by course evaluation criteria.

No

2.4. External Constraints

These constraints come from external stakeholders and the operating environment.

Constraint Explanation Rationale Negotiable?

Bot Integration

External bots can ONLY interact through the public interoperability API (botapi, exposed at /interop/*); no direct access to internal services.

Security and encapsulation requirement.

No

Interoperability Contract

The botapi MUST follow the shared cross-team interoperability contract: POST /games, GET /games/{id}, POST /games/{id}/play, and GET /play (stateless), all using YEN notation.

Enables bot-vs-bot matches between teams from different universities.

No

API Documentation

The public interoperability API MUST be documented via OpenAPI (YAML) at src/openapi/openapi.yaml.

Enables external developers to build compatible bots without reading source code.

Partial (format negotiable)

YEN Notation Compliance

Any system claiming to support Game Y MUST use YEN notation for game state representation.

Industry standard for this game family.

No

Public Accessibility

The deployed system MUST be accessible without special network configuration over HTTPS.

End users and evaluators need to access the platform easily.

No

2.5. Quality Constraints

These constraints define the minimum acceptable quality levels the architecture must guarantee.

Constraint Target Measurement Negotiable?

Response Time (bot move)

Game engine must respond within 2 seconds for standard board sizes (⇐11x11).

k6 load test p(95) < 2000ms on /game/new.

Partial

Response Time (login)

Login endpoint must respond within 1.5 seconds at 50 concurrent users.

k6 load test p(95) < 1500ms on /login.

Partial

Response Time (registration)

Registration endpoint must respond within 2 seconds at 50 concurrent users.

k6 load test p(95) < 2000ms on /register.

Partial

Concurrent Users

Support at least 50 concurrent users on auth endpoints without degradation.

k6 ramping VU scenarios (register and login scripts).

Yes (scale up with resources)

Error Rate

HTTP error rate MUST remain below 5% under load test conditions.

k6 threshold http_req_failed rate < 0.05.

Partial

Availability

System must achieve 99% uptime during the evaluation period.

Monitoring via Prometheus and Grafana dashboards.

Partial

Test Coverage

Code coverage MUST remain above 80% as enforced by SonarCloud.

Automated coverage reports in CI pipeline.

No

2.6. Implications of These Constraints

These constraints collectively shape our architecture in specific ways:

  • Eight-service structure: The technology constraints (TypeScript + Rust + WebSockets) force a separation between frontend (webapp), game engine (gamey), real-time multiplayer (multiplayer), and the remaining Node.js services. The admin role requirement adds access control logic at both the gateway and users service levels.

  • JWT with role claims: The role-based access control constraint means the JWT payload must include a role field (e.g., "admin" or "user") so that the gateway can enforce admin-only endpoints without querying the users service on every request.

  • Reverse proxy and routing: Public access is centralized through an Nginx reverse proxy, which handles HTTPS termination and routes requests to webapp, gateway, botapi, and multiplayer via Socket.IO.

  • Socket.IO for multiplayer: The real-time constraint forces a stateful WebSocket server (multiplayer service) alongside the stateless REST services. This service maintains room state in memory and must be considered separately in scaling and failure scenarios.

  • Docker standardization: The containerization constraint ensures consistent development and deployment environments across all eight services plus the monitoring stack (Prometheus + Grafana).

  • Load testing as a first-class concern: The k6 constraint means performance is validated automatically, not just observed. Thresholds are defined in the test scripts and enforced in CI.

  • Documentation overhead: Arc42, ADRs, OpenAPI, and load test documentation require dedicated time alongside development, but provide traceability and evaluability for the course.

3. Context and Scope

This section delimits the YOVI system from its environment. Following arc42 guidelines, we separate business context (what the system does, from a domain perspective) from technical context (how it communicates, from an infrastructure perspective).

3.1. Business Context

The YOVI system is a web-based gaming platform for Game Y. From a business perspective, it provides three distinct value propositions:

  • For human players: An intuitive web interface to play Game Y against AI opponents or other human players in real time, register an account, manage a profile, interact socially with other users, and track performance.

  • For platform administrators: A privileged management panel to oversee registered accounts, grant or revoke admin permissions, delete match histories, and remove user accounts from the platform.

  • For bot developers: A public interoperability API (botapi) that allows automated bots to interact with the game engine, request moves, create and manage game sessions, and participate in cross-team bot-vs-bot competitions.

The system depends on no external business systems — it is self-contained in terms of domain functionality. All game logic, user management, authentication, multiplayer coordination, and AI strategies are implemented within the YOVI system boundaries.

3.1.1. Business Context Diagram

Business Context Diagram   YOVI System

3.1.2. Business Interfaces

Interface Description Input Output Communication Partner

Game Play UI (PvM)

Web UI for human players to play against AI

Mouse clicks on board cells

Visual board, updated YEN state

Human Player

Game Play UI (PvP)

Real-time web UI for human vs human matches

Mouse clicks; room code to join

Live board updates via WebSocket events

Human Player

User Management

Registration and login interface

Username, password, optional email

Session JWT token, user profile

Human Player

Profile & Statistics

Access to user profile and match history

User identity (JWT)

Profile fields, win/loss records, game history

Human Player

Social Features

Friend requests, friend list, user search

Username queries; friend request actions

Friend list, notifications, search results

Human Player

Notification System

In-app notifications for events

System-generated triggers (friend request, welcome)

Unread badge, notification list

Human Player

Admin Panel

Management UI for privileged users

Admin JWT; user selection; action selection

Updated user list; confirmation of action taken

Administrator

Bot API — Game Session

Programmatic interface for bots to create and play games

Board state (YEN), bot ID, move coordinates

Updated YEN state, game status

External Bot

Bot API — Stateless Move

Request a single bot move without session state

YEN position, optional bot ID

Move coordinates

External Bot

Bot API — Remote Interop

Client-mode sessions against other teams' APIs

Remote API base URL, game ID, local bot ID

Session state, move result, action taken

External Bot / Rival Team API

3.2. Technical Context

From a technical perspective, the YOVI system is implemented as multiple independent services communicating via HTTP/REST and WebSockets. Public access is handled through an Nginx reverse proxy, which routes external requests to the appropriate services. This section describes the technical channels and protocols that enable the business interfaces defined above.

3.2.1. Technical Context Diagram

Technical Context Diagram   YOVI System

3.2.2. Technical Interfaces and Channels

Business Interface Technical Channel Protocol Data Format

Game Play UI (PvM)

Browser → Nginx → Webapp; API calls via Gateway → Gamey

HTTPS / HTTP

HTML/CSS/JS + JSON (YEN)

Game Play UI (PvP)

Browser → Nginx → Multiplayer (WebSocket upgrade); Multiplayer → Gamey

HTTPS → WSS (Socket.IO)

JSON events (room state, YEN, move result)

User Registration

Browser → Gateway → Auth → Users

HTTPS → HTTP → HTTP

JSON

User Login

Browser → Gateway → Auth → Users

HTTPS → HTTP → HTTP

JSON + JWT

Token Verification

Browser → Gateway → Auth

HTTPS → HTTP

JSON (JWT)

Profile & Statistics

Browser → Gateway → Users

HTTPS → HTTP

JSON

Social Features

Browser → Gateway → Users

HTTPS → HTTP

JSON

Notification System

Browser → Gateway → Users

HTTPS → HTTP

JSON

Admin Panel

Browser → Gateway → Users (admin-only endpoints)

HTTPS → HTTP

JSON (role enforced via JWT claim)

Bot API — Game Session

Bot → Nginx → BotAPI → Gamey

HTTPS → HTTP → HTTP

JSON (YEN)

Bot API — Stateless Move

Bot → Nginx → BotAPI → Gamey

HTTPS → HTTP → HTTP

JSON (YEN)

Bot API — Remote Interop

BotAPI → Rival Team API (outbound HTTP client)

HTTP

JSON (YEN, shared cross-team contract)

Game Logic Execution

Gateway → Gamey; Multiplayer → Gamey

HTTP (internal Docker network)

JSON (YEN)

Database Access

Users Service → MongoDB

MongoDB Wire Protocol

BSON

Metrics Collection

Prometheus → Gateway, Users, Gamey

HTTP (scrape /metrics)

Prometheus text format

Metrics Visualization

Grafana → Prometheus

HTTP (PromQL queries)

JSON (time series)

3.2.3. Technology Stack per Component

Component Technology Justification

Web Frontend (webapp)

React + TypeScript + Vite

Client requirement (TypeScript); component reusability; custom i18n; theme switching (dark/light)

API Gateway (gateway)

Node.js + Express 5 + express-prom-bundle

Single entry point for webapp traffic; centralized CORS, input sanitization, and Prometheus metrics

Authentication Service (authentication)

Node.js + Express 5 + jsonwebtoken + bcryptjs

Isolated JWT logic with single responsibility; bcrypt for secure password hashing

User Service (users)

Node.js + Express 5 + Mongoose + express-prom-bundle

User data persistence, friend graph, notifications, game results, admin operations; Prometheus metrics

Game Engine (gamey)

Rust + Axum + Tokio + axum-prometheus

Mandated by client; memory safety and performance for game logic and six AI strategies

Multiplayer Service (multiplayer)

Node.js + Express + Socket.IO + axios

Real-time bidirectional WebSocket communication for PvP rooms; delegates game rules to Gamey

Bot API (botapi)

Node.js + Express + TypeScript

Public interoperability API for external bots; client-mode sessions against rival APIs

Reverse Proxy (nginx)

Nginx stable-alpine

HTTPS termination, HTTP→HTTPS redirect, path-based routing to all internal services

Database

MongoDB (via Mongoose ODM)

Flexible schema for user data; aggregation pipelines for ranking and statistics

Monitoring

Prometheus + Grafana

Metrics scraping from three services; pre-built dashboard ("Yovi Services Overview")

Containerization

Docker + Docker Compose

Consistent dev/test/prod environments across all eight services

CI/CD

GitHub Actions + GitHub Container Registry

Automated build, test, publish, and deploy pipeline on every release tag

3.3. Scope

3.3.1. In Scope (Implemented)

  • Classic Game Y implementation with correct rules (move validation, win condition: connect all 3 sides)

  • Player-versus-machine (PvM) game mode with variable board size and six AI difficulty levels

  • Player-versus-player (PvP) real-time multiplayer via Socket.IO rooms with unique room codes

  • User registration and JWT-based authentication with bcrypt password hashing

  • Role-based access control: regular users and administrators

  • Admin panel: view all users, grant/revoke admin role, delete match history, delete accounts

  • User profiles with editable fields (real name, bio, city, country, preferred language)

  • Match history and game statistics stored in MongoDB

  • Top-10 ranking by wins via MongoDB aggregation

  • Social features: friend requests, friend list management, bidirectional unfriend

  • In-app notification system (friend requests, welcome messages) with mark-as-read

  • User search by username or real name

  • Hint system: AI suggests a move using alfa_beta_bot via POST /hint

  • Public interoperability API (botapi) for external bots using YEN notation

  • Remote interop client mode: our bots play against other teams' APIs

  • Internationalization (EN/ES) via custom i18n module with localStorage persistence

  • Dark/light theme switching with localStorage persistence

  • Prometheus metrics on gateway, users, and gamey services

  • Grafana dashboard ("Yovi Services Overview") with request rate, p95 latency, and error rate panels

  • Load testing suite with k6 (register, login, start_game scenarios)

  • Swagger UI for users service OpenAPI spec at /api-docs

  • Docker Compose for local development and production deployment

  • GitHub Actions CI/CD pipeline with automated test, build, publish, and deploy stages

  • SonarCloud code quality integration

3.3.2. Out of Scope

  • Game variants (Poly-Y, Hex, Tabu Y) — not implemented

  • Mobile native applications (iOS / Android)

  • OAuth / social login (Google, GitHub)

  • Persistent multiplayer room state (rooms are in-memory; lost on service restart)

  • Advanced analytics or machine learning for AI

  • Real-time admin notifications (admin panel requires manual refresh)

4. Solution Strategy

This section summarizes the fundamental decisions that shape the YOVI system architecture. Each decision is motivated by specific constraints (from section 2) and quality goals (from section 1), and forms the foundation for detailed design decisions in later sections.

4.1. 1. Technology Decisions

Decision Rationale Constraints / Goals Addressed Alternatives Considered

Frontend: React + TypeScript + Vite

React’s component model enables UI reuse, i18n support, and theme switching. TypeScript is mandatory per client. Vite provides fast dev server and optimized production builds.

Technical Constraint: TypeScript; Quality: Usability, Maintainability

Vue, Angular (rejected: team familiarity)

Authentication Service: Node.js/Express + JWT + bcryptjs

Isolated JWT logic with single responsibility improves testability. bcrypt ensures passwords are never stored in plain text. JWT role claims allow stateless admin authorization.

Quality: Security, Testability; Constraint: RBAC, bcrypt

Auth inside users service (rejected: violates SRP)

User Service: Node.js/Express + Mongoose

Lightweight, integrates well with MongoDB. Handles user data, friends graph, notifications, game results, ranking, search, and admin operations in a single cohesive service.

Quality: Development speed, Testability

Python/Flask, Java/Spring (rejected: heavier)

Game Engine: Rust + Axum + Tokio

Mandated by client. Memory safety and performance for game logic and six AI strategies. Axum provides ergonomic async HTTP with type-safe extractors and shared state via Arc<YBotRegistry>.

Technical Constraint: Rust; Quality: Performance, Reliability

Actix-web (rejected: more complex); C++, Go (not allowed)

Multiplayer Service: Node.js + Socket.IO

Socket.IO provides battle-tested WebSocket abstraction with room management, reconnection, and fallback transport. Delegates game rules to Gamey via HTTP, keeping itself stateless regarding game logic.

Technical Constraint: WebSockets; Quality: Reliability, Functionality

Raw WebSocket (rejected: no room management); SSE (rejected: unidirectional)

Bot API: Node.js + Express + TypeScript

Dedicated public API in TypeScript for type safety. Supports both server mode (external bots play our bots) and client mode (our bots play rival APIs). In-memory session store is sufficient given stateless game rules.

Technical Constraint: Public Interop API; Quality: Interoperability, Testability

Rust-based (rejected: slower iteration); Python (rejected: team expertise)

Reverse Proxy: Nginx

Centralizes public access, HTTPS termination, HTTP→HTTPS redirect, and path-based routing to all four public-facing services (webapp, gateway, multiplayer, botapi).

Quality: Security, Deployability, Availability

Traefik (rejected: more complex config); no proxy (rejected: insecure)

Database: MongoDB via Mongoose

Schema flexibility for evolving user data. Aggregation pipelines support ranking and statistics. Mongoose ODM provides schema validation, virtuals, and middleware hooks.

Quality: Development speed, Deployability

PostgreSQL (rejected: schema changes slower)

Monitoring: Prometheus + Grafana

Prometheus scrapes metrics from gateway, users, and gamey. Grafana provides a pre-built dashboard with request rate, p95 latency, and error rate panels for all three services.

Quality: Observability, Availability

Datadog (rejected: paid); custom logging only (rejected: no time-series)

Load Testing: k6

k6 provides JavaScript-based load test scripts with custom metrics (Trend, Rate), threshold enforcement, and JSON result export. Three scenarios cover registration, login, and game start.

Technical Constraint: Load Testing; Quality: Performance

Locust (rejected: Python only); JMeter (rejected: XML config, heavy)

Containerization: Docker + Docker Compose

Ensures consistent environments across dev/test/prod. Required for deployment. GitHub Container Registry stores published images per service.

Technical Constraint: Containerization; Quality: Deployability

Kubernetes (overkill); manual deployment (rejected: inconsistent)

CI/CD: GitHub Actions

Integrated with repository. Automates test, build, publish to GHCR, and deploy on every release tag.

Organizational Constraint: CI/CD; Quality: Testability

Jenkins (rejected: separate infrastructure)

4.2. 2. Top-Level Decomposition

Decision: Eight-service microservices architecture with dedicated components for each responsibility, exposed through an Nginx reverse proxy.

Reasons:

  • The technology constraints (TypeScript + Rust + WebSockets) force separation — they cannot run in the same process

  • Real-time multiplayer requires a stateful WebSocket server that is architecturally distinct from the stateless REST services

  • Separating authentication from user data management improves testability and single responsibility

  • The public bot interoperability API must be independently deployable and versioned

  • Nginx provides the single public entry point, handling HTTPS termination and routing

How it maps to quality goals:

  • Maintainability: Services can be modified independently

  • Testability: Each service is tested in isolation with its own test suite

  • Deployability: Components are published as individual Docker images to GHCR

Service Decomposition

4.3. 3. Design Patterns

Pattern Application Rationale Location

Strategy Pattern

Bot AI implementation in gamey

Six strategies implement YBot trait. New strategies require only a new struct — no modification to existing code.

gamey/src/bot/bot_implementations/

Registry Pattern

YBotRegistry manages bot instances

Allows dynamic bot selection by name at runtime. Used by both the game server handlers and the CLI.

gamey/src/bot/ybot_registry.rs

Gateway / Router Pattern

Nginx + gateway + botapi routing

Public traffic routes through Nginx; internal request flows are separated for web and bot clients.

nginx/, gateway/, botapi/

Middleware Pattern

JWT verification in auth service; Prometheus in gateway, users, gamey

authenticateToken middleware intercepts protected requests. express-prom-bundle and axum-prometheus intercept all requests for metrics.

authentication/auth-service.js, all services

Observer / Event Pattern

Socket.IO events in multiplayer service

Room state changes emit room_updated, game_started, game_updated, game_over, opponent_left events to all connected clients.

multiplayer/src/multiplayer-service.js

Repository Pattern

Mongoose models in users service

User, GameResult, Notification models encapsulate all data access. Tests mock at model level.

users/models/

React Context Pattern

ThemeProvider, I18nProvider in webapp

Global state (theme, language) shared across all components without prop drilling.

webapp/src/ThemeProvider.tsx, webapp/src/i18n/I18nProvider.tsx

Union-Find (Disjoint Set)

Win condition detection in game core

Tracks connected components of each player’s pieces. Checks if any component touches all three sides. Used in PlayerSet with parent, touches_side_a/b/c flags.

gamey/src/core/player_set.rs

4.4. 4. Quality Goal Realization

Quality Goal How We Achieve It Key Decisions Verification

Functionality

Rust engine with Union-Find win detection; six strategies in gamey/src/bot/; Socket.IO rooms for real-time PvP

Rust for game logic; Strategy Pattern; Socket.IO for multiplayer

Unit tests in gamey/; integration tests for bot moves; Socket.IO event tests

Usability

React SPA with responsive UI; custom i18n (EN/ES); dark/light theme; hint system

React frontend; ThemeProvider; I18nProvider; /hint endpoint backed by alfa_beta_bot

E2E tests (Playwright); language toggle tested; theme persistence tested

Modularity & Maintainability

Eight services with single responsibilities; Strategy Pattern; OpenAPI spec for botapi

Eight-service architecture; Strategy Pattern; TypeScript in botapi

Independent deployment possible; new AI strategy = new struct + registration

Reliability & Availability

Docker restart policies; stateless gateway and auth; Prometheus alerts

Containerization; stateless design; monitoring stack

docker-compose up with restart; Prometheus scraping; Grafana dashboard

Security

bcrypt hashing; JWT with role claims; input sanitization in gateway; Nginx as public perimeter

bcrypt in authentication service; RBAC in JWT; gateway path validators

Auth tests; admin route protection tests; SonarCloud security hotspot review

Testability

Separation of concerns; Vitest/Jest in Node; cargo test in Rust; Supertest for HTTP; k6 for load

Service separation; Middleware Pattern; in-memory test DBs (mongodb-memory-server)

Unit: cargo test, npm test; Integration: Supertest; E2E: Playwright; Load: k6

Interoperability

REST API with YEN notation; OpenAPI doc; client mode for rival APIs; versioned endpoints (/v1/)

Public botapi; versioned gamey endpoints; remoteInteropClient with allowlist

API tests verify YEN round-trips; cross-team integration tested during course sessions

4.5. 5. Organizational and Process Decisions

Decision Rationale Impact Constraint Addressed

Iterative Development (sprints)

Early validation of gameplay; adapt to feedback

Regular demos; continuous integration

Course timeline; quality goals

Kanban Board (GitHub Projects)

Visualize work; identify bottlenecks

Issues tracked; clear priorities

Organizational: issue tracking

Code Reviews via Pull Requests

Catch issues early; share knowledge

All PRs require review; standards enforced

Quality: maintainability, reliability

Definition of Done

Feature = code + tests + docs + reviewed

Ensures completeness before merge

Quality: testability, maintainability

Architecture Decision Records (ADRs)

Document why decisions were made

GitHub wiki with numbered ADRs

Organizational: documentation standard

SonarCloud Quality Gate

Automated code quality enforcement

PRs blocked if coverage drops below 80% or critical issues found

Organizational: code quality

4.6. 6. Key Architectural Decisions Summary

ADR Summary Status

ADR-001

Extensible game mode architecture starting with PvM; PvP added as real-time multiplayer

Implemented ✅

ADR-002

MongoDB for user data — schema flexibility, aggregation for ranking

Implemented ✅

ADR-003

Eight-service microservices architecture (updated from original three-service)

Implemented ✅ (updated)

ADR-004

Nginx as public reverse proxy with WebSocket upgrade support for Socket.IO

Implemented ✅ (updated)

ADR-005

Strategy Pattern for AI — six bot strategies as pluggable YBot implementations

Implemented ✅ (updated)

ADR-006

Dedicated authentication microservice with bcrypt and JWT role claims

Implemented ✅

ADR-007

Axum as HTTP framework for Rust game engine

Implemented ✅

ADR-008

Custom i18n in frontend — zero dependencies, EN/ES, localStorage persistence

Implemented ✅

ADR-009

Socket.IO for real-time multiplayer — dedicated service, in-memory rooms, delegates rules to Gamey

Implemented ✅

ADR-010

400ms debounce on social search — vs. throttle and search-on-submit

Implemented ✅

ADR-011

React Context for theme and i18n — vs. Redux/Zustand

Implemented ✅

ADR-012

Unambiguous alphabet for room codes — excludes O/0/I/1 to minimise transcription errors

Implemented ✅

ADR-013

express-prom-bundle with normalizePath: true — prevents metric cardinality explosion

Implemented ✅

ADR-014

Test DB isolation via _test suffix using regex — workaround for Atlas multi-host URI format

Implemented ✅

ADR-015

Express 5 in REST services, Express 4 in multiplayer — Socket.IO compatibility constraint

Implemented ✅

ADR-016

In-memory state for BotAPI and multiplayer — conscious decision vs. Redis/MongoDB

Implemented ✅

4.7. 7. Traceability to Constraints

Constraint (Section 2) How Solution Strategy Addresses It

TypeScript frontend

React + TypeScript in webapp/; TypeScript also used in botapi/

Rust game engine

gamey/ service with core logic, six bot strategies, and Axum HTTP server

JSON + YEN communication

REST APIs with YEN validation in gamey/; YEN passed transparently through gateway and multiplayer

Real-time multiplayer (WebSockets)

multiplayer/ service with Socket.IO; Nginx proxies /socket.io/* with WebSocket upgrade

Public API for bots

botapi/ exposes interoperability endpoints; versioned OpenAPI spec

Docker containerization

Each service has a Dockerfile; root docker-compose.yml; images in GitHub Container Registry

User data persistence

MongoDB in users/ service via Mongoose ODM

JWT authentication + bcrypt

Dedicated authentication/ service (ADR-006); bcrypt in /createuser, comparison in /login

Role-based access control

role field in JWT payload; gateway enforces admin-only routes; admin panel in webapp

Admin capabilities

Admin endpoints in users service: list all users, update role, delete history, delete account

Multiple AI strategies

Strategy Pattern in gamey/src/bot/ — six strategies implement YBot trait

Load testing

k6 scripts in tests/load/: register.js, login.js, start_game.js

Monitoring

Prometheus scrapes gateway, users, gamey; Grafana dashboard provisioned automatically

Testing requirements

Unit, integration, E2E, property-based, and load tests across all services

CI/CD

GitHub Actions: test → build → publish to GHCR → deploy on release

Documentation (Arc42 + ADRs)

This document (sections 1-15) + ADRs 001-009 in GitHub wiki

Internationalization

Custom i18n module in webapp/src/i18n/ (EN + ES); localStorage persistence

5. Building Block View

This section describes the static decomposition of the YOVI system into building blocks. Following arc42 guidelines, we present a hierarchical view:

  • Level 1: White box description of the overall system with black box descriptions of each top-level building block.

  • Level 2: White box descriptions of selected building blocks showing their internal structure.

5.1. Level 1: Overall System White Box

The YOVI system is decomposed into eight top-level building blocks plus a database and a monitoring stack, all orchestrated via Docker Compose and exposed through Nginx.

Level 1 System White Box

5.1.1. Black Box Descriptions — Level 1

Nginx Reverse Proxy
  • Purpose: Single public entry point. Handles HTTPS termination, HTTP→HTTPS redirect, and path-based routing to internal services. TLS certificates are mounted from nginx/certs/.

  • Provided interfaces: HTTPS on ports 80 and 443

  • Routing rules:

    • / → webapp:80

    • /api/* → gateway:8080

    • /socket.io/* → multiplayer:7000 (with WebSocket upgrade)

    • /interop/* → botapi:4001

  • Location: nginx/conf.d/default.conf

Webapp
  • Purpose: Single-page React application served as static files. Provides the complete user interface for human players and administrators. Communicates with backend exclusively through /api/ (REST) and /socket.io/ (WebSocket) via Nginx.

  • Provided interfaces: HTML/CSS/JS bundle served to browsers

  • Required interfaces: Gateway REST API; Multiplayer Socket.IO

  • Key routes: / (login), /register, /home, /game, /select-difficulty, /game/finished, /statistics, /profile/:username, /social, /multiplayer, /multiplayer/game, /admin (admin only)

  • Location: webapp/

Gateway
  • Purpose: Backend entry point for all webapp requests. Routes to the appropriate internal service, handles CORS, sanitizes path parameters, validates input, and exposes Prometheus metrics. Stateless — no session state.

  • Provided interfaces: REST API at port 8080 (internal), exposed via Nginx at /api/*

  • Required interfaces: Authentication Service, Users Service, Gamey Service, Multiplayer Service

  • Prometheus metrics: request count, method, path, status code via express-prom-bundle

  • Location: gateway/gateway-service.js

Authentication Service
  • Purpose: Handles all identity operations: payload validation, bcrypt password hashing and comparison, JWT token generation and verification with role claims. Delegates user data persistence to the Users Service.

  • Provided interfaces:

    • POST /register — validates payload, hashes password, calls Users Service, returns JWT

    • POST /login — fetches user from Users Service, bcrypt compares, returns JWT

    • GET /verify — validates JWT, returns decoded claims

    • GET /health

  • Required interfaces: Users Service (internal HTTP)

  • Location: authentication/auth-service.js

Users Service
  • Purpose: Manages all user data: profiles, credentials, game results, match history, statistics, friend graph, notifications, ranking, user search, and admin operations. Pure data service — no JWT logic.

  • Provided interfaces (selected):

    • POST /createuser, GET /users/:username, GET /users/:username/profile, PUT /users/:username/profile

    • DELETE /users/:username

    • POST /gameresults, GET /gameresults/:username, GET /ranking

    • GET /users/:username/friends, POST /users/:username/friends/request, POST /users/:username/friends/accept, POST /users/:username/friends/reject, DELETE /users/:username/friends/:friend

    • GET /users/:username/notifications, PATCH /users/:username/notifications/:id/read

    • GET /search?q=…​

    • GET /health, GET /metrics (Prometheus)

    • Admin endpoints: GET /admin/users, PATCH /admin/users/:username/role, DELETE /admin/users/:username, DELETE /admin/users/:username/history

  • Required interfaces: MongoDB

  • Swagger UI: available at /api-docs

  • Location: users/users-service.js

Gamey (Game Engine)
  • Purpose: Encapsulates all Game Y logic: move validation, win condition detection (connecting all three sides of the triangular board using Union-Find), and AI move calculation with six strategies. Stateless — no persistent data.

  • Provided interfaces:

    • GET /status — health check

    • GET /metrics — Prometheus metrics (axum-prometheus)

    • POST /game/new — creates initial game state, returns YEN

    • POST /game/check — checks win condition from YEN

    • POST /v1/game/pvb/:bot_id — applies player move, computes bot response, returns updated YEN

    • POST /v1/game/pvp/move — applies a player move in PvP context, returns updated YEN + win status

    • POST /v1/ybot/choose/:bot_id — returns bot’s chosen coordinates without applying move

  • Required interfaces: None (fully stateless)

  • Bot strategies: random_bot, heuristic_bot, minimax_bot, alfa_beta_bot, monte_carlo_hard, monte_carlo_extreme

  • Location: gamey/

Multiplayer Service
  • Purpose: Manages real-time player-vs-player game rooms via Socket.IO. Creates and joins rooms with unique 6-character codes. Maintains room state (players, YEN, status) in memory. Delegates game rule enforcement to Gamey via HTTP. Broadcasts game events to all room participants.

  • Provided interfaces:

    • REST: POST /rooms/create, POST /rooms/join, POST /rooms/state, POST /rooms/move, POST /rooms/leave, GET /rooms/:code, GET /health

    • Socket.IO events (server → client): connected, room_updated, game_started, game_updated, game_over, opponent_left

    • Socket.IO events (client → server): create_room, join_room, get_room_state, make_move, leave_room

  • Required interfaces: Gamey (HTTP)

  • State: In-memory RoomManager (lost on restart)

  • Location: multiplayer/src/multiplayer-service.js

BotAPI (Interoperability Service)
  • Purpose: Public interoperability API for external bots. Operates in two modes:

    • Server mode: external bots play against our bots via POST /games, GET /games/:id, POST /games/:id/play, GET /play

    • Client mode: our bots play against rival teams' APIs via POST /remote-games/create, POST /remote-games/connect, POST /remote-games/:id/play-turn

  • Provided interfaces: REST API at port 4001, exposed via Nginx at /interop/*

  • Required interfaces: Gamey (HTTP); Rival Team APIs (HTTP outbound, allowlisted hosts)

  • State: In-memory ActiveGamesStore and RemoteGameSessionsStore (lost on restart)

  • OpenAPI spec: botapi/src/openapi/openapi.yaml

  • Location: botapi/

MongoDB
  • Purpose: Persists all user data via the Users Service. Never accessed directly by other services.

  • Collections: users, gameresults, notifications

  • Location: Docker container managed externally; URI injected via MONGODB_URI env var

Prometheus + Grafana
  • Purpose: Metrics collection and visualization. Prometheus scrapes /metrics from gateway, users, and gamey every 15 seconds. Grafana provides the pre-built "Yovi Services Overview" dashboard with request rate, p95 latency, and error rate panels.

  • Location: users/monitoring/prometheus/, users/monitoring/grafana/

5.2. Level 2: Internal Structure

5.2.1. White Box: Webapp (webapp/)

Webapp Internal Structure

Key webapp building blocks:

Block Responsibility Location

App.tsx + React Router

Route definitions; redirect to / for unknown paths

webapp/src/App.tsx

ThemeProvider

Dark/light theme context; data-theme attribute on <html>; localStorage persistence

webapp/src/ThemeProvider.tsx

I18nProvider

EN/ES translations via React context; {variable} interpolation; localStorage persistence

webapp/src/i18n/

Navbar

Navigation links; notification bell with unread badge; profile link; theme/language toggles; logout

webapp/src/Navbar.tsx

Game (PvM)

Game board rendering; PvM move loop via POST /api/game/pvb/move; hint via POST /api/hint; local PvP mode

webapp/src/Game.tsx

MultiplayerLobby / MultiplayerGame

Room creation/join via REST; Socket.IO connection; real-time board sync; PvP move via Socket.IO make_move

webapp/src/MultiplayerLobby.tsx, webapp/src/MultiplayerGame.tsx

Social

User search with 400ms debounce; friend request buttons; profile navigation

webapp/src/Social.tsx

UserProfile

Editable profile fields; match history display

webapp/src/UserProfile.tsx

Statistics

Win/loss stats, game history with pagination

webapp/src/Statistics.tsx

AdminPanel

Admin-only page; user list; role toggle; delete history; delete account

webapp/src/AdminPanel.tsx

5.2.2. White Box: Gamey (gamey/)

Gamey Internal Structure

Key gamey building blocks:

Block Responsibility Location

HTTP Server (Axum)

Async HTTP server with Tokio runtime; Prometheus middleware; shared AppState via Arc<YBotRegistry>

gamey/src/game_server/mod.rs

GameY (core)

Board state; move application; available cell tracking; player position queries

gamey/src/core/game.rs

Coordinates

Barycentric (x,y,z) representation; from_index / to_index conversion; side-touch detection

gamey/src/core/coord.rs

PlayerSet (Union-Find)

Tracks connected components per player; touches_side_a/b/c flags; is_winning_configuration()

gamey/src/core/player_set.rs

YBot trait

name() → &str; choose_move(&GameY) → Option<Coordinates>; Send + Sync for async safety

gamey/src/bot/ybot.rs

YBotRegistry

HashMap<String, Arc<dyn YBot>>; find(name), names(), with_bot() builder; default includes all six

gamey/src/bot/ybot_registry.rs

YEN notation

TryFrom<YEN> for GameY; From<&GameY> for YEN; layout string parsing and serialization

gamey/src/notation/

5.2.3. White Box: Users Service (users/)

Users Service Internal Structure

Key users service building blocks:

Block Responsibility Location

User model

Schema with username (unique), email (sparse), password (bcrypt hash), friends[], friendRequests[], realName, bio, location, preferredLanguage, role (user/admin), createdAt; unique index on username

users/models/User.js

GameResult model

Stores match outcomes: username, opponent, result (win/loss), winner, score, boardSize, gameMode (pvb/pvp), date; index on username

users/models/GameResult.js

Notification model

recipient, type (friend_request/welcome), from (null for system notifications), read (bool); compound index on (recipient, createdAt DESC) for efficient queries

users/models/Notification.js

Friend routes

Request (creates notification), accept (bidirectional add), reject (remove from friendRequests), unfriend (bidirectional $pull)

users/users-service.js — friends section

Admin routes

GET /admin/users (paginated), PATCH /admin/users/:username/role, DELETE /admin/users/:username, DELETE /admin/users/:username/history; cascades deletes to GameResult and Notification

users/users-service.js — admin section

Ranking

MongoDB aggregation: $match {result: win}$group {_id: username, wins: $sum 1}$sort$limit 10$project

users/users-service.js — GET /ranking

5.2.4. White Box: Multiplayer Service (multiplayer/)

Multiplayer Service Internal Structure

Key multiplayer building blocks:

Block Responsibility Location

RoomManager

Creates rooms with unique codes (6-char alphanumeric); tracks player sockets; manages turn order via getCurrentTurnColor(); updates YEN after moves; handles disconnect cleanup

multiplayer/src/rooms.js

GameyClient

HTTP client wrapping POST /game/new and POST /v1/game/pvp/move calls to Gamey; 5-second timeout

multiplayer/src/gamey-client.js

RoomCodeGenerator

Generates unique 6-character codes from a custom alphabet (no O/0, I/1 ambiguity); retries up to 1000 times to guarantee uniqueness against existing codes

multiplayer/src/codes.js

Socket.IO events (server-emitted)

room_updated (player joined/left), game_started (both players connected), game_updated (after move), game_over (game finished), opponent_left (disconnect)

multiplayer/src/multiplayer-service.js

5.3. Interface Overview

Interface Provider Consumer Protocol Purpose

Public HTTPS

Nginx

Browser / Bot

HTTPS

Single external entry point with TLS termination

WebSocket Upgrade

Nginx → Multiplayer

Browser

WSS (Socket.IO)

Real-time PvP game events

Auth API

Authentication

Gateway

HTTP (internal)

Register, login, verify JWT

User Data API

Users

Auth, Gateway

HTTP (internal)

User CRUD, friends, notifications, admin ops

Game API

Gamey

Gateway, Multiplayer, BotAPI

HTTP (internal)

Game logic, bot moves, win detection

Interop API

BotAPI

External Bots

HTTPS → HTTP

Public bot integration with YEN

Remote Interop

BotAPI (client)

Rival Team APIs

HTTP (outbound)

Cross-team bot-vs-bot via YEN contract

Database

MongoDB

Users

MongoDB Wire

Data persistence

Metrics

Prometheus

Gateway, Users, Gamey

HTTP (scrape)

Observability data collection

6. Runtime View

This section describes the dynamic behavior of the YOVI system through a representative selection of architecturally significant scenarios. These scenarios demonstrate how the building blocks defined in Section 5 collaborate at runtime to deliver the required functionality.

6.1. Runtime Scenario 1: User Registration

A new user creates an account. The authentication service validates credentials, hashes the password with bcrypt, delegates user creation to the users service, and returns a JWT token with role claims.

User Registration Sequence

6.2. Runtime Scenario 2: User Login and Session Verification

A returning user logs in. The auth service fetches the stored bcrypt hash from the users service, compares it, and issues a JWT. The home page verifies the token on every load.

Login and Verification Sequence

6.3. Runtime Scenario 3: Player vs Machine Game (PvM)

A complete game session between a human player and the AI bot. Demonstrates the game loop, YEN notation exchange, and result persistence.

PvM Game Sequence

6.4. Runtime Scenario 4: Real-Time Multiplayer (PvP via Socket.IO)

Two human players compete in real time. Demonstrates room creation, Socket.IO event flow, and move synchronization between clients.

PvP Multiplayer Sequence

6.5. Runtime Scenario 5: Friend Request and Notification Flow

A user searches for another user, sends a friend request, and the recipient receives and accepts it.

Friend Request Sequence

6.6. Runtime Scenario 6: Admin Panel — Delete User Account

An administrator uses the admin panel to delete another user’s account. Demonstrates role enforcement and cascading data deletion.

Admin Delete User Sequence

6.7. Runtime Scenario 7: External Bot Using the Interoperability API

An external bot creates a game session against one of our bots, plays moves, and receives the updated state.

BotAPI External Game Sequence

6.8. Runtime Scenario 8: Error and Failure Handling

6.8.1. Scenario 8.1: Game Engine Unavailable

Game Engine Down

6.8.2. Scenario 8.2: Opponent Disconnects During Multiplayer

Opponent Disconnect

6.8.3. Scenario 8.3: Invalid or Expired JWT

Expired Token

6.8.4. Scenario 8.4: Non-Admin Accessing Admin Panel

Unauthorized Admin Access

6.9. Summary of Runtime Scenarios

Scenario Description Building Blocks Quality Goals Demonstrated

1 — Registration

New user creates account with bcrypt hashing and JWT generation

Webapp, Gateway, Auth, Users, MongoDB

Security, Functionality

2 — Login + Verify

Returning user authenticates; session verified on every page load

Webapp, Gateway, Auth, Users

Security, Usability

3 — PvM Game

Complete game loop against AI bot with result persistence

Webapp, Gateway, Gamey, Users, MongoDB

Functionality, Performance

4 — PvP Multiplayer

Real-time room creation, join, and move synchronization via Socket.IO

Webapp, Gateway, Multiplayer, Gamey

Functionality, Reliability

5 — Friend Request

User search, friend request, notification, and acceptance

Webapp, Gateway, Users, MongoDB

Functionality, Usability

6 — Admin Delete

Admin removes account with cascading data deletion and role enforcement

Webapp, Gateway, Users, MongoDB

Security, Functionality

7 — External Bot (BotAPI)

Bot creates session, plays moves, queries state via interop API

BotAPI, Gamey

Interoperability, Reliability

8 — Error Handling

Service down, opponent disconnect, token expiry, unauthorized admin access

All services

Reliability, Security, Availability

7. Deployment View

This section describes the technical infrastructure that executes the YOVI system and the mapping of building blocks to infrastructure elements. The application is built on Docker containerization running on a cloud virtual machine, combining the portability of containers with the security and isolation of the VM.

Two Docker Compose configurations are maintained:

  • docker-compose.yml — development/CI build: builds images from local source code and tags them for publication to GitHub Container Registry (GHCR).

  • docker-compose.deploy.yml — production deployment: pulls pre-built images from GHCR; no build step.

7.1. Level 1: Production Deployment Overview

All services run as Docker containers on a single cloud VM, orchestrated with docker-compose.deploy.yml. All containers share the monitor-net bridge network. Only Nginx exposes ports externally.

Production Deployment Level 1

7.2. Level 2: Container Detail

This table lists every container in the production deployment, its image, internal port, dependencies, and key environment variables.

Container Image Exposes Depends On Key Env Vars

nginx

nginx:stable-alpine

80, 443 (public)

webapp, gateway, botapi, multiplayer

webapp

ghcr.io/arquisoft/yovi_en2c-webapp:latest

80 (internal)

gateway

VITE_API_URL=/api (build arg)

gateway

ghcr.io/arquisoft/yovi_en2c-gateway:latest

8080 (internal)

authentication, gamey, users, multiplayer

AUTH_BASE_URL, GAMEY_BASE_URL, MULTIPLAYER_BASE_URL, USERS_BASE_URL, PORT=8080

authentication

ghcr.io/arquisoft/yovi_en2c-authentication:latest

5000 (internal)

users

JWT_SECRET, JWT_EXPIRES, USERS_SERVICE_URL=http://users:3000, PORT=5000

users

ghcr.io/arquisoft/yovi_en2c-users:latest

3000 (internal)

MongoDB

MONGODB_URI, PORT=3000

gamey

ghcr.io/arquisoft/yovi_en2c-gamey:latest

4000 (internal)

multiplayer

ghcr.io/arquisoft/yovi_en2c-multiplayer:latest

7000 (internal)

gamey

GAMEY_BASE_URL=http://gamey:4000, PORT=7000

botapi

ghcr.io/arquisoft/yovi_en2c-botapi:latest

4001 (internal)

gamey

GAMEY_BASE_URL=http://gamey:4000, GAMEY_API_VERSION=v1, PORT=4001

prometheus

prom/prometheus

9090:9090

Config: users/monitoring/prometheus/prometheus.yml

grafana

grafana/grafana

9091:3000

prometheus

Provisioning: users/monitoring/grafana/provisioning/

7.3. Level 3: Nginx Routing Configuration

Nginx is the only container with publicly exposed ports. It routes all inbound traffic to the appropriate internal container based on the request path.

Nginx Routing Detail

Key Nginx configuration decisions:

  • HTTPS only — port 80 redirects permanently (301) to port 443.

  • TLS certificates — mounted read-only from nginx/certs/ (Let’s Encrypt format).

  • WebSocket upgrade — the /socket.io/ location sets Upgrade and Connection headers to allow Socket.IO to negotiate the WebSocket protocol through the proxy.

  • SPA fallback — the / location serves index.html for any unmatched path, enabling client-side routing in the React application.

  • Security headerx-powered-by is disabled on all Node.js services.

7.4. Level 3: Prometheus Scrape Configuration

Prometheus scrapes metrics from three services on the Docker internal network every 15 seconds.

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'users-service'
    static_configs:
      - targets: ['users:3000']

  - job_name: 'gateway-service'
    static_configs:
      - targets: ['gateway:8080']

  - job_name: 'gamey-service'
    static_configs:
      - targets: ['gamey:4000']

The Grafana dashboard ("Yovi Services Overview", UID yovi-overview) is provisioned automatically at startup via the users/monitoring/grafana/provisioning/ directory. It includes three panels:

  • Request Rate (req/s)rate(http_requests_total[1m]) per service, method, path, and status code

  • P95 Request Duration (s)histogram_quantile(0.95, …​) per service

  • Error Rate (4xx+5xx)rate(http_requests_total{status_code=~"4..|5.."}[1m]) per service

7.5. Development vs Production Differences

Aspect Development (docker-compose.yml) Production (docker-compose.deploy.yml)

Image source

Built from local source (build: ./service)

Pulled from GHCR (image: ghcr.io/arquisoft/…​)

Build args

VITE_API_URL=/api passed to webapp build

Not needed (image already built)

Volumes

Source code not mounted (clean build)

Config files mounted read-only (nginx/conf.d, nginx/certs)

Secrets

.env file at project root

Environment variables injected by CI/CD on the VM

Monitoring ports

Prometheus 9090, Grafana 9091 accessible locally

Same ports — restrict via firewall in production if needed

MongoDB

Local container or Atlas URI in .env

Atlas URI in production environment variable

7.6. CI/CD and Deployment Pipeline

The GitHub Actions workflow (release-deploy.yml) automates the full pipeline on every release tag:

CD Pipeline

7.7. Mapping of Building Blocks to Infrastructure

Building Block Container Network Notes

Nginx Reverse Proxy

nginx

monitor-net + public

Only container with external ports (80, 443)

Webapp (React SPA)

webapp

monitor-net (internal)

Served as static files; built with Vite

API Gateway

gateway

monitor-net (internal)

Prometheus metrics at /metrics

Authentication Service

authentication

monitor-net (internal)

Stateless; no persistent storage

Users Service

users

monitor-net (internal)

Prometheus metrics; Swagger UI at /api-docs

Game Engine

gamey

monitor-net (internal)

Stateless Rust binary; Prometheus via axum-prometheus

Multiplayer Service

multiplayer

monitor-net (internal)

In-memory room state; lost on container restart

Bot API

botapi

monitor-net (internal)

In-memory session state; lost on container restart

MongoDB

External (Atlas URI)

— (external)

URI injected via MONGODB_URI env var

Prometheus

prometheus

monitor-net

Scrapes gateway, users, gamey every 15s

Grafana

grafana

monitor-net

Dashboard auto-provisioned; port 9091 external

7.8. Quality and Operational Considerations

Concern Approach

Isolation

All containers share a single monitor-net bridge network. Only Nginx has external port bindings. Internal services are not reachable from outside the VM.

Portability

All services are containerized with multi-stage Dockerfiles (where applicable). Images are published to GHCR and pulled on the target VM — no build tools required in production.

Stateless services

Gateway, authentication, and gamey are fully stateless. Multiplayer and botapi hold in-memory state (rooms and sessions respectively) which is lost on container restart — acceptable for the current scope.

Data durability

MongoDB Atlas provides managed persistence with automatic backups. The MONGODB_URI env var is the only configuration needed; no data volume is managed by Docker Compose.

Secret management

JWT_SECRET and MONGODB_URI are stored as GitHub Secrets and injected as environment variables during the deploy step. They are never committed to the repository.

Restart policy

Docker Compose restart: unless-stopped (implicit in deploy config) ensures containers are automatically restarted after crashes or VM reboots.

Monitoring

Prometheus + Grafana are always-on in production. The "Yovi Services Overview" dashboard shows request rate, p95 latency, and error rate for gateway, users, and gamey in real time with a 30-second refresh.

SSL certificates

TLS certificates are mounted from nginx/certs/ (Let’s Encrypt format: fullchain.pem + privkey.pem). Certificate renewal is managed externally and requires an Nginx reload.

8. Cross-cutting Concepts

This section describes concepts, patterns, and approaches that span multiple building blocks and apply system-wide. These are the architectural decisions that cut across service boundaries and define how YOVI behaves as a whole rather than within any single component.

8.1. 1. Game State Representation — YEN Notation

All game state across the system is represented using YEN (Y Exchange Notation), a JSON format inspired by chess FEN notation. This is mandated by the client and forms the backbone of all service-to-service and external communication involving game state.

{
  "size": 5,
  "turn": 0,
  "players": ["B", "R"],
  "layout": "B/BR/.R./..../....."
}

Fields:

  • size — board edge length (e.g., 7 means 28 total cells)

  • turn — index into players array indicating whose turn it is (0 or 1)

  • players — array of token characters; "B" (Blue) moves first, "R" (Red) second

  • layout — rows separated by /; each row is one character per cell (. = empty, B or R = occupied); row 0 has 1 cell, row 1 has 2, …​, row N-1 has N cells

Usage across the system:

Component How YEN is used

Gamey

Parses YEN via GameY::try_from(yen) to reconstruct board state; serializes via From<&GameY> for YEN; validates row count, row lengths, and cell characters

Gateway

Passes YEN through transparently — no parsing; forwards raw JSON body between webapp and Gamey

Multiplayer

Stores current YEN per room in RoomManager; sends updated YEN in all Socket.IO game events

BotAPI

Validates YEN via assertValidYen(); detects moves via detectSingleAddedMove(); applies moves via applyMoveToYen(); forwards to Gamey for rule enforcement

Webapp

Receives YEN from gateway; renders board from layout string; sends YEN in move requests

8.2. 2. Authentication and Authorization

8.2.1. JWT-based Authentication

All user-facing endpoints that require identity are protected by JWT tokens issued by the authentication service. The token lifecycle is:

  1. User logs in → Auth Service verifies bcrypt hash → issues signed JWT

  2. Webapp stores token in localStorage

  3. Every protected request includes Authorization: Bearer <token>

  4. Gateway forwards token to Auth Service for verification (GET /verify)

  5. Auth Service returns decoded claims; gateway proxies the response

JWT payload structure:

{
  "id": "64f1a2b3c4d5e6f7a8b9c0d1",
  "username": "alice",
  "email": "alice@example.com",
  "role": "user",
  "iat": 1712345678,
  "exp": 1712432078
}

The role field is set to "user" by default and "admin" for privileged accounts. It is embedded directly in the token to avoid an additional database round-trip on every admin route check.

8.2.2. Role-Based Access Control (RBAC)

Two roles are defined:

Role Capabilities

user (default)

Play games (PvM and PvP), manage own profile, send/accept/reject friend requests, view own statistics and match history, receive and mark notifications, search for other users

admin

All user capabilities, plus: view all registered accounts (paginated), grant or revoke admin role on any account, delete any user’s match history, permanently delete any user account

RBAC is enforced at two layers:

  • Client-side guard (webapp): Admin pages check the JWT role claim before rendering; non-admin users are redirected to /home immediately.

  • Server-side enforcement (gateway): Admin API routes verify the JWT role claim on every request. A valid token with role !== "admin" receives 403 Forbidden. This prevents bypassing the client guard via direct API calls.

8.2.3. Password Security

Passwords are hashed with bcrypt (cost factor 10) in the users service at account creation (POST /createuser). The authentication service fetches the stored hash and uses bcrypt.compare() during login. Plain-text passwords are never stored or logged.

8.3. 3. Real-Time Communication — Socket.IO

The multiplayer service uses Socket.IO for bidirectional real-time communication between clients and the server. Socket.IO is layered over WebSockets with automatic fallback to HTTP long-polling.

8.3.1. Transport Negotiation

Nginx proxies /socket.io/* with WebSocket upgrade headers:

location /socket.io/ {
    proxy_pass http://multiplayer:7000;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
}

8.3.2. Room Lifecycle

Socket

8.3.3. Event Catalogue

Direction Event Payload When

Client → Server

create_room

{username, size}

Host creates a new room

Client → Server

join_room

{code, username}

Guest joins by room code

Client → Server

make_move

{code, row, col}

Player places a piece

Client → Server

leave_room

{code}

Player explicitly leaves

Client → Server

get_room_state

{code}

Reconnecting client requests current state

Server → Client

connected

{socketId, message}

Immediately after connection established

Server → Client

room_updated

{room}

Any room state change (player joined/left)

Server → Client

game_started

{room, message}

Both players are connected

Server → Client

game_updated

{room, finished, winner, winningEdges}

After every valid move

Server → Client

game_over

{room, finished, winner, winningEdges}

When a player connects all three sides

Server → Client

opponent_left

{room, removedColor}

When a player disconnects or leaves

8.3.4. Room State (In-Memory)

Room state is held exclusively in the RoomManager instance within the multiplayer service. It is not persisted to any database. If the multiplayer container restarts, all active rooms are lost. This is an accepted trade-off for the current scope — active multiplayer sessions are short-lived and can be restarted by the players.

8.4. 4. Internationalization (i18n)

The webapp supports English (EN) and Spanish (ES) via a custom React Context implementation with zero external dependencies.

Implementation:

  • All UI strings are externalized to a translations.ts dictionary keyed by dot-notation paths (e.g., "home.welcome", "login.error.password")

  • I18nProvider wraps the app and exposes t(key, vars?) via the useI18n() hook

  • Variable interpolation: t("home.welcome", { username })"Hello alice" (replaces {username})

  • Language preference is persisted in localStorage under the key "lang"

  • Fallback: if a key is missing in the selected language, the other language is tried; if still missing, the key itself is returned

Language toggle is available on every page (login, register, and in the navbar for authenticated users).

8.5. 5. Theming (Dark / Light Mode)

The webapp supports dark (default) and light themes via a custom ThemeProvider React context.

Implementation:

  • ThemeProvider sets the data-theme attribute on document.documentElement

  • CSS variables are defined per theme in App.css under [data-theme="dark"] and [data-theme="light"]

  • Theme preference is persisted in localStorage under the key "theme"

  • useTheme() hook exposes theme and toggleTheme() to any component

8.6. 6. Observability — Metrics and Monitoring

Three services expose Prometheus-compatible metrics endpoints:

Service Metrics Library Endpoint

Gateway

express-prom-bundle (Node.js)

GET /metrics

Users

express-prom-bundle (Node.js)

GET /metrics

Gamey

axum-prometheus (Rust)

GET /metrics

Standard metrics collected per service:

  • http_requests_total — labelled by method, path (normalized), and status code

  • http_request_duration_seconds — histogram for p50/p95/p99 latency

  • For Gamey: axum_http_requests_total and axum_http_requests_duration_seconds

The Grafana dashboard "Yovi Services Overview" (UID yovi-overview) auto-provisions with three panels covering all three services simultaneously, refreshing every 30 seconds.

8.7. 7. Error Handling Strategy

A consistent error response format is used across all Node.js services:

{
  "ok": false,
  "error": "Human-readable error message",
  "details": {}
}

For the BotAPI (TypeScript), the format follows the OpenAPI spec:

{
  "code": "NOT_FOUND",
  "message": "game abc123 not found"
}

HTTP status codes are used consistently:

Status When Example

400 Bad Request

Invalid input format or business rule violation

Passwords do not match; malformed YEN; occupied cell

401 Unauthorized

Missing or invalid JWT token

No Authorization header; expired token

403 Forbidden

Valid token but insufficient role

Regular user accessing admin endpoint

404 Not Found

Resource does not exist

Username not found; room not found; game not found

409 Conflict

Duplicate resource

Username already exists; friend request already sent

500 Internal Server Error

Unexpected server-side error

Unhandled exception in service logic

502 Bad Gateway

Internal service unreachable

Gamey container down; auth service unreachable

503 Service Unavailable

Service temporarily down

Container restarting

The gateway uses a centralized forwardAxiosError() function that propagates the status code and error message from internal services to the client, or returns 502 if the service is completely unreachable.

8.8. 8. Testing Strategy

Test Type Tool Location Scope

Unit tests

Vitest (gateway, auth, botapi), Jest (users, multiplayer), cargo test (gamey)

tests/, *.test.ts, *_test.rs

Individual functions, modules, and Mongoose models

Integration tests

Supertest (Node.js), tower::ServiceExt::oneshot (Rust)

tests/, gamey/src/game_server/

HTTP endpoints with in-memory or test databases

Property-based tests

proptest (Rust)

gamey/src/core/coord.rs

Coordinate roundtrip invariants, barycentric sum invariant

End-to-end tests

Playwright

webapp/e2e/

Complete user journeys in a real browser

Load tests

k6

tests/load/

Registration (50 VUs), login (50 VUs), game start (20 VUs)

All tests run in CI via GitHub Actions on every pull request. SonarCloud enforces a minimum coverage of 80% across all services as a quality gate.

Special testing considerations:

  • Users service: Uses mongodb-memory-server for isolated in-memory MongoDB during tests; the vitest.globalSetup.js drops the _test database after all suites complete.

  • Gamey: Integration tests use tower::ServiceExt::oneshot to call handlers without binding to a network port, keeping tests fast and isolated.

  • BotAPI: HTTP clients (gameyClient, remoteInteropClient) are mocked with vi.mock() so no real network calls are made during unit tests.

  • Multiplayer: Socket.IO behavior is tested via socket.io-client connected to a real in-process server instance.

8.9. 9. Security Concepts

Measure Implementation

Transport encryption

HTTPS with TLS via Nginx for all external traffic. Internal container-to-container communication uses plain HTTP on the isolated Docker bridge network.

Password hashing

bcrypt with cost factor 10 in the users service. Hashed passwords are returned only to the auth service for comparison and are never included in any public API response.

JWT signing

HS256 algorithm with a strong secret (JWT_SECRET from environment). Tokens expire after JWT_EXPIRES (default 24h). The secret is stored in GitHub Secrets and never committed.

Role enforcement

Admin role is encoded in the JWT payload. Both client-side (webapp route guard) and server-side (gateway middleware) checks are applied, so a compromised client cannot bypass authorization.

Input sanitization

The gateway validates and sanitizes all path parameters (username: alphanumeric + underscore + hyphen, max 60 chars; notification ID: MongoDB ObjectId regex; room code: uppercase alphanumeric). Invalid inputs return 400 before reaching internal services.

Authorization header sanitization

The gateway extracts Bearer tokens using a strict regex (/^Bearer\s+([A-Za-z0-9\-._~/]=*)$/) and rejects malformed headers.

Bot API host allowlist

The remoteInteropClient in botapi validates the base_url against a hardcoded set of allowed hosts before making outbound HTTP requests, preventing SSRF attacks.

x-powered-by disabled

All Node.js services call app.disable("x-powered-by") to avoid leaking the framework version in response headers.

Secrets management

JWT_SECRET and MONGODB_URI are stored as GitHub Secrets and injected as environment variables at deploy time. .env files are gitignored. SonarCloud suppression comments (// NOSONAR) are used only for translation key strings containing the word "password", not for actual credentials.

8.10. 10. Code Organization Conventions

All services follow a consistent structure within their respective technology stack:

Node.js services (CommonJS — users, multiplayer):

service-name/
  src/               # Source files (or flat structure for simpler services)
  models/            # Mongoose models (users service)
  monitoring/        # Prometheus + Grafana config (users service)
  __tests__/         # Test files (Jest / Vitest)
  Dockerfile
  package.json
  .env.example

Node.js services (ESM — gateway, authentication):

service-name/
  service-name.js    # Main entry point
  vitest.config.js
  Dockerfile
  package.json

TypeScript service (botapi):

botapi/
  src/
    app.ts           # Express app factory
    server.ts        # Entry point
    routes/          # Route definitions
    controllers/     # Request handlers
    services/        # Business logic
    clients/         # HTTP clients (gamey, remote interop)
    store/           # In-memory state stores
    models/          # TypeScript interfaces
    dtos/            # Request/response types
    utils/           # YEN helpers, ID generation
    openapi/         # openapi.yaml spec
    __tests__/       # Vitest tests
  tsconfig.json
  Dockerfile

Rust service (gamey):

gamey/src/
  main.rs            # Binary entry point (CLI + server mode)
  lib.rs             # Library exports
  core/              # GameY, Coordinates, Movement, Player, PlayerSet
  bot/               # YBot trait, YBotRegistry, bot_implementations/
  notation/          # YEN parse/serialize
  game_server/       # Axum HTTP server (mod.rs, bot/, game/, error.rs, state.rs, version.rs)
  gamey_error.rs     # Error types
  cli.rs             # Interactive terminal mode
Cargo.toml

9. Architecture Decisions

This section documents the most significant architectural decisions made during the design and development of the YOVI platform. Each decision follows the ADR (Architecture Decision Record) format providing context, rationale, and consequences.

For the full ADR history see the GitHub Wiki — ADR.

9.1. ADR-001: Game Modes Available to the User

  • Status: Implemented ✅

  • Date: 2026-01-15

  • Deciders: Development Team

Context: The platform needs to offer users different ways to play Game Y. The initial requirements specify player-versus-machine mode, but the architecture should be extensible to support additional modes. Real-time player-versus-player was subsequently added as a core requirement.

Decision: Design the game mode system with extensibility as a core principle. Implement PvM as the primary mode; add real-time PvP via Socket.IO as a dedicated multiplayer service that delegates all rule enforcement to gamey. Additionally expose a local PvP mode (same device, two players) as a variant of the PvM game component.

Consequences:

  • ✅ New game modes can be added without modifying core game logic

  • ✅ PvM and PvP share the same game engine (gamey) via different HTTP endpoints

  • ✅ Multiplayer state (rooms) is isolated — failure does not affect PvM

  • ⚠ Two separate game flows (REST for PvM, Socket.IO for PvP) increase client complexity

  • Mitigation: Clear separation in webapp components (Game.tsx for PvM, MultiplayerGame.tsx for PvP)


9.2. ADR-002: MongoDB for User Data Persistence

  • Status: Implemented ✅

  • Date: 2026-01-20

  • Deciders: Development Team

Context: The users service needs to persist user profiles, credentials, friend graphs, notifications, and game results. The schema evolves as new features are added (friends, notifications, admin role, profile fields).

Decision: Use MongoDB via Mongoose ODM. Three collections: users, gameresults, notifications.

Consequences:

  • ✅ Schema flexibility — adding friends[], role, bio, location required no migration scripts

  • ✅ Aggregation pipelines support ranking ($group, $sort, $limit 10)

  • ✅ Compound index on (recipient, createdAt DESC) for efficient notification queries

  • ⚠ No built-in referential integrity — orphan cleanup handled in application code on user delete

  • Mitigation: Explicit cascade deletes in DELETE /admin/users/:username handler


9.3. ADR-003: Eight-Service Microservices Architecture

  • Status: Implemented ✅ — Updated 2026-03-20 (originally three-service)

  • Date: 2026-01-15 | Last Update: 2026-03-20

  • Deciders: Development Team

Context: Technology constraints (TypeScript + Rust) force at least a frontend/backend split. As the project grew, authentication, API gateway, real-time multiplayer, and bot interoperability were identified as distinct responsibilities benefiting from dedicated services.

Evolution:

Version Services

v1 (2026-01-15)

webapp, users, gamey

v2 (2026-02-24)

+ authentication, gateway

v3 (2026-03-20)

+ multiplayer, botapi, nginx

Decision: Eight independent services, each with a single responsibility, orchestrated via Docker Compose and exposed through Nginx.

Consequences:

  • ✅ Each service can be developed, tested, and deployed independently

  • ✅ Technology-specific optimizations per service

  • ✅ Failure in one service does not affect others

  • ⚠ Eight containers, more inter-service HTTP calls

  • Mitigation: Docker Compose simplifies local development; Prometheus + Grafana provide observability


9.4. ADR-004: Nginx as Public Reverse Proxy

  • Status: Implemented ✅ — Updated 2026-03-20

  • Date: 2026-01-20 | Last Update: 2026-03-20

Context: With eight services, external clients need a single entry point. Socket.IO requires WebSocket upgrade support. BotAPI needs its own path prefix (/interop/).

Decision: Nginx handles HTTPS termination, HTTP→HTTPS redirect, and routes to four services:

  • /webapp:80

  • /api/*gateway:8080

  • /socket.io/*multiplayer:7000 (WebSocket upgrade)

  • /interop/*botapi:4001

Consequences:

  • ✅ Single public URL; internal topology hidden from clients

  • ✅ TLS managed centrally — internal services use plain HTTP

  • ✅ WebSocket upgrade transparent to the client

  • ⚠ Single point of failure — mitigated by Docker restart policy

  • ⚠ Certificate renewal requires Nginx reload


9.5. ADR-005: Strategy Pattern for AI Bot Behaviours

  • Status: Implemented ✅ — Updated 2026-03-15

  • Date: 2026-01-25 | Last Update: 2026-03-15

Context: The system must support multiple AI strategies selectable at runtime. New strategies may be added without modifying existing code. The YBot trait must be Send + Sync for Axum’s async handlers.

Decision: Implement the Strategy Pattern via the YBot trait in Rust. All six strategies are registered in YBotRegistry and selected by name at runtime.

Bot ID Algorithm Difficulty

random_bot

Random valid move

heuristic_bot

Side connection heuristic

Easy

minimax_bot

Minimax (depth 3)

Medium

alfa_beta_bot

Minimax + alpha-beta pruning

Hard

monte_carlo_hard

Monte Carlo Tree Search

Expert

monte_carlo_extreme

MCTS (more iterations)

Extreme

Consequences:

  • ✅ New strategy = new struct + one with_bot() call — no existing code changed

  • ✅ Each strategy tested independently

  • ⚠ MCTS Extreme can exceed 2s on boards > 11×11

  • Mitigation: performance budget enforced in k6; depth configurable per constructor


9.6. ADR-006: Dedicated Authentication Microservice

  • Status: Implemented ✅

  • Date: 2026-02-24

Context: Authentication (JWT generation, bcrypt comparison) and user data management are distinct responsibilities. Mixing them in the users service violates SRP and reduces testability.

Decision: Extract a dedicated authentication/ service responsible for: payload validation, bcrypt comparison, JWT generation with role claims, and exposing /register, /login, /verify, /health.

Consequences:

  • ✅ JWT logic is independently testable

  • ✅ Future auth strategies (OAuth) can be added without touching users service

  • ⚠ Login requires two HTTP calls: auth → users (sub-millisecond on Docker network)


9.7. ADR-007: Axum as HTTP Framework for the Rust Game Engine

  • Status: Implemented ✅

  • Date: 2026-02-01

Context: The gamey Rust service needs to expose a REST API. The choice affects ergonomics, async support, and testability.

Decision: Use Axum with Tokio. Key reasons: extractor-based API, State<AppState> for shared Arc<YBotRegistry>, tower::ServiceExt::oneshot for port-free integration tests, and axum-prometheus for native metrics.

Alternatives Considered:

  • Actix-web: Rejected — more complex, historically used unsafe code

  • Rocket: Rejected — async support was immature at time of decision

  • Raw hyper: Rejected — excessive boilerplate

Consequences:

  • ✅ Type-safe shared state; compile-time handler signature guarantees

  • ✅ Integration tests run without network binding

  • ⚠ Rust async adds compile-time complexity


9.8. ADR-008: Custom i18n Implementation in the Frontend

  • Status: Implemented ✅

  • Date: 2026-03-01

Context: The webapp needs EN/ES support. Scope is < 200 keys, two languages only.

Decision: Custom I18nProvider using React Context API with a static translations.ts dictionary. Features: {variable} interpolation, localStorage persistence, Spanish fallback for missing keys.

Alternatives Considered:

  • react-i18next: Rejected — dependency overhead disproportionate to the number of strings

  • FormatJS / react-intl: Rejected — same concern; better for large-scale applications

Consequences:

  • ✅ Zero additional dependencies

  • ✅ TypeScript Dict type catches missing keys at compile time

  • ⚠ No pluralization support (not currently needed)


9.9. ADR-009: Socket.IO for Real-Time Multiplayer

  • Status: Implemented ✅

  • Date: 2026-03-10

Context: PvP game mode requires real-time bidirectional communication between two clients coordinated by a server.

Decision: Use Socket.IO in a dedicated multiplayer/ service. The service manages room state in memory, enforces turn order, and delegates game rules to gamey via HTTP. Socket.IO provides room management, reconnection, and WebSocket/long-polling fallback out of the box.

Alternatives Considered:

  • Raw WebSocket: Rejected — no room management; reconnection would need custom implementation

  • SSE: Rejected — unidirectional; clients cannot send moves without a separate REST endpoint

  • HTTP polling: Rejected — too much latency and server load for turn-based sync

  • Multiplayer inside gateway: Rejected — mixes stateful logic with stateless REST routing

Consequences:

  • ✅ Room management and event broadcasting handled by Socket.IO

  • ✅ Game rule enforcement remains in gamey — no duplication

  • ⚠ Room state is in-memory — lost on container restart

  • ⚠ Cannot scale horizontally without Redis Socket.IO adapter

  • Mitigation: Acceptable for current scope; Redis adapter is documented upgrade path


9.10. ADR-010: YEN as Universal Game State Format

  • Status: Implemented ✅

  • Date: 2026-01-20

Context: Multiple services (gateway, multiplayer, botapi, webapp) need to exchange game state. A shared format must be human-readable, JSON-serializable, and parseable by both Rust and JavaScript without lossy conversion.

Decision: Adopt YEN (Y Exchange Notation) as the sole game state format across all services. YEN is a JSON object with four fields: size (integer), turn (player index), players (token array), and layout (row-separated string of cell characters).

{ "size": 5, "turn": 0, "players": ["B","R"], "layout": "B/BR/.R./..../....." }

Alternatives Considered:

  • Flat cell array: Rejected — loses the triangular structure; requires extra size metadata

  • Coordinate list (only occupied cells): Rejected — harder to detect empty cells; requires full board size for validation; not human-readable

  • Binary encoding: Rejected — not human-readable; harder to debug across languages

Consequences:

  • ✅ Single format understood by Rust (gamey), Node.js (gateway, multiplayer, botapi), and TypeScript (webapp)

  • ✅ Human-readable layout string simplifies debugging

  • ✅ Enables cross-team interoperability without format negotiation

  • assertValidYen() and GameY::try_from(yen) provide validation at every entry point

  • ⚠ Layout string grows quadratically with board size (size 11 = 66 chars)

  • ⚠ Strict row-length validation (row i must have i+1 chars) can reject valid states if serialized incorrectly


9.11. ADR-011: Role-Based Access Control via JWT Claims

  • Status: Implemented ✅

  • Date: 2026-03-15

Context: The admin panel requires differentiating regular users from administrators on every request. Two options exist: (1) embed the role in the JWT payload, or (2) look up the role from the database on every request.

Decision: Embed the role field ("user" or "admin") directly in the JWT payload at token generation time. The gateway reads the role from the decoded token without querying the users service.

{ "id": "...", "username": "alice", "email": "...", "role": "admin", "iat": "...", "exp": "..." }

Enforcement is applied at two layers: * Client-side: Webapp checks role claim before rendering admin routes; redirects non-admins to /home * Server-side: Gateway middleware decodes JWT and rejects requests to /admin/* if role !== "admin"

Alternatives Considered:

  • Database lookup on every admin request: Rejected — adds latency and database load; the users service becomes a synchronous dependency on every protected request

  • Separate admin token: Rejected — doubles the authentication flow and complicates token storage

  • Scope-based OAuth claims: Considered — more standard but overkill for two roles; no OAuth provider in use

Consequences:

  • ✅ Zero additional latency for role checks — role is in the token

  • ✅ Admin routes are protected even if the users service is temporarily unavailable

  • ⚠ Role changes (grant/revoke admin) only take effect on the next login — existing tokens retain the old role

  • ⚠ Token cannot be invalidated without a blocklist (not implemented)

  • Mitigation: Acceptable for current scale; role revocation edge case documented as known limitation


9.12. ADR-012: Union-Find for Win Condition Detection

  • Status: Implemented ✅

  • Date: 2026-02-10

Context: Game Y is won by connecting all three sides of the triangular board. After every move, the system must determine whether any player has formed a connected path touching sides A, B, and C simultaneously. The check must be fast (< 2s total including bot move).

Decision: Use a Union-Find (Disjoint Set Union) data structure in the game core. Each placed piece is represented as a PlayerSet node with three boolean flags: touches_side_a, touches_side_b, touches_side_c. When a piece is placed, it is merged with all adjacent same-color pieces. The win condition is detected when any root node has all three flags set to true.

Alternatives Considered:

  • BFS/DFS on every move: Considered — correct but O(N) per move where N is board size; Union-Find achieves near-O(1) amortized per union/find operation

  • Pre-compute all winning paths: Rejected — exponential number of paths on a triangular board

  • Side-connectivity matrix: Rejected — more complex to maintain incrementally

Consequences:

  • ✅ Win condition check is near-O(1) amortized after each move

  • ✅ Naturally integrates with move application — no separate pass over the board

  • ✅ Side-touch flags are propagated correctly through path compression

  • ⚠ Union-Find requires careful implementation of path compression with flag merging

  • ⚠ Barycentric coordinate system (x + y + z = size - 1) requires non-trivial index conversion


9.13. ADR-013: Barycentric Coordinate System for the Game Board

  • Status: Implemented ✅

  • Date: 2026-02-10

Context: A triangular Game Y board needs a coordinate system for identifying cells, computing adjacency, and detecting side contact. Standard (row, col) coordinates work but make side-touch detection verbose.

Decision: Use barycentric coordinates (x, y, z) where x + y + z = size - 1. Each coordinate represents the distance from one of the three sides:

  • x = 0 → cell touches side A

  • y = 0 → cell touches side B

  • z = 0 → cell touches side C

Conversion functions from_index(idx, size) and to_index(size) map between the linear array representation (for board storage) and barycentric coordinates (for rule logic).

Alternatives Considered:

  • (row, col) only: Rejected — side detection requires row == 0, col == 0, col == row checks separately; adjacency computation is less elegant

  • Axial coordinates (q, r): Considered — common for hex grids but not natural for triangular boards

Consequences:

  • ✅ Side-touch detection is x == 0, y == 0, or z == 0 — one comparison each

  • ✅ Property-based tests verify x + y + z == size - 1 invariant holds for all valid indices

  • ✅ Human-readable in API responses (YEN coordinates use {x, y, z})

  • ⚠ Three-coordinate system is unfamiliar to developers new to the codebase

  • from_index uses floating-point arithmetic (sqrt) — validated by round-trip property tests


9.14. ADR-014: Express 5 for Node.js Services

  • Status: Implemented ✅

  • Date: 2026-02-15

Context: Three Node.js services (gateway, authentication, users) need an HTTP framework. Express is the industry-standard choice for Node.js REST APIs. Express 4 vs Express 5 was evaluated.

Decision: Use Express 5 (^5.0.0) for gateway and authentication services. Express 5 provides native async/await error propagation — unhandled promise rejections in route handlers are automatically passed to the error middleware without requiring explicit try/catch wrappers in every handler.

Note
The users service uses Express 5 as well (^5.2.1). The multiplayer service uses Express 4 (^4.21.1) since it was developed slightly earlier.

Alternatives Considered:

  • Fastify: Rejected — team familiarity with Express; Fastify’s schema validation adds setup overhead

  • Hono: Considered — fast and TypeScript-native but less ecosystem maturity for course timeframe

  • Express 4: Still used in multiplayer — acceptable since multiplayer handlers use explicit try/catch throughout

Consequences:

  • ✅ Native async error propagation reduces boilerplate in route handlers

  • ✅ Familiarity across the team — no learning curve

  • ⚠ Express 5 was in release candidate status during development — minor API differences from Express 4

  • ⚠ Some middleware (e.g., older versions of express-prom-bundle) required version pinning


9.15. ADR-015: In-Memory State for BotAPI and Multiplayer Sessions

  • Status: Implemented ✅

  • Date: 2026-03-10

Context: Both the multiplayer service (rooms) and the BotAPI (game sessions, remote interop sessions) need to maintain short-lived state across multiple HTTP requests or Socket.IO events. Options: in-memory, Redis, MongoDB.

Decision: Use in-memory Map-based stores (RoomManager, ActiveGamesStore, RemoteGameSessionsStore) in both services. State is lost on container restart.

Rationale:

  • Bot games and multiplayer rooms are short-lived (typically < 30 minutes)

  • Adding Redis introduces a new infrastructure dependency and operational complexity

  • MongoDB writes on every move would add latency and storage overhead for ephemeral data

  • Course evaluation scale (< 50 concurrent sessions) does not require persistence

Consequences:

  • ✅ Zero additional infrastructure — no Redis or extra MongoDB collections needed

  • ✅ Sub-millisecond state access — no network round-trip

  • ✅ Simple implementation — Map<string, Room> with helper methods

  • ⚠ State lost on container restart — players must create new rooms after deployments

  • ⚠ Cannot scale horizontally without a shared store

  • Mitigation: Documented as TD1 (multiplayer) and acceptable limitation for BotAPI; Redis adapter is the documented upgrade path


9.16. ADR-016: Dedicated db.js with Test Database Isolation in Users Service

  • Status: Implemented ✅

  • Date: 2026-02-20

Context: The users service needs to connect to MongoDB. Tests must not read from or write to the production database. The MongoDB URI format (host1:port,host2:port/db) is not a valid URL, ruling out the standard URL constructor for parsing.

Decision: Centralise MongoDB connection in db.js with a buildConnectionUri() function that appends _test to the database name when NODE_ENV === "test" using regex-based URI manipulation instead of URL parsing.

// Append _test suffix without URL parsing (handles multi-host URIs)
if (/\/[^/?]+\?/.test(uri)) return uri.replace(/\/([^/?]+)\?/, '/$1_test?')
if (/\/[^/?]+$/.test(uri)) return uri.replace(/\/([^/?]+)$/, '/$1_test')

The vitest.globalSetup.js drops the _test database after all test suites complete.

Alternatives Considered:

  • mongodb-memory-server always: Considered — fully isolated but slower startup; does not test against real MongoDB behavior

  • Separate .env.test file: Considered — requires manual configuration; regex approach is automatic

  • URL constructor: Rejected — multi-host MongoDB Atlas URIs (host1,host2/db) are not valid URLs

Consequences:

  • ✅ Tests run against a real MongoDB instance (_test DB) without affecting production data

  • ✅ Works with any MongoDB URI format including Atlas cluster URIs

  • vitest.globalSetup.js ensures clean state between CI runs

  • ⚠ Requires a running MongoDB instance for tests (CI provides one)

  • ⚠ Regex-based URI manipulation could fail on unusual URI formats


9.17. ADR-017: Vitest as Test Runner for Node.js Services

  • Status: Implemented ✅

  • Date: 2026-02-15

Context: Node.js services need a test runner. Options considered: Jest (established, CommonJS-friendly), Vitest (fast, native ESM support), Mocha (flexible but requires more setup).

Decision: Use Vitest (^4.0.x) for gateway, authentication, and botapi services (all ESM). Use Jest (^29.x) for users and multiplayer services (CommonJS). Both integrate with @vitest/coverage-v8 and jest --coverage for SonarCloud coverage reporting.

Rationale: Gateway and authentication use "type": "module" (ESM). Jest has historically poor ESM support requiring Babel transforms. Vitest natively supports ESM with no configuration. Users and multiplayer use CommonJS (require()), where Jest is the natural choice.

Alternatives Considered:

  • Vitest for all: Considered — CommonJS support requires "type": "module" migration of users and multiplayer, which would have been a large refactor mid-project

  • Jest for all: Rejected — ESM support in Jest requires --experimental-vm-modules flag and Babel configuration overhead

  • Mocha: Rejected — no built-in coverage; requires additional plugins for assertions and mocking

Consequences:

  • ✅ Each service uses the test runner best suited to its module system

  • ✅ Both produce LCOV coverage reports compatible with SonarCloud

  • ⚠ Two different test runners in the same repository increase onboarding friction

  • Mitigation: Consistent npm test and npm run test:coverage scripts across all services hide the underlying runner differences


9.18. ADR-018: ESM vs CommonJS Module System Split

  • Status: Implemented ✅

  • Date: 2026-01-20

Context: Node.js supports two module systems: CommonJS (require()) and ES Modules (import/export). The project started with CommonJS (users service) and progressively adopted ESM (gateway, authentication) as newer services were added. This created a split.

Decision: Accept the mixed module system rather than migrating all services to ESM at once:

  • CommonJS ("type": "commonjs" or no type field): users/, multiplayer/

  • ESM ("type": "module"): gateway/, authentication/

  • TypeScript compiled to CommonJS: botapi/ (TypeScript source, compiled to dist/ as CJS via tsconfig.json with "module": "Node16")

The users-service.js uses a hybrid approach: the main file uses ESM (import), but loads CommonJS models via createRequire(import.meta.url) to bridge the module system gap.

Alternatives Considered:

  • Migrate all to ESM: Considered — Mongoose and some Express middleware had ESM compatibility issues at the time; migration risk was high mid-project

  • Migrate all to CommonJS: Rejected — gateway and auth were already written in ESM; rewriting would lose async error propagation benefits of Express 5 + ESM

Consequences:

  • ✅ Each service works correctly with its chosen module system

  • ✅ No breaking changes required to existing services

  • ⚠ Inconsistency increases onboarding friction and makes copy-paste between services error-prone

  • createRequire bridge in users-service.js is unusual and requires a comment explaining why

  • Mitigation: Each service has its own package.json clearly indicating module system; "type" field is explicit in all cases


9.19. ADR-019: express-prom-bundle for Prometheus Metrics in Node.js Services

  • Status: Implemented ✅

  • Date: 2026-03-01

Context: Gateway and users services need to expose Prometheus metrics. Options: manual metric registration with prom-client, or a middleware wrapper that auto-instruments all HTTP requests.

Decision: Use express-prom-bundle as Express middleware in gateway and users services. Configuration:

promBundle({
  includeMethod: true,
  includePath: true,
  includeStatusCode: true,
  normalizePath: true,
})

normalizePath: true collapses parametric paths (e.g., /users/alice and /users/bob/users/#val) to prevent high-cardinality metric labels.

Alternatives Considered:

  • Manual prom-client counters: Rejected — requires instrumenting every route handler individually; high boilerplate; easy to miss routes

  • OpenTelemetry: Considered — more powerful but significantly more complex to configure for a course project; overkill for three services

Consequences:

  • ✅ All HTTP requests automatically tracked with method, path, and status code labels

  • normalizePath prevents cardinality explosion from user-specific paths

  • ✅ Consistent metric names across gateway and users (http_requests_total, http_request_duration_seconds)

  • normalizePath may collapse paths that should be tracked separately

  • ⚠ Metrics endpoint (/metrics) is publicly accessible — acceptable since it contains no sensitive data


9.20. ADR-020: React Context API for Global State (Theme and i18n)

  • Status: Implemented ✅

  • Date: 2026-03-01

Context: The webapp needs two pieces of global state shared across all components: the current language (EN/ES) and the current theme (dark/light). Options: prop drilling, React Context, or a state management library (Redux, Zustand, Jotai).

Decision: Use the React Context API with custom providers (ThemeProvider, I18nProvider) wrapping the entire application at the main.tsx level. Each provider exposes a custom hook (useTheme(), useI18n()) that components call directly.

Alternatives Considered:

  • Prop drilling: Rejected — theme and language are needed in virtually every component; prop drilling would pollute every component signature

  • Redux Toolkit: Rejected — significant boilerplate for two simple global values; no async state needed

  • Zustand: Considered — lightweight, but adds a dependency; React Context is sufficient for synchronous, rarely-changing global state

Consequences:

  • ✅ Zero additional dependencies

  • ✅ Clean API: const { t } = useI18n() and const { theme, toggleTheme } = useTheme()

  • ✅ Providers are independently testable

  • ⚠ Context re-renders all consumers on every change — acceptable since theme/language changes are rare

  • useTheme() and useI18n() return fallback values if called outside their provider, preventing runtime errors but potentially silencing configuration mistakes


9.21. ADR-021: Room Code Generation Without Ambiguous Characters

  • Status: Implemented ✅

  • Date: 2026-03-10

Context: Multiplayer rooms need unique, human-typeable codes that players share out of band (chat, voice). Codes must be short, unambiguous, and easy to read aloud.

Decision: Generate 6-character codes from a custom alphabet "ABCDEFGHJKLMNPQRSTUVWXYZ23456789" that excludes visually ambiguous characters: O (confused with 0), I (confused with 1), 0 (zero), and 1 (one). The generator retries up to 1000 times to guarantee uniqueness against existing room codes.

Alternatives Considered:

  • UUID: Rejected — 36 characters; impossible to type from memory or dictate verbally

  • Sequential numeric ID: Rejected — predictable; allows enumeration of all active rooms

  • Full alphabet including ambiguous chars: Rejected — O/0 and I/1 confusion causes UX friction when sharing codes verbally

Consequences:

  • ✅ Codes are short (6 chars), unique, and unambiguous when read aloud

  • ✅ 32^6 ≈ 1 billion possible codes — no practical collision risk at current scale

  • generateUniqueRoomCode() retries up to 1000 times — effectively guaranteed for any reasonable number of active rooms

  • ⚠ Case-sensitive codes (all uppercase) — enforced by toUpperCase() on input


9.22. ADR-022: Debounced Search in Social Features

  • Status: Implemented ✅

  • Date: 2026-03-18

Context: The Social page provides a live user search. Without rate limiting, every keystroke would trigger an API call to the users service, causing unnecessary load and poor UX for slow typists.

Decision: Implement a 400ms debounce on the search input using a useRef<NodeJS.Timeout> timer in the Social.tsx component. The search request is only sent after the user stops typing for 400ms. Minimum query length is 1 character.

Alternatives Considered:

  • No debounce (fire on every keystroke): Rejected — excessive API calls; users service and MongoDB would receive a request per character typed

  • Search-on-submit (button click): Rejected — poor UX; most modern search UIs are live

  • Throttle instead of debounce: Rejected — debounce is more appropriate for search: we want to wait until the user finishes typing, not limit to one request per N ms

Consequences:

  • ✅ Significant reduction in API calls during normal typing (e.g., typing "alice" = 1 request vs 5)

  • ✅ Zero additional dependencies — implemented with native setTimeout/clearTimeout

  • ⚠ 400ms delay is noticeable on fast connections — acceptable trade-off for reduced server load

  • ⚠ Debounce timer must be cleared on component unmount to avoid memory leaks — handled via useRef pattern


9.23. Summary of All Architectural Decisions

ID Decision Date Status

ADR-001

Extensible game modes — PvM, PvP, local PvP

2026-01-15

ADR-002

MongoDB + Mongoose for user data

2026-01-20

ADR-003

Eight-service microservices (evolved from three)

2026-01-15

✅ updated

ADR-004

Nginx as public reverse proxy with WS upgrade

2026-01-20

✅ updated

ADR-005

Strategy Pattern for AI — six YBot implementations

2026-01-25

✅ updated

ADR-006

Dedicated authentication microservice with bcrypt

2026-02-24

ADR-007

Axum as HTTP framework for Rust game engine

2026-02-01

ADR-008

Custom i18n — zero dependencies, EN/ES

2026-03-01

ADR-009

Socket.IO for real-time multiplayer in dedicated service

2026-03-10

ADR-010

YEN as universal game state format

2026-01-20

ADR-011

Role-based access control via JWT claims

2026-03-15

ADR-012

Union-Find for win condition detection

2026-02-10

ADR-013

Barycentric coordinate system for the game board

2026-02-10

ADR-014

Express 5 for Node.js services

2026-02-15

ADR-015

In-memory state for BotAPI and multiplayer sessions

2026-03-10

ADR-016

Test database isolation via URI suffix _test

2026-02-20

ADR-017

Vitest (ESM) + Jest (CJS) as test runners

2026-02-15

ADR-018

ESM vs CommonJS module system split

2026-01-20

ADR-019

express-prom-bundle for auto-instrumented metrics

2026-03-01

ADR-020

React Context API for theme and i18n global state

2026-03-01

ADR-021

Room code generation without ambiguous characters

2026-03-10

ADR-022

400ms debounced search in social features

2026-03-18


10. Quality Requirements

This section makes the quality goals from Section 1 concrete, measurable, and testable through a quality tree and specific scenarios following the ATAM approach.

10.1. Quality Tree

Quality Tree

10.2. Quality Scenarios

10.2.1. QS-01: Bot Move Calculation Performance

Aspect Description

ID

QS-01

Quality Goal

Performance / Response Time

Stimulus

User makes a move in a PvM game against minimax_bot or alfa_beta_bot

Environment

Normal operation, board size ≤ 11×11

Artifact

Gamey service (/v1/game/pvb/:bot_id)

Response

System validates player move, computes bot response, and returns updated YEN

Response Measure

95% of responses within 2 seconds (start_game k6 threshold: p(95)<3000ms)

Verification

k6 load test tests/load/start_game.js with 20 concurrent VUs

10.2.2. QS-02: Login Response Time Under Load

Aspect Description

ID

QS-02

Quality Goal

Performance / Response Time

Stimulus

50 concurrent users log in simultaneously

Environment

Production deployment under load test conditions

Artifact

Gateway → Auth Service → Users Service

Response

All users receive a valid JWT token

Response Measure

p(95) < 1500ms; success rate > 95%; HTTP error rate < 5%

Verification

k6 load test tests/load/login.js — ramping to 50 VUs over 10s, hold 30s

10.2.3. QS-03: Concurrent User Throughput

Aspect Description

ID

QS-03

Quality Goal

Performance / Throughput

Stimulus

50 concurrent users registering simultaneously

Environment

Production deployment

Artifact

Gateway → Auth → Users → MongoDB

Response

All users are registered or receive a meaningful error (duplicate username)

Response Measure

p(95) < 2000ms; success rate > 95%; HTTP error rate < 5%

Verification

k6 load test tests/load/register.js — ramping to 50 VUs

10.2.4. QS-04: New User Learnability

Aspect Description

ID

QS-04

Quality Goal

Usability / Learnability

Stimulus

A first-time user wants to start a game against the AI

Environment

User has no prior knowledge of Game Y or YOVI

Artifact

Webapp — Home, SelectDifficulty, Game pages

Response

User navigates to a running game without external instructions

Response Measure

80% of new users complete first move within 2 minutes in usability testing

Verification

Usability test with 5 new participants; task completion time recorded

10.2.5. QS-05: Internationalization Completeness

Aspect Description

ID

QS-05

Quality Goal

Usability / Internationalization

Stimulus

User switches language to Spanish (or English)

Environment

Any page of the webapp

Artifact

I18nProvider, translations.ts

Response

All user-facing strings are displayed in the selected language

Response Measure

100% of translation keys present in both EN and ES; zero hardcoded UI strings

Verification

Automated check for missing keys; TypeScript Dict type enforces completeness at compile time

10.2.6. QS-06: Admin Panel Usability

Aspect Description

ID

QS-06

Quality Goal

Usability / Admin UX

Stimulus

An administrator needs to remove a user account from the platform

Environment

Admin user logged in with valid admin JWT

Artifact

Webapp /admin page → Gateway admin endpoints → Users Service

Response

Admin locates the user, confirms deletion, and the account is permanently removed with all associated data

Response Measure

Task completed in < 3 clicks from the admin panel; confirmation dialog prevents accidental deletion

Verification

Manual usability walkthrough; E2E test covering admin delete flow

10.2.7. QS-07: AI Strategy Extensibility

Aspect Description

ID

QS-07

Quality Goal

Maintainability / Extensibility

Stimulus

Developer needs to add a new AI bot strategy (e.g., RAVE MCTS)

Environment

Development environment, existing codebase

Artifact

gamey/src/bot/

Response

New strategy implemented and accessible via HTTP API

Response Measure

Implementation requires only: (1) new struct implementing YBot, (2) one with_bot() call in create_default_state() — no modification to existing files

Verification

Code review confirms no existing file was modified; new bot appears in GET /status

10.2.8. QS-08: Test Coverage

Aspect Description

ID

QS-08

Quality Goal

Maintainability / Testability

Stimulus

Developer merges a pull request

Environment

CI pipeline (GitHub Actions)

Artifact

All services

Response

SonarCloud analyses coverage; PR is blocked if below threshold

Response Measure

Code coverage > 80% across all services as enforced by SonarCloud quality gate

Verification

Automated coverage reports in CI; SonarCloud dashboard badge in README

10.2.9. QS-09: System Availability

Aspect Description

ID

QS-09

Quality Goal

Reliability / Availability

Stimulus

Evaluators and users access the platform during the evaluation period

Environment

Production deployment on cloud VM

Artifact

All services via Nginx

Response

System responds to requests normally

Response Measure

99% uptime during evaluation period; monitored via Grafana "Yovi Services Overview" dashboard

Verification

Uptime tracking via Prometheus; Grafana alert on sustained error rate spike

10.2.10. QS-10: Crash Recovery

Aspect Description

ID

QS-10

Quality Goal

Reliability / Fault Tolerance

Stimulus

A service container crashes (e.g., gamey OOM, multiplayer panic)

Environment

Production, during active usage

Artifact

Any application container

Response

Docker detects the failure and restarts the container automatically

Response Measure

Service restored within 30 seconds; other services unaffected during restart

Verification

Chaos test: docker kill gamey during a test session; measure time to recovery

10.2.11. QS-11: Data Durability

Aspect Description

ID

QS-11

Quality Goal

Reliability / Data Durability

Stimulus

The users service container or VM restarts unexpectedly

Environment

Post-restart

Artifact

MongoDB (Atlas), Users Service

Response

All user profiles, match history, friends, and notifications are intact

Response Measure

Zero data loss — 100% of committed records recoverable

Verification

Restart container, verify data via GET /users/:username and GET /gameresults/:username

10.2.12. QS-12: Token Expiry Enforcement

Aspect Description

ID

QS-12

Quality Goal

Security / Authentication

Stimulus

A user with an expired JWT attempts to access a protected endpoint

Environment

Token issued more than JWT_EXPIRES (default 24h) ago

Artifact

Auth Service (GET /verify), Gateway

Response

Request is rejected and user is redirected to the login page

Response Measure

100% of expired token requests return 401; webapp clears localStorage and redirects

Verification

Integration test with manually expired token; E2E test simulating session expiry

10.2.13. QS-13: Admin Role Enforcement

Aspect Description

ID

QS-13

Quality Goal

Security / Authorization (RBAC)

Stimulus

A regular user attempts to access an admin endpoint directly (bypassing the UI)

Environment

Production; user has a valid non-admin JWT

Artifact

Gateway admin route middleware

Response

Request is rejected with 403 Forbidden

Response Measure

100% of non-admin requests to admin endpoints return 403; no data is returned

Verification

Integration test sending valid user token to GET /api/admin/users; verify 403 response

10.2.14. QS-14: Bot API Interoperability

Aspect Description

ID

QS-14

Quality Goal

Interoperability / Bot API

Stimulus

An external bot creates a game, plays three moves, and queries the state

Environment

Production endpoint https://yovi.13.63.89.84.sslip.io/interop

Artifact

BotAPI service

Response

All three operations succeed; YEN state is consistent across requests

Response Measure

100% of valid YEN requests handled correctly; detectSingleAddedMove rejects invalid moves

Verification

API tests in botapi/src/tests/; cross-team validation during course interop session

10.2.15. QS-15: Cross-Team Bot-vs-Bot Match

Aspect Description

ID

QS-15

Quality Goal

Interoperability / Cross-team

Stimulus

Our bot plays a complete game against a rival team’s API using the remote interop client

Environment

Both YOVI and the rival API deployed and accessible

Artifact

BotAPI remote interop service; remoteInteropClient

Response

Session created, turns played alternately until game finishes; final status returned

Response Measure

Game completes without error; action field cycles correctly between WAITING_OPPONENT, MOVE_SUBMITTED, and GAME_FINISHED

Verification

Manual test during course interop session; POST /remote-games/create followed by repeated POST /remote-games/:id/play-turn

10.3. Traceability to Quality Goals

Quality Goal (Section 1) Related Scenarios Key Metrics

Functionality

QS-01, QS-06, QS-07, QS-14, QS-15

Correct game rules; admin ops; extensible AI; bot API compliance

Usability

QS-04, QS-05, QS-06

New user < 2min; 100% i18n coverage; admin task < 3 clicks

Reliability & Availability

QS-09, QS-10, QS-11

99% uptime; crash recovery < 30s; zero data loss

Modularity & Maintainability

QS-07, QS-08

New AI strategy = 2 changes; coverage > 80%

Security

QS-12, QS-13

100% expired tokens rejected; 100% non-admin admin routes blocked

Testability

QS-08

Coverage > 80% enforced by SonarCloud

Interoperability

QS-14, QS-15

Valid YEN always accepted; cross-team game completes

Performance

QS-01, QS-02, QS-03

p95 move < 2s; p95 login < 1.5s; 50 concurrent users

11. Risks and Technical Debts

Each item is assessed using a risk matrix:

  • Probability: Low (1), Medium (2), High (3)

  • Impact: Low (1), Medium (2), High (3)

  • Risk Score = Probability × Impact (1–9)

Items with score ≥ 6 require active mitigation. Items with score ≥ 8 are critical.

11.1. Technical Risks

ID Risk Description Prob Impact Score Mitigation Strategy

R1

TypeScript–Rust integration — Serialization bugs in the YEN format between Node.js and Rust services (e.g., field naming, null vs. undefined, integer overflow)

2 (M)

3 (H)

6

Versioned JSON interface (YEN); contract tests in BotAPI (yen.test.ts); integration tests covering full round-trips; assertValidYen() validates on every BotAPI entry point

R2

Game Y rules misunderstanding — Incorrect Union-Find win detection (e.g., off-by-one in barycentric coordinates, wrong side-touch detection)

2 (M)

3 (H)

6

Property-based tests with proptest verify coordinate invariants; unit tests cover all edge cases including corner cells, full board, and known winning configurations

R3

Multiplayer room state loss — In-memory RoomManager loses all active rooms on container restart

3 (H)

2 (M)

6

Accepted trade-off for current scope; rooms are short-lived; players can create a new room; documented in Section 8; Redis adapter is the documented upgrade path for horizontal scaling

R4

Socket.IO scalability — Multiplayer service cannot be scaled horizontally without a shared session store; a single instance becomes a bottleneck under high PvP load

2 (M)

2 (M)

4

Current deployment is single-instance; Redis Socket.IO adapter documented as future upgrade; acceptable for course evaluation scale (< 50 concurrent users)

R5

Bot API remote interop host allowlist — The hardcoded allowlist in remoteInteropClient may block valid rival team APIs or become outdated between course iterations

2 (M)

2 (M)

4

Allowlist is configurable in source; update process is documented; fallback: disable check for trusted environments via environment variable

R6

AI performance on large boardsmonte_carlo_extreme can exceed the 2-second response budget on boards > 11×11 due to exponential search space growth

2 (M)

2 (M)

4

Performance budget enforced in k6 threshold; board size is user-configurable with no enforced maximum; MonteCarloBot iteration count is constructor-configurable; documented as a known limitation

R7

MongoDB Atlas dependency — External managed database; outage would take down the entire platform since users, auth, and game results all depend on it

1 (L)

3 (H)

3

Atlas provides 99.95% SLA with automatic failover; MONGODB_URI can be switched to a local instance if needed; connection errors surface immediately via service health endpoints

R8

JWT secret exposure — If JWT_SECRET is leaked, all tokens can be forged, bypassing authentication and admin authorization

1 (L)

3 (H)

3

Secret stored in GitHub Secrets only; never committed; short token lifetime (JWT_EXPIRES=24h); SonarCloud scans for accidental credential commits

R9

Nginx single point of failure — All external traffic routes through a single Nginx container; a crash takes down the entire public-facing system

1 (L)

3 (H)

3

Docker restart: unless-stopped; stateless config (no in-memory state); health check on port 80; Prometheus monitors request rate drop as indirect availability signal

R10

Scope creep — optional features delay evaluation deliverables — Social features, admin panel, and cross-team interop were added incrementally; each adds surface area for bugs and testing debt

2 (M)

2 (M)

4

MoSCoW prioritisation; Definition of Done enforced per feature; SonarCloud coverage gate prevents shipping undertested features

11.2. Technical Debts

ID Debt Description Impact if Not Fixed Status / Planned Resolution

TD1

Multiplayer room state is in-memory only — Active PvP rooms are lost if the multiplayer container restarts. Players must create a new room after any service interruption.

Medium — poor UX during deployments or crashes; unacceptable for production at scale

Accepted for current scope. Future: Redis Socket.IO adapter for persistent room state across restarts and horizontal scaling.

TD2

BotAPI remote host allowlist is hardcodedremoteInteropClient has a hardcoded Set of allowed hosts (localhost:4001, equipo-rival:4001, yovi.13.63.89.84.sslip.io). Adding a new rival team requires a code change and redeploy.

Low — only affects cross-team interop sessions; can be unblocked quickly

Planned: move allowlist to an environment variable (REMOTE_INTEROP_ALLOWED_HOSTS) parsed at startup.

TD3

Users service openapi.yaml is outdated — The OpenAPI spec served at /api-docs only documents /createuser. The many new endpoints (friends, notifications, ranking, search, admin, profile) are not documented in the YAML, reducing discoverability for internal consumers.

Low — internal service not consumed by external developers; Swagger UI is misleading

Planned: update openapi.yaml to cover all current endpoints before final evaluation.

TD4

No pagination on most list endpointsGET /gameresults/:username, GET /users/:username/friends, GET /users/:username/notifications return all records without pagination. Users with large histories may experience slow responses.

Low-Medium — acceptable for current data volumes; degrades as user base grows

Planned: add ?page=&limit= query parameters to list endpoints in the users service.

TD5

Plain-text password comparison — ~~The authentication service compared passwords in plain text~~

~~High — critical security vulnerability~~

✅ Resolved (2026-03-05) — bcrypt hashing implemented in POST /createuser (cost factor 10) and bcrypt.compare() in POST /login. Passwords are never stored or transmitted in plain text.

TD6

Multiplayer game results not persisted — PvP game outcomes are not saved to MongoDB. The users service GameResult model supports gameMode: "pvp" but the multiplayer service does not call the users service on game completion. PvP wins do not appear in statistics or rankings.

Medium — users expect their multiplayer wins to count; ranking is PvM-only

Planned: multiplayer service calls POST /gameresults for both players on game_over event.

TD7

Hint system uses alfa_beta_bot regardless of selected difficultyPOST /hint always delegates to alfa_beta_bot independent of the difficulty level the user has chosen. A user playing on "Easy" (heuristic_bot) receives a Hard-level hint.

Low — hints still work and are useful; inconsistency may confuse advanced users

Planned: pass the selected bot ID as a parameter to POST /hint and use it in the hint calculation. Player Hints wiki page

12. Load Testing

This section documents the load testing strategy, tooling, test scenarios, thresholds, and how to execute the test suite. Load tests are a first-class concern in YOVI (see Section 2, Technical Constraints) and are maintained alongside the production code in tests/load/.

12.1. Tooling

YOVI uses k6 (https://k6.io) for load testing. k6 provides:

  • JavaScript-based test scripts with a familiar API

  • Built-in virtual user (VU) management with ramping stages

  • Custom metrics (Trend, Rate) for fine-grained measurement

  • Threshold definitions that fail the test if SLAs are breached

  • JSON result export for archiving and trend analysis

  • setup() / teardown() lifecycle hooks for prerequisite data

12.2. Test Scenarios

Three scenarios are defined, each targeting a different critical user flow:

12.2.1. Scenario 1: User Registration (register.js)

Simulates 50 concurrent users registering new accounts against POST /register.

Parameter Value

Target endpoint

POST /register

VU ramp

0 → 50 VUs over 10s; hold 50 VUs for 30s; ramp down over 10s

Custom metrics

register_duration (Trend, ms); register_success (Rate)

Thresholds

p(95) < 2000ms; success rate > 95%; http_req_failed < 5%

Request body

Unique username per VU+timestamp: loaduser_{VU}_{timestamp}@test.com

Expected response

HTTP 200 or 201; no error field in JSON body

export const options = {
  scenarios: {
    registration_load: {
      executor: "ramping-vus",
      startVUs: 0,
      stages: [
        { duration: "10s", target: 50 },
        { duration: "30s", target: 50 },
        { duration: "10s", target: 0  },
      ],
    },
  },
  thresholds: {
    register_duration: ["p(95)<2000"],
    register_success:  ["rate>0.95"],
    http_req_failed:   ["rate<0.05"],
  },
};

12.2.2. Scenario 2: User Login (login.js)

Simulates 50 concurrent users logging in with pre-seeded credentials against POST /login.

Parameter Value

Target endpoint

POST /login

VU ramp

0 → 50 VUs over 10s; hold 50 VUs for 30s; ramp down over 10s

Custom metrics

login_duration (Trend, ms); login_success (Rate)

Thresholds

p(95) < 1500ms; success rate > 95%; http_req_failed < 5%

setup() hook

Creates 10 seed users (seed_login_user_0 …​ seed_login_user_9) via POST /register before VUs start. 409/400 responses (users already exist) are ignored.

VU behaviour

Each VU picks a seed user in round-robin: data.users[__VU % data.users.length]

Expected response

HTTP 200; token field present in JSON body

export const options = {
  thresholds: {
    login_duration: ["p(95)<1500"],
    login_success:  ["rate>0.95"],
    http_req_failed: ["rate<0.05"],
  },
};

12.2.3. Scenario 3: Start Bot Game (start_game.js)

Simulates 20 concurrent users starting a new bot game against POST /game/new.

Parameter Value

Target endpoint

POST /game/new (via gateway → gamey)

VU ramp

0 → 20 VUs over 10s; hold 20 VUs for 30s; ramp down over 10s

Custom metrics

start_game_duration (Trend, ms); start_game_success (Rate)

Thresholds

p(95) < 3000ms; success rate > 90%; http_req_failed < 10%

setup() hook

Registers game_load_user and logs in to obtain a JWT token. VUs include Authorization: Bearer <token> in the request.

Request body

{"size": 7} — creates a standard 7×7 board

Expected response

HTTP 200 or 201; yen field present in JSON body (initial YEN state)

Note: The game start threshold (p(95) < 3000ms) is more lenient than auth endpoints because the Rust game engine initialises board state synchronously on each request.

12.3. Running the Tests

12.3.1. Prerequisites

12.3.2. Run a single scenario

# Against local deployment (default)
k6 run tests/load/register.js

# Against production
k6 run -e BASE_URL=https://yovi.13.63.89.84.sslip.io/api tests/load/login.js

12.3.3. Run all scenarios sequentially

Two runner scripts are provided:

Linux / macOS:

chmod +x tests/load/run_load_tests.sh
./tests/load/run_load_tests.sh

# Against production:
./tests/load/run_load_tests.sh https://yovi.13.63.89.84.sslip.io/api

Windows (PowerShell):

.\tests\load\run_load_tests.ps1

# Against production:
.\tests\load\run_load_tests.ps1 -BaseUrl https://yovi.13.63.89.84.sslip.io/api

The runner scripts execute all three scenarios sequentially and save results as JSON and plain text logs to tests/load/results/ with a timestamp suffix (e.g., register_20260415_143022.json).

12.4. Results Interpretation

A k6 run produces a summary table on stdout. Key metrics to check:

Metric Threshold What to look for

register_duration p(95)

< 2000ms

High p95 indicates database pressure or auth service saturation

login_duration p(95)

< 1500ms

Login is faster than register (no bcrypt hash write); high p95 indicates auth or users service bottleneck

start_game_duration p(95)

< 3000ms

High p95 may indicate gamey is CPU-bound on bot initialisation for size 7

http_req_failed rate

< 5% (< 10% for game)

Any failures above threshold indicate service errors under load; check service logs

register_success rate

> 95%

Failures may indicate duplicate username collisions (expected ~5%) or service errors

login_success rate

> 95%

Failures indicate seed users were not created or credential mismatch

A ✓ next to each threshold in the k6 summary means the SLA was met. A ✗ means the threshold was breached and the test suite should be considered failed.

12.5. Architecture of the Load Test Suite

Load Test Architecture

12.6. Custom Metrics Detail

Each scenario defines two custom k6 metrics:

Scenario Metric Type What it measures

Register

register_duration

Trend

End-to-end latency of POST /register including gateway → auth → users → MongoDB

Register

register_success

Rate

Proportion of requests that returned HTTP 2xx without an error field

Login

login_duration

Trend

End-to-end latency of POST /login including gateway → auth → users (bcrypt compare)

Login

login_success

Rate

Proportion of requests that returned HTTP 200 with a token field

Start Game

start_game_duration

Trend

End-to-end latency of POST /game/new including gateway → gamey (board initialisation)

Start Game

start_game_success

Rate

Proportion of requests that returned HTTP 2xx with a yen field

12.7. Known Limitations

  • No multiplayer load test: Socket.IO load testing requires a dedicated k6 xk6-websockets extension or a separate tool (e.g., Artillery). Not included in the current suite.

  • No admin endpoint load test: Admin operations are low-frequency by design and not included.

  • Seed user collisions: The login test creates 10 seed users before the VUs start. If the test is run repeatedly without cleanup, 409 Conflict responses are expected and ignored in setup().

  • Production rate limits: Running the full suite against the production URL (https://yovi.13.63.89.84.sslip.io/api) may impact real users during the test window. Prefer running against a local or staging deployment for development purposes.

13. API Reference

This section provides a centralised reference of all HTTP endpoints exposed by every YOVI service. Endpoints are grouped by service. All internal services communicate over the Docker bridge network (monitor-net); only the routes listed under Gateway and BotAPI are reachable by external clients through Nginx.

For the full interactive OpenAPI specification of the Bot API, see botapi/src/openapi/openapi.yaml.

13.1. Gateway (/api/*)

The gateway is the single entry point for all webapp traffic. All routes listed here are accessible at https://yovi.13.63.89.84.sslip.io/api/<route>; in production. JWT is required on all protected routes unless stated otherwise.

13.1.1. Authentication Routes

Method Path Auth Body / Params Description

POST

/register

None

{username, password, repeatPassword, email?}

Validates payload, hashes password with bcrypt, creates user, returns JWT. Forwards to Auth Service.

POST

/login

None

{username, password}

Verifies bcrypt hash, returns JWT with role claim. Forwards to Auth Service.

GET

/verify

Bearer JWT

Validates JWT signature and expiry. Returns decoded claims {id, username, role}.

13.1.2. User Routes

Method Path Auth Body / Params Description

GET

/users/:username/profile

Bearer JWT

Returns public profile (no password): username, email, realName, bio, location, preferredLanguage, friends count.

PUT

/users/:username/profile

Bearer JWT

{email?, realName?, bio?, location?, preferredLanguage?}

Updates editable profile fields. Only allowed fields are applied ($set).

DELETE

/users/:username

Bearer JWT

Deletes own account and all associated game results and notifications.

13.1.3. Game Routes

Method Path Auth Body / Params Description

POST

/game/new

Bearer JWT

{size: number}

Creates a new empty game board. Returns initial YEN state. Forwards to Gamey.

POST

/game/pvb/move

Bearer JWT

{yen, bot, row, col}

Applies player move and returns bot response. bot is the bot ID string (e.g., "minimax_bot"). Forwards to POST /v1/game/pvb/:bot_id on Gamey.

POST

/hint

Bearer JWT

{yen}

Returns a suggested move for the current position using alfa_beta_bot. Forwards to POST /v1/ybot/choose/alfa_beta_bot on Gamey. Player Hints wiki page

13.1.4. Game Result Routes

Method Path Auth Body / Params Description

POST

/gameresult

Bearer JWT

{username, opponent, result, winner?, score?, boardSize?, gameMode?}

Saves a game result. result must be "win" or "loss". Forwards to Users Service.

GET

/gameresults/:username

Bearer JWT

Returns all game results for a user, newest first.

GET

/ranking

Bearer JWT

Returns top-10 users by wins. MongoDB aggregation pipeline.

13.1.5. Social Routes

Method Path Auth Body / Params Description

GET

/search

Bearer JWT

?q=<query>

Searches users by username or realName. Returns matching user profiles (no passwords).

GET

/friends/:username

Bearer JWT

Returns the accepted friends list of a user.

POST

/friends/request/:username

Bearer JWT

Sends a friend request to :username. Sender is extracted from JWT. Creates a friend_request notification for the recipient.

POST

/friends/accept/:username

Bearer JWT

Accepts a friend request from :username. Adds both users to each other’s friends[] array.

POST

/friends/reject/:username

Bearer JWT

Rejects a friend request from :username. Removes from friendRequests[].

DELETE

/friends/:username

Bearer JWT

Removes :username from the authenticated user’s friends list (bidirectional $pull).

13.1.6. Notification Routes

Method Path Auth Body / Params Description

GET

/notifications

Bearer JWT

Returns all notifications for the authenticated user, newest first. Username extracted from JWT.

PATCH

/notifications/:id/read

Bearer JWT

Marks notification :id as read ({read: true}).

13.1.7. Admin Routes

Admin routes require a valid JWT with role: "admin". Regular user tokens receive 403 Forbidden.

Method Path Auth Body / Params Description

GET

/admin/users

Admin JWT

?page=<n>&limit=<n>

Returns paginated list of all registered users with profile and role information.

PATCH

/admin/users/:username/role

Admin JWT

{role: "admin" | "user"}

Grants or revokes admin privileges on the target user account.

DELETE

/admin/users/:username/history

Admin JWT

Deletes all game results for :username. User account is preserved.

DELETE

/admin/users/:username

Admin JWT

Permanently deletes the user account and all associated data (game results, notifications). Also removes the user from all other users' friend lists via $pull.

13.1.8. Multiplayer REST Routes (via Gateway)

Method Path Auth Body / Params Description

POST

/multiplayer/room/create

Bearer JWT

{username, size}

Creates a new room. Calls Gamey for initial YEN. Returns room code and player color ("B").

POST

/multiplayer/room/join

Bearer JWT

{code, username}

Joins an existing room by code. Returns room state and player color ("R"). Emits Socket.IO room_updated and game_started events to connected clients.

POST

/multiplayer/room/state

Bearer JWT

{code, username?}

Returns current room state and the requesting user’s color.

POST

/multiplayer/room/move

Bearer JWT

{code, username, row, col}

Submits a move. Validates turn order, forwards to Gamey, emits game_updated event.

POST

/multiplayer/room/leave

Bearer JWT

{code, username}

Removes player from room. Emits opponent_left event to remaining player.

13.2. Gamey — Game Engine (gamey:4000)

Internal service — not directly accessible from outside the Docker network. Called by Gateway, Multiplayer, and BotAPI.

Method Path Body Description

GET

/status

Health check. Returns "OK" plain text.

GET

/metrics

Prometheus metrics endpoint. Scraped by Prometheus every 15 seconds.

POST

/game/new

{size: number}

Creates a new empty game. Returns initial YEN with all . cells.

POST

/game/check

YEN body

Checks current win condition without applying a move. Returns {finished, winner, winning_edges}.

POST

/v1/game/pvb/:bot_id

{yen, row, col}

Applies player move at (row, col), then computes bot response using :bot_id. Returns {yen, finished, winner, winning_edges}.

POST

/v1/game/pvp/move

{yen, row, col}

Applies a PvP move at (row, col). Returns {yen, finished, winner, winning_edges}. Used by the Multiplayer service.

POST

/v1/ybot/choose/:bot_id

YEN body

Returns the bot’s chosen move coordinates without applying it to the board. Returns {api_version, bot_id, coords: {x, y, z}}.

Available bot_id values:

bot_id Algorithm Difficulty

random_bot

Random valid move

heuristic_bot

Side connection heuristic

Easy

minimax_bot

Minimax (depth 3)

Medium

alfa_beta_bot

Minimax + alpha-beta pruning

Hard

monte_carlo_hard

Monte Carlo Tree Search

Expert

monte_carlo_extreme

MCTS (more iterations)

Extreme

13.3. Authentication Service (authentication:5000)

Internal service — called exclusively by the Gateway.

Method Path Body Description

POST

/register

{username, password, repeatPassword, email?}

Validates payload, hashes password, calls Users Service to create user, returns JWT.

POST

/login

{username, password}

Fetches user from Users Service, bcrypt compares password, returns JWT with role claim.

GET

/verify

Authorization header

Verifies JWT signature and expiry. Returns decoded claims.

GET

/health

Returns {status: "OK", service: "auth-service", timestamp}.

13.4. Users Service (users:3000)

Internal service — called by Gateway and Authentication Service. Swagger UI available at http://users:3000/api-docs (note: currently documents only a subset of endpoints — tracked as TD3).

Method Path Body / Params Description

POST

/createuser

{username, email?, password}

Creates a new user with bcrypt-hashed password and welcome notification.

GET

/users/:username

Returns user including hashed password (used by Auth Service for bcrypt comparison).

GET

/users/:username/profile

Returns user profile without password field.

PUT

/users/:username/profile

{email?, realName?, bio?, location?, preferredLanguage?}

Updates allowed profile fields using $set.

DELETE

/users/:username

Deletes user and all associated game results and notifications.

GET

/users/:username/friends

Returns the friends[] array for the user.

POST

/users/:username/friends/request

{from}

Adds from to friendRequests[]. Creates friend_request notification.

POST

/users/:username/friends/accept

{from}

Removes from from friendRequests[], adds to friends[] on both users.

POST

/users/:username/friends/reject

{from}

Removes from from friendRequests[].

DELETE

/users/:username/friends/:friend

Bidirectional $pull — removes the friendship on both sides.

GET

/users/:username/notifications

Returns all notifications for the user, sorted newest first.

PATCH

/users/:username/notifications/:id/read

Sets {read: true} on the notification.

POST

/gameresults

{username, opponent, result, winner?, score?, boardSize?, gameMode?}

Saves a game result document.

GET

/gameresults/:username

Returns all game results for the user, newest first.

GET

/ranking

Top-10 users by wins via MongoDB aggregation.

GET

/search

?q=<query>

Text search on username and realName fields.

GET

/admin/users

?page=&limit=

(Admin) Paginated list of all users.

PATCH

/admin/users/:username/role

{role}

(Admin) Grants or revokes admin role.

DELETE

/admin/users/:username/history

(Admin) Deletes all game results for user.

DELETE

/admin/users/:username

(Admin) Deletes user account and all associated data.

GET

/health

Returns {status: "OK", service: "users-service"}.

GET

/metrics

Prometheus metrics endpoint.

13.5. Multiplayer Service (multiplayer:7000)

Internal service — REST endpoints called by Gateway. Socket.IO server accessible via Nginx at /socket.io/*. See Section 8 (Cross-cutting Concepts) for the full Socket.IO event catalogue.

Method Path Body Description

GET

/health

Returns {status: "ok", service: "multiplayer"}.

GET

/rooms/:code

Returns serialized room state (no socket IDs).

POST

/rooms/create

{username, size}

Creates room, calls Gamey for initial YEN, returns {ok, room, yourColor: "B"}.

POST

/rooms/join

{code, username}

Joins room, sets status to "active", emits room_updated and game_started.

POST

/rooms/state

{code, username?}

Returns current room state and player color.

POST

/rooms/move

{code, username, row, col}

Validates turn, calls Gamey PvP move, emits game_updated. Returns updated room.

POST

/rooms/leave

{code, username}

Removes player, emits opponent_left to remaining player.

13.6. BotAPI — Interoperability Service (/interop/*)

Public service — accessible at https://yovi.13.63.89.84.sslip.io/interop via Nginx. Full OpenAPI 3.1 spec: openapi.yaml.

13.6.1. Local Game Endpoints (Server Mode)

External bots use these endpoints to play against YOVI’s bots.

Method Path Body / Params Description

GET

/health

Returns {status: "ok"}.

POST

/games

{size, bot_id}

Creates a new active game session. Returns {game_id, bot_id, position: YEN, status}.

GET

/games/:gameId

Returns current state of an active game.

POST

/games/:gameId/play

{position: YEN}

Submits opponent move (detects single added move vs. stored YEN), gets bot response. Returns updated state.

GET

/play

?position=<YEN_JSON>&bot_id=<id>

Stateless move: given a YEN position, returns bot move coordinates without storing state.

13.6.2. Remote Session Endpoints (Client Mode)

YOVI’s bots use these endpoints to play against rival teams' APIs.

Method Path Body Description

POST

/remote-games/connect

{base_url, game_id, local_bot_id, our_player_index}

Connects to an existing game on a rival API. Creates a local session tracking the remote game.

POST

/remote-games/create

{base_url, size, remote_bot_id, local_bot_id, our_player_index?}

Creates a new game on a rival API and stores a local session.

GET

/remote-games/:sessionId

Returns stored remote session information including last known state.

POST

/remote-games/:sessionId/play-turn

Syncs remote state. If it is our turn: chooses move via Gamey, applies to remote API. Returns {action: "WAITING_OPPONENT" | "MOVE_SUBMITTED" | "GAME_FINISHED", session, move?}.

13.7. Error Response Format

All Node.js services return errors in the following format:

{
  "ok": false,
  "error": "Human-readable error message"
}

The BotAPI (TypeScript) uses a slightly different format matching the OpenAPI spec:

{
  "code": "NOT_FOUND",
  "message": "game abc123 not found"
}

Standard HTTP status codes apply across all services. See Section 8 (Cross-cutting Concepts) for the complete status code reference.

14. Monitoring and Observability

This section documents the observability stack, the metrics collected, the Grafana dashboard, and how to access monitoring tools in local and production environments.

14.1. Overview

YOVI uses a Prometheus + Grafana stack for metrics-based observability. Three services expose Prometheus-compatible /metrics endpoints: the gateway, the users service, and the gamey service. Prometheus scrapes all three every 15 seconds. Grafana visualises the collected time series in a pre-built dashboard that auto-provisions at container startup.

Monitoring Architecture

14.2. Instrumented Services

Service Library Metrics endpoint Metric prefix

Gateway

express-prom-bundle (Node.js)

GET http://gateway:8080/metrics

http_requests_total, http_request_duration_seconds

Users Service

express-prom-bundle (Node.js)

GET http://users:3000/metrics

http_requests_total, http_request_duration_seconds

Gamey

axum-prometheus (Rust)

GET http://gamey:4000/metrics

axum_http_requests_total, axum_http_requests_duration_seconds

All express-prom-bundle instances are configured with:

promBundle({
  includeMethod: true,
  includePath: true,
  includeStatusCode: true,
  normalizePath: true,   // prevents cardinality explosion from dynamic path params
})

See ADR-013 for the rationale behind normalizePath: true.

14.3. Prometheus Configuration

Prometheus is configured via users/monitoring/prometheus/prometheus.yml:

global:
  scrape_interval: 15s    # standard recommended value for production

scrape_configs:
  - job_name: 'users-service'
    static_configs:
      - targets: ['users:3000']

  - job_name: 'gateway-service'
    static_configs:
      - targets: ['gateway:8080']

  - job_name: 'gamey-service'
    static_configs:
      - targets: ['gamey:4000']

Key decisions:

  • 15-second scrape interval — the standard recommended value; balances resolution with storage overhead

  • Static targets — service discovery is not needed since all containers are on a fixed internal Docker network with predictable hostnames

  • No authentication on /metrics — endpoints are only reachable on the internal Docker network; Nginx does not expose /metrics externally

14.4. Grafana Dashboard

The "Yovi Services Overview" dashboard (UID yovi-overview) auto-provisions at Grafana startup via the provisioning directory at users/monitoring/grafana/provisioning/.

Provisioning files:

  • datasources/datasource.yml — registers Prometheus at http://prometheus:9090 as the default data source

  • dashboards/dashboard.yml — configures the dashboard provider (file-based, 30s update interval)

  • dashboards/dashboard.json — the dashboard definition with all panels

14.4.1. Dashboard Panels

The dashboard contains three panels, each covering all three instrumented services simultaneously:

Panel 1 — Request Rate (req/s)

Displays the per-second HTTP request rate for each service, broken down by method, path, and status code.

# Users and Gateway (express-prom-bundle)
rate(http_requests_total{job="users-service"}[1m])
rate(http_requests_total{job="gateway-service"}[1m])

# Gamey (axum-prometheus)
rate(axum_http_requests_total{job="gamey-service"}[1m])

Legend format: {service} — {method} {path} {status_code}

Panel 2 — P95 Request Duration (seconds)

Displays the 95th percentile request duration for each service over a 5-minute window.

# Users and Gateway
histogram_quantile(0.95,
  rate(http_request_duration_seconds_bucket{job="users-service"}[5m])
)
histogram_quantile(0.95,
  rate(http_request_duration_seconds_bucket{job="gateway-service"}[5m])
)

# Gamey
histogram_quantile(0.95,
  rate(axum_http_requests_duration_seconds_bucket{job="gamey-service"}[5m])
)

This panel directly corresponds to the performance quality scenarios QS-01 and QS-02.

Panel 3 — Error Rate (4xx + 5xx)

Displays the per-second rate of HTTP error responses (client and server errors combined) per service.

# Users and Gateway
rate(http_requests_total{
  job="users-service",
  status_code=~"4..|5.."
}[1m])

rate(http_requests_total{
  job="gateway-service",
  status_code=~"4..|5.."
}[1m])

# Gamey
rate(axum_http_requests_total{
  job="gamey-service",
  status_code=~"4..|5.."
}[1m])

A sustained spike in this panel indicates a service degradation or upstream dependency failure (e.g., MongoDB unavailable, Gamey container restarting).

14.5. Accessing Monitoring Tools

14.5.1. Local Development

After running docker-compose up:

Tool URL Notes

Prometheus

http://localhost:9090

Query interface; check targets at http://localhost:9090/targets

Grafana

http://localhost:9091

Default credentials: admin / admin; dashboard auto-loads on first visit

Users metrics

http://localhost:3000/metrics

Raw Prometheus text format; useful for debugging metric labels

Gateway metrics

http://localhost:8080/metrics

Raw Prometheus text format

Gamey metrics

http://localhost:4000/metrics

Raw Prometheus text format (axum-prometheus format)

14.5.2. Production

In production, Prometheus (9090) and Grafana (9091) are exposed on the VM’s public IP but are not routed through Nginx. They should be protected by a firewall rule limiting access to trusted IPs.

To verify the monitoring stack is healthy in production:

# Check all scrape targets are UP
curl http://<VM_IP>:9090/api/v1/targets | jq '.data.activeTargets[].health'

# Check Grafana is running
curl -u admin:admin http://<VM_IP>:9091/api/health

14.6. Observability Gaps and Planned Improvements

Gap Impact Planned improvement

Multiplayer service has no Prometheus metrics

No visibility into WebSocket connection count, room creation rate, or move throughput

Add express-prom-bundle to multiplayer service; add Socket.IO connection gauge

BotAPI has no Prometheus metrics

No visibility into external bot session count or interop request rate

Add express-prom-bundle to botapi service

Authentication service has no Prometheus metrics

No visibility into login/register rates or bcrypt latency

Add express-prom-bundle to authentication service

No alerting rules defined

Grafana shows degradation visually but no automatic alerts are triggered

Define Grafana alert rules for: error rate > 10% sustained for 2 minutes; p95 latency > 3s on gateway

No structured logging

Service logs are unstructured console.log / eprintln! output; no log aggregation

Add structured JSON logging (e.g., pino for Node.js, tracing-subscriber JSON layer for Rust); integrate with a log aggregator (e.g., Loki + Grafana)

No uptime / synthetic monitoring

No external check verifies the system is reachable from outside the VM

Add an external uptime check (e.g., UptimeRobot, Grafana Cloud Synthetic Monitoring) hitting https://yovi.13.63.89.84.sslip.io/api/health

15. Glossary

This glossary defines the most important domain and technical terms used throughout the YOVI project. It ensures that all stakeholders — developers, client, evaluators, and bot developers — have an identical understanding of these terms.

15.1. Domain Terms

Term Definition

YOVI

The name of the system developed in this project. A web-based platform for playing Game Y, supporting human players, administrators, and external bots.

Game Y

An abstract strategy board game played on a triangular board where two players compete to connect all three sides of the board with a connected group of their pieces.

Classic Game Y

The standard version of Game Y with the original rules and a triangular board. Mandatory implementation for this project.

YEN (Y Exchange Notation)

A JSON-based format for representing Game Y state, inspired by chess FEN notation. Consists of size, turn, players, and layout fields. Mandated by the client for all service-to-service and external communication involving game state.

Board Size

The edge length of the triangular board. A board of size N has N×(N+1)/2 total cells. The default size in YOVI is 7 (28 cells). Users can configure this when starting a game.

Barycentric Coordinates

The coordinate system used in YOVI to address cells on the triangular board. A cell is identified by three non-negative integers (x, y, z) where x + y + z = N − 1, and N is the board size. Each coordinate represents the distance from one of the three sides.

Side A / Side B / Side C

The three edges of the triangular board. A player wins by connecting all three sides with a single connected group of pieces. In barycentric coordinates: Side A = cells where x = 0; Side B = cells where y = 0; Side C = cells where z = 0.

Player (Blue / Red)

The two participants in a Game Y match. Blue ("B") moves first; Red ("R") moves second. Tokens are single characters in the YEN layout string.

Move / Placement

A single action where a player places one piece on an empty cell of the board. Represented in YEN by changing a . character to the player’s token (B or R).

Win Condition

A player wins when they have a connected group of pieces that simultaneously touches all three sides of the board. Detected via Union-Find (Disjoint Set Union) in the game engine.

PvM (Player vs Machine)

A game mode where a human player competes against an AI bot. The bot is selected by the player from six available difficulty levels.

PvP (Player vs Player)

A game mode where two human players compete in real time via WebSockets. Each player joins the same room using a shared room code.

Room

A private multiplayer session identified by a unique 6-character code. Contains two player slots (Blue and Red), the current YEN state, and the room status (waiting, active, finished). Room state is held in memory by the multiplayer service.

Room Code

A 6-character identifier used to join a multiplayer room. Generated from an unambiguous alphabet (no O/0/I/1) to minimise transcription errors. Example: ABCD23.

User

A registered human account with a username, bcrypt-hashed password, optional email, profile fields, friend list, notification list, and match history.

Administrator (Admin)

A user with the role: "admin" claim in their JWT. Can access the admin panel to view all users, grant or revoke admin privileges, delete match histories, and delete accounts.

Match History

A record of all games played by a user, stored in the gameresults MongoDB collection. Includes opponent, result (win/loss), score, board size, game mode (pvb/pvp), and date.

Ranking

A top-10 leaderboard of users ordered by number of wins. Computed via MongoDB aggregation pipeline on the gameresults collection.

Friend Request

An invitation sent by one user to another. Stored in the recipient’s friendRequests[] array and creates a friend_request notification. Can be accepted (bidirectional add) or rejected (remove only).

Notification

An in-app message for a user. Two types: friend_request (from another user) and welcome (system, on registration). Stored in the notifications MongoDB collection with a read boolean flag.

Hint

A suggested move computed by alfa_beta_bot for the current board position, returned by POST /hint. Helps human players when they are unsure of their next move.

Bot

An automated program that interacts with the YOVI system through the public interoperability API (botapi) to play Game Y. Bots communicate exclusively via YEN notation over HTTP.

Interoperability (Interop)

The capability of the YOVI system to interact with external bots and rival teams' APIs using a shared contract based on YEN notation. Supported in both server mode (external bots play our bots) and client mode (our bots play rival APIs).

15.2. Technical Terms

Term Definition

Microservice

An independently deployable service with a single well-defined responsibility. YOVI comprises eight microservices: nginx, webapp, gateway, authentication, users, gamey, multiplayer, and botapi.

Docker / Docker Compose

Containerization platform (Docker) and multi-container orchestration tool (Docker Compose) used to package, run, and manage all YOVI services consistently across environments.

GitHub Container Registry (GHCR)

The container image registry at ghcr.io/arquisoft/ where published Docker images for each YOVI service are stored and pulled from during production deployment.

Nginx

The reverse proxy used as the single public entry point for YOVI. Handles HTTPS termination, HTTP→HTTPS redirect, and path-based routing to webapp, gateway, multiplayer, and botapi.

JWT (JSON Web Token)

A compact, URL-safe token used for stateless authentication. Contains user identity claims (id, username, email, role) signed with JWT_SECRET using HS256. Expires after JWT_EXPIRES (default 24h).

bcrypt

A password hashing algorithm used to store user passwords securely. YOVI uses a cost factor of 10 via the bcryptjs library. Hashing is performed in the users service at account creation; comparison is performed in the authentication service at login.

RBAC (Role-Based Access Control)

An authorization model where access to resources is determined by the user’s role. YOVI has two roles: user (default) and admin. The role is embedded in the JWT payload and enforced at both the client (webapp route guard) and server (gateway middleware) levels.

Socket.IO

A JavaScript library providing real-time bidirectional event-based communication over WebSockets, with automatic HTTP long-polling fallback. Used by the multiplayer service for PvP game events.

WebSocket

A communication protocol providing full-duplex channels over a single TCP connection. Used by Socket.IO for real-time PvP communication. Nginx proxies WebSocket connections to the multiplayer service via the Upgrade header.

REST API

Representational State Transfer. The architectural style used for all HTTP-based service-to-service and external communication in YOVI. Resources are accessed via standard HTTP methods (GET, POST, PUT, PATCH, DELETE).

Express

A web framework for Node.js used in gateway (v5), authentication (v5), users (v5), and multiplayer (v4) services. Provides routing, middleware, and HTTP request/response handling.

Axum

An async HTTP framework for Rust built on Tokio. Used by the gamey service. Provides type-safe request extractors, shared state via Arc<AppState>, and native Tower middleware compatibility.

Tokio

An asynchronous runtime for Rust. Used by the gamey service to handle concurrent HTTP requests efficiently without blocking threads.

Mongoose

An Object Document Mapper (ODM) for MongoDB in Node.js. Used by the users service to define schemas (User, GameResult, Notification), perform validation, and build queries.

MongoDB

A NoSQL document database used for user data persistence. Collections: users, gameresults, notifications. Accessed exclusively by the users service.

MongoDB Atlas

MongoDB’s managed cloud database service. Used in production; connection URI injected via MONGODB_URI environment variable.

Union-Find (Disjoint Set Union)

A data structure used in the gamey game engine to efficiently track connected components of each player’s pieces. The PlayerSet struct tracks touches_side_a, touches_side_b, touches_side_c flags per component to detect the win condition in O(α(N)) amortised time.

YBot

The Rust trait that all AI bot strategies implement. Defines name() → &str and choose_move(&GameY) → Option<Coordinates>. The Send + Sync bounds allow safe use across Axum’s async handlers.

YBotRegistry

A HashMap<String, Arc<dyn YBot>> that stores and retrieves bot implementations by name. Populated at server startup and shared across all request handlers via Arc<AppState>.

Vite

A modern frontend build tool used by the webapp. Provides a fast development server with hot module replacement and an optimised production build pipeline.

React

A JavaScript library for building user interfaces via components. Used by the webapp for all UI pages, the notification panel, the admin panel, and the game board.

React Context

A React mechanism for sharing global state without prop drilling. Used in YOVI for ThemeProvider (dark/light) and I18nProvider (EN/ES).

Vitest

A Vite-native unit testing framework used by gateway, authentication, and botapi services. Compatible with the Jest API; supports coverage via @vitest/coverage-v8.

Jest

A JavaScript testing framework used by the users and multiplayer services.

Supertest

A Node.js HTTP assertion library used alongside Vitest/Jest to test Express HTTP endpoints without starting a real server.

proptest

A property-based testing library for Rust. Used in gamey/src/core/coord.rs to verify coordinate invariants (index roundtrip, barycentric sum) across a large input space.

k6

An open-source load testing tool with a JavaScript API. Used for three load test scenarios: registration (50 VUs), login (50 VUs), and game start (20 VUs). Threshold definitions enforce SLAs on p(95) latency and error rate.

SonarCloud

A cloud-based code quality and security analysis platform. Integrated into the CI pipeline to enforce a minimum 80% code coverage and detect code smells, bugs, and security hotspots.

Prometheus

An open-source time-series metrics collection system. Scrapes /metrics endpoints on gateway, users, and gamey every 15 seconds.

Grafana

A metrics visualization platform. Provides the "Yovi Services Overview" dashboard with request rate, p95 latency, and error rate panels. Auto-provisioned at container startup.

express-prom-bundle

A Node.js middleware that automatically instruments Express apps with Prometheus metrics (http_requests_total, http_request_duration_seconds). Used with normalizePath: true to prevent high-cardinality label explosion from dynamic route parameters.

axum-prometheus

A Rust crate that adds Prometheus metrics to Axum services via Tower middleware. Exposes axum_http_requests_total and axum_http_requests_duration_seconds on the /metrics endpoint.

GitHub Actions

A CI/CD platform integrated into the GitHub repository. Runs parallel test jobs, builds Docker images, publishes to GHCR, and deploys to the cloud VM on every release tag.

OpenAPI

A specification format for describing REST APIs. Used by the botapi service (src/openapi/openapi.yaml) to document all public interoperability endpoints. The users service also has an openapi.yaml at /api-docs (currently outdated — tracked as TD3).

Debounce

A technique that delays execution of a function until after a specified idle period. Used in Social.tsx to limit user search API calls to once per 400ms pause in typing. Contrasted with throttle (fires at a fixed interval regardless of pauses).

Normalization (Prometheus path)

The process of replacing dynamic path segments (e.g., usernames, ObjectIds) with a generic placeholder (e.g., #val) in Prometheus metric labels. Prevents unbounded cardinality growth as the user base grows.

Cardinality (Prometheus)

The number of unique time series in a Prometheus database. High cardinality (e.g., one series per username in metric labels) degrades query performance and can cause OOM errors. Controlled via normalizePath: true in express-prom-bundle.

SSRF (Server-Side Request Forgery)

A security vulnerability where an attacker tricks a server into making HTTP requests to unintended targets. Mitigated in the botapi remoteInteropClient via a hardcoded allowlist of permitted hosts.

ADR (Architecture Decision Record)

A document that captures an important architectural decision, including its context, the decision made, alternatives considered, and its consequences. YOVI maintains 16 ADRs in this document and the GitHub wiki.

15.3. Acronyms

Acronym Definition

ADR

Architecture Decision Record

API

Application Programming Interface

CI/CD

Continuous Integration / Continuous Deployment

GHCR

GitHub Container Registry

HTTP

Hypertext Transfer Protocol

HTTPS

Hypertext Transfer Protocol Secure

i18n

Internationalization

JSON

JavaScript Object Notation

JWT

JSON Web Token

MCTS

Monte Carlo Tree Search

ODM

Object Document Mapper

OOM

Out Of Memory

PvM

Player versus Machine

PvP

Player versus Player

RBAC

Role-Based Access Control

REST

Representational State Transfer

SLA

Service Level Agreement

SPA

Single-Page Application

SSRF

Server-Side Request Forgery

TLS

Transport Layer Security

UI

User Interface

VU

Virtual User (k6 load testing)

WS / WSS

WebSocket / WebSocket Secure

YEN

Y Exchange Notation