Database Schema Review + Security Audit

This is a read-only audit. No files were modified.

1. DATABASE SCHEMA ISSUES

1.1 Missing Indexes

TableColumn(s)Impact
ordersstatusAll status-based filtering queries do full table scans (list_orders, stats)
itemsorder_idEvery order detail page triggers a scan
itemshydee_wareidsearch_hydee LIKE queries, batch-match, stats counting
items(order_id, row_no)Used by OCR re-run matching updates
ordersemployee_idEmployee-specific listing
orderssupplier_idJOINs with suppliers table

Only 2 indexes exist: idx_aliases_alias and idx_aliases_wareid — both on the small metadata aliases table. The transactional tables (orders, items) have ZERO indexes beyond PRIMARY KEY.

1.2 Foreign Keys Without ON DELETE Actions

FKProblem
orders.supplier_id → suppliers(id)Deleting a supplier would fail or leave dangling refs
orders.employee_id → users(id)Deleting a user breaks the reference
orders.operator_id → users(id)Same
ocr_log.order_id → orders(id)Deleting an order leaves orphan log entries

items.order_id has ON DELETE CASCADE — good, but this is the only one properly configured.

1.3 CHECK Constraint vs Code Inconsistency

  • init_db() defines: CHECK(review_status IN (‘pending’,‘corrected’,‘confirmed’,‘flag’)) — includes ‘flag’
  • Migration rebuilt items table with: CHECK(review_status IN (‘pending’,‘matched’,‘corrected’,‘confirmed’,‘rejected’)) — no ‘flag’, adds ‘matched’ and ‘rejected’
  • status_service.py defines ITEM_REVIEW_STATUS as {“pending”, “corrected”, “confirmed”, “rejected”} — no ‘matched’ or ‘flag’

The init schema and migration schema differ. The code in status_service.py does not match either SQL definition. After migration runs, the init schema is effectively dead for the items table.

1.4 SHA-256 Password Hash

Passwords are hashed with hashlib.sha256() (in users.py, auth.py). SHA-256 is a fast general-purpose hash, NOT suitable for password storage. Should use pbkdf2_hmac with salt or bcrypt.

1.5 image_paths Stored as Comma-Separated TEXT

  • Denormalized: violates 1NF. All image path operations require split(”,”) + string joining
  • No referential integrity for image files
  • Path resolution in _resolve_image_paths() mixes absolute paths (legacy) with relative paths
  • Should use a separate order_images table with (order_id, path, sort_order)

1.6 No Schema Version Tracking

run_migrations() detects needed changes by inspecting PRAGMA table_info() and sqlite_master SQL strings. This is fragile. Schema rebuilds re-create the entire table — risky with production data.

1.7 Duplicate Schema Definitions in init + Migrations

Both init_db() and run_migrations() duplicate the full CREATE TABLE statement for rebuilds.

2. SECURITY AUDIT

2.1 Auth Bypass — Public Endpoint Exposure

PUBLIC_PREFIXES in main.py includes “/api/benchmark” — fully public. “/docs” and “/openapi” expose the entire API schema.

2.2 CORS Wildcard in Production

allow_origins=[”*”] allows any origin to make API calls. Combined with Bearer token auth (disablable when API_TOKEN is empty), this is a significant risk.

2.3 Uploads Directory Publicly Accessible

The /uploads mount serves files directly with no auth check — anyone with the URL can view any uploaded image.

2.4 Path Traversal

SAFE: Filenames are regenerated with UUIDs, extension is whitelisted. No path traversal via filename. RISK in delete endpoint: Legacy absolute paths from DB are used directly for os.remove().

2.5 F-String SQL Patterns — SQL Injection

VERDICT: NO exploitable SQL injection found. All user input goes through parameterized queries or safe whitelist-based field construction.

2.6 Secrets: API_TOKEN Default Empty

When empty, the auth middleware is effectively bypassed. If .env lacks API_TOKEN, all API endpoints are completely open.

2.7 CORS + No CSRF Protection

Any origin can make requests and read responses.

2.8 Missing Production Guardrails

  • No rate limiting
  • No request size limits on non-upload endpoints
  • SPA fallback sends index.html for all non-API paths including /admin, /.env, etc.

3. CONFIGURATION ISSUES

3.1 Missing .env.example and requirements.txt

  • .env.example: Does NOT exist. No documentation of required env vars.
  • requirements.txt: Not found. Dependencies on fastapi, pydantic, uvicorn, python-dotenv, pymssql etc. but no requirements file.

3.2 Hardcoded Configuration

ParameterDefault in Source
MSSQL_SERVER”183.220.33.12”
MSSQL_PORT”17433”
MSSQL_DATABASE”hydee”
MSSQL_USER”hydee_reader”
UPLOAD_DIRComputed in both upload.py and orders.py (duplicate, possibly different resolution)

3.3 Default SQLite Database Path

DB_PATH = os.path.join(os.path.dirname(file), “data”, “invoices.db”) The SQLite file is stored inside the app directory, making it harder to backup, rotate, or scale in production.

4. SUMMARY OF FINDINGS

Critical

  1. API_TOKEN default empty — Authentication is completely disabled if .env is missing this value
  2. CORS wildcard — Any website can make API calls
  3. /uploads publicly accessible — No auth on uploaded image access
  4. Missing .env.example and requirements.txt — No standardized configuration documentation

High

  1. Missing indexes on 6+ columns — Performance will degrade with data volume
  2. SHA-256 password hashing — Weak, no salt, no iterations
  3. image_paths as comma-separated TEXT — Denormalized, fragile path resolution
  4. Foreign keys without ON DELETE — 4 of 5 FKs lack CASCADE/SET NULL
  5. CHECK constraint inconsistency — init_db(), migration, and code all define different values for review_status

Medium

  1. No schema versioning — Migration detection is fragile/reliant on string matching
  2. Hardcoded MSSQL credentials in source — Default user/password visible in code
  3. Swagger /docs exposed publicly — API surface visible without auth
  4. /api/benchmark public — Undocumented public endpoint

Low

  1. Duplicate UPLOAD_DIR path computation — Inconsistent between upload.py and orders.py
  2. SQLite db inside app directory — Not ideal for production deployment
  3. SPA fallback serves /.env etc. — Informational disclosure risk