Database Schema Review + Security Audit
This is a read-only audit. No files were modified.
1. DATABASE SCHEMA ISSUES
1.1 Missing Indexes
| Table | Column(s) | Impact |
|---|---|---|
orders | status | All status-based filtering queries do full table scans (list_orders, stats) |
items | order_id | Every order detail page triggers a scan |
items | hydee_wareid | search_hydee LIKE queries, batch-match, stats counting |
items | (order_id, row_no) | Used by OCR re-run matching updates |
orders | employee_id | Employee-specific listing |
orders | supplier_id | JOINs with suppliers table |
Only 2 indexes exist: idx_aliases_alias and idx_aliases_wareid — both on the small metadata aliases table. The transactional tables (orders, items) have ZERO indexes beyond PRIMARY KEY.
1.2 Foreign Keys Without ON DELETE Actions
| FK | Problem |
|---|---|
orders.supplier_id → suppliers(id) | Deleting a supplier would fail or leave dangling refs |
orders.employee_id → users(id) | Deleting a user breaks the reference |
orders.operator_id → users(id) | Same |
ocr_log.order_id → orders(id) | Deleting an order leaves orphan log entries |
items.order_id has ON DELETE CASCADE — good, but this is the only one properly configured.
1.3 CHECK Constraint vs Code Inconsistency
- init_db() defines: CHECK(review_status IN (‘pending’,‘corrected’,‘confirmed’,‘flag’)) — includes ‘flag’
- Migration rebuilt items table with: CHECK(review_status IN (‘pending’,‘matched’,‘corrected’,‘confirmed’,‘rejected’)) — no ‘flag’, adds ‘matched’ and ‘rejected’
- status_service.py defines ITEM_REVIEW_STATUS as {“pending”, “corrected”, “confirmed”, “rejected”} — no ‘matched’ or ‘flag’
The init schema and migration schema differ. The code in status_service.py does not match either SQL definition. After migration runs, the init schema is effectively dead for the items table.
1.4 SHA-256 Password Hash
Passwords are hashed with hashlib.sha256() (in users.py, auth.py). SHA-256 is a fast general-purpose hash, NOT suitable for password storage. Should use pbkdf2_hmac with salt or bcrypt.
1.5 image_paths Stored as Comma-Separated TEXT
- Denormalized: violates 1NF. All image path operations require split(”,”) + string joining
- No referential integrity for image files
- Path resolution in _resolve_image_paths() mixes absolute paths (legacy) with relative paths
- Should use a separate order_images table with (order_id, path, sort_order)
1.6 No Schema Version Tracking
run_migrations() detects needed changes by inspecting PRAGMA table_info() and sqlite_master SQL strings. This is fragile. Schema rebuilds re-create the entire table — risky with production data.
1.7 Duplicate Schema Definitions in init + Migrations
Both init_db() and run_migrations() duplicate the full CREATE TABLE statement for rebuilds.
2. SECURITY AUDIT
2.1 Auth Bypass — Public Endpoint Exposure
PUBLIC_PREFIXES in main.py includes “/api/benchmark” — fully public. “/docs” and “/openapi” expose the entire API schema.
2.2 CORS Wildcard in Production
allow_origins=[”*”] allows any origin to make API calls. Combined with Bearer token auth (disablable when API_TOKEN is empty), this is a significant risk.
2.3 Uploads Directory Publicly Accessible
The /uploads mount serves files directly with no auth check — anyone with the URL can view any uploaded image.
2.4 Path Traversal
SAFE: Filenames are regenerated with UUIDs, extension is whitelisted. No path traversal via filename. RISK in delete endpoint: Legacy absolute paths from DB are used directly for os.remove().
2.5 F-String SQL Patterns — SQL Injection
VERDICT: NO exploitable SQL injection found. All user input goes through parameterized queries or safe whitelist-based field construction.
2.6 Secrets: API_TOKEN Default Empty
When empty, the auth middleware is effectively bypassed. If .env lacks API_TOKEN, all API endpoints are completely open.
2.7 CORS + No CSRF Protection
Any origin can make requests and read responses.
2.8 Missing Production Guardrails
- No rate limiting
- No request size limits on non-upload endpoints
- SPA fallback sends index.html for all non-API paths including /admin, /.env, etc.
3. CONFIGURATION ISSUES
3.1 Missing .env.example and requirements.txt
- .env.example: Does NOT exist. No documentation of required env vars.
- requirements.txt: Not found. Dependencies on fastapi, pydantic, uvicorn, python-dotenv, pymssql etc. but no requirements file.
3.2 Hardcoded Configuration
| Parameter | Default in Source |
|---|---|
| MSSQL_SERVER | ”183.220.33.12” |
| MSSQL_PORT | ”17433” |
| MSSQL_DATABASE | ”hydee” |
| MSSQL_USER | ”hydee_reader” |
| UPLOAD_DIR | Computed in both upload.py and orders.py (duplicate, possibly different resolution) |
3.3 Default SQLite Database Path
DB_PATH = os.path.join(os.path.dirname(file), “data”, “invoices.db”) The SQLite file is stored inside the app directory, making it harder to backup, rotate, or scale in production.
4. SUMMARY OF FINDINGS
Critical
- API_TOKEN default empty — Authentication is completely disabled if .env is missing this value
- CORS wildcard — Any website can make API calls
- /uploads publicly accessible — No auth on uploaded image access
- Missing .env.example and requirements.txt — No standardized configuration documentation
High
- Missing indexes on 6+ columns — Performance will degrade with data volume
- SHA-256 password hashing — Weak, no salt, no iterations
- image_paths as comma-separated TEXT — Denormalized, fragile path resolution
- Foreign keys without ON DELETE — 4 of 5 FKs lack CASCADE/SET NULL
- CHECK constraint inconsistency — init_db(), migration, and code all define different values for review_status
Medium
- No schema versioning — Migration detection is fragile/reliant on string matching
- Hardcoded MSSQL credentials in source — Default user/password visible in code
- Swagger /docs exposed publicly — API surface visible without auth
- /api/benchmark public — Undocumented public endpoint
Low
- Duplicate UPLOAD_DIR path computation — Inconsistent between upload.py and orders.py
- SQLite db inside app directory — Not ideal for production deployment
- SPA fallback serves /.env etc. — Informational disclosure risk