656 lines
23 KiB
Markdown

# Codebase Concerns
**Analysis Date:** 2026-05-07
## Tech Debt
### Multiple Lock Files Conflict Risk
**Issue:** Three conflicting lock files coexist: `package-lock.json`, `pnpm-lock.yaml`, and `yarn.lock`.
**Files:**
- `package-lock.json` (177KB)
- `pnpm-lock.yaml` (2.3KB)
- `yarn.lock` (91KB)
**Impact:**
- Risk of installing different dependency versions across environments
- CI/CD, Docker builds, and local development may pull incompatible versions
- Lock file merge conflicts during multi-developer collaboration
- Dockerfile uses yarn exclusively, but npm/pnpm may be used locally, creating version drift
**Current State:**
- `Dockerfile` (line 12, 34) explicitly uses `yarn.lock*`
- `.gitignore` likely checks in all three files
- CLAUDE.md explicitly warns "不要混用" but provides no automation to enforce it
**Fix Approach:**
1. Choose single package manager (recommend yarn, per Dockerfile)
2. Delete `package-lock.json` and `pnpm-lock.yaml` from git history
3. Add enforcement to `.gitignore` and CI/CD (fail if multiple lock files modified)
4. Document in CLAUDE.md that ONLY yarn should be used
**Priority:** High - impacts build reproducibility and team workflow
---
### Production Build Ignores TypeScript & ESLint Errors
**Issue:** Build succeeds despite type/lint errors; production may contain undetected bugs.
**Files:**
- `next.config.mjs:16-21` — Both `ignoreDuringBuilds: true`
**Current Configuration:**
```javascript
eslint: {
ignoreDuringBuilds: true,
},
typescript: {
ignoreBuildErrors: true,
},
```
**Impact:**
- TypeScript errors (type mismatches, missing nullchecks) not caught at build time
- ESLint warnings (unused imports, unreachable code) ignored
- QA burden shifts to runtime discovery
- Breaking changes in dependencies not detected early
**Evidence:** README.md (line 121) acknowledges this: "务必在本地执行 `next lint``tsc --noEmit` 保证质量" — manual step, not enforced
**Fix Approach:**
1. Set both to `false` in production config
2. Fix all current type/lint errors (likely small count given ~27K LOC)
3. Add pre-commit hooks to `husky` to block commits with errors
4. Make lint/type checks mandatory in CI/CD
**Priority:** High - gates correctness
---
### Console Logging in Client Code (Debug Artifact)
**Issue:** Production code contains console.log/warn/error calls, exposing debug info and token prefixes.
**Files & Line Numbers:**
- `lib/api/client.ts:23``console.log('🔍 Token检查:', ...)`
- `lib/api/client.ts:29``console.warn('⚠️ 未找到auth_token...')`
- `lib/api/client.ts:35``console.log('📡 发送请求:', ...)`
- `lib/api/client.ts:36``console.log('📋 请求头:', ...)`
- `lib/api/client.ts:47``console.log('✅ 响应成功:', ...)`
- `lib/api/client.ts:51``console.error('❌ 响应错误:', ...)`
- `lib/api/client.ts:55``console.warn('🚫 认证失败...')`
- `lib/api/client.ts:56``console.log('响应详情:', ...)`
- `lib/api/upload.ts:33``console.log('📤 开始上传文件:', ...)`
- `lib/api/upload.ts:43``console.log(...上传进度...)`
- `lib/api/upload.ts:59``console.log('✅ 文件上传成功:', ...)`
- `lib/api/upload.ts:62``console.error("文件上传失败:", error)`
- Similar issues in upload multifile (lines 83, 93)
- `components/sidebar.tsx:97``console.error("退出登录失败:", error)`
- `app/outfits/page.tsx:72, 117, 135, 154` — console.error calls
- Many other pages: `app/*/page.tsx` contain error logging
**Impact:**
- Token prefixes exposed in browser console → XSS/developer tools leak risk
- Error details (e.g., API paths, status codes) visible to end users
- Production debugging should use structured error handling, not console
- Could be scraped by malicious actors with browser console access
**Evidence:** `lib/api/token-debug.ts` is entire debug utility with `console.log` throughout (lines 7-54), showing token prefix extraction logic
**Fix Approach:**
1. Remove all console.log/warn/error in client code, replace with structured error handling
2. Create error boundary/toast notification pattern for user-facing errors
3. Optional: Create client-side error tracking service (Sentry) for production
4. Add ESLint rule: `no-console` (already available in Next.js config, just not enabled)
**Priority:** High - security and code cleanliness
---
## Security Concerns
### Client-Side Permission Enforcement Only (Trust Boundary Violation)
**Issue:** Permissions are enforced ONLY on client; backend may not re-validate on API calls.
**Files:**
- `lib/permissions.ts:65-87``getUserRole()`, `hasPermission()` read from `localStorage.user_role`
- `components/sidebar.tsx:102-104` — Filter menu items client-side only
- `middleware.ts:17-44` — Token validation in middleware, but NO permission check against module being accessed
**Risk Model:**
```
User (attacker) → Modify localStorage.user_role="超级管理员" →
Sidebar shows all menus (useless) BUT
Direct API call to DELETE /api/v1/admin/users/123 → Backend accepts if no re-check
```
**Current Evidence:**
- `lib/api/users.ts:24-43``getUsers()` makes raw API call; no permission guard in client code
- No evidence of backend role check in Django API (not inspectable from frontend code, but CRITICAL to verify)
**Impact:**
- **CRITICAL:** If backend doesn't re-validate role on each endpoint, attackers can:
- Modify localStorage role → make admin API calls
- Bypass middleware token check (SSR can't access localStorage) → direct API calls with stolen token
- Perform unauthorized operations (delete users, change settings, etc.)
**Trust Boundary Rule:** Backend MUST re-check permission for EVERY admin API call, regardless of client claims.
**Fix Approach:**
1. **URGENT:** Audit Django backend API (`qy_lty/lib/api/v1/admin/`) to confirm all endpoints validate user role/permissions
2. Document in CLAUDE.md: "Client-side permission filters are UI courtesy only; backend MUST validate on all admin/ endpoints"
3. Add explicit checks in frontend error handling for 403 Forbidden responses
4. Consider adding request logging/monitoring to catch unauthorized attempts
**Priority:** CRITICAL - security vulnerability if backend doesn't re-check
---
### Token Storage in localStorage (XSS Vulnerability)
**Issue:** Auth token stored in `localStorage` instead of secure HttpOnly cookie; vulnerable to XSS.
**Files:**
- `lib/api/auth.ts:48-59` — Stores token in localStorage AND sets js-cookie
- `lib/api/client.ts:21-22` — Reads token from localStorage in request interceptor
- `middleware.ts:32` — Token read from Cookie (correct), but localStorage is fallback
**Current Implementation:**
```typescript
// Stored in BOTH localStorage and cookie
localStorage.setItem("auth_token", token); // XSS risk
Cookies.set("auth_token", token, { expires: 7, path: '/' }); // No HttpOnly flag
```
**Impact:**
- XSS attack can steal token from localStorage
- js-cookie (js-cookie, no HttpOnly flag) also vulnerable to XSS
- 7-day expiry without rotation = long XSS window
- `localStorage.user_role` also stored unencrypted (attackable)
**Proper Secure Pattern:**
- HttpOnly, Secure, SameSite cookies (set by backend during login)
- Token never exposed to JavaScript
- Client reads from Bearer token response, never stores in accessible storage
**Fix Approach:**
1. Backend should set HttpOnly cookie during login response
2. Remove localStorage token storage entirely
3. Keep only sealed HTTP-only cookie for auth
4. If SSR requires token access, use secure session storage (not localStorage)
5. Add CSP (Content-Security-Policy) headers to mitigate XSS
**Priority:** High - significant security risk
---
### No Token Rotation / Refresh Strategy
**Issue:** Token expires in 7 days, no refresh token or rotation mechanism visible.
**Files:**
- `lib/api/auth.ts:63``Cookies.set("auth_token", token, { expires: 7, path: '/' })`
- `lib/api/client.ts:54-60` — 401 handler just redirects to /login, doesn't attempt refresh
**Impact:**
- User activity > 7 days → auto-logout (UX degradation)
- No short-lived access token + long-lived refresh token pattern
- Stolen token valid for full 7 days
- No token revocation (logout doesn't invalidate server-side if stateless JWT)
**Fix Approach:**
1. Implement short-lived access token (15-30 min) + refresh token (7-14 days)
2. Refresh interceptor: if 401, attempt token refresh before redirecting to /login
3. Backend session should blacklist tokens on logout
4. Document token lifecycle in CLAUDE.md
**Priority:** Medium - acceptable for internal admin tool, but should improve for production
---
### No Input Validation on File Upload
**Issue:** File uploads accept any mimetype/size without validation.
**Files:**
- `lib/api/upload.ts:25-65``uploadFile()` accepts File object, no size/type checks in client
- `components/ui/file-upload.tsx` — Likely has upload UI, needs validation
**Impact:**
- Malicious files (executables, oversized) may reach backend
- Backend must validate, but client-side check is first defense
- No maximum file size enforced client-side
**Fix Approach:**
1. Add file size limits (e.g., max 50MB)
2. Whitelist MIME types (images: image/jpeg, image/png, image/webp only)
3. Show user error before upload attempt
4. Verify backend enforces same limits
**Priority:** Medium - backend should catch it, but client should be first defense
---
## Fragile Areas
### Large Pages with Mixed Concerns (1000+ line files)
**Issue:** Several pages handle data fetching, UI rendering, and business logic in single component.
**Large Files:**
- `app/affinity/page.tsx` (1005 lines) — Affinity rules, levels, interactions all in one page
- `components/ui/sidebar.tsx` (763 lines) — Navigation logic, permission filtering, logout in one component
- `app/songs/[id]/page.tsx` (651 lines) — Song detail page, editing, deletion mixed
- `app/songs/page.tsx` (644 lines) — Song list, search, pagination, CRUD operations
**Fragility Risks:**
- Hard to test individual business logic
- Changes to one feature (e.g., affinity rules) risk breaking another (levels)
- Difficult to reuse UI logic across pages
- High cognitive load for maintenance
**Safe Modification Pattern:**
1. Extract data-fetching logic to `lib/api/` hooks or services
2. Create smaller sub-components (e.g., `<AffinityRuleSection>`, `<AffinityLevelSection>`)
3. Use custom hooks for state management per feature
4. Keep pages as composition only
**Example Refactor:**
```typescript
// Current: 1005-line page
// After:
export default function AffinityPage() {
return (
<DashboardShell>
<DashboardHeader />
<AffinityRulesSection />
<AffinityLevelsSection />
<InteractionLogsSection />
</DashboardShell>
)
}
```
**Priority:** Medium - refactor as part of feature changes, not urgent
---
### No Error Boundaries (Next.js error.tsx Files Missing)
**Issue:** No error boundary components for graceful error handling in App Router.
**Files:**
- `app/` has NO `error.tsx` files
- Only some leaf routes have `loading.tsx` (e.g., `app/outfits/loading.tsx`)
**Current Error Handling:**
- Pages catch errors in `try/catch`, show toast notifications
- Uncaught errors propagate, may show raw error UI
- No fallback UI for API failures
**Impact:**
- User sees blank page or browser error on network failure
- No recovery mechanism (retry button, fallback content)
- Log data lost if error page crashes
**Next.js Error Boundary Pattern:**
```typescript
// app/error.tsx (catches errors in all children)
'use client'
export default function Error({ error, reset }) {
return (
<div>
<h1>Something went wrong</h1>
<button onClick={() => reset()}>Try again</button>
</div>
)
}
```
**Fix Approach:**
1. Add `app/error.tsx` for app-level errors
2. Add segment-level `error.tsx` for feature-specific handling (e.g., `app/outfits/error.tsx`)
3. Keep try/catch for API calls, render fallback UI
4. Add `app/not-found.tsx` for 404 routes
**Priority:** Low - pages have basic error handling via toasts, but UX could improve
---
### Hardcoded API Base URL Fallback
**Issue:** API client falls back to hardcoded URL if env var not set.
**Files:**
- `lib/api/client.ts:9``API_BASE_URL = process.env.NEXT_PUBLIC_API_BASE_URL || "http://localhost:8000/api"`
**Risk:**
- If `NEXT_PUBLIC_API_BASE_URL` not set in production, silently uses localhost
- Requests fail silently in production, confusing to debug
- No warning log when fallback is used
**Fix Approach:**
1. Remove fallback; throw error if env var not set at build time
2. Use `if (!process.env.NEXT_PUBLIC_API_BASE_URL) throw new Error("...")`
3. Document required env vars in deployment guide
**Priority:** Low - environmental issue, unlikely in production if properly configured
---
## Performance Bottlenecks
### No Memoization on Permission Checks
**Issue:** `hasPermission()`, `getUserRole()` called on every render without memoization.
**Files:**
- `components/sidebar.tsx:102-104` — Calls `hasPermission(item.module)` in `.filter()` on every render
- `lib/permissions.ts:65-87` — No caching; reads localStorage every time
**Current Pattern:**
```typescript
const visibleAiItems = mounted ? aiMenuItems.filter((item) => hasPermission(item.module)) : []
```
**Impact:**
- Sidebar re-renders, re-filters items, re-reads localStorage unnecessarily
- Minimal impact on small arrays (3-15 items), but scales poorly
**Fix Approach:**
1. Memoize `getUserRole()` result in sidebar:
```typescript
const userRole = useMemo(() => getUserRole(), [mounted])
const visibleItems = useMemo(() =>
aiMenuItems.filter(item => hasPermission(item.module)),
[userRole]
)
```
2. Or move to custom hook: `const { role, allowedModules } = usePermissions()`
**Priority:** Low - not a visible performance issue, but good practice
---
## Cross-Repo Coupling Risks
### Strong Runtime Dependency on qy_lty Backend
**Issue:** Front-end depends on specific API contract from `qy_lty/` Django backend; contract drift risk.
**Files:**
- `lib/api/users.ts:6-21` — Maps backend `ParadiseUser` to `User` type (field mapping)
- `lib/api/outfits.ts` (not read, but expected) — Maps backend outfit schema
- ALL API modules expect specific response structure from `/api/v1/admin/*`
**Coupling Points:**
- Backend field names: `username`, `email`, `phone_number`, `date_joined`, `last_login`, `is_superuser`, `is_staff`
- Response structure: `{ results: [...], count: number }` (Django REST Framework default)
- Endpoint paths: `/v1/admin/login/`, `/user/`, `/common/upload/`
- Token storage key in Redis: `admin_token:{token}` (per CLAUDE.md)
**Risk Scenario:**
1. Backend dev changes `/v1/admin/users/` response format
2. Frontend still sends old field names or expects old response shape
3. Silent failures or type mismatches
4. Detected late in testing or production
**Mitigation in Place:**
- CLAUDE.md (line 34) requires both repos update `docs/修改记录.md` when contract changes
- But no automated contract testing (OpenAPI, TypeScript codegen)
**Fix Approach:**
1. **Shared types/API contract** (OpenAPI/GraphQL schema) versioned in git
2. Generate TypeScript types from backend OpenAPI spec (e.g., with Swagger-Codegen)
3. Add pre-commit hook to detect schema changes
4. Document in CLAUDE.md: "Any API response structure change requires updating both repos' docs"
**Priority:** Medium - mitigated by manual process, but automatable
---
## Dependencies at Risk
### No Enforcement of Dependency Updates
**Issue:** Many dependencies pinned to `latest`, creating unpredictable updates.
**Files:**
- `package.json:12, 56, 63``"latest"` version specifiers
**Current Versions Using "latest":**
```json
"@hookform/resolvers": "latest",
"react-hook-form": "latest",
"recharts": "latest",
"zod": "latest"
```
**Impact:**
- `npm install` at different times installs different major versions
- Breaking changes in `react-hook-form` or `recharts` not caught by lock file
- Tests pass locally, fail in CI if new major version published
**Fix Approach:**
1. Replace all `"latest"` with specific versions (e.g., `"^7.0.0"` for caret range)
2. Use Dependabot or renovate to automate safe updates
3. Pin devDependencies to exact versions
**Example:**
```json
"@hookform/resolvers": "^3.3.4",
"react-hook-form": "^7.51.0",
"recharts": "^2.12.0",
"zod": "^3.22.0"
```
**Priority:** Medium - not urgent, but good practice
---
### Unused Dependencies or Overlapping Packages
**Issue:** Unclear if all installed packages are actively used.
**Potential Issues:**
- Multiple notification libraries: `sonner` + `@radix-ui/react-toast` (which is used?)
- Figure out which is preferred and remove duplication
**Files:**
- `package.json:36, 59` — Both toast packages listed
**Fix Approach:**
1. Audit each dependency in `package.json`
2. Remove unused packages
3. Consolidate duplicate functionality
**Priority:** Low - cleanup task
---
## Test Coverage Gaps
### No Tests Found
**Issue:** No Jest/Vitest configuration or test files detected in codebase.
**Expected Test Files:**
- `*.test.ts`, `*.spec.ts`, `*.test.tsx`, `*.spec.tsx` not found
- No `jest.config.*`, `vitest.config.*`
**Impact:**
- Complex logic (permission matrix, API adapters) not tested
- Refactoring risky (no regression protection)
- Onboarding new developers without safety net
- No confidence in critical paths (auth, user management)
**Critical Areas Needing Tests:**
- `lib/permissions.ts` — Permission matrix logic (many branches)
- `lib/api/client.ts` — Request/response interceptors (error handling paths)
- `lib/api/users.ts` — User mapping logic (backend schema mismatches)
- Custom hooks in pages (data fetching, state management)
**Fix Approach:**
1. Set up Jest or Vitest (Vitest recommended for ESM + modern tooling)
2. Create basic test for `lib/permissions.ts`:
```typescript
// lib/permissions.test.ts
describe('Permission checks', () => {
it('should allow super admin all modules', () => {
// Mock localStorage with 超级管理员
expect(hasPermission('users')).toBe(true)
expect(hasPermission('ai-model')).toBe(true)
})
it('should restrict content admin to content modules', () => {
// Mock localStorage with 内容管理员
expect(hasPermission('users')).toBe(false)
expect(hasPermission('outfits')).toBe(true)
})
})
```
3. Aim for 80%+ coverage of critical API functions
**Priority:** Medium - important for long-term maintainability
---
## Missing Critical Features
### No Logout Token Revocation
**Issue:** `logout()` clears client-side storage but doesn't invalidate server-side token.
**Files:**
- `lib/api/auth.ts:108-131``logout()` calls API endpoint, then clears localStorage
**Problem:**
- If using stateless JWT, backend has no record of logout
- Token remains valid until expiry (7 days)
- Attacker with stolen token can still use it after target user "logs out"
**Expected Behavior:**
- Server should blacklist token on logout (Redis blacklist or DB flag)
- Verify endpoint `/v1/admin/logout/` actually invalidates token
**Fix Approach:**
1. Audit backend logout endpoint to confirm token is invalidated
2. Add token blacklist to backend (Redis: token → blacklisted, TTL=7 days)
3. Check token against blacklist on every request
4. Document in CLAUDE.md
**Priority:** High - security issue if not handled on backend
---
### No Rate Limiting on Client (Brute Force Risk)
**Issue:** Login page may be vulnerable to brute force attacks.
**Files:**
- `app/login/page.tsx` — Likely has login form
- `lib/api/auth.ts:25-40` — No rate limiting before API call
**Impact:**
- Attacker can spam login attempts to guess password
- Backend must implement rate limiting, but client can't spam if it has throttling
**Fix Approach:**
1. Client-side: Disable login button for 30 seconds after failed attempt
2. Backend: Implement account lockout (3 failed attempts = 10 min lockout)
3. Monitor login failures in logs
**Priority:** Medium - backend responsibility, but client should assist
---
## Build Configuration Issues
### Next.js Standalone Output May Have Deployment Pitfalls
**Issue:** Standalone mode used, but edge case behaviors not fully documented.
**Files:**
- `next.config.mjs:30``output: 'standalone'`
**What Standalone Does:**
- Bundles all dependencies into `.next/standalone` folder
- Reduces Docker image size (~100MB vs ~500MB)
- Requires explicit `COPY .next .next` in Dockerfile (correctly done)
**Potential Issues:**
- Public assets not automatically copied; must COPY separately (done in Dockerfile:44-45)
- `next.config.mjs` must be manually copied (done in Dockerfile:46)
- Environment variables must be passed at runtime, not build time
- Debugging bundle issues harder (bundled deps not in node_modules)
**Current Dockerfile (lines 44-46) handles this correctly:**
```dockerfile
COPY --from=builder /app/.next ./.next
COPY --from=builder /app/public ./public
COPY --from=builder /app/next.config.mjs ./
```
**Risk:**
- If dependencies add new output files in future versions, they won't be copied
- Node version mismatch between builder (22.10.0) and runner (22.10.0) is correct but fragile
**Fix Approach:**
1. Document standalone output caveats in deployment guide
2. Test Docker image monthly to ensure no drift in builder/runner
3. Consider base image pinning: `FROM node:22.10.0-alpine@sha256:...`
**Priority:** Low - current setup is correct, just needs documentation
---
### No Pre-commit Hooks or Husky Setup
**Issue:** No automated checks before commits; developers can commit broken code.
**Files:**
- No `.husky/` directory
- No `pre-commit` config
- CLAUDE.md doesn't mention hook setup
**Risk:**
- Developer commits console.log, broken types, or lint errors
- CI catches them later, but feedback loop is slow
- No enforcement of modification record updates (CLAUDE.md requirement)
**Fix Approach:**
1. Install husky: `npm install husky --save-dev`
2. Setup hook: `npx husky install`
3. Create `.husky/pre-commit`:
```bash
#!/bin/sh
npm run lint && npm run type-check
```
4. Create `.husky/prepare-commit-msg` to remind user about docs/修改记录.md
**Priority:** Low - improvement for team workflow
---
## Summary Table
| Category | Issue | Severity | Fix Effort |
|----------|-------|----------|-----------|
| Security | Client-only permissions (backend trust) | CRITICAL | High |
| Security | Token in localStorage (XSS) | High | Medium |
| Security | No token revocation on logout | High | High |
| Tech Debt | Lock file conflicts | High | Low |
| Tech Debt | Build ignores TS/ESLint errors | High | Low |
| Tech Debt | Console logging in production | High | Low |
| Fragility | Large 1000+ line pages | Medium | Medium |
| Performance | No memoization on permissions | Low | Low |
| Testing | Zero test coverage | Medium | High |
| Build | Standalone output not fully documented | Low | Low |
---
*Concerns audit: 2026-05-07*