lty/qy-lty-admin/.planning/codebase/CONCERNS.md

# Codebase Concerns

**Analysis Date:** 2026-05-07

## Tech Debt

### Multiple Lock Files Conflict Risk

**Issue:** Three conflicting lock files coexist: `package-lock.json`, `pnpm-lock.yaml`, and `yarn.lock`.

**Files:**
- `package-lock.json` (177KB)
- `pnpm-lock.yaml` (2.3KB)
- `yarn.lock` (91KB)

**Impact:**
- Risk of installing different dependency versions across environments
- CI/CD, Docker builds, and local development may pull incompatible versions
- Lock file merge conflicts during multi-developer collaboration
- Dockerfile uses yarn exclusively, but npm/pnpm may be used locally, creating version drift

**Current State:**
- `Dockerfile` (line 12, 34) explicitly uses `yarn.lock*`
- `.gitignore` likely checks in all three files
- CLAUDE.md explicitly warns "不要混用" but provides no automation to enforce it

**Fix Approach:**
1. Choose single package manager (recommend yarn, per Dockerfile)
2. Delete `package-lock.json` and `pnpm-lock.yaml` from git history
3. Add enforcement to `.gitignore` and CI/CD (fail if multiple lock files modified)
4. Document in CLAUDE.md that ONLY yarn should be used

**Priority:** High - impacts build reproducibility and team workflow

---

### Production Build Ignores TypeScript & ESLint Errors

**Issue:** Build succeeds despite type/lint errors; production may contain undetected bugs.

**Files:**
- `next.config.mjs:16-21` — Both `ignoreDuringBuilds: true`

**Current Configuration:**
```javascript
eslint: {
  ignoreDuringBuilds: true,
},
typescript: {
  ignoreBuildErrors: true,
},
```

**Impact:**
- TypeScript errors (type mismatches, missing nullchecks) not caught at build time
- ESLint warnings (unused imports, unreachable code) ignored
- QA burden shifts to runtime discovery
- Breaking changes in dependencies not detected early

**Evidence:** README.md (line 121) acknowledges this: "务必在本地执行 `next lint` 与 `tsc --noEmit` 保证质量" — manual step, not enforced

**Fix Approach:**
1. Set both to `false` in production config
2. Fix all current type/lint errors (likely small count given ~27K LOC)
3. Add pre-commit hooks to `husky` to block commits with errors
4. Make lint/type checks mandatory in CI/CD

**Priority:** High - gates correctness

---

### Console Logging in Client Code (Debug Artifact)

**Issue:** Production code contains console.log/warn/error calls, exposing debug info and token prefixes.

**Files & Line Numbers:**
- `lib/api/client.ts:23` — `console.log('🔍 Token检查:', ...)`
- `lib/api/client.ts:29` — `console.warn('⚠️  未找到auth_token...')`
- `lib/api/client.ts:35` — `console.log('📡 发送请求:', ...)`
- `lib/api/client.ts:36` — `console.log('📋 请求头:', ...)`
- `lib/api/client.ts:47` — `console.log('✅ 响应成功:', ...)`
- `lib/api/client.ts:51` — `console.error('❌ 响应错误:', ...)`
- `lib/api/client.ts:55` — `console.warn('🚫 认证失败...')`
- `lib/api/client.ts:56` — `console.log('响应详情:', ...)`
- `lib/api/upload.ts:33` — `console.log('📤 开始上传文件:', ...)`
- `lib/api/upload.ts:43` — `console.log(...上传进度...)`
- `lib/api/upload.ts:59` — `console.log('✅ 文件上传成功:', ...)`
- `lib/api/upload.ts:62` — `console.error("文件上传失败:", error)`
- Similar issues in upload multifile (lines 83, 93)
- `components/sidebar.tsx:97` — `console.error("退出登录失败:", error)`
- `app/outfits/page.tsx:72, 117, 135, 154` — console.error calls
- Many other pages: `app/*/page.tsx` contain error logging

**Impact:**
- Token prefixes exposed in browser console → XSS/developer tools leak risk
- Error details (e.g., API paths, status codes) visible to end users
- Production debugging should use structured error handling, not console
- Could be scraped by malicious actors with browser console access

**Evidence:** `lib/api/token-debug.ts` is entire debug utility with `console.log` throughout (lines 7-54), showing token prefix extraction logic

**Fix Approach:**
1. Remove all console.log/warn/error in client code, replace with structured error handling
2. Create error boundary/toast notification pattern for user-facing errors
3. Optional: Create client-side error tracking service (Sentry) for production
4. Add ESLint rule: `no-console` (already available in Next.js config, just not enabled)

**Priority:** High - security and code cleanliness

---

## Security Concerns

### Client-Side Permission Enforcement Only (Trust Boundary Violation)

**Issue:** Permissions are enforced ONLY on client; backend may not re-validate on API calls.

**Files:**
- `lib/permissions.ts:65-87` — `getUserRole()`, `hasPermission()` read from `localStorage.user_role`
- `components/sidebar.tsx:102-104` — Filter menu items client-side only
- `middleware.ts:17-44` — Token validation in middleware, but NO permission check against module being accessed

**Risk Model:**
```
User (attacker) → Modify localStorage.user_role="超级管理员" →
  Sidebar shows all menus (useless) BUT
  Direct API call to DELETE /api/v1/admin/users/123 → Backend accepts if no re-check
```

**Current Evidence:**
- `lib/api/users.ts:24-43` — `getUsers()` makes raw API call; no permission guard in client code
- No evidence of backend role check in Django API (not inspectable from frontend code, but CRITICAL to verify)

**Impact:**
- **CRITICAL:** If backend doesn't re-validate role on each endpoint, attackers can:
  - Modify localStorage role → make admin API calls
  - Bypass middleware token check (SSR can't access localStorage) → direct API calls with stolen token
  - Perform unauthorized operations (delete users, change settings, etc.)

**Trust Boundary Rule:** Backend MUST re-check permission for EVERY admin API call, regardless of client claims.

**Fix Approach:**
1. **URGENT:** Audit Django backend API (`qy_lty/lib/api/v1/admin/`) to confirm all endpoints validate user role/permissions
2. Document in CLAUDE.md: "Client-side permission filters are UI courtesy only; backend MUST validate on all admin/ endpoints"
3. Add explicit checks in frontend error handling for 403 Forbidden responses
4. Consider adding request logging/monitoring to catch unauthorized attempts

**Priority:** CRITICAL - security vulnerability if backend doesn't re-check

---

### Token Storage in localStorage (XSS Vulnerability)

**Issue:** Auth token stored in `localStorage` instead of secure HttpOnly cookie; vulnerable to XSS.

**Files:**
- `lib/api/auth.ts:48-59` — Stores token in localStorage AND sets js-cookie
- `lib/api/client.ts:21-22` — Reads token from localStorage in request interceptor
- `middleware.ts:32` — Token read from Cookie (correct), but localStorage is fallback

**Current Implementation:**
```typescript
// Stored in BOTH localStorage and cookie
localStorage.setItem("auth_token", token);  // XSS risk
Cookies.set("auth_token", token, { expires: 7, path: '/' });  // No HttpOnly flag
```

**Impact:**
- XSS attack can steal token from localStorage
- js-cookie (js-cookie, no HttpOnly flag) also vulnerable to XSS
- 7-day expiry without rotation = long XSS window
- `localStorage.user_role` also stored unencrypted (attackable)

**Proper Secure Pattern:**
- HttpOnly, Secure, SameSite cookies (set by backend during login)
- Token never exposed to JavaScript
- Client reads from Bearer token response, never stores in accessible storage

**Fix Approach:**
1. Backend should set HttpOnly cookie during login response
2. Remove localStorage token storage entirely
3. Keep only sealed HTTP-only cookie for auth
4. If SSR requires token access, use secure session storage (not localStorage)
5. Add CSP (Content-Security-Policy) headers to mitigate XSS

**Priority:** High - significant security risk

---

### No Token Rotation / Refresh Strategy

**Issue:** Token expires in 7 days, no refresh token or rotation mechanism visible.

**Files:**
- `lib/api/auth.ts:63` — `Cookies.set("auth_token", token, { expires: 7, path: '/' })`
- `lib/api/client.ts:54-60` — 401 handler just redirects to /login, doesn't attempt refresh

**Impact:**
- User activity > 7 days → auto-logout (UX degradation)
- No short-lived access token + long-lived refresh token pattern
- Stolen token valid for full 7 days
- No token revocation (logout doesn't invalidate server-side if stateless JWT)

**Fix Approach:**
1. Implement short-lived access token (15-30 min) + refresh token (7-14 days)
2. Refresh interceptor: if 401, attempt token refresh before redirecting to /login
3. Backend session should blacklist tokens on logout
4. Document token lifecycle in CLAUDE.md

**Priority:** Medium - acceptable for internal admin tool, but should improve for production

---

### No Input Validation on File Upload

**Issue:** File uploads accept any mimetype/size without validation.

**Files:**
- `lib/api/upload.ts:25-65` — `uploadFile()` accepts File object, no size/type checks in client
- `components/ui/file-upload.tsx` — Likely has upload UI, needs validation

**Impact:**
- Malicious files (executables, oversized) may reach backend
- Backend must validate, but client-side check is first defense
- No maximum file size enforced client-side

**Fix Approach:**
1. Add file size limits (e.g., max 50MB)
2. Whitelist MIME types (images: image/jpeg, image/png, image/webp only)
3. Show user error before upload attempt
4. Verify backend enforces same limits

**Priority:** Medium - backend should catch it, but client should be first defense

---

## Fragile Areas

### Large Pages with Mixed Concerns (1000+ line files)

**Issue:** Several pages handle data fetching, UI rendering, and business logic in single component.

**Large Files:**
- `app/affinity/page.tsx` (1005 lines) — Affinity rules, levels, interactions all in one page
- `components/ui/sidebar.tsx` (763 lines) — Navigation logic, permission filtering, logout in one component
- `app/songs/[id]/page.tsx` (651 lines) — Song detail page, editing, deletion mixed
- `app/songs/page.tsx` (644 lines) — Song list, search, pagination, CRUD operations

**Fragility Risks:**
- Hard to test individual business logic
- Changes to one feature (e.g., affinity rules) risk breaking another (levels)
- Difficult to reuse UI logic across pages
- High cognitive load for maintenance

**Safe Modification Pattern:**
1. Extract data-fetching logic to `lib/api/` hooks or services
2. Create smaller sub-components (e.g., `<AffinityRuleSection>`, `<AffinityLevelSection>`)
3. Use custom hooks for state management per feature
4. Keep pages as composition only

**Example Refactor:**
```typescript
// Current: 1005-line page
// After:
export default function AffinityPage() {
  return (
    <DashboardShell>
      <DashboardHeader />
      <AffinityRulesSection />
      <AffinityLevelsSection />
      <InteractionLogsSection />
    </DashboardShell>
  )
}
```

**Priority:** Medium - refactor as part of feature changes, not urgent

---

### No Error Boundaries (Next.js error.tsx Files Missing)

**Issue:** No error boundary components for graceful error handling in App Router.

**Files:**
- `app/` has NO `error.tsx` files
- Only some leaf routes have `loading.tsx` (e.g., `app/outfits/loading.tsx`)

**Current Error Handling:**
- Pages catch errors in `try/catch`, show toast notifications
- Uncaught errors propagate, may show raw error UI
- No fallback UI for API failures

**Impact:**
- User sees blank page or browser error on network failure
- No recovery mechanism (retry button, fallback content)
- Log data lost if error page crashes

**Next.js Error Boundary Pattern:**
```typescript
// app/error.tsx (catches errors in all children)
'use client'
export default function Error({ error, reset }) {
  return (
    <div>
      <h1>Something went wrong</h1>
      <button onClick={() => reset()}>Try again</button>
    </div>
  )
}
```

**Fix Approach:**
1. Add `app/error.tsx` for app-level errors
2. Add segment-level `error.tsx` for feature-specific handling (e.g., `app/outfits/error.tsx`)
3. Keep try/catch for API calls, render fallback UI
4. Add `app/not-found.tsx` for 404 routes

**Priority:** Low - pages have basic error handling via toasts, but UX could improve

---

### Hardcoded API Base URL Fallback

**Issue:** API client falls back to hardcoded URL if env var not set.

**Files:**
- `lib/api/client.ts:9` — `API_BASE_URL = process.env.NEXT_PUBLIC_API_BASE_URL || "http://localhost:8000/api"`

**Risk:**
- If `NEXT_PUBLIC_API_BASE_URL` not set in production, silently uses localhost
- Requests fail silently in production, confusing to debug
- No warning log when fallback is used

**Fix Approach:**
1. Remove fallback; throw error if env var not set at build time
2. Use `if (!process.env.NEXT_PUBLIC_API_BASE_URL) throw new Error("...")`
3. Document required env vars in deployment guide

**Priority:** Low - environmental issue, unlikely in production if properly configured

---

## Performance Bottlenecks

### No Memoization on Permission Checks

**Issue:** `hasPermission()`, `getUserRole()` called on every render without memoization.

**Files:**
- `components/sidebar.tsx:102-104` — Calls `hasPermission(item.module)` in `.filter()` on every render
- `lib/permissions.ts:65-87` — No caching; reads localStorage every time

**Current Pattern:**
```typescript
const visibleAiItems = mounted ? aiMenuItems.filter((item) => hasPermission(item.module)) : []
```

**Impact:**
- Sidebar re-renders, re-filters items, re-reads localStorage unnecessarily
- Minimal impact on small arrays (3-15 items), but scales poorly

**Fix Approach:**
1. Memoize `getUserRole()` result in sidebar:
```typescript
const userRole = useMemo(() => getUserRole(), [mounted])
const visibleItems = useMemo(() =>
  aiMenuItems.filter(item => hasPermission(item.module)),
  [userRole]
)
```
2. Or move to custom hook: `const { role, allowedModules } = usePermissions()`

**Priority:** Low - not a visible performance issue, but good practice

---

## Cross-Repo Coupling Risks

### Strong Runtime Dependency on qy_lty Backend

**Issue:** Front-end depends on specific API contract from `qy_lty/` Django backend; contract drift risk.

**Files:**
- `lib/api/users.ts:6-21` — Maps backend `ParadiseUser` to `User` type (field mapping)
- `lib/api/outfits.ts` (not read, but expected) — Maps backend outfit schema
- ALL API modules expect specific response structure from `/api/v1/admin/*`

**Coupling Points:**
- Backend field names: `username`, `email`, `phone_number`, `date_joined`, `last_login`, `is_superuser`, `is_staff`
- Response structure: `{ results: [...], count: number }` (Django REST Framework default)
- Endpoint paths: `/v1/admin/login/`, `/user/`, `/common/upload/`
- Token storage key in Redis: `admin_token:{token}` (per CLAUDE.md)

**Risk Scenario:**
1. Backend dev changes `/v1/admin/users/` response format
2. Frontend still sends old field names or expects old response shape
3. Silent failures or type mismatches
4. Detected late in testing or production

**Mitigation in Place:**
- CLAUDE.md (line 34) requires both repos update `docs/修改记录.md` when contract changes
- But no automated contract testing (OpenAPI, TypeScript codegen)

**Fix Approach:**
1. **Shared types/API contract** (OpenAPI/GraphQL schema) versioned in git
2. Generate TypeScript types from backend OpenAPI spec (e.g., with Swagger-Codegen)
3. Add pre-commit hook to detect schema changes
4. Document in CLAUDE.md: "Any API response structure change requires updating both repos' docs"

**Priority:** Medium - mitigated by manual process, but automatable

---

## Dependencies at Risk

### No Enforcement of Dependency Updates

**Issue:** Many dependencies pinned to `latest`, creating unpredictable updates.

**Files:**
- `package.json:12, 56, 63` — `"latest"` version specifiers

**Current Versions Using "latest":**
```json
"@hookform/resolvers": "latest",
"react-hook-form": "latest",
"recharts": "latest",
"zod": "latest"
```

**Impact:**
- `npm install` at different times installs different major versions
- Breaking changes in `react-hook-form` or `recharts` not caught by lock file
- Tests pass locally, fail in CI if new major version published

**Fix Approach:**
1. Replace all `"latest"` with specific versions (e.g., `"^7.0.0"` for caret range)
2. Use Dependabot or renovate to automate safe updates
3. Pin devDependencies to exact versions

**Example:**
```json
"@hookform/resolvers": "^3.3.4",
"react-hook-form": "^7.51.0",
"recharts": "^2.12.0",
"zod": "^3.22.0"
```

**Priority:** Medium - not urgent, but good practice

---

### Unused Dependencies or Overlapping Packages

**Issue:** Unclear if all installed packages are actively used.

**Potential Issues:**
- Multiple notification libraries: `sonner` + `@radix-ui/react-toast` (which is used?)
- Figure out which is preferred and remove duplication

**Files:**
- `package.json:36, 59` — Both toast packages listed

**Fix Approach:**
1. Audit each dependency in `package.json`
2. Remove unused packages
3. Consolidate duplicate functionality

**Priority:** Low - cleanup task

---

## Test Coverage Gaps

### No Tests Found

**Issue:** No Jest/Vitest configuration or test files detected in codebase.

**Expected Test Files:**
- `*.test.ts`, `*.spec.ts`, `*.test.tsx`, `*.spec.tsx` not found
- No `jest.config.*`, `vitest.config.*`

**Impact:**
- Complex logic (permission matrix, API adapters) not tested
- Refactoring risky (no regression protection)
- Onboarding new developers without safety net
- No confidence in critical paths (auth, user management)

**Critical Areas Needing Tests:**
- `lib/permissions.ts` — Permission matrix logic (many branches)
- `lib/api/client.ts` — Request/response interceptors (error handling paths)
- `lib/api/users.ts` — User mapping logic (backend schema mismatches)
- Custom hooks in pages (data fetching, state management)

**Fix Approach:**
1. Set up Jest or Vitest (Vitest recommended for ESM + modern tooling)
2. Create basic test for `lib/permissions.ts`:
```typescript
// lib/permissions.test.ts
describe('Permission checks', () => {
  it('should allow super admin all modules', () => {
    // Mock localStorage with 超级管理员
    expect(hasPermission('users')).toBe(true)
    expect(hasPermission('ai-model')).toBe(true)
  })

  it('should restrict content admin to content modules', () => {
    // Mock localStorage with 内容管理员
    expect(hasPermission('users')).toBe(false)
    expect(hasPermission('outfits')).toBe(true)
  })
})
```
3. Aim for 80%+ coverage of critical API functions

**Priority:** Medium - important for long-term maintainability

---

## Missing Critical Features

### No Logout Token Revocation

**Issue:** `logout()` clears client-side storage but doesn't invalidate server-side token.

**Files:**
- `lib/api/auth.ts:108-131` — `logout()` calls API endpoint, then clears localStorage

**Problem:**
- If using stateless JWT, backend has no record of logout
- Token remains valid until expiry (7 days)
- Attacker with stolen token can still use it after target user "logs out"

**Expected Behavior:**
- Server should blacklist token on logout (Redis blacklist or DB flag)
- Verify endpoint `/v1/admin/logout/` actually invalidates token

**Fix Approach:**
1. Audit backend logout endpoint to confirm token is invalidated
2. Add token blacklist to backend (Redis: token → blacklisted, TTL=7 days)
3. Check token against blacklist on every request
4. Document in CLAUDE.md

**Priority:** High - security issue if not handled on backend

---

### No Rate Limiting on Client (Brute Force Risk)

**Issue:** Login page may be vulnerable to brute force attacks.

**Files:**
- `app/login/page.tsx` — Likely has login form
- `lib/api/auth.ts:25-40` — No rate limiting before API call

**Impact:**
- Attacker can spam login attempts to guess password
- Backend must implement rate limiting, but client can't spam if it has throttling

**Fix Approach:**
1. Client-side: Disable login button for 30 seconds after failed attempt
2. Backend: Implement account lockout (3 failed attempts = 10 min lockout)
3. Monitor login failures in logs

**Priority:** Medium - backend responsibility, but client should assist

---

## Build Configuration Issues

### Next.js Standalone Output May Have Deployment Pitfalls

**Issue:** Standalone mode used, but edge case behaviors not fully documented.

**Files:**
- `next.config.mjs:30` — `output: 'standalone'`

**What Standalone Does:**
- Bundles all dependencies into `.next/standalone` folder
- Reduces Docker image size (~100MB vs ~500MB)
- Requires explicit `COPY .next .next` in Dockerfile (correctly done)

**Potential Issues:**
- Public assets not automatically copied; must COPY separately (done in Dockerfile:44-45)
- `next.config.mjs` must be manually copied (done in Dockerfile:46)
- Environment variables must be passed at runtime, not build time
- Debugging bundle issues harder (bundled deps not in node_modules)

**Current Dockerfile (lines 44-46) handles this correctly:**
```dockerfile
COPY --from=builder /app/.next ./.next
COPY --from=builder /app/public ./public
COPY --from=builder /app/next.config.mjs ./
```

**Risk:**
- If dependencies add new output files in future versions, they won't be copied
- Node version mismatch between builder (22.10.0) and runner (22.10.0) is correct but fragile

**Fix Approach:**
1. Document standalone output caveats in deployment guide
2. Test Docker image monthly to ensure no drift in builder/runner
3. Consider base image pinning: `FROM node:22.10.0-alpine@sha256:...`

**Priority:** Low - current setup is correct, just needs documentation

---

### No Pre-commit Hooks or Husky Setup

**Issue:** No automated checks before commits; developers can commit broken code.

**Files:**
- No `.husky/` directory
- No `pre-commit` config
- CLAUDE.md doesn't mention hook setup

**Risk:**
- Developer commits console.log, broken types, or lint errors
- CI catches them later, but feedback loop is slow
- No enforcement of modification record updates (CLAUDE.md requirement)

**Fix Approach:**
1. Install husky: `npm install husky --save-dev`
2. Setup hook: `npx husky install`
3. Create `.husky/pre-commit`:
```bash
#!/bin/sh
npm run lint && npm run type-check
```
4. Create `.husky/prepare-commit-msg` to remind user about docs/修改记录.md

**Priority:** Low - improvement for team workflow

---

## Summary Table

| Category | Issue | Severity | Fix Effort |
|----------|-------|----------|-----------|
| Security | Client-only permissions (backend trust) | CRITICAL | High |
| Security | Token in localStorage (XSS) | High | Medium |
| Security | No token revocation on logout | High | High |
| Tech Debt | Lock file conflicts | High | Low |
| Tech Debt | Build ignores TS/ESLint errors | High | Low |
| Tech Debt | Console logging in production | High | Low |
| Fragility | Large 1000+ line pages | Medium | Medium |
| Performance | No memoization on permissions | Low | Low |
| Testing | Zero test coverage | Medium | High |
| Build | Standalone output not fully documented | Low | Low |

---

*Concerns audit: 2026-05-07*