
Interview Radar
Turn a resume plus a vague target role into a personalized interview prep pack sourced from recent 牛客, 小红书, and GitHub interview posts.
Overview
interview-radar is an agent skill for the Validate phase that builds a personalized interview prep pack from a resume and vague role direction using scraped recent面经.
Install
npx skills add https://github.com/kunchen1110/interviewradar --skill interview-radarWhat is this skill?
- Accepts resume as PDF, image, or text plus a fuzzy role direction (not a formal JD)
- Aggregates real interview content: NowCoder and Xiaohongshu primary; GitHub and generic pages as supplements
- Keeps roughly the last two years, dedupes, and ranks by frequency and recency
- Python pipeline under `scripts/` with `corpus_cache/` JSON handoff between agent reasoning and deterministic scrapers
- Xiaohongshu via export JSON or MediaCrawler driver with fast vs deep OCR modes
- Primary sources: NowCoder + Xiaohongshu; supplement GitHub + generic pages
- Corpus filtered to roughly the last two years with dedupe and frequency/recency ranking
Adoption & trust: 1 installs on skills.sh; 114 GitHub stars; 0/3 security scanners passed (skills.sh audits).
What problem does it solve?
You know your target area but not which questions and project drills appear most often in real interviews for your background.
Who is it for?
Job seekers with a resume file and a broad role label who want data-backed面经 synthesis instead of static cheat sheets.
Skip if: Users with only a full JD and no resume, those who cannot run Python venv scripts or MediaCrawler, or anyone needing employer-specific official interview rubrics.
When should I use this skill?
User uploads a resume (PDF/image) and gives a fuzzy target role direction for interview preparation.
What do I get? / Deliverables
You get a deduplicated, frequency- and recency-ranked prep package with resume-aligned project追问 grounded in corpus JSON under `corpus_cache/`.
- Personalized interview prep pack with project-specific追问
- JSON artifacts in `corpus_cache/` from connector searches
Recommended Skills
Journey fit
Before committing to a role or interview loop, builders validate fit and preparation depth using real community面经 rather than generic question lists. Scope subphase covers narrowing what to study and which project stories to stress-test when the job title is directional but not a full JD.
How it compares
Use this corpus-driven prep workflow instead of asking the model for generic LeetCode lists without sourcing real community interview threads.
Common Questions / FAQ
Who is interview-radar for?
Candidates preparing for technical or internship interviews who upload a resume and specify a fuzzy role direction and want agent-plus-script automation to harvest recent面经.
When should I use interview-radar?
In Validate when scoping what to study before interview rounds—after you have resume path and role keywords but before deep mock sessions; also when refreshing prep for roles like AI app development using 牛客/小红书-heavy search.
Is interview-radar safe to install?
It runs local Python scrapers and may shell out to MediaCrawler with your cookies; review the Security Audits panel on this page and treat exported social data and credentials as sensitive.
SKILL.md
READMESKILL.md - Interview Radar
# InterviewRadar · 面试雷达 Skill 把「简历 + 模糊岗位」变成一份**基于真实面经内容**的个性化备考包。你(agent)负责推理判断;`scripts/` 下的 Python 脚本负责确定性的脏活。两者通过 `corpus_cache/` 里的 JSON 文件交互。 ## 输入 - 简历:PDF、图片或文本文件路径。 - 模糊岗位:一个方向,例如"AI 应用开发"(**不是**具体的 JD)。 ## 工具(用包内 venv 运行:`.venv/bin/python`) - `scripts/resume_extract.py` → `extract_resume(path) -> ResumeExtraction{text, needs_vision, asset_path}` - `scripts/connectors/github.py` → `GithubConnector(repo_raw_urls).search(queries) -> SearchResult` - `scripts/connectors/nowcoder.py` → `NowCoderConnector(post_urls).search(queries) -> SearchResult` - `scripts/connectors/xiaohongshu.py` → `XiaohongshuConnector(export_path=..., driver=..., enable_image_ocr=...).search(queries) -> SearchResult`(二选一:`export_path` 读预生成的 JSON;`driver` 自动跑 MediaCrawler;`enable_image_ocr=True` 为 deep 模式,图片会下载/OCR 后作为主正文;`False` 为 fast 模式,只读标题、正文 caption、标签和时间戳) - `scripts/scrape/mediacrawler_driver.py` → `MediaCrawlerDriver(home=None).scrape_xhs(keywords, login_type="qrcode"|"cookie") -> Path`,**驱动模式**——shell out 调本机已装的 MediaCrawler(默认从 `$MEDIACRAWLER_HOME` 或 `~/.mediacrawler/` 找)。推荐 cookie 登录:在 MediaCrawler `config/base_config.py` 里设置 `LOGIN_TYPE = "cookie"` 和 `COOKIES = "web_session=<value>"`。 - `scripts/scrape/normalize_xhs.py` → `normalize(notes) -> list[dict]`(CLI:`python -m scripts.scrape.normalize_xhs <in.json> -o <out.json>`),把 MediaCrawler 原生输出归一化为 `XiaohongshuConnector` 的输入。仅手动模式用得到;driver 模式连接器内部自动调用。 - `scripts/ocr/extract.py` → `extract_text_from_image(path, engine=None, min_confidence=0.6) -> OcrResult{text, confidence, needs_vision}` - `scripts/ocr/xhs_images.py` → 下载小红书 `image_list`,默认探测 RapidOCR,把分页 OCR 合并进 `RawPost.content_text/raw_text`;低质量时设 `needs_vision_fallback=True` - `scripts/corpus/store.py` → `save_raw_posts / load_raw_posts / save_questions / load_questions` - `scripts/corpus/recency.py` → `filter_recent(posts, window_days=730, today=None) -> list[RawPost]` - `scripts/corpus/dedupe_rank.py` → `dedupe_and_rank(questions) -> list[Question]` - 数据模型在 `scripts/models.py`;结构说明见 `assets/schema.md`。 ## 工作流 0. **准备(仅当启用小红书源)。** 两种模式二选一,见 `docs/setup/mediacrawler.md`: - **driver 模式(推荐)**:用户**一次性**装 MediaCrawler 并登录。优先用 cookie 模式:从正常浏览器复制 `web_session`,写入 MediaCrawler `config.COOKIES`,然后用 `XiaohongshuConnector(driver=MediaCrawlerDriver(), login_type="cookie")` 自动跑采集。二维码模式仍可用,但更容易触发风控。 - **手动模式**:用户每次自己跑 MediaCrawler + `normalize_xhs.py`,把 `corpus_cache/xhs_export.json` 喂给 `XiaohongshuConnector(export_path=...)`。 - **读取深度必须说明**:小红书很多面经受文字区限制,完整题目在图片里。`fast` 模式只读标题/正文/标签,适合验证召回;正式备考包优先用 `deep` 模式(`enable_image_ocr=True`)读取图片 OCR。若因速度或依赖问题使用 `fast`,必须在输出中明确写“未读取图片 OCR,可能漏掉图片里的完整题目”。 文本/牛客/GitHub 源不需要这一步。 1. **简历理解。** 调用 `extract_resume`。若 `needs_vision` 为真,就用你自己的视觉能力直接读这张图片/PDF。产出结构化摘要:技能、项目(每个项目用到的技术)、关键术语。 2. **种子查询生成。** 用你**自己的领域知识**,从「岗位方向 + 简历」推导出种子查询。这是领域无关的:无论什么领域(市场、量化、后端、设计……)你本来就知道相关的岗位别名和底层技能/话题,当场生成即可。**不要依赖任何预设词表。** 种子来自两处:(a) 岗位方向隐含的相关岗位别名;(b) 从简历里抽出的具体技能/项目/关键词。优先用底层技能/话题词,而不是岗位名——它们更稳定、召回更好。 3. **迭代检索。** 源的优先级:**牛客 + 小红书(主力,带时间戳)> GitHub(补充,常过时)**。 **3a. URL 发现(每轮先做)。** 用你的搜索能力(WebSearch 或等价工具)对当前的种子查询跑一遍,收集候选 URL。按域名分桶: - `nowcoder.com/discuss/<post_id>` → 进 `NowCoderConnector(post_urls=...)` - `xiaohongshu.com/explore/<note_id>` → **不能直抓**;如果启用了小红书源,把当前的关键词丢给 `XiaohongshuConnector(driver=MediaCrawlerDriver()).search([keywords])`,它会自动 shell out 跑 MediaCrawler。若未启用,记下笔记 ID 让用户按 `docs/setup/mediacrawler.md` 配置 - `github.com/<owner>/<repo>/blob/<branch>/<path>` 或 `raw.githubusercontent.com/...` → 转 raw URL → 进 `GithubConnector(repo_raw_urls=...)` - **其他公开正文页**(知乎 article、CSDN 文章、个人博客、woshipm/uisdc 等):用 WebFetch 拉回正文,自己手工构造 `RawPost(source="webfetch:<domain>", post_type="text", raw_text=<正文>, posted_at=<页面可见日期或 None>)`,和 connector 结果一起 `save_raw_posts` 显式排除:聚合/listing 页(只接 article 页