Naver Blog Research

Name: Naver Blog Research
Author: nomadamas

nomadamas/k-skill

2.9k installs
6.5k repo stars
Updated July 27, 2026
nomadamas/k-skill

naver-blog-research searches and reads Naver blog posts and downloads images using bundled python3 scripts without an API key.

About

Naver Blog Research is a Korean content research skill that searches Naver blogs, reads post bodies, and downloads images using only python3 standard library helpers in scripts/. naver_search.py returns structured JSON for queries with count and sort options up to thirty results, while naver_read.py fetches full text via mobile blog URLs to bypass PC iframe layouts. naver_download_images.py saves CDN images from blogfiles.naver.net and postfiles.pstatic.net, optionally piped from read output. The recommended workflow searches top three to five posts, reads the most relevant articles, downloads images when needed, and cross-checks findings with WebSearch on Google for credibility. Response policy requires summarizing results, citing URL and author sources, avoiding dozens of requests per session, and warning about IP blocks on heavy automated use. Prerequisites are internet access and python3 8 plus. Use when users need Korean blog research phrases like Naver blog search, read this Naver post, or investigate topics Google alone misses for Korea-focused content.

No API key; python3 stdlib scripts search, read, and download Naver blog content.
Uses m.blog.naver.com mobile URLs to extract post bodies without PC iframe issues.
naver_search.py outputs structured JSON with title, url, mobile_url, snippet, and author.
Recommended flow: search, read top posts, optional image download, cross-verify with WebSearch.
Session limits: avoid dozens of requests; cite sources; warn on bulk crawling risk.

Naver Blog Research by the numbers

2,903 all-time installs (skills.sh)
+126 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #201 of 1,881 Marketing & SEO skills by installs in the Skillselion catalog
Security screen: MEDIUM risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

naver-blog-research capabilities & compatibility

Capabilities: naver blog search with json structured output · full post text extraction via mobile urls · image download from naver blog cdns · cross verification guidance with websearch · source citation and session rate discipline
Use cases: research · web search · web scraping
Platforms: macOS · Linux · Windows

From the docs

What naver-blog-research says it does

WebSearch(구글) 결과와 교차 검증하여 정보 신뢰도 높이기

SKILL.md

npx skills add https://github.com/nomadamas/k-skill --skill naver-blog-research

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/nomadamas/k-skill/naver-blog-research.svg)](https://skillselion.com/skills/nomadamas/k-skill/naver-blog-research)

Installs	2.9k
repo stars	★ 6.5k
Security audit	2 / 3 scanners passed
Last updated	July 27, 2026
Repository	nomadamas/k-skill ↗

How do I research Korean topics on Naver blogs and extract full post text without a paid API or browser scraping setup?

Search Naver blogs, read full Korean posts, and download images with python3 stdlib scripts and no API key.

Who is it for?

Agents doing small-scale Korean content research where Naver blogs are primary sources.

Skip if: Skip for Naver News, cafes, bulk commercial crawling, or non-blog Naver services.

When should I use this skill?

User asks to search Naver blogs, read a Naver post, do Korean blog research, or download images from a blog article.

What you get

JSON search results, extracted post text with sources, and optional locally saved images from cited blogs.

Search JSON, extracted post text, and optional downloaded images

By the numbers

Uses Python urllib.request with dedicated secure and insecure SSL context helpers

Files

SKILL.mdMarkdownGitHub ↗

네이버 블로그 리서치

What this skill does

네이버 블로그를 검색하고, 개별 포스트의 원문을 읽고, 이미지를 로컬에 다운로드한다.

API 키 없이 python3 표준 라이브러리만으로 동작한다.
검색 결과를 구조화된 JSON으로 출력한다.
모바일 버전(m.blog.naver.com)을 이용해 iframe 없이 본문을 직접 추출한다.
블로그 이미지 CDN(blogfiles.naver.net, postfiles.pstatic.net)에서 이미지를 다운로드한다.

When to use

"네이버 블로그에서 결혼식 체크리스트 검색해줘"
"네이버 블로그 리서치 해줘"
"한국 블로그에서 관련 정보 조사해줘"
"네이버 블로그 글 읽어줘"
"이 네이버 블로그 포스트에서 이미지 다운로드해줘"
한국어 콘텐츠 리서치에서 구글 외 네이버 블로그 소스가 필요한 상황

When not to use

네이버 뉴스, 카페, 지식iN 등 블로그 외 네이버 서비스 검색
대량 크롤링/스크래핑 (한 세션에 수십 건 이상의 요청)
상업적 데이터 수집

Prerequisites

인터넷 연결
python3 3.8+
이 스킬 디렉토리의 scripts/ 안에 포함된 helper 스크립트

Workflow

1. 네이버 블로그 검색

python3 scripts/naver_search.py "검색어" --count 10 --sort sim

인자	필수	설명	기본값
query	O	검색어	-
--count	X	결과 수 (최대 30)	10
--sort	X	sim(관련도), date(최신)	sim
--timeout	X	요청 타임아웃(초)	15

출력 예시:

{
  "query": "결혼식 체크리스트",
  "total_results": 7,
  "results": [
    {
      "title": "결혼식 체크리스트 총정리",
      "url": "https://blog.naver.com/user123/224212849946",
      "mobile_url": "https://m.blog.naver.com/user123/224212849946",
      "snippet": "결혼식 1주일 전에 반드시 확인해야 할...",
      "author": "user123"
    }
  ]
}

2. 블로그 원문 읽기

검색 결과에서 관심 있는 포스트의 URL을 선택하여 원문을 읽는다.

python3 scripts/naver_read.py "https://blog.naver.com/user123/224212849946"

인자	필수	설명	기본값
url	O	블로그 포스트 URL (PC 또는 모바일)	-
--no-images	X	이미지 URL 제외	false
--max-length	X	본문 최대 글자 수 (0=무제한)	0
--timeout	X	요청 타임아웃(초)	20

PC URL을 넣어도 자동으로 모바일 URL로 변환하여 요청한다.

3. 이미지 다운로드 (필요 시)

python3 scripts/naver_download_images.py --urls "url1,url2,url3" --output ./images/

또는 naver_read.py 결과를 파이프로 전달:

python3 scripts/naver_read.py "https://..." | python3 scripts/naver_download_images.py --output ./images/

인자	필수	설명	기본값
--urls	X	쉼표 구분 이미지 URL	-
--output	X	저장 디렉토리	./naver-images/
--max	X	최대 다운로드 수	10
--timeout	X	요청 타임아웃(초)	15

Response policy

검색 결과와 본문은 사용자에게 요약하여 전달한다.
블로그 출처(URL, 작성자)를 반드시 함께 안내한다.
한 세션에 과도한 요청(수십 건 이상)을 자제한다.
이미지 다운로드 시 사용자에게 저장 경로를 안내한다.

Done when

검색 결과가 JSON으로 정상 출력된다.
블로그 원문 텍스트가 추출된다.
필요한 이미지가 로컬에 저장된다.
출처가 명시된다.

Notes

네이버 검색엔진을 직접 요청하므로 대량/자동화 사용 시 IP 차단 가능성이 있다.
이 스킬은 소량, 비상업적 콘텐츠 리서치 용도로 설계되었다.
네이버 HTML 구조는 변경될 수 있어, 파싱 실패 시 에러 메시지를 확인하고 스크립트 업데이트가 필요할 수 있다.
PC 버전(blog.naver.com)은 iframe 구조여서 모바일 버전(m.blog.naver.com)을 사용한다.

"""Shared HTTP utilities for Naver blog scripts (SSL handling, URL validation, urlopen wrapper)."""

from __future__ import annotations

import re
import ssl
import sys
import urllib.error
import urllib.parse
import urllib.request


TAG_RE = re.compile(r"<[^>]+>")

_ssl_ctx_secure: ssl.SSLContext | None = None
_ssl_ctx_insecure: ssl.SSLContext | None = None


def _get_ssl_context(*, insecure: bool = False) -> ssl.SSLContext:
    global _ssl_ctx_secure, _ssl_ctx_insecure
    if insecure:
        if _ssl_ctx_insecure is None:
            ctx = ssl.create_default_context()
            ctx.check_hostname = False
            ctx.verify_mode = ssl.CERT_NONE
            _ssl_ctx_insecure = ctx
        return _ssl_ctx_insecure
    if _ssl_ctx_secure is None:
        _ssl_ctx_secure = ssl.create_default_context()
    return _ssl_ctx_secure


_NAVER_DOMAINS = (".naver.com", ".naver.net", ".pstatic.net")


def is_naver_url(url: str) -> bool:
    host = urllib.parse.urlparse(url).hostname or ""
    return any(host == d.lstrip(".") or host.endswith(d) for d in _NAVER_DOMAINS)


def urlopen(request: urllib.request.Request, timeout: int, *, insecure: bool = False):
    """urlopen with explicit SSL insecure mode for Naver domains.

    When *insecure* is True and the target is a Naver domain, SSL certificate
    verification is skipped.  A warning is printed to stderr on every call so
    the caller is always aware.
    """
    if insecure:
        if not is_naver_url(request.full_url):
            raise ValueError("insecure 모드는 네이버 도메인에만 사용할 수 있습니다.")
        print(
            "[warn] SSL 인증서 검증이 비활성화되었습니다. 연결이 안전하지 않을 수 있습니다.",
            file=sys.stderr,
        )
        return urllib.request.urlopen(
            request, timeout=timeout, context=_get_ssl_context(insecure=True),
        )
    return urllib.request.urlopen(request, timeout=timeout, context=_get_ssl_context())

from __future__ import annotations

import argparse
import json
import os
import sys
import urllib.error
import urllib.request
from concurrent.futures import ThreadPoolExecutor, as_completed

sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from _naver_http import is_naver_url, urlopen

DEFAULT_OUTPUT_DIR = "./naver-images"
DEFAULT_MAX = 10
DEFAULT_TIMEOUT = 15

DEFAULT_HEADERS = {
    "Accept": "image/webp,image/apng,image/*,*/*;q=0.8",
    "Accept-Language": "ko,en-US;q=0.9,en;q=0.8",
    "Referer": "https://m.blog.naver.com/",
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36"
    ),
}

CONTENT_TYPE_TO_EXT = {
    "image/jpeg": ".jpg",
    "image/png": ".png",
    "image/gif": ".gif",
    "image/webp": ".webp",
    "image/bmp": ".bmp",
    "image/svg+xml": ".svg",
}


_MAGIC_BYTES = (
    (b"\x89PNG\r\n\x1a\n", ".png"),
    (b"GIF87a", ".gif"),
    (b"GIF89a", ".gif"),
    (b"RIFF", ".webp"),  # WebP: RIFF....WEBP (check first 4 bytes)
    (b"BM", ".bmp"),
)


def guess_extension(url: str, content_type: str | None = None, data: bytes | None = None) -> str:
    if content_type:
        ct = content_type.split(";")[0].strip().lower()
        if ct in CONTENT_TYPE_TO_EXT:
            return CONTENT_TYPE_TO_EXT[ct]

    lower_url = url.lower().split("?")[0]
    for ext in (".jpg", ".jpeg", ".png", ".gif", ".webp", ".bmp", ".svg"):
        if lower_url.endswith(ext):
            return ".jpg" if ext == ".jpeg" else ext

    if data:
        for magic, ext in _MAGIC_BYTES:
            if data[:len(magic)] == magic:
                if ext == ".webp" and data[8:12] != b"WEBP":
                    continue
                return ext
        if data[:2] in (b"\xff\xd8",):
            return ".jpg"

    return ".jpg"


def download_image(url: str, output_path: str, output_dir: str, timeout: int = DEFAULT_TIMEOUT, *, insecure: bool = False) -> dict:
    """Download a single image from a Naver CDN URL.

    *output_dir* is used solely for path-traversal protection: the resolved
    *output_path* must reside inside *output_dir*.
    """
    if not is_naver_url(url):
        return {"url": url, "error": "Not a Naver CDN URL. Skipped."}

    real_dir = os.path.realpath(output_dir)
    if not os.path.realpath(output_path).startswith(real_dir + os.sep):
        return {"url": url, "error": "Output path escapes target directory. Skipped."}

    request = urllib.request.Request(url, headers=DEFAULT_HEADERS)

    try:
        with urlopen(request, timeout, insecure=insecure) as response:
            data = response.read()
            content_type = response.headers.get("Content-Type", "")
    except (urllib.error.HTTPError, urllib.error.URLError, OSError) as error:
        return {"url": url, "error": str(error)}

    ext = guess_extension(url, content_type, data)
    if not os.path.splitext(output_path)[1]:
        output_path += ext

    os.makedirs(os.path.dirname(output_path) or ".", exist_ok=True)

    with open(output_path, "wb") as f:
        f.write(data)

    size_kb = round(len(data) / 1024, 1)
    return {"url": url, "path": output_path, "size_kb": size_kb}


def download_images(
    urls: list[str],
    output_dir: str = DEFAULT_OUTPUT_DIR,
    max_count: int = DEFAULT_MAX,
    timeout: int = DEFAULT_TIMEOUT,
    *,
    insecure: bool = False,
) -> dict:
    os.makedirs(output_dir, exist_ok=True)

    max_count = max(1, max_count)
    targets = urls[:max_count]
    downloaded: list[dict] = []
    failed: list[dict] = []

    # index → result 순서를 보장하기 위해 dict로 매핑
    results_by_index: dict[int, dict] = {}

    with ThreadPoolExecutor(max_workers=min(4, max(1, len(targets)))) as executor:
        future_to_index = {}
        for i, url in enumerate(targets, start=1):
            filename = f"{i:03d}"
            output_path = os.path.join(output_dir, filename)
            future = executor.submit(download_image, url, output_path, output_dir, timeout, insecure=insecure)
            future_to_index[future] = i

        for future in as_completed(future_to_index):
            idx = future_to_index[future]
            try:
                results_by_index[idx] = future.result()
            except Exception as exc:
                results_by_index[idx] = {"url": targets[idx - 1], "error": str(exc)}

    # 원래 순서대로 정렬
    for idx in sorted(results_by_index):
        result = results_by_index[idx]
        if "error" in result:
            failed.append(result)
        else:
            downloaded.append(result)

    return {
        "downloaded": len(downloaded),
        "files": downloaded,
        "failed": failed,
    }


def parse_args(argv: list[str]) -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        description="Download images from Naver blog CDN URLs."
    )
    parser.add_argument(
        "--urls", type=str, default="",
        help="Comma-separated image URLs.",
    )
    parser.add_argument(
        "--output", type=str, default=DEFAULT_OUTPUT_DIR,
        help=f"Output directory. Default: {DEFAULT_OUTPUT_DIR}",
    )
    parser.add_argument(
        "--max", type=int, default=DEFAULT_MAX,
        help=f"Maximum number of images to download. Default: {DEFAULT_MAX}",
    )
    parser.add_argument(
        "--timeout", type=int, default=DEFAULT_TIMEOUT,
        help=f"HTTP request timeout in seconds. Default: {DEFAULT_TIMEOUT}",
    )
    parser.add_argument(
        "--insecure", action="store_true",
        help="Skip SSL certificate verification (use only when certificate errors occur).",
    )
    return parser.parse_args(argv)


def read_urls_from_stdin() -> list[str]:
    try:
        data = json.load(sys.stdin)
        if isinstance(data, dict) and "images" in data:
            return [img["url"] for img in data["images"] if isinstance(img, dict) and img.get("url")]
        if isinstance(data, list):
            return [
                u for item in data
                if (u := (item if isinstance(item, str) else item.get("url", "")))
            ]
        if isinstance(data, dict):
            print(
                "[warn] stdin JSON에 'images' 키가 없습니다. "
                "naver_read.py 실행 시 --no-images 플래그를 사용하지 않았는지 확인하세요.",
                file=sys.stderr,
            )
    except (json.JSONDecodeError, KeyError, TypeError) as exc:
        print(f"[warn] stdin JSON 파싱 실패: {exc}", file=sys.stderr)
        return []
    return []


def main(argv: list[str] | None = None) -> int:
    args = parse_args(argv or sys.argv[1:])

    urls: list[str] = []

    if args.urls:
        urls = [u.strip() for u in args.urls.split(",") if u.strip()]

    if not urls and not sys.stdin.isatty():
        urls = read_urls_from_stdin()

    if not urls:
        print(
            json.dumps({"error": "No image URLs provided. Use --urls or pipe naver_read.py output via stdin."}, ensure_ascii=False),
            file=sys.stderr,
        )
        return 1

    result = download_images(
        urls,
        output_dir=args.output,
        max_count=args.max,
        timeout=args.timeout,
        insecure=args.insecure,
    )

    print(json.dumps(result, ensure_ascii=False, indent=2))
    return 0


if __name__ == "__main__":
    raise SystemExit(main())

from __future__ import annotations

import argparse
import json
import os
import re
import sys
import urllib.error
import urllib.request
from html import unescape

sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from _naver_http import TAG_RE, is_naver_url, urlopen

MOBILE_UA = (
    "Mozilla/5.0 (iPhone; CPU iPhone OS 17_0 like Mac OS X) "
    "AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Mobile/15E148 Safari/604.1"
)

DEFAULT_HEADERS = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "ko,en-US;q=0.9,en;q=0.8",
    "User-Agent": MOBILE_UA,
}

BR_RE = re.compile(r"<br\s*/?>", re.IGNORECASE)
BLOCK_END_RE = re.compile(r"</(p|div|li)>", re.IGNORECASE)
WHITESPACE_RE = re.compile(r"[ \t]+")
BLANK_LINES_RE = re.compile(r"\n{3,}")

_IMG_CDN_HOSTS = r"(?:blogfiles\.naver\.net|postfiles\.pstatic\.net|mblogthumb-phinf\.pstatic\.net)"

IMAGE_LAZY_PATTERN = re.compile(
    rf'data-lazy-src="(https?://{_IMG_CDN_HOSTS}[^"]+)"'
)
IMAGE_SRC_PATTERN = re.compile(
    rf'src="(https?://{_IMG_CDN_HOSTS}[^"]+)"'
)
IMAGE_ALT_PATTERN = re.compile(
    r'alt="([^"]*)"'
)

TITLE_PATTERN = re.compile(
    r'<title[^>]*>(.*?)</title>', re.DOTALL | re.IGNORECASE
)

SCRIPT_STYLE_RE = re.compile(r"<(script|style|noscript)[^>]*>.*?</\1>", re.DOTALL | re.IGNORECASE)

PC_BLOG_RE = re.compile(r"^https?://blog\.naver\.com/")
BLOG_ID_RE = re.compile(r"blog\.naver\.com/([a-zA-Z0-9_]+)/(\d+)")


def to_mobile_url(url: str) -> str:
    url = url.strip()
    url = PC_BLOG_RE.sub("https://m.blog.naver.com/", url)
    if not url.startswith("https://m.blog.naver.com/"):
        match = BLOG_ID_RE.search(url)
        if match:
            url = f"https://m.blog.naver.com/{match.group(1)}/{match.group(2)}"
    return url


def fetch_blog_page(url: str, timeout: int = 20, *, insecure: bool = False) -> str:
    mobile_url = to_mobile_url(url)
    if not is_naver_url(mobile_url):
        raise ValueError(f"Not a Naver blog URL: {url}")
    request = urllib.request.Request(mobile_url, headers=DEFAULT_HEADERS)

    try:
        with urlopen(request, timeout, insecure=insecure) as response:
            return response.read().decode("utf-8", "ignore")
    except urllib.error.HTTPError as error:
        raise RuntimeError(
            f"Naver blog returned HTTP {error.code} for {mobile_url}. "
            "The post may not exist or access may be restricted."
        ) from error


def extract_title(html: str) -> str:
    match = TITLE_PATTERN.search(html)
    if not match:
        return ""
    title = unescape(TAG_RE.sub("", match.group(1))).strip()
    title = re.sub(r"\s*[-:|]?\s*네이버\s*블로그$", "", title).strip()
    return title


def _extract_div_block(html: str, start_pos: int) -> str:
    tag_start = html.rfind("<div", 0, start_pos)
    if tag_start < 0:
        tag_start = start_pos

    depth = 0
    pos = tag_start
    started = False
    length = len(html)
    while pos < length:
        # HTML 주석 건너뛰기
        if html[pos : pos + 4] == "<!--":
            end = html.find("-->", pos + 4)
            pos = end + 3 if end >= 0 else length
            continue
        if html[pos : pos + 4] == "<div" and (pos + 4 >= length or html[pos + 4] in (" ", ">", "\t", "\n", "/")):
            depth += 1
            started = True
        elif html[pos : pos + 6] == "</div>":
            depth -= 1
            if started and depth == 0:
                return html[tag_start : pos + 6]
        pos += 1

    return html[tag_start:]


def extract_content_area(html: str) -> str:
    cleaned = SCRIPT_STYLE_RE.sub("", html)

    match = re.search(r'class="[^"]*\bse-main-container\b[^"]*"', cleaned)
    if match:
        return _extract_div_block(cleaned, match.start())

    for class_name in ("post_ct", "postViewArea", "post-view"):
        match = re.search(rf'class="[^"]*\b{re.escape(class_name)}\b[^"]*"', cleaned)
        if match:
            return _extract_div_block(cleaned, match.start())

    marker = cleaned.find('id="viewTypeSelector"')
    if marker >= 0:
        return _extract_div_block(cleaned, marker)

    return ""


def extract_text(html_fragment: str) -> str:
    text = BR_RE.sub("\n", html_fragment)
    text = BLOCK_END_RE.sub("\n", text)
    text = TAG_RE.sub("", text)
    text = unescape(text)

    lines = []
    for line in text.split("\n"):
        stripped = WHITESPACE_RE.sub(" ", line).strip()
        if stripped:
            lines.append(stripped)

    result = "\n".join(lines)
    result = BLANK_LINES_RE.sub("\n\n", result)
    return result.strip()


def extract_images(html_fragment: str) -> list[dict]:
    images: list[dict] = []
    seen_base: set[str] = set()

    img_tags = re.finditer(r"<img\s[^>]+>", html_fragment, re.IGNORECASE)
    for img_match in img_tags:
        img_tag = img_match.group(0)

        lazy_match = IMAGE_LAZY_PATTERN.search(img_tag)
        src_match = IMAGE_SRC_PATTERN.search(img_tag)
        url_match = lazy_match or src_match
        if not url_match:
            continue

        url = url_match.group(1)

        base_url = re.sub(r"\?type=.*$", "", url)
        if base_url in seen_base:
            continue
        seen_base.add(base_url)

        if "?type=" not in url:
            url = base_url
        elif "_blur" in url:
            url = re.sub(r"\?type=w\d+_blur", "?type=w800", url)

        alt_match = IMAGE_ALT_PATTERN.search(img_tag)
        alt = unescape(alt_match.group(1)).strip() if alt_match else ""

        images.append({"url": url, "alt": alt})

    return images


def read_blog(url: str, include_images: bool = True, max_length: int = 0, timeout: int = 20, *, insecure: bool = False) -> dict:
    html = fetch_blog_page(url, timeout=timeout, insecure=insecure)
    mobile_url = to_mobile_url(url)

    title = extract_title(html)
    content_area = extract_content_area(html)
    content = extract_text(content_area)

    if max_length > 0 and len(content) > max_length:
        content = content[:max_length] + "..."

    result: dict = {
        "url": mobile_url,
        "title": title,
        "content": content,
        "char_count": len(content),
    }

    if not content:
        result["warning"] = "본문 영역을 찾지 못했습니다. 네이버 HTML 구조가 변경되었을 수 있습니다."

    if include_images:
        result["images"] = extract_images(content_area)

    return result


def parse_args(argv: list[str]) -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        description="Read a Naver blog post and extract text content and images."
    )
    parser.add_argument("url", help="Naver blog post URL (PC or mobile).")
    parser.add_argument(
        "--no-images", action="store_true",
        help="Exclude image URLs from output.",
    )
    parser.add_argument(
        "--max-length", type=int, default=0,
        help="Maximum content length in characters (0 = unlimited). Default: 0.",
    )
    parser.add_argument(
        "--timeout", type=int, default=20,
        help="HTTP request timeout in seconds. Default: 20.",
    )
    parser.add_argument(
        "--insecure", action="store_true",
        help="Skip SSL certificate verification (use only when certificate errors occur).",
    )
    return parser.parse_args(argv)


def main(argv: list[str] | None = None) -> int:
    args = parse_args(argv or sys.argv[1:])

    try:
        result = read_blog(
            args.url,
            include_images=not args.no_images,
            max_length=args.max_length,
            timeout=args.timeout,
            insecure=args.insecure,
        )
    except (RuntimeError, ValueError) as error:
        print(json.dumps({"error": str(error)}, ensure_ascii=False), file=sys.stderr)
        return 1

    print(json.dumps(result, ensure_ascii=False, indent=2))
    return 0


if __name__ == "__main__":
    raise SystemExit(main())

from __future__ import annotations

import argparse
import json
import os
import re
import sys
import time
import urllib.parse
import urllib.request
from html import unescape

sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
from _naver_http import TAG_RE, urlopen

SEARCH_URL = "https://search.naver.com/search.naver"
DEFAULT_COUNT = 10
MAX_COUNT = 30
FIRST_PAGE_START = 1
RESULTS_PER_PAGE = 15

DEFAULT_HEADERS = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "ko,en-US;q=0.9,en;q=0.8",
    "User-Agent": (
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) Chrome/136.0.0.0 Safari/537.36"
    ),
}

BLOG_ANCHOR_PATTERN = re.compile(
    r'<a[^>]*href="(https?://blog\.naver\.com/([a-zA-Z0-9_]+)/(\d+))"[^>]*>(.*?)</a>',
    re.DOTALL,
)


def strip_html(text: str) -> str:
    return unescape(TAG_RE.sub("", text)).strip()


def build_search_params(query: str, start: int = FIRST_PAGE_START, sort: str = "sim") -> dict[str, str]:
    return {
        "query": query,
        "ssc": "tab.blog.all",
        "sm": "tab_jum" if start <= FIRST_PAGE_START else "tab_pge",
        "start": str(start),
        "nso": {"sim": "so:r,p:all,a:all", "date": "so:dd,p:all,a:all"}.get(sort, "so:r,p:all,a:all"),
    }


def fetch_search_page(query: str, start: int = 1, sort: str = "sim", timeout: int = 15, *, insecure: bool = False) -> str:
    params = build_search_params(query, start=start, sort=sort)
    url = f"{SEARCH_URL}?{urllib.parse.urlencode(params)}"
    request = urllib.request.Request(url, headers=DEFAULT_HEADERS)

    try:
        with urlopen(request, timeout, insecure=insecure) as response:
            return response.read().decode("utf-8", "ignore")
    except urllib.error.HTTPError as error:
        raise RuntimeError(
            f"Naver search returned HTTP {error.code}. "
            "The request may have been blocked. Retry later or reduce request volume."
        ) from error


def parse_search_results(html: str) -> list[dict]:
    results: list[dict] = []
    anchors = BLOG_ANCHOR_PATTERN.findall(html)

    pending: dict[str, dict] = {}

    for full_url, user_id, post_id, inner_html in anchors:
        if full_url not in pending:
            pending[full_url] = {
                "url": full_url,
                "mobile_url": f"https://m.blog.naver.com/{user_id}/{post_id}",
                "author": user_id,
                "title": "",
                "snippet": "",
            }

        text = strip_html(inner_html)
        if not text:
            continue

        entry = pending[full_url]

        if "headline1" in inner_html or "text-type-headline" in inner_html:
            if not entry["title"]:
                entry["title"] = text
        elif "body1" in inner_html or "text-type-body" in inner_html:
            if not entry["snippet"]:
                entry["snippet"] = text
        else:
            if not entry["title"]:
                entry["title"] = text

    for entry in pending.values():
        results.append(entry)

    return results


def search(query: str, count: int = DEFAULT_COUNT, sort: str = "sim", timeout: int = 15, *, insecure: bool = False) -> dict:
    count = max(1, min(count, MAX_COUNT))
    all_results: list[dict] = []
    seen_urls: set[str] = set()
    start = FIRST_PAGE_START
    # 네이버 검색이 페이지당 정확히 RESULTS_PER_PAGE개를 반환하지 않을 수 있으므로 여유 페이지 확보
    max_pages = (count // RESULTS_PER_PAGE) + 3

    for page_num in range(max_pages):
        if len(all_results) >= count:
            break

        if page_num > 0:
            time.sleep(0.5)

        html = fetch_search_page(query, start=start, sort=sort, timeout=timeout, insecure=insecure)
        page_results = parse_search_results(html)[:RESULTS_PER_PAGE]

        if not page_results:
            if start == 1:
                print("[warn] 검색 결과 파싱 실패. 네이버 HTML 구조가 변경되었을 수 있습니다.", file=sys.stderr)
            break

        new_count = 0
        for result in page_results:
            if result["url"] not in seen_urls:
                seen_urls.add(result["url"])
                all_results.append(result)
                new_count += 1
                if len(all_results) >= count:
                    break

        if new_count == 0:
            break

        start += RESULTS_PER_PAGE

    return {
        "query": query,
        "total_results": len(all_results),
        "results": all_results,
    }


def parse_args(argv: list[str]) -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        description="Search Naver blogs and return structured JSON results."
    )
    parser.add_argument("query", help="Search query string.")
    parser.add_argument(
        "--count", type=int, default=DEFAULT_COUNT,
        help=f"Number of results to return (max {MAX_COUNT}, default {DEFAULT_COUNT}).",
    )
    parser.add_argument(
        "--sort", choices=["sim", "date"], default="sim",
        help="Sort order: sim (relevance) or date (newest first). Default: sim.",
    )
    parser.add_argument(
        "--timeout", type=int, default=15,
        help="HTTP request timeout in seconds. Default: 15.",
    )
    parser.add_argument(
        "--insecure", action="store_true",
        help="Skip SSL certificate verification (use only when certificate errors occur).",
    )
    return parser.parse_args(argv)


def main(argv: list[str] | None = None) -> int:
    args = parse_args(argv or sys.argv[1:])

    try:
        result = search(
            args.query,
            count=args.count,
            sort=args.sort,
            timeout=args.timeout,
            insecure=args.insecure,
        )
    except RuntimeError as error:
        print(json.dumps({"error": str(error)}, ensure_ascii=False), file=sys.stderr)
        return 1

    print(json.dumps(result, ensure_ascii=False, indent=2))
    return 0


if __name__ == "__main__":
    raise SystemExit(main())

Related skills

Seo AuditRun structured SEO audits on their SaaS site or content hub and receive a prioritized action plan.167k41.1k

CopywritingGenerate, rewrite, or strengthen persuasive website and landing-page copy that converts visitors into users.158k41.1k

Viral Short FormQuickly generate high-retention hooks, scripts, and outlines for TikTok, Reels, YouTube Shorts, and carousels.132k64

Viral HooksWrite and critique viral hooks for short-form video opening sequences.123k64

Viral Captions And CtasOptimize social media captions and CTAs for viral short-form video reach and saves.123k64

Viral Youtube ShortsWrite and diagnose YouTube Shorts for Shorts Feed and long-form funnel.123k64

How it compares

Choose naver-blog-research over generic SEO skills when the research target is Korean Naver blogs rather than Google Search Console or English-language SERPs.

FAQ

Does this skill need a Naver API key?

No. It uses python3 stdlib scripts that request Naver search and mobile blog pages directly.

Why use mobile blog URLs?

PC blog.naver.com uses iframes; m.blog.naver.com allows direct body extraction per the skill notes.

Are there usage limits?

Yes. Avoid dozens of requests per session; the skill warns about IP blocks on heavy automated use.

Is Naver Blog Research safe to install?

skills.sh reports 2 of 3 security scanners passed. Review the Security Audits panel on this page before installing in production.

Marketing & SEOcontentseo