
Korean Character Count
Measure Korean copy against grapheme, line, UTF-8, and NEIS byte limits before submitting forms, essays, or regulated text fields.
Overview
korean-character-count is an agent skill for the Grow phase that counts Korean graphemes, lines, UTF-8 bytes, and NEIS bytes from text you provide.
Install
npx skills add https://github.com/nomadamas/k-skill --skill korean-character-countWhat is this skill?
- Node.js script using Intl.Segmenter with Korean grapheme granularity (Node 18+)
- Counts UTF-8 bytes, lines, and NEIS-style byte totals for Korean text
- Hangul and mark-aware grapheme segmentation for accurate Korean counts
- Line-break aware NEIS chunking aligned with common Korean submission systems
- CLI-oriented utility suitable for piping files or stdin in agent workflows
- Requires Node.js 18+ for Intl.Segmenter
Adoption & trust: 1.8k installs on skills.sh; 5.4k GitHub stars; 3/3 security scanners passed (skills.sh audits).
What problem does it solve?
You are writing Korean content with strict byte or character caps and standard tools miscount Hangul graphemes or NEIS limits.
Who is it for?
Indie builders shipping Korean-language sites, docs, or form-heavy products who need NEIS-accurate counts in a scriptable workflow.
Skip if: Teams needing general English word counts, automated CMS publishing, or SEO keyword analysis without Korean length rules.
When should I use this skill?
You need NEIS-accurate or grapheme-accurate counts for Korean text files or pasted content.
What do I get? / Deliverables
You get accurate grapheme, line, UTF-8, and NEIS byte counts so you can trim or expand copy before submission.
- Grapheme count report
- UTF-8 and NEIS byte totals
- Line count for the supplied text
Recommended Skills
Journey fit
Length compliance for Korean marketing and editorial copy most often matters when you are publishing and iterating content in Grow. The script counts characters and NEIS bytes for written Korean text—the core job of content preparation rather than app runtime monitoring.
How it compares
Use this local counting script instead of guessing lengths in the editor status bar when NEIS byte rules apply.
Common Questions / FAQ
Who is korean-character-count for?
Solo builders and content authors preparing Korean text where portals enforce grapheme or NEIS byte limits.
When should I use korean-character-count?
During Grow content drafting and before submitting Korean copy to education, government, or legacy systems with byte caps.
Is korean-character-count safe to install?
It is a local Node script that reads text you point at; review the Security Audits panel on this page and inspect the script before running on sensitive drafts.
SKILL.md
READMESKILL.md - Korean Character Count
#!/usr/bin/env node "use strict"; const fs = require("node:fs"); const LINE_BREAK_PATTERN = /\r\n|[\n\r\u2028\u2029]/gu; const HANGUL_OR_MARK_PATTERN = /^[\p{Script=Hangul}\p{Mark}]+$/u; const HAS_HANGUL_PATTERN = /\p{Script=Hangul}/u; const WHITESPACE_ONLY_PATTERN = /^\s+$/u; const ASCII_ONLY_PATTERN = /^[\x00-\x7F]+$/; function ensureSegmenter() { if (!globalThis.Intl?.Segmenter) { throw new Error("Intl.Segmenter is required. Use Node.js 18 or newer."); } return new Intl.Segmenter("ko", { granularity: "grapheme" }); } function segmentGraphemes(text) { return Array.from(ensureSegmenter().segment(text), ({ segment }) => segment); } function countUtf8Bytes(text) { return Buffer.byteLength(text, "utf8"); } function countLines(text) { if (text.length === 0) { return 0; } return Array.from(text.matchAll(LINE_BREAK_PATTERN)).length + 1; } function countNeisBytes(text) { let total = 0; let lastIndex = 0; for (const match of text.matchAll(LINE_BREAK_PATTERN)) { const breakIndex = match.index ?? 0; total += countNeisChunkBytes(text.slice(lastIndex, breakIndex)); total += 2; lastIndex = breakIndex + match[0].length; } total += countNeisChunkBytes(text.slice(lastIndex)); return total; } function countNeisChunkBytes(chunk) { return segmentGraphemes(chunk).reduce((sum, grapheme) => sum + countNeisGraphemeBytes(grapheme), 0); } function countNeisGraphemeBytes(grapheme) { if (!grapheme) { return 0; } if (ASCII_ONLY_PATTERN.test(grapheme)) { return 1; } if (HANGUL_OR_MARK_PATTERN.test(grapheme) && HAS_HANGUL_PATTERN.test(grapheme)) { return 3; } return countUtf8Bytes(grapheme); } function createReport(text, profile = "default") { const graphemes = segmentGraphemes(text); const bytesUtf8 = countUtf8Bytes(text); const bytesNeis = countNeisBytes(text); const selectedBytes = profile === "neis" ? bytesNeis : bytesUtf8; return { profile, contract: { characters: "Unicode extended grapheme clusters via Intl.Segmenter", bytes: profile === "neis" ? "NEIS-compatible bytes: Hangul grapheme=3B, ASCII grapheme=1B, each line break=2B, everything else falls back to UTF-8 bytes" : "Actual UTF-8 encoded byte length", lines: "Empty string => 0 lines; otherwise count CRLF, LF, CR, U+2028, U+2029 as one line break each and add 1", }, counts: { characters: graphemes.length, characters_without_whitespace: graphemes.filter((grapheme) => !WHITESPACE_ONLY_PATTERN.test(grapheme)).length, code_points: Array.from(text).length, utf16_code_units: text.length, lines: countLines(text), bytes: selectedBytes, bytes_utf8: bytesUtf8, bytes_neis: bytesNeis, }, }; } function parseArgs(argv, stdinIsTTY = process.stdin.isTTY) { const options = { format: "json", inputMode: null, profile: "default", text: null, }; for (let index = 0; index < argv.length; index += 1) { const arg = argv[index]; if (arg === "--help" || arg === "-h") { options.help = true; continue; } if (arg === "--text") { setInputMode(options, "text"); options.text = readNextValue(argv, ++index, arg); continue; } if (arg === "--file") { setInputMode(options, "file"); options.file = readNextValue(argv, ++index, arg); continue; } if (arg === "--stdin") { setInputMode(options, "stdin"); continue; } if (arg === "--profile") { const value = readNextValue(argv, ++index, arg); if (!["default", "neis"].includes(value)) { throw new Error(`Unknown profile: ${value}`); } options.profile = value; continue; } if (arg === "--format") { const value = readNextValue(argv, ++index, arg); if (!["json", "text"].includes(value)) { throw new Error(`Unknown format: ${value}`); } options.format = value;