Korean Character Count

Name: Korean Character Count
Author: nomadamas

nomadamas/k-skill

2.8k installs
6.5k repo stars
Updated July 27, 2026
nomadamas/k-skill

korean-character-count deterministically counts Korean graphemes, lines, and bytes using default or neis profiles via a Node helper script.

About

Korean Character Count deterministically counts Korean text for self-introduction essays, applications, and free-form fields where one character matters, without LLM estimation. The default profile uses Intl.Segmenter with ko grapheme granularity for character counts, UTF-8 Buffer.byteLength for bytes, and newline sequences CRLF, LF, CR, U+2028, and U+2029 counted as single line breaks with empty strings returning zero lines. The neis profile shares grapheme and line rules but counts Korean graphemes as 3 bytes, ASCII as 1 byte, and newlines as 2 bytes for school record compatibility. A Node 18 helper script korean_character_count.js accepts text, file, or stdin input with json or text output formats. The skill never trims or normalizes input arbitrarily and reports which profile produced results. Submission workflows for NEIS or school records should select neis only when that contract is required.

Deterministic Intl.Segmenter grapheme counting without LLM guesses.
default and neis byte profile contracts documented.
Line counting rules for CRLF and Unicode line separators.
CLI helper for text, file, and stdin with json output.
No arbitrary trim or normalize on submitted text input.

Korean Character Count by the numbers

2,824 all-time installs (skills.sh)
+126 installs in the week ending Jul 28, 2026 (Skillselion tracking)
Ranked #185 of 3,301 Productivity & Planning skills by installs in the Skillselion catalog
Security screen: LOW risk (skills.sh audit)
Data as of Jul 28, 2026 (Skillselion catalog sync)

At a glance

korean-character-count capabilities & compatibility

Capabilities: intl.segmenter ko grapheme character counting · utf 8 and neis byte profile calculations · newline aware line counting rules · cli text file and stdin input support · json and text output format selection · profile contract reporting in responses
Use cases: translation · copywriting
Pricing: Free

From the docs

What korean-character-count says it does

LLM이 글자 수를 눈대중으로 예측하면 재현성이 없다.

SKILL.md

입력을 임의로 trim/정규화하지 않고

SKILL.md

npx skills add https://github.com/nomadamas/k-skill --skill korean-character-count

Add your badge

Show developers this skill is listed on Skillselion. Paste this into your README.

[![Listed on Skillselion](https://skillselion.com/badge/skills/nomadamas/k-skill/korean-character-count.svg)](https://skillselion.com/skills/nomadamas/k-skill/korean-character-count)

Installs	2.8k
repo stars	★ 6.5k
Security audit	3 / 3 scanners passed
Last updated	July 27, 2026
Repository	nomadamas/k-skill ↗

How do I count Korean essay characters and bytes exactly for form limits without LLM estimation?

Count Korean text deterministically with grapheme, line, and byte contracts for essays and form character limits.

Who is it for?

Users validating Korean self-introduction or application text against strict character or byte limits.

Skip if: Skip for non-Korean text-only counting or semantic editing of essay content.

When should I use this skill?

User asks to count Korean characters, bytes, or lines for essays, NEIS, or form limits.

What you get

Grapheme, line, and byte counts with stated profile contract from korean_character_count.js output.

grapheme count results
utf-8 byte totals
hangul segment breakdown

By the numbers

Requires Node.js 18 or newer for Intl.Segmenter

Files

SKILL.mdMarkdownGitHub ↗

한국어 글자 수 세기

What this skill does

자기소개서, 지원서, 자유서술형 폼처럼 글자 수 제한이 중요한 한국어 텍스트를 대상으로 LLM 추정 없이 결정론적으로 카운트한다.

기본 글자 수: Intl.Segmenter 기반 Unicode extended grapheme cluster
줄 수: CRLF, LF, CR, U+2028, U+2029 를 줄바꿈 1회로 계산
기본 byte 수: UTF-8 실제 인코딩 길이
호환 프로필: neis byte 규칙

When to use

"이 자기소개서 1000자 넘는지 정확히 세줘"
"이 텍스트를 UTF-8 byte 기준으로 계산해줘"
"줄 수랑 byte 수도 같이 알려줘"
"한글/영문/이모지 섞인 문장을 추정 말고 코드로 세줘"

Why this skill exists

글자 수 제한은 1자 차이도 민감하다.
LLM이 글자 수를 눈대중으로 예측하면 재현성이 없다.
이 스킬은 입력을 임의로 trim/정규화하지 않고, 문서화된 계약으로만 센다.

Contracts

`default` profile

characters: Intl.Segmenter("ko", { granularity: "grapheme" })
bytes: Buffer.byteLength(text, "utf8")
lines:
empty string => 0
non-empty => 줄바꿈 시퀀스 수 + 1
CRLF 는 2줄바꿈이 아니라 1줄바꿈으로 센다.

`neis` profile

characters: default 와 동일
lines: default 와 동일
bytes:
한글 grapheme => 3B
ASCII grapheme => 1B
Enter/줄바꿈 시퀀스 => 2B
그 외 문자는 UTF-8 byte 길이로 fallback

Prerequisites

node 18+
설치된 skill payload 안에 scripts/korean_character_count.js helper 포함
별도 API 키 없음

Workflow

1. 텍스트를 직접 받거나 파일/STDIN으로 읽는다. 2. node scripts/korean_character_count.js 로 결정론적 카운트를 실행한다. 3. 필요한 프로필(default/neis)과 출력 형식(json/text)을 고른다. 4. 결과를 그대로 반환하고, 어떤 계약으로 셌는지 함께 알려준다.

CLI examples

node scripts/korean_character_count.js --text "가나다"
node scripts/korean_character_count.js --text $'첫 줄\r\n둘째 줄🙂'
node scripts/korean_character_count.js --text $'첫 줄\n둘째 줄🙂' --profile neis --format text
node scripts/korean_character_count.js --file ./essay.txt --profile default
cat essay.txt | node scripts/korean_character_count.js --stdin --profile neis

Response policy

추정하지 말고 helper 결과를 그대로 쓴다.
어떤 profile로 셌는지 함께 보여준다.
기본값이 필요하면 default profile을 사용한다.
제출처가 NEIS/학교생활기록부 같은 별도 계약을 요구할 때만 neis 를 쓴다.

Done when

글자 수, 줄 수, byte 수가 함께 반환된다.
default 와 neis 계약 차이가 문서에 명시된다.
node scripts/korean_character_count.js --help 가 동작한다.
혼합 한국어/영문/공백/개행/emoji 입력에 대한 테스트가 있다.

Notes

Unicode grapheme clusters: https://www.unicode.org/reports/tr29/
WHATWG Encoding Standard: https://encoding.spec.whatwg.org/
Node Buffer.byteLength: https://nodejs.org/api/buffer.html

#!/usr/bin/env node
"use strict";

const fs = require("node:fs");

const LINE_BREAK_PATTERN = /\r\n|[\n\r\u2028\u2029]/gu;
const HANGUL_OR_MARK_PATTERN = /^[\p{Script=Hangul}\p{Mark}]+$/u;
const HAS_HANGUL_PATTERN = /\p{Script=Hangul}/u;
const WHITESPACE_ONLY_PATTERN = /^\s+$/u;
const ASCII_ONLY_PATTERN = /^[\x00-\x7F]+$/;

function ensureSegmenter() {
  if (!globalThis.Intl?.Segmenter) {
    throw new Error("Intl.Segmenter is required. Use Node.js 18 or newer.");
  }

  return new Intl.Segmenter("ko", { granularity: "grapheme" });
}

function segmentGraphemes(text) {
  return Array.from(ensureSegmenter().segment(text), ({ segment }) => segment);
}

function countUtf8Bytes(text) {
  return Buffer.byteLength(text, "utf8");
}

function countLines(text) {
  if (text.length === 0) {
    return 0;
  }

  return Array.from(text.matchAll(LINE_BREAK_PATTERN)).length + 1;
}

function countNeisBytes(text) {
  let total = 0;
  let lastIndex = 0;

  for (const match of text.matchAll(LINE_BREAK_PATTERN)) {
    const breakIndex = match.index ?? 0;
    total += countNeisChunkBytes(text.slice(lastIndex, breakIndex));
    total += 2;
    lastIndex = breakIndex + match[0].length;
  }

  total += countNeisChunkBytes(text.slice(lastIndex));
  return total;
}

function countNeisChunkBytes(chunk) {
  return segmentGraphemes(chunk).reduce((sum, grapheme) => sum + countNeisGraphemeBytes(grapheme), 0);
}

function countNeisGraphemeBytes(grapheme) {
  if (!grapheme) {
    return 0;
  }

  if (ASCII_ONLY_PATTERN.test(grapheme)) {
    return 1;
  }

  if (HANGUL_OR_MARK_PATTERN.test(grapheme) && HAS_HANGUL_PATTERN.test(grapheme)) {
    return 3;
  }

  return countUtf8Bytes(grapheme);
}

function createReport(text, profile = "default") {
  const graphemes = segmentGraphemes(text);
  const bytesUtf8 = countUtf8Bytes(text);
  const bytesNeis = countNeisBytes(text);
  const selectedBytes = profile === "neis" ? bytesNeis : bytesUtf8;

  return {
    profile,
    contract: {
      characters: "Unicode extended grapheme clusters via Intl.Segmenter",
      bytes:
        profile === "neis"
          ? "NEIS-compatible bytes: Hangul grapheme=3B, ASCII grapheme=1B, each line break=2B, everything else falls back to UTF-8 bytes"
          : "Actual UTF-8 encoded byte length",
      lines:
        "Empty string => 0 lines; otherwise count CRLF, LF, CR, U+2028, U+2029 as one line break each and add 1",
    },
    counts: {
      characters: graphemes.length,
      characters_without_whitespace: graphemes.filter((grapheme) => !WHITESPACE_ONLY_PATTERN.test(grapheme)).length,
      code_points: Array.from(text).length,
      utf16_code_units: text.length,
      lines: countLines(text),
      bytes: selectedBytes,
      bytes_utf8: bytesUtf8,
      bytes_neis: bytesNeis,
    },
  };
}

function parseArgs(argv, stdinIsTTY = process.stdin.isTTY) {
  const options = {
    format: "json",
    inputMode: null,
    profile: "default",
    text: null,
  };

  for (let index = 0; index < argv.length; index += 1) {
    const arg = argv[index];

    if (arg === "--help" || arg === "-h") {
      options.help = true;
      continue;
    }

    if (arg === "--text") {
      setInputMode(options, "text");
      options.text = readNextValue(argv, ++index, arg);
      continue;
    }

    if (arg === "--file") {
      setInputMode(options, "file");
      options.file = readNextValue(argv, ++index, arg);
      continue;
    }

    if (arg === "--stdin") {
      setInputMode(options, "stdin");
      continue;
    }

    if (arg === "--profile") {
      const value = readNextValue(argv, ++index, arg);

      if (!["default", "neis"].includes(value)) {
        throw new Error(`Unknown profile: ${value}`);
      }

      options.profile = value;
      continue;
    }

    if (arg === "--format") {
      const value = readNextValue(argv, ++index, arg);

      if (!["json", "text"].includes(value)) {
        throw new Error(`Unknown format: ${value}`);
      }

      options.format = value;
      continue;
    }

    throw new Error(`Unknown option: ${arg}`);
  }

  if (options.help) {
    return options;
  }

  if (!options.inputMode) {
    if (stdinIsTTY) {
      throw new Error("Provide exactly one input source with --text, --file, or --stdin.");
    }

    options.inputMode = "stdin";
  }

  return options;
}

function setInputMode(options, nextMode) {
  if (options.inputMode) {
    throw new Error("Provide exactly one input source with --text, --file, or --stdin.");
  }

  options.inputMode = nextMode;
}

function readNextValue(argv, index, flagName) {
  const value = argv[index];

  if (value == null) {
    throw new Error(`Missing value after ${flagName}`);
  }

  return value;
}

function readInput(options) {
  if (options.inputMode === "text") {
    return options.text ?? "";
  }

  if (options.inputMode === "file") {
    return fs.readFileSync(options.file, "utf8");
  }

  return fs.readFileSync(0, "utf8");
}

function formatTextReport(report) {
  return [
    `profile: ${report.profile}`,
    `characters: ${report.counts.characters}`,
    `characters_without_whitespace: ${report.counts.characters_without_whitespace}`,
    `code_points: ${report.counts.code_points}`,
    `utf16_code_units: ${report.counts.utf16_code_units}`,
    `lines: ${report.counts.lines}`,
    `bytes: ${report.counts.bytes}`,
    `bytes_utf8: ${report.counts.bytes_utf8}`,
    `bytes_neis: ${report.counts.bytes_neis}`,
    `character_contract: ${report.contract.characters}`,
    `byte_contract: ${report.contract.bytes}`,
    `line_contract: ${report.contract.lines}`,
  ].join("\n");
}

function printHelp() {
  console.log(`Usage: node scripts/korean_character_count.js [--text <text> | --file <path> | --stdin] [--profile default|neis] [--format json|text]

Deterministically count Korean text characters, lines, and bytes.

Options:
  --text <text>      Count the provided text
  --file <path>      Read UTF-8 text from a file
  --stdin            Read UTF-8 text from stdin
  --profile <name>   default (grapheme + UTF-8) or neis
  --format <name>    json (default) or text
  --help, -h         Show this help text
`);
}

function main(argv = process.argv.slice(2)) {
  const options = parseArgs(argv);

  if (options.help) {
    printHelp();
    return;
  }

  const report = createReport(readInput(options), options.profile);
  const output = options.format === "text" ? formatTextReport(report) : JSON.stringify(report, null, 2);

  console.log(output);
}

if (require.main === module) {
  try {
    main();
  } catch (error) {
    console.error(error.message);
    process.exitCode = 1;
  }
}

module.exports = {
  countLines,
  countNeisBytes,
  countUtf8Bytes,
  createReport,
  formatTextReport,
  main,
  parseArgs,
  segmentGraphemes,
};