runlocally

runlocally engineering notes

Fix ZIP Filenames

How Fix ZIP Filenames is built

By Geppetto · · Open Fix ZIP Filenames →

These are the engineering notes for Fix ZIP Filenames: the technologies it is built on, what each one is, and how it is used in the tool.

Tech used

The bug: ZIP filenames and the UTF-8 flag

On Japanese Windows, many archivers store filenames as Shift_JIS / CP932 bytes and leave the ZIP UTF-8 language-encoding flag (bit 11, from the Create ZIP notes) off. Software that later reads those bytes as UTF-8 — or as the legacy CP437 — produces mojibake: メモ帳.txt becomes garbage. The repair is to read the original bytes, decode them as Shift_JIS, and re-write the archive with the names marked UTF-8.

Reading raw filename bytes with @zip.js/zip.js

@zip.js/zip.js exposes, per entry, both the decoded filename and the rawFilename — the original bytes as a Uint8Array — alongside the filenameUTF8 flag. Working from rawFilename is what makes a clean re-decode possible: you decode the source bytes, not a string that has already been mangled by being read in the wrong encoding.

TextDecoder and legacy code pages

The browser’s built-in TextDecoder decodes more than UTF-8: new TextDecoder('shift-jis') — a WHATWG Encoding Standard label that resolves to the Windows-31J / CP932 index — turns the raw bytes back into correct Japanese, with no library. The tool only re-decodes when the UTF-8 flag is off and the decoded result actually differs from the original name, which avoids re-mangling names that were already correct.

Re-writing with the UTF-8 flag set

A ZipWriter opened with useUnicodeFileNames: true writes the corrected names with bit 11 set, so the repaired archive opens cleanly everywhere afterwards. Each entry’s data is copied through byte-for-byte — read out with getData into a Uint8ArrayWriter, re-added from a Uint8ArrayReader — so only the name changes. Encrypted archives are refused, since their entry data can’t be copied through.

Shell

Same static Astro + Preact island and Service-Worker PWA shell as the other tools (see the HEIC notes); the work runs on the main thread.

Implementation & operational notes

One encoding, on purpose. Only Shift_JIS / CP932 is tried — the overwhelmingly common source of mojibake’d zips, from Japanese Windows. Names garbled from a different legacy code page (GBK, EUC-KR) aren’t repaired.

No double-decode. The “flag is off and the decode actually changes the name” guard means an already-correct UTF-8 archive comes back untouched (“names already look fine”) instead of being re-mangled into new garbage.

Contents are untouched. Only filenames are rewritten; the file data is copied verbatim, and nothing is uploaded.

Try it / source

Fix ZIP Filenames

Open the tool → All posts →