I have a project that produces "cite/käyttötapakuvaus.html" as an artifact. With compress-artifacts-plugin 1.10 installed, the resulting archive.zip has the following file header in its central directory:
- central file header signature: 50 4B 01 02
- version made by: 3F 00, i.e. spec v6.3
- version needed to extract: 14 00, i.e. spec v2.0
- general purpose bit flag: 08 00, i.e. the file name is not claimed to be UTF-8
- compression method: 08 00
- last mod file time: 0D A7
- last mod file date: 88 4B
- crc-32: D2 3A 07 C1
- compressed size: 05 07 00 00
- uncompressed size: DF 17 00 00
- file name length: 1A 00
- extra field length: 00 00, i.e. no alternative file name is stored in the extra field
- file comment length: 00 00
- disk number start: 00 00
- internal file attributes: 00 00
- external file attributes: 00 00 00 00
- relative offset of local header: 2E 90 03 00
- file name: 63 69 74 65 2F 6B E4 79 74 74 F6 74 61 70 61 6B 75 76 61 75 73 2E 68 74 6D 6C, i.e. "ä" was encoded as 0xE4, and "ö" was encoded as 0xF6. This matches Latin-1 and Windows-1252, but not CP437 nor UTF-8.
However, when I view the artifacts listing in Jenkins, it includes a link <a href="k%EF%BF%BDytt%EF%BF%BDtapakuvaus.html">k�ytt�tapakuvaus.html</a>, i.e. the non-ASCII characters have been replaced with U+FFFD REPLACEMENT CHARACTER. This link actually works, but it looks very ugly. Other HTML artifacts contain links like <a href="k%C3%A4ytt%C3%B6tapakuvaus.html">käyttötapakuvaus</a>, and those links do not work.
If I understand correctly, the file names in archive.zip should not be Latin-1 at all. APPNOTE.TXT - .ZIP File Format Specification v6.3.4 says they should be CP437 by default, or UTF-8 if bit 11 of the general purpose bit flag is set. However, TrueZipArchiver.java does zip = new ZipOutputStream(out, Charset.defaultCharset()), and I suppose the default charset is Windows-1252 here.
I'm not sure which charset ZipFile expects when ZipStorage.java constructs it as new ZipFile(archive); the javadocs used to be at java.net, which has been shut down. RawZipFile.DEFAULT_CHARSET suggests it may be expecting UTF-8.
Because the archive.zip files are intended to be read back by the compress-artifacts-plugin itself rather than published as is, I think it would be best to hardcode UTF-8 in TrueZipArchiver.java.