If you are using file names with non-ASCII characters in them, then Git will sometimes behave funny. There are some commands to know about. Continue reading
Tag Archives: character encoding
Adding Unicode Characters to LaTeX documents
While typesetting a travel diary, I wanted to include place names in their local language using their local script. The book is written entirely in English, and have only a standard text editor, without any exotic script support. I can look up the local name of the city online, but cutting and pasting can be a challenge, here is how to do it. Continue reading
Storing Date Values in Files
XML and JSON files store and transport data. What is the best way to store a Date-Time value? Always, always, always use the integer epoch format: the number of milliseconds since Jan 1, 1970 UTC. This post tells you why. Continue reading
Proper Stream Patterns
Java has streams for bytes, and streams for chars. Learn to use them correctly. It can be daunting at first, but if you just learn a few basic patterns, it all works well. Continue reading
3 reasons that XML should be Streamed and never “Stringed”
XML is a text format, and so it is tempting to handle it with the normal String handling capabilities of Java, but there are several reasons that you must never do this. XML should either be on the disk as a sequence of bytes, or it should be parsed as a DOM tree of decoded string values, but it should never be in a String in its encoded value. Continue reading
#23 UTF-8 Encoding
Wondering what encoding to use for your web pages? Wonder no longer. Always use UTF-8 encoding. It is the single best encoding, supports the most characters, the most languages, and is available on every browser. That is all you need to know: always use UTF-8. Continue reading