I wrote an essay which compared the read performance in Python3.5 between bytes and the Unicode text options 'newline' and 'encoding'. I concluded that I couldn't get the Unicode string performance to within a factor of 2 of the binary byte performance, so chemfp will be working with bytes, not strings.
I also checked how the RDKit handles invalid Unicode, to see what another toolkit did for the same problem. I concluded that it uses bytes internally and exposes strings, which causes problems if those bytes cannot be converted to strings.
This is the place to leave comments about that post.