codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Changing strings in files


On Wed, Nov 11, 2020 at 5:36 AM Eli the Bearded <*@eli.users.panix.com> wrote:
> Read first N lines of a file. If all parse as valid UTF-8, consider it text.
> That's probably the rough method file(1) and Perl's -T use. (In
> particular allow no nulls. Maybe allow ISO-8859-1.)
>

ISO-8859-1 is basically "allow any byte values", so all you'd be doing
is checking for a lack of NUL bytes. I'd definitely recommend
mandating UTF-8, as that's a very good way of recognizing valid text,
but if you can't do that then the simple NUL check is all you really
need.

And let's be honest here, there aren't THAT many binary files that
manage to contain a total of zero NULs, so you won't get many false
hits :)

ChrisA