codehaus


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Changing strings in files


10.11.20 22:40, Dennis Lee Bieber ????:
> 	Testing for extension in a list of exclusions would be much faster than
> scanning the contents of a file, and the few that do get through would have
> to be scanned anyway.

Then the simplest method should work: read the first 512 bytes and check
if they contain b'\0'. Chance that a random sequences of bytes does not
contain NUL is (1-1/256)**512 = 0.13. So this will filter out 87% of
binary files. Likely6 more, because binary files usually have some
structure, and reserve fixed size for integers. Most integers are much
less than the maximal value, so higher bits and bytes are zeroes. You
can also decrease the probability of false results by increasing the
size of tested data or by testing few other byte values (b'\1', b'\2',
etc). Anything more sophisticate is just a waste of your time.