Bytes are bytes. There is no way to declare that something isn't file data. It'd be fairly easy to construct a valid file in many formats consisting only of printable ascii.
Especially when dealing with unicode, you're in very murky territory. If possible, I'd suggest modifying the method so that it takes two parameters...use one for passing text and the other for binary data One thing you might do is look at the length of the string. Most image formats are at least 500-600 bytes even for a tiny image, and while this is by no means an adccurate test, if you get passed, say, a 20k string, it's probably an image.
If it were text, it would be quite a bit (Like a quarter of a typical novel, or thereabouts).
Bytes are bytes. There is no way to declare that something isn't file data. It'd be fairly easy to construct a valid file in many formats consisting only of printable ascii.
Especially when dealing with unicode, you're in very murky territory. If possible, I'd suggest modifying the method so that it takes two parameters...use one for passing text and the other for binary data. One thing you might do is look at the length of the string.
Most image formats are at least 500-600 bytes even for a tiny image, and while this is by no means an adccurate test, if you get passed, say, a 20k string, it's probably an image. If it were text, it would be quite a bit (Like a quarter of a typical novel, or thereabouts).
I agree with you and I selected the first choice and I think maybe it's more simple. Thank you very much. – ywenbo Dec 11 '10 at 7:24.
Files like images or sound files have defined blocks that can be "sniffed". Wotsit. Org has a lot of info about the key bytes and ways to determine what the files are.By looking at those byte offsets in your data you could figure it out.
Another way way is to use some "magic", which is code to sniff key-bytes or byte-types in a file to try to figure out what its type is. *nix systems have it built in via the file command.Do a man file or man magic for more info or check Wikipedia's article on Magic numbers in files. Ruby Filemagic uses the same technique but is based on GNU's libmagic.
It seems that the links are not suitable for my intention. My context is that there is a ruby method, it accepts one parameter string, maybe it's a literal string, maybe it's an image blob string, so I need to according to different content of string to do different operations. As a matter of fact if there is Blob type in ruby I think maybe that will resolve my problem, but I can not find it.
– ywenbo Dec 11 '10 at 4:27.
If you know you're going to get ASCII text or a blob then you can just spin through the first n bytes and see if anything has the eight bit set, that would tell you that you have binary. OTOH, not finding anything wouldn't guarantee that you had text. If you're going to get UTF-8 Unicode then you'd do the same thing but look for invalid UTF-8 sequences.
Of course, the same caveats apply. You could scan the first n bytes for anything between 0x00 and 0x20. If you find any bytes that low then you probably have a binary blob of some sort.
But maybe bytes are bytes. You're starting with a bunch of bytes and trying to find an interpretation of them that makes sense. Your best bet is to make the caller supply the expected interpretation or take Greg's advice and use a magic number library.
Thank you very much, finally I added one more parameter to distinguish. Anyway thank you very much. – ywenbo Dec 11 '10 at 7:25.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.