Yes, it is possible to encode an extra bit of information while maintaining the previous encoding for 3 character values. But since your original encoding doesn't leave nice clean swaths of free numbers in the output set, mapping of the additional set of Strings introduced by adding that extra character cannot help but be a little discontinuous Accordingly, I think it would be hard to come up with mapping functions that handle these discontinuities without being both awkward and slow. I conclude that a table-based mapping is the only sane solution I was too lazy to re-engineer your mapping code, so I incorporated it into the table initialization code of mine; this also eliminates many opportunities for translation errors :) Your encode() method is what I call OldEncoder.encode() I've run a small test program to verify that NewEncoder.encode() comes up with the same values as OldEncoder.encode() and is in addition able to encode Strings with a leading 4th character NewEncoder.encode() doesn't care what the character is, it goes by String length; for decode() the character used can be defined using PREFIX_CHAR I've also eyeball checked that the byte array values for prefixed Strings don't duplicate any of those for non-prefixed Strings; and finally, that encoded prefixed Strings can indeed be converted back to the same prefixed Strings package tequilaguy; public class NewConverter { private static final String b2s = new String0x10000; private static final int s2b = new int0x10000; static { createb2s(); creates2b(); } /** * Create the "byte to string" conversion table.
*/ private static void createb2s() { // Fill 17576 elements of the array with be -> s equivalents. // index is the combined byte value of the old encode fn; // value is the String (3 chars). For (char a='A'; a @s equivalents.
// index is the next free (= not null) array index; // value = the String (@ + 3 chars) int freep = 0; for (char a='A'; a> 8), (byte) (bval & 0xFF) }; } public static String decode(byte b) { int bval = ((b0 & 0xFF) I hope you'll find this answer and code in time to make good use of it. In support of which effort I'll throw in my little test program. It doesn't directly check stuff but prints out the results of conversions in all significant ways and allows you to check them by eye and hand.
I fiddled with my code (small tweaks once I got the basic idea down) until everything looked OK there. You may want to test more mechanically and exhaustively package tequilaguy; public class ConverterHarness { // private static void runOldEncoder() { // for (char a='A'; aFormat("%s : %02X%02X%n", str, enc0, enc1); // } // } // } // } private static void testNewConverter() { for (char a='A'; aEncode("@" + str); System.out. Format("%s : %02X%02X %02X%02X %02X%02X %s %s %n", str, oldEnc0, oldEnc1, newEnc0, newEnc1, newEnc20, newEnc21, NewConverter.
Decode(newEnc), NewConverter. Decode(newEnc2)); } } } } public static void main(String args) { testNewConverter(); } }.
Yes, it is possible to encode an extra bit of information while maintaining the previous encoding for 3 character values. But since your original encoding doesn't leave nice clean swaths of free numbers in the output set, mapping of the additional set of Strings introduced by adding that extra character cannot help but be a little discontinuous. Accordingly, I think it would be hard to come up with mapping functions that handle these discontinuities without being both awkward and slow.
I conclude that a table-based mapping is the only sane solution. I was too lazy to re-engineer your mapping code, so I incorporated it into the table initialization code of mine; this also eliminates many opportunities for translation errors :) Your encode() method is what I call OldEncoder.encode(). I've run a small test program to verify that NewEncoder.encode() comes up with the same values as OldEncoder.encode(), and is in addition able to encode Strings with a leading 4th character.NewEncoder.encode() doesn't care what the character is, it goes by String length; for decode(), the character used can be defined using PREFIX_CHAR .
I've also eyeball checked that the byte array values for prefixed Strings don't duplicate any of those for non-prefixed Strings; and finally, that encoded prefixed Strings can indeed be converted back to the same prefixed Strings. Package tequilaguy; public class NewConverter { private static final String b2s = new String0x10000; private static final int s2b = new int0x10000; static { createb2s(); creates2b(); } /** * Create the "byte to string" conversion table. */ private static void createb2s() { // Fill 17576 elements of the array with be -> s equivalents.
// index is the combined byte value of the old encode fn; // value is the String (3 chars). For (char a='A'; a @s equivalents. // index is the next free (= not null) array index; // value = the String (@ + 3 chars) int freep = 0; for (char a='A'; a> 8), (byte) (bval & 0xFF) }; } public static String decode(byte b) { int bval = ((b0 & 0xFF) The code looks horribly mysterious otherwise.
You can leave those as they are without losing performance, as the compiler folds them up like Kleenexes. Update: As the horror of X-mas approaches, I'll be on the road for a while. I hope you'll find this answer and code in time to make good use of it.
In support of which effort I'll throw in my little test program. It doesn't directly check stuff but prints out the results of conversions in all significant ways and allows you to check them by eye and hand. I fiddled with my code (small tweaks once I got the basic idea down) until everything looked OK there.
You may want to test more mechanically and exhaustively. Package tequilaguy; public class ConverterHarness { // private static void runOldEncoder() { // for (char a='A'; aEncode(str); // System.out. Format("%s : %02X%02X%n", str, enc0, enc1); // } // } // } // } private static void testNewConverter() { for (char a='A'; aEncode(str); byte newEnc2 = NewConverter.
Encode("@" + str); System.out. Format("%s : %02X%02X %02X%02X %02X%02X %s %s %n", str, oldEnc0, oldEnc1, newEnc0, newEnc1, newEnc20, newEnc21, NewConverter. Decode(newEnc), NewConverter.
Decode(newEnc2)); } } } } public static void main(String args) { testNewConverter(); } }.
Thanks :) I created a similar method in the morning and was printing results when Arrays. Equals(oldEnc, newEnc) returned false. The new encoding works perfectly :) ....Actions Items for me: 1) Learn bit manipulations and encoding algorithms (did a bit in college but not after that) 2) Spend more time on Stackoverflow!
– Tequila Guy Dec 23 '09 at 14:35 Yer welcome, bro. Glad I was able to help after all. If you have more interesting puzzles like this, figure out my email address from my profile and give me a shout.
– Carl Smotricz Dec 23 '09 at 16:42.
Not directly an answer, but here's how I would do the encoding: public static byte encode(String s) { int code = s. CharAt(0) - 'A' + (32 * (s. CharAt(1) - 'A' + 32 * (s.
CharAt(2) - 'A'))); byte encoded = { (byte) ((code >>> 8) & 255), (byte) (code & 255) }; return encoded; } The first line uses Horner's Schema to arithmetically assemble 5 bits of each character into an integer. It will fail horribly if any of your input chars fall outside the range A-`. The second line assembles a 2 byte array from the leading and trailing byte of the integer.
Decoding could be done in a similar manner, with the steps reversed. UPDATE with the code (putting my foot where my mouth is, or something like that): public class TequilaGuy { public static final char SPECIAL_CHAR = '@'; public static byte encode(String s) { int special = (s.length() == 4)? 1 : 0; int code = s.
CharAt(2 + special) - 'A' + (32 * (s. CharAt(1 + special) - 'A' + 32 * (s. CharAt(0 + special) - 'A' + 32 * special))); byte encoded = { (byte) ((code >>> 8) & 255), (byte) (code & 255) }; return encoded; } public static String decode(byte b) { int code = 256 * ((b0 = 0x8000)?1 : 0; char chrs = { SPECIAL_CHAR, '\0', '\0', '\0' }; for (int ptr=3; ptr>0; ptr--) { chrsptr = (char) ('A' + (code & 31)); code >>>= 5; } return (special == 1)?String.
ValueOf(chrs) : String. ValueOf(chrs, 1, 3); } public static void testEncode() { for (int spcl=0; spclValueOf(SPECIAL_CHAR)) + c1 + c2 + c3; byte cod = encode(s); String dec = decode(cod); System.out. Format("%4s : %02X%02X : %s\n", s, cod0, cod1, dec); } } } } } public static void main(String args) { testEncode(); } }.
Thanks. Can this be extended to 4 characters too? Maybe I am not understanding your point about not requiring the special character to be encoded.
I need it because when I receive an encoded 2 byte array, and decode it, my processing logic will change if the decoded string has the special character. – Tequila Guy Dec 9 '09 at 16:04 Sure, can do. I didn't realize that the information needed to be transported around, but it's easy as that whole 4th character can be represented by a single bit and you have a 1.9 bits to spare :) Answer coming up... – Carl Smotricz Dec 9 '09 at 16:10 Thanks for your help Carl :) – Tequila Guy Dec 10 '09 at 9:14 1 Thing is, the compiler is able to recognize the equivalency of * 32 and This may even vary from computer to computer.
So it's highly likely to be changing one into the other behind your back. However, there IS still some potential for optimization. I can look again and give you some hints.
HOWEVER, if you're hard up for speed your best move might be to create a pair of hashmaps for all values, which is less than 60000. Then you can convert simply by looking up. – Carl Smotricz Dec 11 '09 at 10:56 1 This is more than just a hunch: Both functions create a new object (String/byte) every time, and object creation is time-expensive.
Hash lookup, on the other hand, is well optimized. Try that! – Carl Smotricz Dec 11 '09 at 10:59.
In your alphabet, you use only 15 of the 16 available bits of the output. So you could just set the MSB (most significant bit) if the string is of length 4 since the special char is fixed. The other option is to use a translation table.
Just create a String with all valid characters: String valid = "@ABCDEFGHIJKLMNOPQRSTUVWXYZ"; The index of a character in this string is the encoding in the output. Now create two arrays: byte encode = new byte256; char decode = new charvalid. Length (); for (int i=0; iCharAt(i); encodec = i; decodei = c; } Now you can lookup the values for each direction in the arrays and add any character you like in any order.
You didn't understand my approach: After finding the encoded value, you must compress the bits as before. My approach just allows you to squeeze more different characters into the output. – Aaron Digulla Dec 9 '09 at 15:54 oh ok :) Thanks.
Actually that compression of bits is the part I am struggling with. Am I doing it correctly? Are there better ways of doing it.
Thanks for the simpler approach. – Tequila Guy Dec 9 '09 at 16:08 You could use a loop: output = output – Aaron Digulla Dec 10 '09 at 8:13 How do I set the free bit to signify presence/absence of special character in the code I put in the question? I am not able to figure which bit is free :( – Tequila Guy Dec 22 '09 at 13:27 @Tequila Guy: Since you fill the value from the low bits, the highest bit is free.
If you set it and print the value, it will suddenly be negative but that's OK.As long as you properly mask the values with &0x1f when reading, it will work. – Aaron Digulla Dec 22 '09 at 16:22.
You would find this a lot easier if you just used the java.nio.charset. CharsetEncoder class to convert your characters to bytes. It would even work for characters other than ASCII.
Even String. GetBytes would be a lot less code to the same basic effect.
I'm doubtful about this. CharsetEncoder is generally used to pack a single character into one or more bytes, hence the "averageBytesPerChar" field. Can it really be subverted to pack one-and-a-half characters into a single byte?
– Carl Smotricz Dec 9 '09 at 13:51 You have to implement your own encoding class, but, yes, you can. – bmargulies Dec 9 '09 at 15:21.
If the "special char" is fixed and you're always aware that a 4 character String begins with this special char, then the char itself provides no useful information. If the String is 3 characters in length, then do what you did before; if it's 4 characters, run the old algorithm on the String's substring starting with the 2nd character. Am I thinking too simply or are you thinking too hard?
Thanks Carl. The special character is used to specify different processing path for the decoded string. If(decodedString.
StartsWith('@')) { // do this } else { // do that } Hence the additional character. – Tequila Guy Dec 9 '09 at 14:16 I think I've given you the solution. There's no need to decode your special character.
– Carl Smotricz Dec 9 '09 at 14:22 but, he may need to keep the second character, in which case, use one of your leftover bits in your encoded bytes to represent the existence of the extra char. – james Dec 9 '09 at 14:46 Ah, I see Aaron Digulla already mentioned this solution below... – james Dec 9 '09 at 14:47 OK, I finally got the point and provided a coded solution (in my other answer). Consider this answer "for documentary purposes only" :) – Carl Smotricz Dec 9 '09 at 16:50.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.