Why does this python regular expression return the wrong string?

Not your problem, but these 3 lines are amazing: final = hex(CalcCRC( str_to_crc )):2 value = '%08X' % CalcCRC( str_to_crc ) final = final + value.upper() Assuming CalcCRC returns a non-negative integer (e.g. 12345567890 Line 1 sets final to "0x" irrespective of the input! Hex(1234567890) '0x499602d2' >>> hex(1234567890):2 '0x Line 2 repeats the call to CalcCRC! Value = '%08X' % 1234567890 >>> value '499602D2 Note that value is already uppercase!

And after line 3 final becomes '0x499602D2 As value is not used again, the whole thing can be replaced by final = '0x%08X' % CalcCRC(str_to_crc) More from Circumlocution City These lines: quote_to_crc = re. Search(r'"\w+"', crc); str_to_crc = re. Search(r'\w+', quote_to_crc.group() ).group() can be replaced by one of: str_to_crc = re.

Search(r'"\w+"', crc).group()1:-1 str_to_crc = re. Search(r'"(\w+)"', crc). Group(1).

Not your problem, but these 3 lines are amazing: final = hex(CalcCRC( str_to_crc )):2 value = '%08X' % CalcCRC( str_to_crc ) final = final + value.upper() Assuming CalcCRC returns a non-negative integer (e.g. 12345567890 Line 1 sets final to "0x" irrespective of the input! >>> hex(1234567890) '0x499602d2' >>> hex(1234567890):2 '0x' Line 2 repeats the call to CalcCRC! >>> value = '%08X' % 1234567890 >>> value '499602D2' Note that value is already uppercase!

And after line 3, final becomes '0x499602D2' As value is not used again, the whole thing can be replaced by final = '0x%08X' % CalcCRC(str_to_crc) More from Circumlocution City These lines: quote_to_crc = re. Search(r'"\w+"', crc); str_to_crc = re. Search(r'\w+', quote_to_crc.group() ).group(); can be replaced by one of: str_to_crc = re.

Search(r'"\w+"', crc).group()1:-1 str_to_crc = re. Search(r'"(\w+)"', crc). Group(1).

Acute eyes. I used this simplification. +1 – eyquem Apr 22 at 12:00.

A quick peek at the real answer: You need (inter alia) to use re.escape() .... term = re. Compile(re. Escape(crc_listi)) and the indentation on your last if looks stuffed.... more after dinner :-) Post-prandial update You make 3 passes over the whole file, when only one will do the trick.

Apart from cutting out an enormous lot of clutter, the main innovation is to use the re. Sub functionality that allows the replacement to be a function instead of a string. Import re import zlib def CalcCRC(s): # This is an example.It doesn't produce the same CRC as your examples do.

Return zlib. Crc32(s) & 0xffffffff def repl_func(mobj): str_to_crc = mobj. Group(2) print "str_to_crc:", repr(str_to_crc) crc = CalcCRC(str_to_crc) # If my guess about Insert(s1, s2, n) was wrong, # adjust the ollowing statement.

Return '%s"%s", 0x%08X%s' % (mobj. Group(1), mobj. Group(2), crc, mobj.

Group(3)) def ReplaceCRC(file_handle): regex = re. Compile(r'(_CalcCRC(\s*)"(\w+)"(\s*))') for line in file_handle: print "line:", repr(line) line2 = regex. Sub(repl_func, line) print "line2:", repr(line2) return if __name__ == "__main__": import sys, cStringIO args = sys.

Argv1: if args: f = open(args0, 'r') else: f = cStringIO. StringIO(r""" printf( "0x%08X\n", _CalcCRC("THIS_IS_A_CRC") ) other_stuff() printf( "0x%08X\n", _CalcCRC("PATIENT_ZERO") ) """) ReplaceCRC(f) Result of running script with no args: line: '\n' line2: '\n' line: 'printf( "0x%08X\\n", _CalcCRC("THIS_IS_A_CRC") )\n' str_to_crc: 'THIS_IS_A_CRC' line2: 'printf( "0x%08X\\n", _CalcCRC("THIS_IS_A_CRC", 0x98ABAC4B) )\n' line: 'other_stuff()\n' line2: 'other_stuff()\n' line: 'printf( "0x%08X\\n", _CalcCRC("PATIENT_ZERO") )\n' str_to_crc: 'PATIENT_ZERO' line2: 'printf( "0x%08X\\n", _CalcCRC("PATIENT_ZERO", 0x76BCDA4E) )\n.

Import re def ripl(mat): return '%s, 0x%08X' % (mat. Group(1),CalcCRC(mat. Group(2))) regx = re.

Compile(r'(_CalcCRC(\s*"(\w+)"\s*))') def ReplaceCRC( file_path, regx = regx, ripl = ripl ): with open(file_path,'r+') as f: file_str = f.read() print file_str,'\n' if file_str: file_str = regx. Sub(ripl,file_str) print file_str f. Seek(0,0) f.

Write(file_str) f.truncate() EDIT I had forgot the instruction f.truncate() , very important, otherwise it remains a tail if the rewritten content is shorter than the initial content . EDIT 2 John Machin, There is no mistake, my above solution is right, it gives printf( "0x%08X\n", _CalcCRC("THIS_IS_A_CRC"), 0x97DFEAC9 ); printf( "0x%08X\n", _CalcCRC("PATIENT_ZERO"), 0x0D691C21 ); I hadn't changed it since your comment. I think that I first posted a solution that was incorrect (because I performed some various tests to verify some behaviors and, you know, I sometimes do mix-up with my files and codes), then you copied this incorrect code to try it, then I realized that there was a mistake and corrected the code, and then you posted your comment without noticing I had corrected.

I imagine no other cause of such a confusion. By the way, to obtain the same result, there's even no need of two groups in the pattern defining regx, one alone is sufficient. These following regx and ripl() work as well: regx = re.

Compile(r'_CalcCRC\(\s*"(\w+)"\s*\)') # I prefer '\(' to '(', and same for '\)' instead of ')' def ripl(mat): return '%s, 0x%08X' % (mat.group(),CalcCRC(mat. Group(1))) But an uncertainty remains. Each of our result is wise, relativelay to the inaccurate wording of Joe.

So, what does he want as precise result? : must the value 0x97DFEAC9 be inserted in CalcCRC("THIS_IS_A_CRC") as in your result, or after CalcCRC("THIS_IS_A_CRC") as in mine? To say all, I did like you to obtain a code that could be run: I defined a function CalcCRC() of my own consisting simply in if x=="THIS_IS_A_CRC": return 0x97DFEAC9 and if x=="PATIENT_ZERO": return 0x0D691C21; I picked these associations out by seeing the results desired by Joe exposed in his question.

Now , concerning your nasty affirmation that my "point about redefinition of functions is utter nonsense", I think I didn't explain enough what I mean. Putting the regex regx and the function ripl() as default arguments to the parameters of the function ReplaceCRC() has a consequence : the objects regx and ripl() are created only one time, at the moment the definition of function ReplaceCRC() is executed.So, in case that ReplaceCRC() will be applied several times in an execution, there will be no re-creation of these objects. I don't know if the function ReplaceCRC() is really called several times during the execution of Joe's program, but I think it's a good practice to put this feature in a code in case it may be useful.

Maybe, I should have underlined this point in my answer instead of a comment to justify my code relatively to yours. But I try to limit my tendency to write sometimes answers long too much. Are the points clarified and your annoyance soothed by these explanations?

– John Machin Apr 22 at 12:08 @John Machin Well, after having found my solution , I saw yours and I thought "oh he has already done it". However I posted my solution, because it is better.In your solution the pattern defines 3 groups instead of 2 in mine, and the object returned by your repl_func() function needs a 4 elements tuple for formating instead of 2 in my ripl() function. Moreover I pass the replacement function and the regex as default argument to the ReplaceCRC() function: that avoids the redefinition of them each time the ReplaceCRC() function is called .

– eyquem Apr 22 at 13:02 @John Machin Additionally , there is no need of keyword return at the end of ReplaceCRC() and you spoil the readibility with use of StringIO() option. And still, my solution doesn't have several prints, if the reader want to track the execution, he will add them at his convenience. – eyquem Apr 22 at 13:03 @John Machin Concerning the no-seat-belt replacement, I wrote this in-situ replacement because I noticed that the questioner opens the file in 'r+' mode and I suppose that he is aware of what he does.

I have no intention to cover extra problems outside the initial questions if that gives too much work , that may also be useless.In the past, I was warning a questioner about all potential problems, that was a too heavy task. I consider that a coder must experience himself the issues to acquire a real knowledge of Python or any other programming language. – eyquem Apr 22 at 13:11 @eyquem: I define 3 groups to avoid your mistake of omitting some of the OP's desired output.

Mine: 'printf( "0x%08X\\n", _CalcCRC("THIS_IS_A_CRC", 0x98ABAC4B) )\n'; yours: 'printf( "0x%08X\\n", _CalcCRC(, 0x98ABAC4B )\n'. Your point about redefinition of functions is utter nonsense. The functions are defined (in global scope) only once.

ReplaceCRC is called only once in the script. The difference is a global reference instead of a local reference each time regex. Sub is called.

The StringIO and the prints are there to demonstrate that my solution actually works; yours? – John Machin Apr 22 at 19:04.

To say all, I did like you to obtain a code that could be run: I defined a function CalcCRC() of my own consisting simply in if x=="THIS_IS_A_CRC": return 0x97DFEAC9 and if x=="PATIENT_ZERO": return 0x0D691C21; I picked these associations out by seeing the results desired by Joe exposed in his question. Now , concerning your nasty affirmation that my "point about redefinition of functions is utter nonsense", I think I didn't explain enough what I mean. Putting the regex regx and the function ripl() as default arguments to the parameters of the function ReplaceCRC() has a consequence : the objects regx and ripl() are created only one time, at the moment the definition of function ReplaceCRC() is executed.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions