📄 tut35.html
字号:
<p><font face="Tahoma" size="-1">Pass the words, the memory block handle, the
size of the data read from wordfile.txt, the address of the color dword that
will be used to hilight the words and the address of <font color="#0000CC"><b>ASMSyntaxArray</b></font>.</font></p>
<p><font face="Tahoma" size="-1">Now, let's examine what <font color="#0000CC"><b>ParseBuffer</b></font>
does. In essence, this function accepts the buffer containing the words to be
hilighted ,parses them to individual words and stores each of them in a <font color="#006666"><b>WORDINFO</b></font>
structure array that can be accessed quickly from <font color="#006666"><b>ASMSyntaxArray</b></font>.</font><font face="Tahoma"><b>
<br>
</b></font></p>
<pre><font face="Tahoma"><b>ParseBuffer proc uses edi esi hHeap:DWORD,pBuffer:DWORD, nSize:DWORD, ArrayOffset:DWORD,pArray:DWORD
LOCAL buffer[128]:BYTE
LOCAL InProgress:DWORD
mov InProgress,FALSE</b></font></pre>
<p><font face="Tahoma"><font size="-1">InProgress is the flag I use to indicate
whether the scanning process has begun. If the value is FALSE, we haven't encountered
a non-white space character yet.</font><b><br>
</b></font></p>
<pre><font face="Tahoma"><b> lea esi,buffer
mov edi,pBuffer
invoke CharLower,edi</b></font></pre>
<p><font face="Tahoma"><font size="-1">esi points to our local buffer that will
contain the word we have parsed from the word list. edi points to the word list
string. To simplify the search later, we convert all characters to lowercase.</font><b><br>
</b></font></p>
<pre><font face="Tahoma"><b> mov ecx,nSize
SearchLoop:
or ecx,ecx
jz Finished
cmp byte ptr [edi]," "
je EndOfWord
cmp byte ptr [edi],9 ; tab
je EndOfWord</b></font></pre>
<p><font face="Tahoma"><font size="-1">Scan the whole word list in the buffer,
looking for the white spaces. If a white space is found, we have to determine
whether it marks the end or the beginning of a word.</font><b> </b></font></p>
<pre><font face="Tahoma"><b> mov InProgress,TRUE
mov al,byte ptr [edi]
mov byte ptr [esi],al
inc esi</b></font>
<font face="Tahoma"><b>SkipIt:
inc edi
dec ecx
jmp SearchLoop</b></font></pre>
<p><font face="Tahoma"><font face="Tahoma"><font size="-1">If the byte under scrutiny
is not a white space, we copy it to the buffer to construct a word and then
continue the scan.</font></font><b> </b></font></p>
<pre><font face="Tahoma"><b>EndOfWord:
cmp InProgress,TRUE
je WordFound
jmp SkipIt</b></font></pre>
<p><font face="Tahoma"><font size="-1">If a white space is found, we check the
value in <font color="#006666"><b>InProgress</b></font>. If the value is TRUE,
we can assume that the white space marks the end of a word and we may proceed
to put the word currently in the local buffer (pointed to by esi) into a <font color="#006666"><b>WORDINFO</b></font>
structure. If the value is FALSE, we continue the scan until a non-white space
character is found. </font><b> </b></font></p>
<pre><font face="Tahoma"><b>WordFound:
mov byte ptr [esi],0
push ecx
invoke HeapAlloc,hHeap,HEAP_ZERO_MEMORY,sizeof WORDINFO</b></font></pre>
<p><font face="Tahoma"><font size="-1">When the end of a word is found, we append
0 to the buffer to make the word an ASCIIZ string. We then allocate a block
of memory from the heap the size of <font color="#006666"><b>WORDINFO</b></font>
for this word.</font><b> </b></font></p>
<pre><font face="Tahoma"><b> push esi
mov esi,eax
assume esi:ptr WORDINFO
invoke lstrlen,addr buffer
mov [esi].WordLen,eax</b></font></pre>
<p><font face="Tahoma"><font size="-1">We obtain the length of the word in the
local buffer and store it in the <font color="#006666"><b>WordLen</b></font>
member of the <font color="#006666"><b>WORDINFO</b></font> structure, to be
used as a quick comparison.</font><b> </b></font></p>
<pre><font face="Tahoma"><b> push ArrayOffset
pop [esi].pColor</b></font></pre>
<p><font face="Tahoma"><font size="-1">Store the address of the dword that contains
the color to be used to hilight the word in <font color="#006666"><b>pColor</b></font>
member.</font><b> </b></font></p>
<pre><font face="Tahoma"><b> inc eax
invoke HeapAlloc,hHeap,HEAP_ZERO_MEMORY,eax
mov [esi].pszWord,eax
mov edx,eax
invoke lstrcpy,edx,addr buffer</b></font></pre>
<p><font face="Tahoma"><font size="-1">Allocate memory from the heap to store
the word itself. Right now, the <font color="#006666"><b>WORDINFO</b></font>
structure is ready to be inserted into the appropriate linked list.</font><b>
</b></font></p>
<pre><font face="Tahoma"><b> mov eax,pArray
movzx edx,byte ptr [buffer]
shl edx,2 ; multiply by 4
add eax,edx</b></font></pre>
<p><font face="Tahoma"><font size="-1">pArray contains the address of <font color="#006666"><b>ASMSyntaxArray</b></font>.
We want to move to the dword that has the same index as the value of the first
character of the word. So we put the first character of the word in edx then
multiply edx by 4 (because each element in <font color="#006666"><b>ASMSyntaxArray</b></font>
is 4 bytes in size) and then add the offset to the address of <font color="#006666"><b>ASMSyntaxArray</b></font>.
We have the address of the corresponding dword in eax.</font><b> </b></font></p>
<pre><font face="Tahoma"><b> .if dword ptr [eax]==0
mov dword ptr [eax],esi
.else
push dword ptr [eax]
pop [esi].NextLink
mov dword ptr [eax],esi
.endif</b></font></pre>
<p><font face="Tahoma"><font size="-1">Check the value of the dword. If it's 0,
it means there is currently no word that begins with this character in the list.
We thus put the address of the current <font color="#006666"><b>WORDINFO</b></font>
structure in that dword.</font></font></p>
<p><font face="Tahoma"><font size="-1">If the value in the dword is not 0, it
means there is at least one word that begins with this character in the array.
We thus insert this <font color="#006666"><b>WORDINFO</b></font> structure to
the head of the linked list and update its NextLink member to point to the next
<font color="#006666"> <b>WORDINFO</b></font> structure.</font></font></p>
<pre><font face="Tahoma"><b> pop esi
pop ecx
lea esi,buffer
mov InProgress,FALSE
jmp SkipIt</b></font></pre>
<p><font face="Tahoma"><font size="-1">After the operation is complete, we begin
the next scan cycle until the end of buffer is reached. </font></font></p>
<pre><font face="Tahoma"><b> invoke SendMessage,hwndRichEdit,EM_SETTYPOGRAPHYOPTIONS,TO_SIMPLELINEBREAK,TO_SIMPLELINEBREAK
invoke SendMessage,hwndRichEdit,EM_GETTYPOGRAPHYOPTIONS,1,1
.if eax==0 ; means this message is not processed
mov RichEditVersion,2
.else
mov RichEditVersion,3
invoke SendMessage,hwndRichEdit,EM_SETEDITSTYLE,SES_EMULATESYSEDIT,SES_EMULATESYSEDIT
.endif</b></font></pre>
<p><font face="Tahoma" size="-1">After the richedit control is created, we need
to determine the its version. This step is necessary since <font color="#006666"><b>EM_POSFROMCHAR</b></font>
behaves differently for RichEdit 2.0 and 3.0 and <font color="#006666"><b>EM_POSFROMCHAR</b></font>
is crucial to our syntax hilighting routine. I have never seen a documented
way of checking the version of richedit control thus I have to use a workaround.
In this case, I set an option that is specific to version 3.0 and immediately
retrieve its value. If I can retrieve the value, I assume that the control version
is 3.0.</font></p>
<p><font face="Tahoma" size="-1">If you use RichEdit control version 3.0, you
will notice that updating the font color for a large file takes quite a long
time. This problem seems to be specific to version 3.0. I found a workaround:
making the control emulate the behavior of the system edit control by sending
<font color="#006666"> <b>EM_SETEDITSTYLE</b></font> message.</font></p>
<p><font face="Tahoma" size="-1">After we can obtain the version information,
we proceed to subclass the richedit control. We will now examine the new window
procedure for the richedit control.</font></p>
<pre><font face="Tahoma"><b>NewRichEditProc proc hWnd:DWORD, uMsg:DWORD, wParam:DWORD, lParam:DWORD
........
.......
.if uMsg==WM_PAINT
push edi
push esi
invoke HideCaret,hWnd
invoke CallWindowProc,OldWndProc,hWnd,uMsg,wParam,lParam
push eax</b></font></pre>
<p><font face="Tahoma"><font size="-1">We handle <font color="#006666"><b>WM_PAINT</b></font>
message. First, we hide the caret so as to avoid some ugly gfx after the hilighting.
After that we pass the message to the original richedit procedure to let it
update the window. When <font color="#006666"><b>CallWindowProc</b></font> returns,
the text is updated with its usual color/background. Now is our opportunity
to do syntax hilighting.</font><b> </b></font></p>
<pre><font face="Tahoma"><b> mov edi,offset ASMSyntaxArray
invoke GetDC,hWnd
mov hdc,eax
invoke SetBkMode,hdc,TRANSPARENT</b></font></pre>
<p><font face="Tahoma"><font size="-1">Store the address of ASMSyntaxArray in
edi. Then we obtain the handle to the device context and set the text background
mode to transparent so the text that we will write will use the default background
color. </font><b> <br>
</b></font></p>
<pre><font face="Tahoma"><b> invoke SendMessage,hWnd,EM_GETRECT,0,addr rect
invoke SendMessage,hWnd,EM_CHARFROMPOS,0,addr rect
invoke SendMessage,hWnd,EM_LINEFROMCHAR,eax,0
invoke SendMessage,hWnd,EM_LINEINDEX,eax,0</b></font></pre>
<p><font face="Tahoma"><font size="-1">We want to obtain the visible text so we
first have to obtain the formatting rectangle by sending <font color="#006666"><b>EM_GETRECT</b></font>
message to the richedit control. Now that we have the bounding rectangle, we
obtain the nearest character index to the upper left corner of the rectangle
with <font color="#006666"><b>EM_CHARFROMPOS</b></font>. Once we have the character
index (the first visible character in the control), we can start to do syntax
hilighting starting from that position. But the effect might not be as good
as when we start from the first character of the line that the character is
in. That's why I need to obtain the line number of that the first visible character
is in by sending <font color="#006666"><b>EM_LINEFROMCHAR</b></font> message.
To obtain the first character of that line, I send <font color="#006666"><b>EM_LINEINDEX</b></font>
message.</font><b> </b></font></p>
<pre><font face="Tahoma"><b> mov txtrange.chrg.cpMin,eax
mov FirstChar,eax
invoke SendMessage,hWnd,EM_CHARFROMPOS,0,addr rect.right
mov txtrange.chrg.cpMax,eax</b></font></pre>
<p><font face="Tahoma"><font size="-1">Once we have the first character index,
store it for future reference in FirstChar variable. Next we obtain the last
visible character index by sending <font color="#006666"><b>EM_CHARFROMPOS</b></font>,
passing the lower-right corner of the formatting rectangle in <font color="#0000CC"><b>lParam</b></font>.</font><b>
</b></font></p>
<pre><font face="Tahoma"><b> push rect.left
pop RealRect.left
push rect.top
pop RealRect.top
push rect.right
pop RealRect.right
push rect.bottom
pop RealRect.bottom
invoke CreateRectRgn,RealRect.left,RealRect.top,RealRect.right,RealRect.bottom
mov hRgn,eax
invoke SelectObject,hdc,hRgn
mov hOldRgn,eax</b></font></pre>
<p><font face="Tahoma"><font size="-1">While doing syntax hilighting, I noticed
an unsightly side-effect of this method: if the richedit control has a margin
(you can specify margin by sending <font color="#006666"><b>EM_SETMARGINS</b></font>
message to the richedit control), <font color="#006666"><b>DrawText</b></font>
writes over the margin. Thus I need to create a clipping region, the size of
the formatting rectangle, by calling <font color="#006666"><b>CreateRectRgn</b></font>.
The output of GDI functions will be clipped to the "writable" area.</font></font></p>
<p><font face="Tahoma" size="-1">Next, we need to hilight the comments first and
get them out of our way. My method is to search for ";" and hilight
the text with the comment color until the carriage return is found. I will not
analyze the routine here: it's fairly long and complicated. Suffice here to
say that, when all the comments are hilighted, we replace them with 0s in the
buffer so that the words in the comments will not be processed/hilighted later.</font></p>
<pre><font face="Tahoma"><b> mov ecx,BufferSize
lea esi,buffer
.while ecx>0
mov al,byte ptr [esi]
.if al==" " || al==0Dh || al=="/" || al=="," || al=="|" || al=="+" || al=="-" || al=="*" || al=="&" || al=="<" || al==">" || al=="=" || al=="(" || al==")" || al=="{" || al=="}" || al=="[" || al=="]" || al=="^" || al==":" || al==9
mov byte ptr [esi],0
.endif
dec ecx
inc esi
.endw</b></font></pre>
<p><font face="Tahoma"><font size="-1">Once the comments are out of our way, we
separate the words in the buffer by replacing the "separator" characters
with 0s. With this method, we need not concern about the separator characters
while processing the words in the buffer anymore: there is only one separator
character, NULL.</font><b> </b></font></p>
<pre><font face="Tahoma"><b> lea esi,buffer
mov ecx,BufferSize
.while ecx>0
mov al,byte ptr [esi]
.if al!=0</b></font></pre>
<p><font face="Tahoma"><font size="-1">Search the buffer for the first character
that is not null,ie, the first character of a word.</font><b> </b></font></p>
<pre><font face="Tahoma"><b> push ecx
invoke lstrlen,esi
push eax
mov edx,eax</b></font></pre>
<p><font face="Tahoma"><font size="-1">Obtain the length of the word and put it
in edx </font><b> </b></font></p>
<pre><font face="Tahoma"><b> movzx eax,byte ptr [esi]
.if al>="A" && al<="Z"
sub al,"A"
add al,"a"
.endif</b></font></pre>
<p><font face="Tahoma"><font size="-1">Convert the character to lowercase (if
it's an uppercase character)</font><b> </b></font></p>
<pre><font face="Tahoma"><b> shl eax,2
add eax,edi ; edi contains the pointer to the WORDINFO pointer array
.if dword ptr [eax]!=0</b></font></pre>
<p><font face="Tahoma"><font size="-1">After that, we skip to the corresponding
dword in ASMSyntaxArray and check whether the value in that dword is 0. If it
is, we can skip to the next word.</font><b> </b></font></p>
<pre><font face="Tahoma"><b> mov eax,dword ptr [eax]
assume eax:ptr WORDINFO
.while eax!=0
.if edx==[eax].WordLen</b></font></pre>
<p><font face="Tahoma"><font size="-1">If the value in the dword is non-zero,
it points to the linked list of <font color="#006666"><b>WORDINFO</b></font>
structures. We process to walk the linked list, comparing the length of the
word in our local buffer with the length of the word in the <font color="#006666"><b>WORDINFO</b></font>
structure. This is a quick test before we compare the words. Should save some
clock cycles.</font><b> </b></font></p>
<pre><font face="Tahoma"><b> pushad
invoke lstrcmpi,[eax].pszWord,esi
.if eax==0</b></font></pre>
<p><font face="Tahoma"><font size="-1">If the lengths of both words are equal,
we proceed to compare them with <font color="#006666"><b>lstrcmpi</b></font>.</font><b>
</b></font></p>
<pre><font face="Tahoma"><b> popad
mov ecx,esi
lea edx,buffer
sub ecx,edx
add ecx,FirstChar</b></font></pre>
<p><font face="Tahoma"><font size="-1">We construct the character index from the
address of the first character of the matching word in the buffer. We first
obtain its relative offset from the starting address of the buffer then add
the character index of the first visible character to it.</font><b> </b></font></p>
<pre><font face="Tahoma"><b> pushad
.if RichEditVersion==3
invoke SendMessage,hWnd,EM_POSFROMCHAR,addr rect,ecx
.else
invoke SendMessage,hWnd,EM_POSFROMCHAR,ecx,0
mov ecx,eax
and ecx,0FFFFh
mov rect.left,ecx
shr eax,16
mov rect.top,eax
.endif
popad</b></font></pre>
<p><font face="Tahoma"><font size="-1">Once we know the character index of the
first character of the word to be hilighted, we proceed to obtain the coordinate
of it by sending <font color="#006666"><b>EM_POSFROMCHAR</b></font> message.
However, this message is interpreted differently by richedit 2.0 and 3.0. For
richedit 2.0, <font color="#0000CC"><b>wParam</b></font> contains the character
index and <font color="#0000CC"><b>lParam</b></font> is not used. It returns
the coordinate in eax. For richedit 3.0, <font color="#0000CC"><b>wParam</b></font>
is the pointer to a <font color="#006666"><b>POINT</b></font> structure that
will be filled with the coordinate and <font color="#0000CC"><b>lParam</b></font>
contains the character index. </font></font></p>
<p><font face="Tahoma"><font size="-1">As you can see, passing the wrong arguments
to <font color="#006666"><b>EM_POSFROMCHAR</b></font> can wreak havoc to your
system. That's why I have to differentiate between RichEdit control versions.</font><b>
</b></font></p>
<pre><font face="Tahoma"><b> mov edx,[eax].pColor
invoke SetTextColor,hdc,dword ptr [edx]
invoke DrawText,hdc,esi,-1,addr rect,0</b></font></pre>
<p><font face="Tahoma" size="-1">Once we got the coordinate to start, we set the
text color with the one specified in the <font color="#006666"><b>WORDINFO</b></font>
structure. And then proceed to overwrite the word with the new color.</font></p>
<p><font face="Tahoma" size="-1">As the final words, this method can be improved
in several ways. For example, I obtain all the text starting from the first
to the last visible line. If the lines are very long, the performance may hurt
by processing the words that are not visible. You can optimize this by obtaining
the really visible text line by line. Also the searching algorithm can be improved
by using a more efficient method. Don't take me wrong: the syntax hilighting
method used in this example is FAST but it can be FASTER. :)</font></p>
<hr>
<p align="center"><font face="Tahoma" size="-1"><b>[<a href="http://win32asm.cjb.net">Iczelion's
Win32 Assembly Homepage</a>]</b></font></p>
<p> </p>
<p><font face="Tahoma"><b><br>
</b></font></p>
<pre> </pre>
</body>
</html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -