⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 page_132.html

📁 怎样挖掘你的网站的内容。本领域内唯一的书
💻 HTML
字号:
<HTML>  <HEAD>    <!--SCRIPT LANGUAGE="JavaScript" SRC="http://a1835.g.akamai.net/f/1835/276/3h/www.netlibrary.com/include/js/dictionary_library.js"></SCRIPT>    <SCRIPT LANGUAGE="JavaScript">      if (!opener){document.onkeyup=parent.turnBookPage;}    </SCRIPT!-->    <META HTTP-EQUIV="Cache-Control" CONTENT="no-cache">    <META HTTP-EQUIV="Pragma" CONTENT="no-cache">    <META HTTP-EQUIV="Expires" CONTENT="-1"><META http-equiv="Content-Type" content="text/html; charset=windows-1252"><SCRIPT>var PrevPage="Page_131";var NextPage="Page_133";var CurPage="Page_132";var PageOrder="142";</SCRIPT>  <TITLE>Document</TITLE>  </HEAD>  <BODY BGCOLOR="#FFFFFF"><CENTER><TABLE BORDER=0 WIDTH=100% CELLPADDING=0><TR><TD ALIGN=CENTER>  <TABLE BORDER=0 CELLPADDING=2 CELLSPACING=0 WIDTH=100%>  <TR>  <TD ALIGN=LEFT><A HREF='Page_131.html'>Previous</A></TD>  <TD ALIGN=RIGHT><A HREF='Page_133.html'>Next</A></TD>  </TR>  </TABLE></TD></TR><TR><TD ALIGN=LEFT><P><A NAME='JUMPDEST_Page_132'/><A NAME='{467}'/><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0 WIDTH='100%'><TR><TD ALIGN=RIGHT><FONT FACE='Times New Roman, Times, Serif' SIZE=2 COLOR=#FF0000>Page 132</FONT></TD></TR></TABLE><A NAME='{468}'/><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0><TR>  <TD ROWSPAN=5></TD>  <TD COLSPAN=3 HEIGHT=17></TD>  <TD ROWSPAN=5></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR><TD></TD>  <TD><FONT FACE='Times New Roman, Times, Serif' SIZE=3><B>Prepare Your Data</B></FONT></TD><TD></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR>  <TD COLSPAN=3 HEIGHT=1></TD></TR></TABLE><A NAME='{469}'/><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0><TR>  <TD ROWSPAN=5></TD>  <TD COLSPAN=3 HEIGHT=12></TD>  <TD ROWSPAN=5></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR><TD></TD>  <TD><FONT FACE='Times New Roman, Times, Serif' SIZE=3>Once the data has been assembled and visually inspected, some decisions must be made regarding which attributes to exclude and which attributes need to be converted into usable formats.</FONT></TD><TD></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR>  <TD COLSPAN=3 HEIGHT=1></TD></TR></TABLE><A NAME='{46A}'/><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0><TR>  <TD ROWSPAN=5></TD>  <TD COLSPAN=3 HEIGHT=12></TD>  <TD ROWSPAN=5></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR><TD></TD>  <TD><FONT FACE='Symbol' SIZE=3>&middot;</FONT><FONT FACE='Times New Roman, Times, Serif' SIZE=3> What condition is the data in, and what steps are needed to prepare it for analysis?</FONT></TD><TD></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR>  <TD COLSPAN=3 HEIGHT=1></TD></TR></TABLE><A NAME='{46B}'/><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0><TR>  <TD ROWSPAN=5></TD>  <TD COLSPAN=3 HEIGHT=12></TD>  <TD ROWSPAN=5></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR><TD></TD>  <TD><FONT FACE='Symbol' SIZE=3>&middot;</FONT><FONT FACE='Times New Roman, Times, Serif' SIZE=3> What conversions and mapping of the data is required prior to the analysis?</FONT></TD><TD></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR>  <TD COLSPAN=3 HEIGHT=1></TD></TR></TABLE><A NAME='{46C}'/><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0><TR>  <TD ROWSPAN=5></TD>  <TD COLSPAN=3 HEIGHT=12></TD>  <TD ROWSPAN=5></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR><TD></TD>  <TD><FONT FACE='Symbol' SIZE=3>&middot;</FONT><FONT FACE='Times New Roman, Times, Serif' SIZE=3> Are these processes acceptable to the users and the deliverable solution?</FONT></TD><TD></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR>  <TD COLSPAN=3 HEIGHT=1></TD></TR></TABLE><A NAME='{46D}'/><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0><TR>  <TD ROWSPAN=5></TD>  <TD COLSPAN=3 HEIGHT=12></TD>  <TD ROWSPAN=5></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR><TD></TD>  <TD><FONT FACE='Symbol' SIZE=3>&middot;</FONT><FONT FACE='Times New Roman, Times, Serif' SIZE=3> What strategies must be taken for handling missing data and noise or outliers?</FONT></TD><TD></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR>  <TD COLSPAN=3 HEIGHT=1></TD></TR></TABLE><A NAME='{46E}'/><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0><TR>  <TD ROWSPAN=5></TD>  <TD COLSPAN=3 HEIGHT=12></TD>  <TD ROWSPAN=5></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR><TD></TD>  <TD><FONT FACE='Symbol' SIZE=3>&middot;</FONT><FONT FACE='Times New Roman, Times, Serif' SIZE=3> How skewed is the data? Are logarithm or square transformation needed?</FONT></TD><TD></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR>  <TD COLSPAN=3 HEIGHT=1></TD></TR></TABLE><A NAME='{46F}'/><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0><TR>  <TD ROWSPAN=5></TD>  <TD COLSPAN=3 HEIGHT=12></TD>  <TD ROWSPAN=5></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR><TD></TD>  <TD><FONT FACE='Symbol' SIZE=3>&middot;</FONT><FONT FACE='Times New Roman, Times, Serif' SIZE=3> Do you need to do 1-of-N conversion for categorical fields?</FONT></TD><TD></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR>  <TD COLSPAN=3 HEIGHT=1></TD></TR></TABLE><A NAME='{470}'/><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0><TR>  <TD ROWSPAN=5></TD>  <TD COLSPAN=3 HEIGHT=12></TD>  <TD ROWSPAN=5></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR><TD></TD>  <TD><FONT FACE='Symbol' SIZE=3>&middot;</FONT><FONT FACE='Times New Roman, Times, Serif' SIZE=3> Do you need to normalize dollar fields by dividing them by 1000?</FONT></TD><TD></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR>  <TD COLSPAN=3 HEIGHT=1></TD></TR></TABLE><A NAME='{471}'/><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0><TR>  <TD ROWSPAN=5></TD>  <TD COLSPAN=3 HEIGHT=12></TD>  <TD ROWSPAN=5></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR><TD></TD>  <TD><FONT FACE='Symbol' SIZE=3>&middot;</FONT><FONT FACE='Times New Roman, Times, Serif' SIZE=3> Do you need to convert purchase dates to continuous values?</FONT></TD><TD></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR>  <TD COLSPAN=3 HEIGHT=1></TD></TR></TABLE><A NAME='{472}'/><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0><TR>  <TD ROWSPAN=5></TD>  <TD COLSPAN=3 HEIGHT=12></TD>  <TD ROWSPAN=5></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR><TD></TD>  <TD><FONT FACE='Symbol' SIZE=3>&middot;</FONT><FONT FACE='Times New Roman, Times, Serif' SIZE=3> Do you need to convert addresses to sectors?</FONT></TD><TD></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR>  <TD COLSPAN=3 HEIGHT=1></TD></TR></TABLE><A NAME='{473}'/><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0><TR>  <TD ROWSPAN=5></TD>  <TD COLSPAN=3 HEIGHT=12></TD>  <TD ROWSPAN=5></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR><TD></TD>  <TD><FONT FACE='Symbol' SIZE=3>&middot;</FONT><FONT FACE='Times New Roman, Times, Serif' SIZE=3> Do you need to convert Yes/No fields to 1/0?</FONT></TD><TD></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR>  <TD COLSPAN=3 HEIGHT=1></TD></TR></TABLE><A NAME='{474}'/><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0><TR>  <TD ROWSPAN=5></TD>  <TD COLSPAN=3 HEIGHT=12></TD>  <TD ROWSPAN=5></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR><TD></TD>  <TD><FONT FACE='Times New Roman, Times, Serif' SIZE=3>A graphical tool or a good file editor can assist you in inspecting the physical state of the data. A visual inspection should provide you an overview of the number and percentage of blank fields in the data set. Also, a statistical tool can assist you in identifying important relations between variables in the data. This, however, may not help in large data sets. Operational data is organized to be compact for speed and efficiency; it is not organized for analysis. Carefully review its format and be prepared to convert it into a format that will yield insight from its analysis. For example, it may be efficient to have an account establish date in the format of MM/DD/YY, but it may be necessary to transform this field to one that equates to &quot;Total Number of Establish Account Days: NNNN.&quot;</FONT></TD><TD></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR>  <TD COLSPAN=3 HEIGHT=1></TD></TR></TABLE><A NAME='{475}'/><TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0><TR>  <TD ROWSPAN=5></TD>  <TD COLSPAN=3 HEIGHT=12></TD>  <TD ROWSPAN=5></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR><TD></TD>  <TD><FONT FACE='Times New Roman, Times, Serif' SIZE=3>If you are dealing with a very large database, don't do wholesale conversions on the entire data set. It is safer to first do a random extract, perform the transactions, mine the data, and evaluate the results. If you are considering using a neural network tool, additional conversion of the data will be required so that categorical values are converted to 1-of-N values and all continuous values are converted to ranges between 0 and 1 or into log or square functions. One common method of</FONT><FONT FACE='Times New Roman, Times, Serif' SIZE=3 COLOR=#FFFF00><!-- continue --></FONT></TD><TD></TD></TR><TR>  <TD COLSPAN=3></TD></TR><TR>  <TD COLSPAN=3 HEIGHT=1></TD></TR></TABLE><A NAME='{476}'/></FORM></P></TD></TR></TABLE><P><FONT SIZE=0 COLOR=WHITE></CENTER><A NAME="bottom">&nbsp;</A><!-- netLibrary.com Copyright Notice -->  </BODY></HTML>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -