[ Pobierz całość w formacie PDF ]
.Check the  bedrooms and  sqFeet boxes.Click  Next to advance to the next column.Predictions of numeric column values in VisMiner are accomplished bybuilding a linear regression model using the checked columns as input variablesand the column containing missing values as the output variable.Only thoseobservations containing values for all indicated columns are used to build themodel.Once built, the model is then used to generate estimated values forthe observations with missing values in the output (predicted) column.Topredict missing nominal (text) values, a decision tree is constructed and appliedin a similar manner.Select the  bedrooms column in the list on the left.For simplicity s sake,select  Remove rows w/ missing value as the handling option.Select the  cul-de-sac column in the list on the left, then select  Assigndefault value as the handling option.In the  Default Value box enter  N.Specify handling options for the remaining columns according toTable 3.1.Click  OK.Once you have specified the handling option for each of the columns withmissing values and pressed  OK , a new dataset named CmpltHomes.csv isautomatically created and saved in the same folder or database from which theoriginal was loaded.Right-click on the original dataset, Homes.csv, then select  Close dataset.Right-click in the open space in the  Datasets and Models pane, thenselect  Reorganize dataset layout.View the summary statistics for the newly created dataset.Verify that allcolumns with missing values have been handled.Close the summary statistics window.88 8888 888 888 56 Visual Data MiningTable 3.1 Missing Values OptionsColumn Handling Option Otherbathrooms Predict use bedrooms and sqFeetbedrooms Remove rowscul-de-sac Assign default Nden Assign default NdiningRoom Assign default Nelementary Assign default leave blankexterior Remove columngarage Remove columnjrHigh Assign default leave blanklaundry Assign default Nlot Remove rowsmls Remove columnneighborhood Remove columnschoolDistrict Predict use zipstories Remove columnstyle Remove columnExploration using the location plotPreviously we used the boundary plot to evaluate measures tied to politicalboundaries when we compared populations by state.In datasets visualized usingthe boundary plot, there was one row or observation per boundary entity, eventhough each observation may have contained multiple measure columns.In this section, we review a related plot  the location plot.Each observationrepresented in a location plot is a point on a map.To be viewable, the datasetmust contain latitude and longitude coordinates.The location plot is similar to ascatter plot layered over a map background.The longitude and latitude valuesare plotted along the X and Y axes respectively.VisMiner uses a server on theInternet to generate the background map layer.In order to use the location plot,an Internet connection is needed.If you have not already done so, execute VisSlave on any computers to beused for visualizations.Drag the newly created CmpltHomes.csv dataset up to a display, release,then select  Location Plot.A small dialog box opens, asking which column in the dataset contains theobservation s latitude and which contains the longitude.In this dataset thelatitude column is appropriately named  latitude and the longitudecolumn  longitude.Select these columns, then click  OK.888 Advanced Topics in Initial Exploration 57The initial location plot view is a map where all observations are plotted asred dots.Map navigation is:pan  right-button dragzoom  mouse scroll; as you scroll, the map is automatically centered at thecurrent mouse pointer position.Notice in the original display, that most of the homes are located around ornear Utah Lake.There is one home located to the north and east in the SouthSnyderville Basin near Park City, Utah.This observation is an outlier.(Outliersare discussed later in the chapter.) It most likely got its location due to a dataentry error.To begin exploration:move the mouse pointer to a position near Provo Bay on Utah Lake; useyour mouse wheel to zoom in a single click.At this zoom level, allobservations are visible except for the previously mentioned outlier.Youmay, however, need to pan slightly to include the upper or lower concen-tration of points.As you explore this visualization, you may be tempted to think of theapplication as a home-finding tool for a potential home buyer.With a fewmodifications, it could be used for that purpose.Keep in mind that the objectivehere is data mining.We are looking for patterns in the dataset, not great homes ata bargain price.In the pane to the right of the map are controls to facilitate the pattern search.(See Figure 3.3.) On top are the category and color encoding options thatimplement color highlighting of potential relations.Below are the numeric andcategory filters that allow selective viewing of subsets of data using both thenumeric and nominal (category) column values as filters.The numeric filters aredouble-ended sliders that delineate the range of filtered values.The categoryfilters allow you to include in the display observations having selectedvalues only.In the Category drop-down, select  schoolDistrict.To make it easier to read and visually locate, drag the small category keyfrom the upper right corner of the map down over Utah Lake.Can you readily locate school district boundaries? Although you probablywould not use a data mining tool to look up political boundaries, the visualiza-tion does provide a quick assessment of the data quality.For example, notice theinconsistencies in school district for homes located up highway 189 in the88 8 58 Visual Data MiningFigure 3.3 Location Plot Controlsnortheast area of the map.If you are going to use school district as an inputvariable to a data mining algorithm, you may want to remove these observationsfrom the dataset before performing the analysis.They may bias the results.In the  Category drop-down, select  propertyType.Where are theconcentrations of  Condo/Townhome. properties?To see just the  Condo/Townhome. properties, click on the proper-tyType label within the  Category Filters box located at the bottom ofthe pane.A small dialog opens, allowing you to select which property types youwould like to see.Check only  Condo/Townhome [ Pobierz całość w formacie PDF ]

  • zanotowane.pl
  • doc.pisz.pl
  • pdf.pisz.pl
  • higrostat.htw.pl
  •