[ Pobierz całość w formacie PDF ]
.Check the bedrooms and sqFeet boxes.Click Next to advance to the next column.Predictions of numeric column values in VisMiner are accomplished bybuilding a linear regression model using the checked columns as input variablesand the column containing missing values as the output variable.Only thoseobservations containing values for all indicated columns are used to build themodel.Once built, the model is then used to generate estimated values forthe observations with missing values in the output (predicted) column.Topredict missing nominal (text) values, a decision tree is constructed and appliedin a similar manner.Select the bedrooms column in the list on the left.For simplicity s sake,select Remove rows w/ missing value as the handling option.Select the cul-de-sac column in the list on the left, then select Assigndefault value as the handling option.In the Default Value box enter N.Specify handling options for the remaining columns according toTable 3.1.Click OK.Once you have specified the handling option for each of the columns withmissing values and pressed OK , a new dataset named CmpltHomes.csv isautomatically created and saved in the same folder or database from which theoriginal was loaded.Right-click on the original dataset, Homes.csv, then select Close dataset.Right-click in the open space in the Datasets and Models pane, thenselect Reorganize dataset layout.View the summary statistics for the newly created dataset.Verify that allcolumns with missing values have been handled.Close the summary statistics window.88 8888 888 88856 Visual Data MiningTable 3.1 Missing Values OptionsColumn Handling Option Otherbathrooms Predict use bedrooms and sqFeetbedrooms Remove rowscul-de-sac Assign default Nden Assign default NdiningRoom Assign default Nelementary Assign default leave blankexterior Remove columngarage Remove columnjrHigh Assign default leave blanklaundry Assign default Nlot Remove rowsmls Remove columnneighborhood Remove columnschoolDistrict Predict use zipstories Remove columnstyle Remove columnExploration using the location plotPreviously we used the boundary plot to evaluate measures tied to politicalboundaries when we compared populations by state.In datasets visualized usingthe boundary plot, there was one row or observation per boundary entity, eventhough each observation may have contained multiple measure columns.In this section, we review a related plot the location plot.Each observationrepresented in a location plot is a point on a map.To be viewable, the datasetmust contain latitude and longitude coordinates.The location plot is similar to ascatter plot layered over a map background.The longitude and latitude valuesare plotted along the X and Y axes respectively.VisMiner uses a server on theInternet to generate the background map layer.In order to use the location plot,an Internet connection is needed.If you have not already done so, execute VisSlave on any computers to beused for visualizations.Drag the newly created CmpltHomes.csv dataset up to a display, release,then select Location Plot.A small dialog box opens, asking which column in the dataset contains theobservation s latitude and which contains the longitude.In this dataset thelatitude column is appropriately named latitude and the longitudecolumn longitude.Select these columns, then click OK.888Advanced Topics in Initial Exploration 57The initial location plot view is a map where all observations are plotted asred dots.Map navigation is:pan right-button dragzoom mouse scroll; as you scroll, the map is automatically centered at thecurrent mouse pointer position.Notice in the original display, that most of the homes are located around ornear Utah Lake.There is one home located to the north and east in the SouthSnyderville Basin near Park City, Utah.This observation is an outlier.(Outliersare discussed later in the chapter.) It most likely got its location due to a dataentry error.To begin exploration:move the mouse pointer to a position near Provo Bay on Utah Lake; useyour mouse wheel to zoom in a single click.At this zoom level, allobservations are visible except for the previously mentioned outlier.Youmay, however, need to pan slightly to include the upper or lower concen-tration of points.As you explore this visualization, you may be tempted to think of theapplication as a home-finding tool for a potential home buyer.With a fewmodifications, it could be used for that purpose.Keep in mind that the objectivehere is data mining.We are looking for patterns in the dataset, not great homes ata bargain price.In the pane to the right of the map are controls to facilitate the pattern search.(See Figure 3.3.) On top are the category and color encoding options thatimplement color highlighting of potential relations.Below are the numeric andcategory filters that allow selective viewing of subsets of data using both thenumeric and nominal (category) column values as filters.The numeric filters aredouble-ended sliders that delineate the range of filtered values.The categoryfilters allow you to include in the display observations having selectedvalues only.In the Category drop-down, select schoolDistrict.To make it easier to read and visually locate, drag the small category keyfrom the upper right corner of the map down over Utah Lake.Can you readily locate school district boundaries? Although you probablywould not use a data mining tool to look up political boundaries, the visualiza-tion does provide a quick assessment of the data quality.For example, notice theinconsistencies in school district for homes located up highway 189 in the88 858 Visual Data MiningFigure 3.3 Location Plot Controlsnortheast area of the map.If you are going to use school district as an inputvariable to a data mining algorithm, you may want to remove these observationsfrom the dataset before performing the analysis.They may bias the results.In the Category drop-down, select propertyType.Where are theconcentrations of Condo/Townhome. properties?To see just the Condo/Townhome. properties, click on the proper-tyType label within the Category Filters box located at the bottom ofthe pane.A small dialog opens, allowing you to select which property types youwould like to see.Check only Condo/Townhome
[ Pobierz całość w formacie PDF ]