📄 ann_test.cpp
字号:
//----------------------------------------------------------------------// File: ann_test.cpp// Programmer: Sunil Arya and David Mount// Description: test program for ANN (approximate nearest neighbors)// Last modified: 08/04/06 (Version 1.1.1)//----------------------------------------------------------------------// Copyright (c) 1997-2005 University of Maryland and Sunil Arya and// David Mount. All Rights Reserved.// // This software and related documentation is part of the Approximate// Nearest Neighbor Library (ANN). This software is provided under// the provisions of the Lesser GNU Public License (LGPL). See the// file ../ReadMe.txt for further information.// // The University of Maryland (U.M.) and the authors make no// representations about the suitability or fitness of this software for// any purpose. It is provided "as is" without express or implied// warranty.//----------------------------------------------------------------------// History:// Revision 0.1 03/04/98// Initial release// Revision 0.2 06/26/98// Added CLOCKS_PER_SEC definition if needed// Revision 1.0 04/01/05// Added comments (from "#" to eol)// Added clus_orth_flats and clus_ellipsoids distributions// Fixed order of fair and midpt in split_table// Added dump/load operations// Cleaned up C++ for modern compilers// Revision 1.1 05/03/05// Added fixed radius kNN search// Revision 1.1.1 08/04/06// Added planted distribution//----------------------------------------------------------------------#include <ctime> // clock#include <cmath> // math routines#include <string> // C string ops#include <fstream> // file I/O#include <ANN/ANN.h> // ANN declarations#include <ANN/ANNx.h> // more ANN declarations#include <ANN/ANNperf.h> // performance evaluation#include "rand.h" // random point generation#ifndef CLOCKS_PER_SEC // define clocks-per-second if needed #define CLOCKS_PER_SEC 1000000#endifusing namespace std; // make std:: available//----------------------------------------------------------------------// ann_test//// This program is a driver for testing and evaluating the ANN library// for computing approximate nearest neighbors. It allows the user to// generate data and query sets of various sizes, dimensions, and// distributions, to build kd- and bbd-trees of various types, and then// run queries and outputting various performance statistics.//// Overview:// ---------// The test program is run as follows:// // ann_test < test_input > test_output//// where the test_input file contains a list of directives as described// below. Directives consist of a directive name, followed by list of// arguments (depending on the directive). Arguments and directives are// separated by white space (blank, tab, and newline). String arguments// are not quoted, and consist of a string of nonwhite chacters. A// character "#" denotes a comment. The following characters up to// the end of line are ignored. Comments may only be inserted between// directives (not within the argument list of a directive).//// Basic operations:// -----------------// The test program can perform the following operations. How these// operations are performed depends on the options which are described// later.//// Data Generation:// ----------------// read_data_pts <file> Create a set of data points whose// coordinates are input from file <file>.// gen_data_pts Create a set of data points whose// coordinates are generated from the// current point distribution.//// Building the tree:// ------------------// build_ann Generate an approximate nearest neighbor// structure for the current data set, using// the selected splitting rules. Any existing// tree will be destroyed.//// Query Generation/Searching:// ---------------------------// read_query_pts <file> Create a set of query points whose// coordinates are input from file <file>.// gen_query_pts Create a set of query points whose// coordinates are generated from the// current point distribution.// run_queries <string> Apply nearest neighbor searching to the// query points using the approximate nearest// neighbor structure and the given search// strategy. Possible strategies are:// standard = standard kd-tree search// priority = priority search//// Miscellaneous:// --------------// output_label Output a label to the output file.// dump <file> Dump the current structure to given file.// (The dump format is explained further in// the source file kd_tree.cc.)// load <file> Load a tree from a data file which was// created by the dump operation. Any// existing tree will be destroyed.//// Options:// --------// How these operations are performed depends on a set of options.// If an option is not specified, a default value is used. An option// retains its value until it is set again. String inputs are not// enclosed in quotes, and must contain no embedded white space (sorry,// this is C++'s convention).//// Options affecting search tree structure:// ----------------------------------------// split_rule <type> Type of splitting rule to use in building// the search tree. Choices are:// kd = optimized kd-tree// midpt = midpoint split// fair = fair split// sl_midpt = sliding midpt split// sl_fair = sliding fair split// suggest = authors' choice for best// The default is "suggest". See the file// kd_split.cc for more detailed information.//// shrink_rule <type> Type of shrinking rule to use in building// a bd-tree data structure. If "none" is// given, then no shrinking is performed and// the result is a kd-tree. Choices are:// none = perform no shrinking// simple = simple shrinking// centroid = centroid shrinking// suggest = authors' choice for best// The default is "none". See the file// bd_tree.cc for more information.// bucket_size <int> Bucket size, that is, the maximum number of// points stored in each leaf node.//// Options affecting data and query point generation:// --------------------------------------------------// dim <int> Dimension of space.// seed <int> Seed for random number generation.// data_size <int> Number of data points. When reading data// points from a file, this indicates the// maximum number of points for storage// allocation. Default = 100.// query_size <int> Same as data_size for query points.// std_dev <float> Standard deviation (used in gauss,// planted, and clustered distributions).// This is the "small" distribution for// clus_ellipsoids. Default = 1.// std_dev_lo <float> Low and high standard deviations (used in// std_dev_hi <float> clus_ellipsoids). Default = 1.// corr_coef <float> Correlation coefficient (used in co-gauss// and co_lapace distributions). Default = 0.05.// colors <int> Number of color classes (clusters) (used// in the clustered distributions). Default = 5.// new_clust Once generated, cluster centers are not// normally regenerated. This is so that both// query points and data points can be generated// using the same set of clusters. This option// forces new cluster centers to be generated// with the next generation of either data or// query points.// max_clus_dim <int> Maximum dimension of clusters (used in// clus_orth_flats and clus_ellipsoids).// Default = 1.// distribution <string> Type of input distribution// uniform = uniform over cube [-1,1]^d.// gauss = Gaussian with mean 0// laplace = Laplacian, mean 0 and var 1// co_gauss = correlated Gaussian// co_laplace = correlated Laplacian// clus_gauss = clustered Gaussian// clus_orth_flats = clusters of orth flats// clus_ellipsoids = clusters of ellipsoids// planted = planted distribution// See the file rand.cpp for further information.//// Options affecting nearest neighbor search:// ------------------------------------------// epsilon <float> Error bound for approx. near neigh. search.// near_neigh <int> Number of nearest neighbors to compute.// max_pts_visit <int> Maximum number of points to visit before// terminating. (Used in applications where// real-time performance is important.)// (Default = 0, which means no limit.)// radius_bound <float> Sets an upper bound on the nearest// neighbor search radius. If the bound is// positive, then fixed-radius nearest// neighbor searching is performed, and the// count of the number of points in the// range is returned. If the bound is// zero, then standard search is used.// This can only be used with standard, not// priority, search. (Default = 0, which// means standard search.)//// Options affection general program behavior:// -------------------------------------------// stats <string> Level of statistics output// silent = no output,// exec_time += execution time only// prep_stats += preprocessing statistics// query_stats += query performance stats// query_res += results of queries// show_pts += show the data points// show_struct += print search structure// validate <string> Validate experiment and compute average// error. Since validation causes exact// nearest neighbors to be computed by the// brute force method, this can take a long// time. Valid arguments are:// on = turn validation on// off = turn validation off// true_near_neigh <int> Number of true nearest neighbors to compute.// When validating, we compute the difference// in rank between each reported nearest neighbor// and the true nearest neighbor of the same// rank. Thus it is necessary to compute a// few more true nearest neighbors. By default// we compute 10 more than near_neigh. With// this option the exact number can be set.// (Used only when validating.)//// Example:// --------// output_label test_run_0 # output label for this run// validate off # do not perform validation// dim 16 # points in dimension 16// stats query_stats # output performance statistics for queries// seed 121212 # random number seed// data_size 1000// distribution uniform// gen_data_pts # 1000 uniform data points in dim 16// query_size 100// std_dev 0.05// distribution clus_gauss// gen_query_pts # 100 points in 10 clusters with std_dev 0.05// bucket_size 2// split_rule kd// shrink_rule none// build_ann # kd-tree, bucket size 2// epsilon 0.1// near_neigh 5// max_pts_visit 100 # stop search if more than 100 points seen// run_queries standard # run queries; 5 nearest neighbors, 10% error// data_size 500// read_data_pts data.in # read up to 500 points from file data.in// split_rule sl_midpt// shrink_rule simple// build_ann # bd-tree; simple shrink, sliding midpoint split// epsilon 0// run_queries priority # run same queries; 0 allowable error////------------------------------------------------------------------------//------------------------------------------------------------------------// Constants//------------------------------------------------------------------------const int STRING_LEN = 500; // max string lengthconst double ERR = 0.00001; // epsilon (for float compares)//------------------------------------------------------------------------// Enumerated values and conversions//------------------------------------------------------------------------typedef enum {DATA, QUERY} PtType; // point types//------------------------------------------------------------------------// Statistics output levels//------------------------------------------------------------------------typedef enum { // stat levels SILENT, // no output EXEC_TIME, // just execution time PREP_STATS, // preprocessing info QUERY_STATS, // query performance QUERY_RES, // query results SHOW_PTS, // show data points SHOW_STRUCT, // show tree structure N_STAT_LEVELS} // number of levels StatLev;const char stat_table[N_STAT_LEVELS][STRING_LEN] = { "silent", // SILENT "exec_time", // EXEC_TIME "prep_stats", // PREP_STATS "query_stats", // QUERY_STATS "query_res", // QUERY_RES "show_pts", // SHOW_PTS "show_struct"}; // SHOW_STRUCT//------------------------------------------------------------------------// Distributions//------------------------------------------------------------------------typedef enum { // distributions UNIFORM, // uniform over cube [-1,1]^d. GAUSS, // Gaussian with mean 0 LAPLACE, // Laplacian, mean 0 and var 1 CO_GAUSS, // correlated Gaussian CO_LAPLACE, // correlated Laplacian CLUS_GAUSS, // clustered Gaussian CLUS_ORTH_FLATS, // clustered on orthog flats CLUS_ELLIPSOIDS, // clustered on ellipsoids PLANTED, // planted distribution N_DISTRIBS} Distrib;
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -