📄 km_ann.h

📁 高效的k-means算法实现
💻 H
📖 第 1 页 / 共 2 页
字号:
12 下一页
//----------------------------------------------------------------------//      File:           KM_ANN.h//      Programmer:     David Mount//      Last modified:  05/14/04//      Description:    Stripped down ANN utilities//----------------------------------------------------------------------// Copyright (C) 2004-2005 David M. Mount and University of Maryland// All Rights Reserved.// // This program is free software; you can redistribute it and/or modify// it under the terms of the GNU General Public License as published by// the Free Software Foundation; either version 2 of the License, or (at// your option) any later version.  See the file Copyright.txt in the// main directory.// // The University of Maryland and the authors make no representations// about the suitability or fitness of this software for any purpose.// It is provided "as is" without express or implied warranty.//----------------------------------------------------------------------//	This file contains basic definitions for points, rectangles and//	distance computations, which have been taken from ANN.  These//	are heavily used by the kc-tree, which was adapted from ANN's//	kd-tree routines.//----------------------------------------------------------------------#ifndef KM_ANN_H#define KM_ANN_H//----------------------------------------------------------------------//  basic includes//----------------------------------------------------------------------#include <cstdlib>                      // standard libs and definitions#include <fstream>                      // file I/O streams#include <iomanip>                      // I/O manipulators (for setw)#include <climits>                      // for INT_MAX#include <cfloat>                       // for DBL_MAX#include <ctime>                        // for system clock#include <cassert>                      // assertion checking#include <string>                       // STL strings#include <climits>			// numeric limits//----------------------------------------------------------------------//  Basic Types:  KMcoord, KMdist, KMidx//	KMcoord and KMdist are the types used for representing//	point coordinates and distances.  They can be modified by the//	user, with some care.  It is assumed that they are both numeric//	types, and that KMdist is generally of an equal or higher type//	from KMcoord.  A variable of type KMdist should be large//	enough to store the sum of squared components of a variable//	of type KMcoord for the number of dimensions needed in the//	application.  For example, the following combinations are//	legal:////		KMcoord	KMdist//		---------	-------------------------------//		short		short, int, long, float, double//		int		int, long, float, double//		long		long, float, double//		float		float, double//		double		double////	It is the user's responsibility to make sure that overflow does//	not occur in distance calculation.////	The code assumes that there is an infinite distance, KM_DIST_INF//	(as large as any legal distance).  Possible values are given below:////	    Examples://	    KMdist:		double, float, long, int, short//	    KM_DIST_INF:	DBL_MAX, FLT_MAX, LONG_MAX, INT_MAX, SHRT_MAX////	The routine that dumps a tree needs to know roughly how many//	significant digits there are in a KMcoord, so it can output//	points to full precision.  This is defined in KMcoordPrec.//	If you have ANSI C++, you should be able to use the following//	values from values.h to help in computing this:////		KMcoord			KMcoordBits//		--------------------	-------------------------------//		short, int, long,...	BITS(short), BITS(int), ...//		float			FSIGNIF//		double			DSIGNIF////	Then KMcoordPrec = KMcoordBits/(log_2 10) where log_2 10//	is the base 2 logarithm of 10.////	KMidx is a point index.  When the data structure is built,//	the points are given as an array.  Nearest neighbor results are//	returned as an index into this array.  To make it clearer when//	this is happening, we define the integer type KMidx.//		//----------------------------------------------------------------------typedef	double	KMcoord;		// coordinate data typetypedef	double	KMdist;			// distance data typetypedef int	KMidx;			// point index					// largest possible distanceconst KMdist	KM_DIST_INF	=  DBL_MAX;#ifdef DSIGNIF				// number of sig. digits in KMcoord    const int	 KMcoordPrec	= DBL_DIG;#else    const int	 KMcoordPrec	= 15;	// default precision#endif//----------------------------------------------------------------------//  Norms and metrics://	KM supports any Minkowski norm for defining distance.  In//	particular, for any p >= 1, the L_p Minkowski norm defines the//	length of a d-vector (v0, v1, ..., v(d-1)) to be////		(|v0|^p + |v1|^p + ... + |v(d-1)|^p)^(1/p),////	(where ^ denotes exponentiation, and |.| denotes absolute//	value).  The distance between two points is defined to be//	the norm of the vector joining them.  Some common distance//	metrics include////		Euclidean metric	p = 2//		Manhattan metric	p = 1//		Max metric		p = infinity////	In the case of the max metric, the norm is computed by//	taking the maxima of the absolute values of the components.//	KM is highly "coordinate-based" and does not support general//	distances functions (e.g. those obeying just the triangle//	inequality).  It also does not support distance functions//	based on inner-products.////	For the purpose of computing nearest neighbors, it is not//	necessary to compute the final power (1/p).  Thus the only//	component that is used by the program is |v(i)|^p.////	KM parameterizes the distance computation through the following//	macros.  (Macros are used rather than procedures for efficiency.)//	Recall that the distance between two points is given by the length//	of the vector joining them, and the length or norm of a vector v//	is given by formula:////		|v| = ROOT(POW(v0) # POW(v1) # ... # POW(v(d-1)))////	where ROOT, POW are unary functions and # is an associative and//	commutative binary operator satisfying:////	    **	POW:	coord		--> dist//	    **	#:	dist x dist	--> dist//	    **	ROOT:	dist (>0)	--> double////	For early termination in distance calculation (partial distance//	calculation) we assume that POW and # together are monotonically//	increasing on sequences of arguments, meaning that for all//	v0..vk and y:////	POW(v0) #...# POW(vk) <= (POW(v0) #...# POW(vk)) # POW(y).////	Due to the use of incremental distance calculations in the code//	for searching k-d trees, we assume that there is an incremental//	update function DIFF(x,y) for #, such that if:////		    s = x0 # ... # xi # ... # xk ////	then if s' is s with xi replaced by y, that is, //	//		    s' = x0 # ... # y # ... # xk////	can be computed by:////		    s' = s # DIFF(xi,y).////	Thus, if # is + then DIFF(xi,y) is (yi-x).  For the L_infinity//	norm we make use of the fact that in the program this function//	is only invoked when y > xi, and hence DIFF(xi,y)=y.////	Finally, for approximate nearest neighbor queries we assume//	that POW and ROOT are related such that////		    v*ROOT(x) = ROOT(POW(v)*x)////	Here are the values for the various Minkowski norms:////	L_p:	p even:				p odd://		-------------------------	------------------------//		POW(v)		= v^p		POW(v)		= |v|^p//		ROOT(x)		= x^(1/p)	ROOT(x)		= x^(1/p)//		#		= +		#		= +//		DIFF(x,y)	= y - x		DIFF(x,y)	= y - x	////	L_inf://		POW(v)		= |v|//		ROOT(x)		= x//		#		= max//		DIFF(x,y)  	= y////	By default the Euclidean norm is assumed.  To change the norm,//	uncomment the appropriate set of macros below.////----------------------------------------------------------------------//----------------------------------------------------------------------//	Use the following for the Euclidean norm//----------------------------------------------------------------------#define KM_POW(v)		((v)*(v))#define KM_ROOT(x)		sqrt(x)#define KM_SUM(x,y)		((x) + (y))#define KM_DIFF(x,y)		((y) - (x))//----------------------------------------------------------------------//  Array types
12 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -