📄 performancelog

📁 一个内存数据库的源代码这是服务器端还有客户端
💻
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
From mk Sat Jun 30 13:06:33 2001Received: from mast.cwi.nl (mast.cwi.nl [192.16.196.89]) by hera.cwi.nl with ESMTP	id NAA15285 for <martin.kersten@mail.cwi.nl>; Sat, 30 Jun 2001 13:06:32 +0200 (MEST)Received: from echolood.cwi.nl (IDENT:root@echolood.cwi.nl [192.16.196.65])	by mast.cwi.nl (8.11.2/8.9.3/FLW-3.11M) with ESMTP id f5UB6Wq13328	for <Martin.Kersten@cwi.nl>; Sat, 30 Jun 2001 13:06:32 +0200From: Martin Kersten <Martin.Kersten@cwi.nl>Received: (from mk@localhost)	by echolood.cwi.nl (8.11.0/8.9.3/FLW-3.2C) id f5UB6WI24560	for Martin.Kersten@cwi.nl; Sat, 30 Jun 2001 13:06:32 +0200Message-Id: <200106301106.f5UB6WI24560@echolood.cwi.nl>Subject: Re: performanceTo: Martin.Kersten@cwi.nl (Martin Kersten)Date: Sat, 30 Jun 2001 13:06:31 +0200 (CEST)In-Reply-To: <200106292337.f5TNbhl29637@echolood.cwi.nl> from "Martin Kersten" at Jun 30, 2001 01:37:43 AMX-Mailer: ELM [version 2.5 PL3]MIME-Version: 1.0Content-Type: text/plain; charset=us-asciiContent-Transfer-Encoding: 7bitStatus: ROForwarded message:> > here the results of the little experiment of this afternoon,> after trying to answer the question "why does it take 0.8 sec> to start Mserver on Stefan's machine compared to 2.7 sec on> Martin's"> > On Stefan's machine (dual 550Mhz)> 	V4-3 -O6	V4.3 -O2	V5.0 -O2> base	0.8/0.2/0.6	0.8/0.2/0.5	0.8/0.2/0.5> tst400a	7.7/6.5/0.6	7.0/6.4/0.5	1.8/1.2/0.5> tst400b	3.3/2/5/0.8	3.0/2.4/0.6	1.6/0.7/0.8> tst400c	14.2/13.3/0.9	14.5/13/9/0.5	1.9/0.7/0.9> tst400d	9.2/8.6/0.5	10.1/9.3/0.7	0.8/0.1/0.6> > On Martin's machine (450 Mhz)> 			V4.3 -O2	V5.0 -O2> base			2.7/0.3/2.4	2.6/0.25/2.3> tst400a		8.7/6.3/2.4	3.5/1.2/2.3> tst400b 		5.0/2.5/2.5	3.2/0.7/2.5> tst400c		5.6/3.0/2.6	3.2/0.6/2.6> tst400d		3.0/0.5/2.5	2.6/0.2/2.4> > Observations:> - original question not answered> - tst c+d involve b.insert(1,1), which uses a simplified>   reference counting scheme for bats (no BBPfix() in module interface)> - synchronization seems really expensive on the dual> - handling empty  1M loop in M4  doesn't align with tst400d loop> - new parser is quite effective> The situation on June 30, 2001, calling from homeOn Martin's machine (450 Mhz)		V5.0 -O2base		2.6/0.2/2.4		tst400b 	2.9/0.5/2.4		10k counttst400c		3.0/0.5/2.5		10k inserttst400d		2.6/0.2/2.4		100K insert tupletst400e		7.0/3.6/2.4		1M mal call increment functionTimes at 25/8/01		echolood(450Mz load 1.5)	Version 4.3tst400a		 3.3/0.9/2.4	1M loop	 	8.7/ 6.2/2.4tst400bHuge 	13  /7.3/5.8	100k count (io)	24.8/11.9/4.1tst400cHuge	12.3/6.5/5.6	100k insert(io)	22.0/16.0/3.6tst400d		 4.9/2.3/2.6	1M insert tuple 31.8/26.5/4.8tst400e		 5.2/2.8/2.4	1M incr fcn    	16.2/12.2/2.4		gaffel (2x400Mhz Celeron load 0.0)	Version 4.3tst400a		 1.5/0.9/0.6	1M loop	 	7.1/ 6.6/0.5tst400bHuge 	13  /8.2/4.6	100k count (io)	14.6/13.0/1.5tst400cHuge	12  /8.2/4.8	100k insert(io)	17.6/15.5/2.1tst400d		 3.4/2.5/0.9	1M insert tuple 33.0/29.4/3.6tst400e		 3.7/3.0/0.6	1M incr fcn    	13.1/12.6/0.5		bezaan ( Athlon load 1.5)	Version 4.3base		11.8/0.4/5.6			0m13/0.9/5.7tst400a		22.5/5.1/6.0	1M loop	 	1m21/38/5.6tst400bHuge 	1m28/32/15	100k count (io)	2m02/56/8.1tst400cHuge	1m30/31/15	100k insert(io)	2m12/1m01/9.4tst400d		0m38/13/7	1M insert tuple 4m02/1m53/13.9tst400e		0m36/16/6	1M incr fcn    	2m20/1m06/5.7		thor (XEON load 0.0)		Version 4.3base		0.5/0.2/0.3			0.9/0.3/0.4tst400a		1.1/0.7/0.4	1M loop	 	5.1/4.7/0.4tst400bHuge 	6.4/4.3/1.9	100k count (io)	10.5/9.6/0.9tst400cHuge	6.4/4.1/2.0	100k insert(io)	12.8/11.4/1.4tst400d		2.4/1.7/0.6	1M insert tuple 25.0/22.4/2.5tst400e		2.7/2.2/0.4	1M incr fcn    	9.6/9.2/0.4Times as of 17-09-01 		echolood(450Mz load 0)	Version 4.3base		 2.5/0.2/2.3tst400a		 3.0/0.7/2.3	1M loop	 	8.7/ 6.2/2.4tst400a2	 3.4/1.0/2.3	1M loop inctst400bHuge 	 4.8/2.4/2.4	100k count (io)	24.8/11.9/4.1tst400cHuge	 5.4/2.9/2.5	100k insert(io)	22.0/16.0/3.6tst400d		 4.5/1.9/2.5	1M insert tuple 31.8/26.5/4.8tst400e		 4.9/2.6/2.3	1M incr fcn    	16.2/12.2/2.4Multiplex operationsWe assume a bat[lng,lng] with 1M elements in increasing order.tst901a contains a fast implementation assuming aligned batsin both V5 and V4.3tst901b uses the MAL/MIL generic loop to achieve the same resultFor V5 a C-version was also constructed, which assumed acode generation technique based on translation of each individualMAL statement to the relevant C-statements. No inter-statementanalysis was performed. It represents what would be achievedby a 'naive' expansion to a C-function.The running time for this function was 3.6 seconds.Compiled version as part of batcalc runs in 3.6sectst901a		9.4/5.6/3.6	[+] call	35.7/30.4/5.0	make	5.7				32.4	multiplex 0.31				 0.35tst901b	20.6/16.3/4.3		explicit loop	82.8/68.5/7.3	make	   5.5				41.5	multiplex 14.6				59.6Multiplex library calls,Assume bats with 1M unique elements, all results in milliseconds              Monet V5.0                      Monet V4.3tst901d         [:+=]   [dbl]   [log]   [==]    [:+=]   [dbl]   [log]   [==][void,lng]      110     240     630             144     960     720     220                110     258     650     234     130     967     727     224                108     253     657     240     130     965     726     225[lng,lng]       140     280     710             143     1020    880     345                142     300     696     381     164     1136    974     402The operation [+](lng,lng) costs 180 and 346 micro seconds, respectively.The BATcopy underlying this operation takes: 141 and 269 ms.Although [+] == batcopy followed by [:+=], this is not cost effective(180 ms versus 141+140)C- implementation of a trivial iterator, compile time 0.2 sec[      1, 639855.00]		profiler.setAll();[      1, 99.00]		b := mal.new(lng,lng);[      1, 26.00]		t0 := system.usec();[      1,  3.00]		i := -1;[      1,  3.00]		t0 := system.usec();[1000001,  2.56]	barrier	v := mal.nextElement(i,0,1000000);[1000000,  6.97]		bat.insert(b,i,i);[1000000,  2.13]		redo v ;[      1,  5.00]		t1 := system.usec();[      1, 118475.00]		c := algebra.copy(b);[      1,  7.00]		t2 := system.usec();[      1, 111.00]		d := mal.new(lng,lng);[      1,  5.00]		h := 0;[      1,  3.00]		t := 0;[1000001,  4.55]	barrier	mloop := mal.bunStream(b,h,t);[1000000,  5.55]		B2 := algebra.find(b,h);[1000000,  4.83]		B3 := algebra.find(c,h);[1000000,  2.38]		cr := calc.+(B2,B3);[1000000,  7.85]		bat.insert(d,h,cr);[1000000,  2.24]	catch	GDKerror;[1000000,  2.30]		redo mloop ;[      1,  5.00]		t3 := system.usec();[      1, 3714520.00]		batcalc.CMD000(b,b);[      1,  8.00]		t4 := system.usec();[      1,  5.00]		d1 := calc.-(t1,t0);[      1,  3.00]		d2 := calc.-(t3,t2);[      1,  3.00]		d3 := calc.-(t4,t3);[      1, 39.00]		cnt := aggr.count(d);[      1, 123.00]		system.printf("cnt %d ",cnt);[      1, 19.00]		system.printf("make %d ",d1);[      1, 79.00]		system.printf("multiplex %d\n",d2);[      1, 87.00]		system.printf("compiled multiplex %d\n",d3);[      1,  2.00]	end function;--------------- MARCH 2002 ---------------------------Just before we are moving to the Athlon desktopsCalling convention with -monetrc .../monet.confBoth servers use a 'hot' measurements on quite machineTimes as of 22-3-03		echolood(450Mz load 0)	Version 4.3base		 0.7/0.3/0.4			0.7/ 0.3/0.4tst400a		 1.3/0.9/0.4	1M loop	 	8.2/ 7.9/0.4tst400a2	 1.3/1.3/0.4	1M loop inctst400bHuge 	 3.7/3.0/0.5	100k count (io)	17.6/15.8/1.3tst400cHuge	 3.8/3.2/0.4	100k insert(io)	22.0/18.0/1.6tst400d		 6.5/5.1/1.4	1M insert tuple 38.2/34.7/2.8tst400e		 3.8/3.4/0.4	1M incr fcn    	15.6/15.2/0.4tst901a		8.3/6.7/1.5	[+] call	39.4/36.3/2.9tst901b		27.4/23.4/3.6	[+] call	119.9/112.4/6.3mk@gaffel::~/monet_5-0/src/mal/Tests> time Mserver -monetrc ../../../monet.conf  tst400a </dev/null >/dev/null		gaffel(2x400Mhz Celeron load 0.5)	Version 4.3base		 0.97/0.4/0.53			1.0/ 0.4/0.57tst400a		 1.6/1.0/0.6	1M loop	 	9.7/ 9.2/0.5tst400a2	 2.1/1.6/0.5	1M loop inctst400bHuge 	 4.3/3.5/0.7	100k count (io)	18.7/16.9/1.7tst400cHuge	 4.6/3.7/0.8	100k insert(io)	22.6/20.5/2.1tst400d		 7.7/5.9/1.7	1M insert tuple 39.2/36.2/3.5tst400e		 4.5/3.8/0.5	1M incr fcn    	17.7/17.2/0.5tst901a		9.2/6.8/2.4	[+] call	40.4/36.6/3.7tst901b		27.0/23.4/3.5	[+] call	118.8/110.5/7.4Overall picture is a loss of performance of about 25% onboth sides compared to Sept 2001. Since V5 and V4.3 use the sameGDK stuff, I suspect this to be a result of Linux kernel changes.Outlier are cHuge and tst400d. Both test insertion in a BAT.The overall loss is in the order of 10%. Furthermore, tst400don V5 illustrates unexpected behavior. tst400d: changing the BBPfix in CMDinsert into a direct update of the reference        count saved 0.7 usr sec. In combination with BBPunfix part of the        garbagecollector being called after before each instruction this        means that 25% of the time the test is involved in locking        activity. It seems that locking has become significantly more        expensive. (this might also explain the slow start of gdb on        Mserver)The chains in the symbol table have become quite large(mdb->scope)Forexample, reaching +() requires passing 15 modules and 35 signature teston average.If you identify the module calc.+() you skip one module for inspectionand save around 300 ms/100K calls in tst400bHuge.If you could 'jump' to the correct signature within the module thenyou would save another 300ms. This means that type resolution in tst400bHugetakes about 20% of the resources.insert() requires 16 module passes and 63 (!) signature comparisons on avg.This leads to a difference in tst400cHuge of 970 ms/100calls about 25% of testMinor update to source (before/after/final- hooks)and better stack preparation in mal_interpetertst400a		 1.2/0.8/0.4	1M loop	 	tst400a2	 1.6/1.2/0.4	1M loop inctst400bHuge 	 3.3/2.8/0.5	100k count (io)	tst400cHuge	 3.4/2.8/0.6	100k insert(io)	tst400d		 6.6/5.4/1.2	1M insert tuple tst400e		 3.0/2.6/0.4	1M incr fcn    	tst901a		 7.6/5.9/1.7	[+] call	tst901b		24.2/21.1/3.0	[+] call	The first hour of the dual Athlon	and V4.3 compiled with -O2!!!		orion(2x1.4Mh load 0)	base		 0.3/0.1/0.2			0.29/0.11/0.18tst400a		 0.45/0.3/0.15	1M loop	 	2.0/1.89/0.14tst400a2	 0.6/0.4/0.2	1M loop inc	tst400bHuge 	 1.26/1.0/0.22	100k count (io)	4.1/3.90/0.27tst400cHuge	 1.3/1.0/0.28	100k insert(io)	4.5/4.11/0.47tst400d		 1.4/1.0/0.4	1M insert tuple 7.0/6.1/0.88tst400e		 1.1/0.9/0.2	1M incr fcn     3.8/3.63/0.21tst901a		 1.9/1.5/0.4	[+] call	7.1/6.19/0.91tst901b		 5.9/5.4/0.5	[+] call	19/17.8/1.64Alignment of the compilation flags illustrates that V5 is about 4 times faster on the tests(previous tests indicate 3.3-7 (17/9) and 3.9-5 (22/3)======================== 15 March 2003 ===================The compilation mode for both V5 and V4.3.7 is CFLAGS=-O2 command: time Mserver -c perf.conf TST </dev/null >/dev/null		orion(2x1.4Mh load 0)	base		 0.33/0.13/0.20			0.28/0.8/0.20tst400a		 0.49/0.29/0.19	1M loop	 	2.0/1.84/0.18tst400a2	 0.62/0.44/0.18	1M loop inc	tst400bHuge 	 1.47/1.2/0.24	100k count (io)	5.3/4.90/0.44tst400cHuge	 1.55/1.3/0.24	100k insert(io)	6.0/5.60/0.42tst400d		 1.4/1.1/0.3	1M insert tuple 7.3/6.6/0.68tst400e		 1.2/1.0/0.2	1M incr fcn     3.7/3.6/0.19tst901a		 1.9/1.5/0.4	[+] call	7.4/6.7/0.7tst901b		 5.5/5.0/0.5	[+] call	20/18.5/1.64======================== 13 April 2003 ===================The compilation mode for both V5 and V4.3.7 is CFLAGS=-O2 command: time Mserver -c perf.conf TST </dev/null >/dev/null		orion(2x1.4Mh load 0)	base		 0.33/0.13/0.20			0.28/0.8/0.20tst400a		 0.52/0.31/0.20	1M loop	 	2.0/1.84/0.18tst400a2	 0.65/0.44/0.19	1M loop inc	tst400bHuge 	 1.10/0.85/0.24	100k count (io)	5.3/4.90/0.44tst400cHuge	 1.32/1.08/0.24	100k insert(io)	6.0/5.60/0.42tst400d		 1.36/1.07/0.29	1M insert tuple 7.3/6.6/0.68tst400e		 1.32/1.12/0.2	1M incr fcn     3.7/3.6/0.19tst901a		 1.9/1.3/0.6	[+] call	7.4/6.7/0.7tst901b		 4.3/4.0/0.26	[+] call	20/18.5/1.64
12 3 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -