📄 psymodel.c

📁 音频编码
💻 C
📖 第 1 页 / 共 4 页
字号:
12 3 4 下一页
/* *	psymodel.c * *	Copyright (c) 1999-2000 Mark Taylor *	Copyright (c) 2001-2002 Naoki Shibata *	Copyright (c) 2000-2003 Takehiro Tominaga *	Copyright (c) 2000-2005 Robert Hegemann *	Copyright (c) 2000-2005 Gabriel Bouvigne *	Copyright (c) 2000-2005 Alexander Leidinger * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Library General Public * License as published by the Free Software Foundation; either * version 2 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the GNU * Library General Public License for more details. * * You should have received a copy of the GNU Library General Public * License along with this library; if not, write to the * Free Software Foundation, Inc., 59 Temple Place - Suite 330, * Boston, MA 02111-1307, USA. *//* $Id: psymodel.c,v 1.142.2.1 2005/11/20 14:08:25 bouvigne Exp $ *//*PSYCHO ACOUSTICSThis routine computes the psycho acoustics, delayed by one granule.  Input: buffer of PCM data (1024 samples).  This window should be centered over the 576 sample granule window.The routine will compute the psycho acoustics forthis granule, but return the psycho acoustics computedfor the *previous* granule.  This is because the blocktype of the previous granule can only be determinedafter we have computed the psycho acoustics for the followinggranule.  Output:  maskings and energies for each scalefactor band.block type, PE, and some correlation measures.  The PE is used by CBR modes to determine if extra bitsfrom the bit reservoir should be used.  The correlationmeasures are used to determine mid/side or regular stereo.*//*Notation:barks:  a non-linear frequency scale.  Mapping from frequency to        barks is given by freq2bark()scalefactor bands: The spectrum (frequencies) are broken into                    SBMAX "scalefactor bands".  Thes bands                   are determined by the MPEG ISO spec.  In                   the noise shaping/quantization code, we allocate                   bits among the partition bands to achieve the                   best possible qualitypartition bands:   The spectrum is also broken into about                   64 "partition bands".  Each partition                    band is about .34 barks wide.  There are about 2-5                   partition bands for each scalefactor band.LAME computes all psycho acoustic information for each partitionband.  Then at the end of the computations, this informationis mapped to scalefactor bands.  The energy in each scalefactorband is taken as the sum of the energy in all partition bandswhich overlap the scalefactor band.  The maskings can be computedin the same way (and thus represent the average masking in that band)or by taking the minmum value multiplied by the number ofpartition bands used (which represents a minimum masking in that band).*//*The general outline is as follows:1. compute the energy in each partition band2. compute the tonality in each partition band3. compute the strength of each partion band "masker"4. compute the masking (via the spreading function applied to each masker)5. Modifications for mid/side masking.  Each partition band is considiered a "masker".  The strengthof the i'th masker in band j is given by:    s3(bark(i)-bark(j))*strength(i)The strength of the masker is a function of the energy and tonality.The more tonal, the less masking.  LAME uses a simple linear formula(controlled by NMT and TMN) which says the strength is given by theenergy divided by a linear function of the tonality.*//*s3() is the "spreading function".  It is given by a formuladetermined via listening tests.  The total masking in the j'th partition band is the sum overall maskings i.  It is thus given by the convolution ofthe strength with s3(), the "spreading function."masking(j) = sum_over_i  s3(i-j)*strength(i)  = s3 o strengthwhere "o" = convolution operator.  s3 is given by a formula determinedvia listening tests.  It is normalized so that s3 o 1 = 1.Note: instead of a simple convolution, LAME also has theoption of using "additive masking"The most critical part is step 2, computing the tonality of eachpartition band.  LAME has two tonality estimators.  The firstis based on the ISO spec, and measures how predictiable thesignal is over time.  The more predictable, the more tonal.The second measure is based on looking at the spectrum ofa single granule.  The more peaky the spectrum, the moretonal.  By most indications, the latter approach is better.Finally, in step 5, the maskings for the mid and sidechannel are possibly increased.  Under certain circumstances,noise in the mid & side channels is assumed to alsobe masked by strong maskers in the L or R channels.Other data computed by the psy-model:ms_ratio        side-channel / mid-channel masking ratio (for previous granule)ms_ratio_next   side-channel / mid-channel masking ratio for this granulepercep_entropy[2]     L and R values (prev granule) of PE - A measure of how                       much pre-echo is in the previous granulepercep_entropy_MS[2]  mid and side channel values (prev granule) of percep_entropyenergy[4]             L,R,M,S energy in each channel, prev granuleblocktype_d[2]        block type to use for previous granule*/#ifdef HAVE_CONFIG_H# include <config.h>#endif#include "util.h"#include "encoder.h"#include "psymodel.h"#include "l3side.h"#include <assert.h>#include "tables.h"#include "fft.h"#include "machine.h"#ifdef WITH_DMALLOC#include <dmalloc.h>#endif#define NSFIRLEN 21#ifdef M_LN10#define		LN_TO_LOG10		(M_LN10/10)#else#define         LN_TO_LOG10             0.2302585093#endif#ifdef NON_LINEAR_PSYstatic const float non_linear_psy_constant = .3;#define NON_LINEAR_SCALE_ITEM(x)   pow((x), non_linear_psy_constant)#define NON_LINEAR_SCALE_SUM(x)    pow((x), 1/non_linear_psy_constant)#if 0#define NON_LINEAR_SCALE_ENERGY(x) pow(10, (x)/10)#else#define NON_LINEAR_SCALE_ENERGY(x) (x)#endif#else#define NON_LINEAR_SCALE_ITEM(x)   (x)#define NON_LINEAR_SCALE_SUM(x)    (x)#define NON_LINEAR_SCALE_ENERGY(x) (x)#endif/*   L3psycho_anal.  Compute psycho acoustics.   Data returned to the calling program must be delayed by one    granule.    This is done in two places.     If we do not need to know the blocktype, the copying   can be done here at the top of the program: we copy the data for   the last granule (computed during the last call) before it is   overwritten with the new data.  It looks like this:     0. static psymodel_data    1. calling_program_data = psymodel_data   2. compute psymodel_data       For data which needs to know the blocktype, the copying must be   done at the end of this loop, and the old values must be saved:      0. static psymodel_data_old    1. compute psymodel_data   2. compute possible block type of this granule   3. compute final block type of previous granule based on #2.   4. calling_program_data = psymodel_data_old   5. psymodel_data_old = psymodel_data*//* psycho_loudness_approx   jd - 2001 mar 12in:  energy   - BLKSIZE/2 elements of frequency magnitudes ^ 2     gfp      - uses out_samplerate, ATHtype (also needed for ATHformula)returns: loudness^2 approximation, a positive value roughly tuned for a value         of 1.0 for signals near clipping.notes:   When calibrated, feeding this function binary white noise at sample         values +32767 or -32768 should return values that approach 3.         ATHformula is used to approximate an equal loudness curve.future:  Data indicates that the shape of the equal loudness curve varies         with intensity.  This function might be improved by using an equal         loudness curve shaped for typical playback levels (instead of the         ATH, that is shaped for the threshold).  A flexible realization might         simply bend the existing ATH curve to achieve the desired shape.         However, the potential gain may not be enough to justify an effort.*/static FLOATpsycho_loudness_approx( FLOAT *energy, lame_internal_flags *gfc ){    int i;    FLOAT loudness_power;    loudness_power = 0.0;    /* apply weights to power in freq. bands*/    for( i = 0; i < BLKSIZE/2; ++i )	loudness_power += energy[i] * gfc->ATH->eql_w[i];    loudness_power *= VO_SCALE;    return loudness_power;}static voidcompute_ffts(    lame_global_flags *gfp,    FLOAT fftenergy[HBLKSIZE],    FLOAT (*fftenergy_s)[HBLKSIZE_s],    FLOAT (*wsamp_l)[BLKSIZE],    FLOAT (*wsamp_s)[3][BLKSIZE_s],    int gr_out,    int chn,    const sample_t *buffer[2]    ){    int b, j;    lame_internal_flags *gfc=gfp->internal_flags;    if (chn<2) {	fft_long ( gfc, *wsamp_l, chn, buffer);	fft_short( gfc, *wsamp_s, chn, buffer);    }    /* FFT data for mid and side channel is derived from L & R */    else if (chn == 2) {	for (j = BLKSIZE-1; j >=0 ; --j) {	    FLOAT l = wsamp_l[0][j];	    FLOAT r = wsamp_l[1][j];	    wsamp_l[0][j] = (l+r)*(FLOAT)(SQRT2*0.5);	    wsamp_l[1][j] = (l-r)*(FLOAT)(SQRT2*0.5);	}	for (b = 2; b >= 0; --b) {	    for (j = BLKSIZE_s-1; j >= 0 ; --j) {		FLOAT l = wsamp_s[0][b][j];		FLOAT r = wsamp_s[1][b][j];		wsamp_s[0][b][j] = (l+r)*(FLOAT)(SQRT2*0.5);		wsamp_s[1][b][j] = (l-r)*(FLOAT)(SQRT2*0.5);	    }	}    }	    /*********************************************************************     *  compute energies     *********************************************************************/    fftenergy[0]  = NON_LINEAR_SCALE_ENERGY(wsamp_l[0][0]);    fftenergy[0] *= fftenergy[0];    for (j=BLKSIZE/2-1; j >= 0; --j) {	FLOAT re = (*wsamp_l)[BLKSIZE/2-j];	FLOAT im = (*wsamp_l)[BLKSIZE/2+j];	fftenergy[BLKSIZE/2-j] = NON_LINEAR_SCALE_ENERGY((re * re + im * im) * 0.5f);    }    for (b = 2; b >= 0; --b) {	fftenergy_s[b][0]  = (*wsamp_s)[b][0];	fftenergy_s[b][0] *=  fftenergy_s [b][0];	for (j=BLKSIZE_s/2-1; j >= 0; --j) {	    FLOAT re = (*wsamp_s)[b][BLKSIZE_s/2-j];	    FLOAT im = (*wsamp_s)[b][BLKSIZE_s/2+j];	    fftenergy_s[b][BLKSIZE_s/2-j] = NON_LINEAR_SCALE_ENERGY((re * re + im * im) * 0.5f);	}    }    /* total energy */    {FLOAT totalenergy=0.0;    for (j=11;j < HBLKSIZE; j++)	totalenergy += fftenergy[j];    gfc->tot_ener[chn] = totalenergy;    }#if defined(HAVE_GTK)    if (gfp->analysis) {	for (j=0; j<HBLKSIZE ; j++) {	    gfc->pinfo->energy[gr_out][chn][j]=gfc->energy_save[chn][j];	    gfc->energy_save[chn][j]=fftenergy[j];	} 	gfc->pinfo->pe[gr_out][chn]=gfc->pe[chn];    }#endif    /*********************************************************************     * compute loudness approximation (used for ATH auto-level adjustment)      *********************************************************************/    if (gfp->athaa_loudapprox == 2 && chn < 2) {/*no loudness for mid/side ch*/	gfc->loudness_sq[gr_out][chn] = gfc->loudness_sq_save[chn];	gfc->loudness_sq_save[chn]	    = psycho_loudness_approx(fftenergy, gfc);    }}/***************************************************************  * compute interchannel masking effects ***************************************************************/static voidcalc_interchannel_masking(    lame_global_flags * gfp,    FLOAT ratio    ){    lame_internal_flags *gfc=gfp->internal_flags;    int sb, sblock;    FLOAT l, r;    if (gfc->channels_out > 1){        for ( sb = 0; sb < SBMAX_l; sb++ ) {	        l = gfc->thm[0].l[sb];	        r = gfc->thm[1].l[sb];	        gfc->thm[0].l[sb] += r*ratio;	        gfc->thm[1].l[sb] += l*ratio;        }        for ( sb = 0; sb < SBMAX_s; sb++ ) {	        for ( sblock = 0; sblock < 3; sblock++ ) {	            l = gfc->thm[0].s[sb][sblock];	            r = gfc->thm[1].s[sb][sblock];	            gfc->thm[0].s[sb][sblock] += r*ratio;	            gfc->thm[1].s[sb][sblock] += l*ratio;	        }        }    }}/***************************************************************  * compute M/S thresholds from Johnston & Ferreira 1992 ICASSP paper ***************************************************************/static voidmsfix1(    lame_internal_flags *gfc    ){    int sb, sblock;    FLOAT rside,rmid,mld;    for ( sb = 0; sb < SBMAX_l; sb++ ) {	/* use this fix if L & R masking differs by 2db or less */	/* if db = 10*log10(x2/x1) < 2 */	/* if (x2 < 1.58*x1) { */	if (gfc->thm[0].l[sb] > 1.58*gfc->thm[1].l[sb]	 || gfc->thm[1].l[sb] > 1.58*gfc->thm[0].l[sb])	    continue;	mld = gfc->mld_l[sb]*gfc->en[3].l[sb];	rmid = Max(gfc->thm[2].l[sb], Min(gfc->thm[3].l[sb],mld));	mld = gfc->mld_l[sb]*gfc->en[2].l[sb];	rside = Max(gfc->thm[3].l[sb], Min(gfc->thm[2].l[sb],mld));	gfc->thm[2].l[sb]=rmid;	gfc->thm[3].l[sb]=rside;    }    for ( sb = 0; sb < SBMAX_s; sb++ ) {	for ( sblock = 0; sblock < 3; sblock++ ) {	    if (gfc->thm[0].s[sb][sblock] > 1.58*gfc->thm[1].s[sb][sblock]	     || gfc->thm[1].s[sb][sblock] > 1.58*gfc->thm[0].s[sb][sblock])		continue;	    mld = gfc->mld_s[sb]*gfc->en[3].s[sb][sblock];	    rmid = Max(gfc->thm[2].s[sb][sblock],		       Min(gfc->thm[3].s[sb][sblock],mld));	    mld = gfc->mld_s[sb]*gfc->en[2].s[sb][sblock];	    rside = Max(gfc->thm[3].s[sb][sblock],			Min(gfc->thm[2].s[sb][sblock],mld));	    gfc->thm[2].s[sb][sblock]=rmid;	    gfc->thm[3].s[sb][sblock]=rside;	}    }}/***************************************************************  * Adjust M/S maskings if user set "msfix" ***************************************************************//* Naoki Shibata 2000 */static voidns_msfix(    lame_internal_flags *gfc,    FLOAT msfix,    FLOAT athadjust    ){    int sb, sblock;    FLOAT msfix2 = msfix;    FLOAT athlower = pow(10, athadjust);    msfix *= 2.0;    msfix2 *= 2.0;    for ( sb = 0; sb < SBMAX_l; sb++ ) {	FLOAT thmLR,thmM,thmS,ath;	ath  = (gfc->ATH->cb[gfc->bm_l[sb]])*athlower;	thmLR = Min(Max(gfc->thm[0].l[sb],ath), Max(gfc->thm[1].l[sb],ath));	thmM = Max(gfc->thm[2].l[sb],ath);	thmS = Max(gfc->thm[3].l[sb],ath);	if (thmLR*msfix < thmM+thmS) {	    FLOAT f = thmLR * msfix2 / (thmM+thmS);	    thmM *= f;	    thmS *= f;	}	gfc->thm[2].l[sb] = Min(thmM,gfc->thm[2].l[sb]);	gfc->thm[3].l[sb] = Min(thmS,gfc->thm[3].l[sb]);    }    athlower *= ((FLOAT)BLKSIZE_s / BLKSIZE);    for ( sb = 0; sb < SBMAX_s; sb++ ) {	for ( sblock = 0; sblock < 3; sblock++ ) {	    FLOAT thmLR,thmM,thmS,ath;	    ath  = (gfc->ATH->cb[gfc->bm_s[sb]])*athlower;	    thmLR = Min(Max(gfc->thm[0].s[sb][sblock],ath),			Max(gfc->thm[1].s[sb][sblock],ath));	    thmM = Max(gfc->thm[2].s[sb][sblock],ath);	    thmS = Max(gfc->thm[3].s[sb][sblock],ath);	    if (thmLR*msfix < thmM+thmS) {		FLOAT f = thmLR*msfix / (thmM+thmS);		thmM *= f;		thmS *= f;	    }	    gfc->thm[2].s[sb][sblock] = Min(gfc->thm[2].s[sb][sblock],thmM);	    gfc->thm[3].s[sb][sblock] = Min(gfc->thm[3].s[sb][sblock],thmS);	}    }}/* longblock threshold calculation (part 2) */static void convert_partition2scalefac_l(    lame_internal_flags *gfc,    FLOAT *eb,    FLOAT *thr,    int chn    ){    FLOAT enn, thmm;    int sb, b;    enn = thmm = 0.0;    sb = b = 0;    for (;;) {	while (b < gfc->bo_l[sb]) {	    enn  += eb[b];	    thmm += thr[b];	    b++;	}	if (sb == SBMAX_l - 1)	    break;    assert( enn >= 0 );    assert( thmm >= 0 );	gfc->en [chn].l[sb] = enn  + 0.5 * eb [b];	gfc->thm[chn].l[sb] = thmm + 0.5 * thr[b];	enn  = 0.5 *  eb[b];	thmm = 0.5 * thr[b];    assert( enn >= 0 );    assert( thmm >= 0 );	b++;	sb++;    }    gfc->en [chn].l[SBMAX_l-1] = enn;    gfc->thm[chn].l[SBMAX_l-1] = thmm;}static voidcompute_masking_s(    lame_internal_flags *gfc,    FLOAT (*fftenergy_s)[HBLKSIZE_s],    FLOAT *eb,    FLOAT *thr,    int chn,    int sblock,    FLOAT athlower    ){    int j, b;    athlower *= ((FLOAT)BLKSIZE_s / BLKSIZE);    for (j = b = 0; b < gfc->npart_s; b++) {	FLOAT ecb = fftenergy_s[sblock][j++];	int kk = gfc->numlines_s[b];
12 3 4 下一页
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -