normsinv in SQL SERVER 2012

by Slava on 05/24/13
in SQL Server

This is a slightly modified function for SQL SERVER 2012. It's an imporant function, for example, for calculating safety deposit.

Code:

-- Returns the inverse of the standard normal cumulative distribution.
-- The distribution has a mean of zero and a standard deviation of one.
CREATE FUNCTION normsinv(@p FLOAT)
RETURNS FLOAT
AS
BEGIN
DECLARE @a1 FLOAT = -39.6968302866538
DECLARE @a2 FLOAT = 220.946098424521
DECLARE @a3 FLOAT = -275.928510446969
DECLARE @a4 FLOAT = 138.357751867269
DECLARE @a5 FLOAT = -30.6647980661472
DECLARE @a6 FLOAT = 2.50662827745924
DECLARE @b1 FLOAT = -54.4760987982241
DECLARE @b2 FLOAT = 161.585836858041
DECLARE @b3 FLOAT = -155.698979859887
DECLARE @b4 FLOAT = 66.8013118877197
DECLARE @b5 FLOAT = -13.2806815528857
DECLARE @c1 FLOAT = -0.00778489400243029
DECLARE @c2 FLOAT = -0.322396458041136
DECLARE @c3 FLOAT = -2.40075827716184
DECLARE @c4 FLOAT = -2.54973253934373
DECLARE @c5 FLOAT = 4.37466414146497
DECLARE @c6 FLOAT = 2.93816398269878
DECLARE @d1 FLOAT = 0.00778469570904146
DECLARE @d2 FLOAT = 0.32246712907004
DECLARE @d3 FLOAT = 2.445134137143
DECLARE @d4 FLOAT = 3.75440866190742
DECLARE @plow FLOAT = 0.02425
DECLARE @phigh FLOAT = 1-@plow
DECLARE @q FLOAT
DECLARE @r FLOAT
DECLARE @result FLOAT
IF (@p<@plow)
BEGIN
SET @q = Sqrt(-2 * LOG(@p))
SET @result=(((((@c1 * @q + @c2) * @q + @c3) * @q + @c4) * @q + @c5) * @q + @c6) / ((((@d1 * @q + @d2) * @q + @d3) * @q + @d4) * @q + 1)
END
ELSE
BEGIN
IF (@p<@phigh)
BEGIN
SET @q =@p - 0.5
SET @r = @q * @q
SET @result= (((((@a1 * @r + @a2) * @r + @a3) * @r + @a4) * @r + @a5) * @r + @a6) * @q / (((((@b1 * @r + @b2) * @r + @b3) * @r + @b4) * @r + @b5) * @r + 1)
END
ELSE
BEGIN
SET @q = SQRT(-2 * LOG(1 - @p))
SET @result= -(((((@c1 * @q + @c2) * @q + @c3) * @q + @c4) * @q + @c5) * @q + @c6) / ((((@d1 * @q + @d2) * @q + @d3) * @q + @d4) * @q + 1)
END
END
RETURN @result
END
GO
normsinv.sql

Passed working days coefficient

by Slava on 05/08/13
in Excel

I have been told a problem about calculating a coefficient for passed working days which must be automatically calculated by Excel. It equals to the number of passed working days divided on the number of all working days in the month. In Excel it can be done using only embedded formulas.

Firstly we are to make a list of all the "Holidays" in the year. Like that:

Code:

01.01.2013
02.01.2013
03.01.2013
04.01.2013
07.01.2013
08.01.2013
08.03.2013
01.05.2013
02.05.2013
03.05.2013
09.05.2013
10.05.2013
12.06.2013
04.11.2013

Then we name the range as holidays for simplicity.

The denominator will be

Code:

=NETWORKDAYS(DATE(YEAR(TODAY()),MONTH(TODAY()),1),EOMONTH(TODAY(),0),Holidays)
The numerator will be

Code:

=NETWORKDAYS(DATE(YEAR(TODAY()),MONTH(TODAY()),1),TODAY(),Holidays)

Then we can obtain the coefficient just by plain dividing operation.

This coefficient is used in forecast monitoring and KPI reporting.

Spatial distance calculation problem

In marketing we often use distance to the buyer as a parameter for data mining. There are plenty of ways and sources how to calculate it, including Google Maps, Yandex Maps, spatial databases etc. Here I present some solution to calculate it having only longitude and latitude of the place.

My data are:

These are the coordinates of New York and Washington. I want to calculate the distance between these cities as easy as possible and as accurate as possible.

At once let's install the earthdistance PostreSQL contrib module.

Code:

CREATE EXTENSION cube;
CREATE EXTENSION earthdistance;

Secondly, let's check the custom function of Stou S. from here.

Code:

-- Heaver Sine
CREATE OR REPLACE FUNCTION heaver_sin(FLOAT)
RETURNS FLOAT
AS
$$
SELECT power(sin($1/2.0), 2);
$$
LANGUAGE SQL;
 
-- Heaver Arcsine
CREATE OR REPLACE FUNCTION arc_heaver_sin(FLOAT)
RETURNS FLOAT
AS
$$
SELECT 2.0*asin(sqrt($1));
$$
LANGUAGE SQL;
 
-- Provides the Earth radius
CREATE OR REPLACE FUNCTION earth_radius(lat FLOAT)
RETURNS float
AS
$$
DECLARE rad_equatorial FLOAT := 6378.137;
rad_polar FLOAT := 6356.752;
BEGIN
RETURN sqrt(power(rad_equatorial, 2) * power(cos(lat), 2) + power(rad_polar, 2)* power(sin(lat), 2)/rad_equatorial*power(cos(lat), 2) + rad_polar*power(sin(lat), 2));
END;
$$
LANGUAGE plpgsql;
 
-- Give two sets of coordinates in degree form
CREATE OR REPLACE FUNCTION distance_in_km(lat1 float, lon1 float, lat2 float, lon2 float)
RETURNS float
AS
$$
DECLARE lat1_r FLOAT := radians(lat1);
lon1_r FLOAT := radians(lon1);
lat2_r FLOAT := radians(lat2);
lon2_r FLOAT := radians(lon2);
BEGIN RETURN earth_radius((lat1_r + lat2_r) / 2.0) * arc_heaver_sin(heaver_sin(abs(lat1_r-lat2_r)) + cos(lat1_r)*cos(lat2_r)*heaver_sin(abs(lon1_r - lon2_r)));
END;
$$
LANGUAGE plpgsql;

Thirdly, we can create our own function based of the very well-known expression.

Code:

CREATE OR REPLACE FUNCTION CalculateDistance(lat1 float8, long1 float8, lat2 float8, long2 float8)
RETURNS FLOAT
AS
$$
SELECT atan2((sqrt(power(cos(lat2 * pi() / 180) * sin(long2 * pi() / 180 - long1 * pi() / 180), 2) + power(cos(lat1 * pi() / 180) * sin(lat2 * pi() / 180) - sin(lat1 * pi() / 180) * cos(lat2 * pi() / 180) * cos(long2 * pi() / 180 - long1 * pi() / 180), 2))), (sin(lat1 * pi() / 180) * sin(lat2 * pi() / 180) + cos(lat1 * pi() / 180) * cos(lat2* pi() / 180) * cos(long2 * pi() / 180 - long1 * pi() / 180))) * 6372.795477598;
$$
LANGUAGE SQL;

Sorry if not comprehensive but I tried not to use PL/pgSQL. First you must notice is the different values of Earth radius which dramatically influences on the final result. But let's calculate the distance at last.

Code:

SELECT ROUND( ('(-74.00583, 40.71417)'::point'(-77.03611, 38.895)')::NUMERIC * 1.609344 ) AS "km (earthdistance extension)"
, distance_in_km(40.71417, -74.00583, 38.895, -77.03611) AS "km (distance_in_km custom function)"
, CalculateDistance(40.71417, -74.00583, 38.895, -77.03611) AS "km (CalculateDistance custom function)";

The results are: 328 (earthdistance module), 252.65 (distance_in_km function) and 328.57(CalculateDistance function). The second function looks like unreliable but you always can edit it.

Now I am moving to SQL SERVER and I want to use the embedded STDistance function for this task. The MSDN says it "returns the shortest distance between a point in a geometry instance and a point in another geometry instance".

Code:

DECLARE @g geography,@h geography
SET @g=geography::Point(-74.00583,40.71417,4326)
SET @h=geography::Point(-77.03611,38.895,4326)
SELECT @h.STDistance(@g)/1000 AS sqlserver_stdistance
GO

The result is 342, which is slightly higher than the other functions' yield.

Google Maps uses roads to calculate the distance and shows 368.3.

That's at no means the full list of possible function but I hope it helps.

Deleting files from a SQL SERVER database: 'the file cannot be removed because it is not empty'

by Slava on 04/25/13
in SQL Server

Sometimes you happen not to need a file in your database: additional file in tempdb or another one, it doesn't matter. And when you try to delete it SQL Server shows the message about the impossibility of such an operation. So you have to empty it first just by querying:

Code:

DBCC SHRINKFILE('name', EMPTYFILE)

After that you are free to delete it forever.

The local name of the file and other information about all the database's files running:

Code:

USE DBName
GO
EXEC sp_helpfile
GO

Change caller id on a2billing for each client

by Slava on 03/31/13
in A2billing

For each client in a2billing you change the caller id to anything by using the CID NUMBER option in VoIP settings.

This option overrides the existing caller id and you will make a call with anticipated new one.

Confidence Intervals simple calculation

by Slava on 03/28/13
in SPSS

Suppose, we have a task: calculate the confidence intervals (limits) for the calculated mean of number of consumed bottles of milk for 15 or 45 respondents. The first mean is 4.5, the first standard deviation is 1.5. The second mean is 5.6, the second standard deviation is 1.9.

The first case has a few observations so we can't use normal distribution values here. So we're using Student's t distribution instead.

The formulae are

Code:

Lower/Upper Boundary of 95% Confidence Intervals = mean +/- (t-value*std.error) = mean +/- (t-value*(std.dev./sqrt(n))) =
mean +/- (IDF.T(0.975, n-1)*(std.dev./sqrt(n)))
In SPSS we compute:

Code:

COMPUTE lower1=4.5-(1.5/sqrt(15))*IDF.T(0.975,15-1).
EXECUTE.
 
COMPUTE upper1=4.5+(1.5/sqrt(15))*IDF.T(0.975,15-1).
EXECUTE.

The second case has enough observations for us to try to use the normal distribution and corresponding z-scores.

The formulae will be

Code:

Lower/Upper Boundary of 95% Confidence Intervals = mean +/- (z-score*std.error) = mean +/- (z-score*(std.dev./sqrt(n))) =
mean +/- (IDF.NORMAL(0.975,0,1)*(std.dev./sqrt(n)))
In SPSS we compute:

Code:

COMPUTE upper2=5.6+(1.9/sqrt(45))*IDF.NORMAL(0.975,0,1).
EXECUTE.
 
COMPUTE lower2=5.6-(1.9/sqrt(45))*IDF.NORMAL(0.975,0,1).
EXECUTE.

The results are 3.67-5.33 and 5.04-6.16. Everybody can infer that Student's t distribution isn't a great option but SPSS uses it in most analyses as a default.

The necessary minimum sample size for specific margin of error and standard deviation

by Slava on 03/25/13
in SPSS

Calculating confidence limits, we sometimes want to calculate the minimum sample size (n) to achieve the specific margin of error (m), standard deviation (std.dev.) and level of confidence, denoted as a z-score (z). So it can help us define the necessary number of observations to gather to make a report statistically appropriate.

The expression we are to use is:

Code:

n = ((z*std.dev.)/m)^2

Let's consider an example. We have statistics for the price for the new hi-tech wax which our respondents (car owners) might pay for. The number of observations isn't sufficient for us as we found out the crappy results of analysis. We want to know how many new observations we have to gather to fit our confidence limits. The standard deviation is $3.5, m=$0.5, z=1.96 (95% confidence).

Calculating the appropriate n:

Code:

COMPUTE n=((1.96*3.5)/0.5) ** 2.
EXECUTE.

We have got 188.24 as n so we need 189 respondents' answers.

Calculating Probability without Standard Normal Distribution Table

by Slava on 03/22/13
in SPSS

The standard normal distribution table helps us answer the following questions:

  1. Is there a difference in the means of population and a custom sample?
  2. What is the probability of an event with custom parameters?

Looking up the z-score values in the table is boring so I offer to calculate the p-value on the fly.

Examples will be:

  1. Suppose that young people eat 2340 calories per day on the average basis with 250 calories as the standard deviation. Somebody conducted an analysis on 45 youngsters and obtained 2450 calories per day. Does the researches have different means or the difference is just a coincidence?
  2. The average age of marketing managers is 35.2 with the standard deviation 7. What's the probability of hiring а 24 years old or younger marketing manager?

Solutions:

  1. To answer the question let's test the null hypothesis so the mean equals 2340. The alternative hypothesis will be two-tailed - the mean does not equals 2340, we don't care whether it is higher or lower. Let's compute the z-score.

    Code:

    COMPUTE zscore = (2450-2340)/(250/SQRT(45)).
    EXECUTE.
    We divided the standard deviation by square root of 45 because we are checking the probability of the mean for 45 observations and hence we got the z-score = 2.95. Therefore let's calculate the probability or p-value. As we have the two-tailed hypothesis we have to multiply the p-value by 2 so we can get p-values for p>2.95 and p<-2.95 (keep in mind our testing value is more than the mean and we need the area above the test statistic).

    Code:

    COMPUTE prob=2*(1-CDF.NORMAL(zscore,0,1)).
    EXECUTE.
    We've got 0.0031 or 0.31% of probability of the sample group of 45 youngsters to eat 2450 calories per day if the mean equals 2340 calories. So we are rejecting the null hypothesis and conclude that the average calories consumption for our sample is definitely higher that 2340.
  2. Here we have the one-tailed alternative hypothesis: the average age must be below 35.2. Actually we want to get the probability of obtaining 24 for only one observation. Let's calculate the z-score and the p-value. Since 24 is less than the mean (imagine a chart of standard normal curve) we have to take the area below the z-score and shouldn't subtract the calculated p-value from 1.

    Code:

    COMPUTE zage = (24-35.2)/7.
    EXECUTE.
     
    COMPUTE probage = CDF.NORMAL(zage,0,1).
    EXECUTE.
    We have got -1.6 as the z-score and 5.47% as the probability. So hiring a 24 years old or younger marketing manager is on the verge of the reality but we can't accept the alternative hypothesis and say it's unreal with the mean of 35.2.

So with the help of CDF.NORMAL function we don't need any tables any more though graphic interpretation would be helpful.

Detecting outliers in SPSS: using z-scores for the task

by Slava on 03/21/13
in SPSS

Detecting outliers is one of the must-do phases of the data preparation in SPSS. Without doing it one risks to obtain wrong incomprehensive results for interpretation. There are some effective technique for this task: manual frequency tables observation or boxplots. But you can use z-scores (standardized values) for this. Z-scores are transformed observations of variables so there is a mean that equals 0 and standard deviation that equals 1.

To obtain it one has to subtract the mean from an observation and divide the result on the standard deviation. Or use menu Descriptives to save the z-scores as a new variable. Therefore one needs to calculate how many absolute values of the z-scores fall within specific limits according to normal distribution parameters:

  • 1.96 (5% of all observations)
  • 2.58 (1% of all observations)
  • 3.29 (0% of all observations)

The SPSS syntax for quick calculation is following. The test file age.sav can be downloaded below.

Create a syntax file:

Code:

DESCRIPTIVES   VARIABLES  age/SAVE.
 
COMPUTE AbsOutlier = abs(Zage).
 
EXECUTE.
 
RECODE AbsOutlier (3.29 THRU HIGHEST = 1)
(2.58 THRU HIGHEST   = 2) (1.96 THRU HIGHEST   = 3)
(LOWEST THRU 1.96 = 4).
 
VALUE LABELS AbsOutlier
1 'Greater than 3.29'
2 'Greater than 2.58'
3 'Greater than 1.96'
4 'Less than 1.96'.
 
FREQUENCIES   VARIABLES = AbsOutlier
/ORDER=ANALYSIS.

Run it and the output will be like that.

Now there is a nice supportive table for outliers detecting.

age.sav

A Caller behind a NAT can't hear anything

by Slava on 03/14/13
in A2billing

This is by no means a confusing situation when a caller who is sitting behind a NAT can't hear the voice. Almost everyone who has ever touched Asterisk encountered such a situation. There are many possible problem's reasons but the most common one is a no-NAT tuned server.

The solution usually is simple as: in sip.conf (or another substantive file or GUI menu if used) specify

Code:

nat=yes
canreinvite=no
qualify=yes

Thus you must do it on a global level, not for a context.

1 2 3 4 5 6 7 8 Next Page >>