Variation range of option. Variational and statistical distribution series. How to interpret the value of the Wilkexon criterion

The set of values \u200b\u200bof the value studied in this experiment or observation of the parameter forwarded in magnitude (ascending or descending) is called a variational number.

Suppose we measured blood pressure in ten patients in order to obtain the upper threshold of blood pressure: systolic pressure, i.e. Only one number.

Imagine that a series of observations (statistical aggregate) of arterial systolic pressure in 10 observations is as follows (Table 1):

Table 1

The components of the variation number are called options. Options are a numeric value of the studied sign.

Building from a statistical aggregate observation of the variational series - only the first step to the understanding of the characteristics of the whole population. Next, it is necessary to determine the average level of the resulting quantitative feature (the average level of blood protein, the average weight of the patients, the average time of the occurrence of anesthesia, etc.)

The average level is measured using criteria that are called average values. The average value is a generalizing numerical characteristic of qualitatively homogeneous values, which characterizes in one number of the entire statistical set on one basis. The average value is expressed in general, which is characteristic of a sign in this set of observations.

Three types of average values \u200b\u200bare commonly used: fashion (), median () and medium-tariff value ().

To determine any average, it is necessary to use the results of individual observations, writing them in the form of a variational series (Table 2).

Fashion - The value most common in a series of observations. In our example of fashion \u003d 120. If there are no repetitive values \u200b\u200bin the variation series, they say that there is no mode. If multiple values \u200b\u200bare repeated the same number of times, then the smallest of them take as fashion.

Median - The value dividing the distribution into two equal parts, the central or median value of a series of observations, ordered by ascending or descending. So, if in the variational series of 5 of the values, then its median is equal to a third member of the variational series, if in a row an even number of members, then the median is the arithmetic average of its two central observations, i.e. If there are 10 observations, the median is equal to the average arithmetic 5 and 6 observations. In our example.

We note an important feature of fashion and medians: the numerical values \u200b\u200bof the extreme option do not affect their values.

Middle arithmetic value Calculated by the formula:

where - the observed value of the observation, and the number of observations. For our case.

The average arithmetic value has three properties:

The average occupies a middle position in the variational series. In a strictly symmetrical row.

The average is a generalizing magnitude and for the average are not visible by random fluctuations, differences in individual data. It reflects that typical, which is typical for the whole totality.

The amount of deviations of all the option from the average is zero :. Deviation option from medium is indicated.

The variation series consists of an option and the corresponding frequencies. Of the ten values \u200b\u200bof the digit 120 met 6 times, 115 - 3 times, 125 - 1 time. Frequency () is the absolute number of individual option in the aggregate, indicating how many times this option is found in the variation series.

The variational series may be simple (frequency \u003d 1) or grouped shortened, 3-5 options. A simple range is used with a small number of observations (), grouped - with a large number of observations ().

A special place in statistical analysis belongs to the definition of the mean level of the studied sign or phenomenon. The average characteristic level is measured by average values.

The average value characterizes the overall quantitative level of the trait under study and is a group property of a statistical aggregate. It levels, weakens the random deviations of individual observations in one way or another and highlights the main, typical property of the studied sign.

Average variables are widely used:

1. To assess the state of health of the population: characteristics of physical development (growth, weight, chest circumference, etc.), detecting the prevalence and duration of various diseases, analysis of demographic indicators (natural movement of the population, the average duration of the upcoming life, population reproduction, average population and etc.).

2. To study the activities of medical and prophylactic institutions, medical personnel and assess the quality of their work, planning and determining the needs of the population in various types of medical care (the average number of appeals or visits per resident per year, the average duration of the patient's stay in the hospital, the average duration of the survey The patient, the average security of doctors, bows, etc.).

3. To characterize the sanitary and epidemiological condition (the average dusting of air in the workshop, the average area per person, the average norms of the consumption of proteins, fats and carbohydrates, etc.).

4. To determine the medical and physiological indicators, normally and pathology, during the processing of laboratory data, to establish the reliability of the results of the sample study in socio-hygienic, clinical, experimental studies.

The calculation of average values \u200b\u200bis based on variational series. Variational series - This is a homogeneous statistical set in a qualitative relation, some units of which characterize the quantitative differences in the studied attribute or phenomenon.

Quantitative variation can be two types: a terminated (discrete) and continuous.

A discontinuous (discrete) feature is expressed only by an integer and cannot have any intermediate values \u200b\u200b(for example, the number of visits, the population population, the number of children in the family, the severity of the disease in points, etc.).

A continuous sign can take any values \u200b\u200bwithin certain limits, including fractional, and is expressed only approximately (for example, weight - for adults can be limited to kilograms, and for newborns - grams; growth, blood pressure, time spent on the patient's reception, and etc.).



The digital value of each individual feature or phenomenon included in the variation range is called the option and is indicated by the letter V. . In mathematical literature there are other designations, for example x. or y.

A variation range, where each option is specified once, is called simple. Such rows are used in most statistical tasks in the case of computer data processing.

With an increase in the number of observations, as a rule, there are repetitive values \u200b\u200boption. In this case is created grouped variationarieswhere the number of repetitions is indicated (the frequency is indicated by the letter " r »).

Rainted variationaries It consists of an option arranged in ascending order or descending. Both simple and grouped rows can be compiled with ranking.

Interval variational series Make up to simplify subsequent calculations performed without the use of a computer, with a very large number of observation units (more than 1000).

Continuous variational series Includes the values \u200b\u200bof the option that can be expressed by any values.

If in the variational series the values \u200b\u200bof the attribute (options) are given in the form of individual specific numbers, then such a number called discrete.

Common characteristics The signs reflected in the variational series are average values. Among them are the most used: the average arithmetic value M,fashion MOand Mediana Me.Each of these characteristics is originally. They cannot replace each other and only in the aggregate quite fully and in a compressed form are the features of the variational series.

Modoy (MO) call the value of the most common options.

Median (Me) - This is the value of the options dividing the ranked variationaries in half (on each side of the median is half the option). In rare cases, when there is a symmetrical variational series, a mod and median are equal to each other and coincide with the value of the average arithmetic.

The most typical characteristic of the values \u200b\u200bis the option is middle arithmetic quantity ( M. ). In mathematical literature, it is indicated .

Middle arithmetic value (M, ) - This is the total quantitative characteristic of a certain sign of studied phenomena constituting a qualitatively homogeneous statistical aggregate. Distinguish between the average arithmetic simple and weighted. The average arithmetic is simple is calculated for a simple variational series by summing up all the variant and dividing this amount for the total number of options included in this variation range. Calculations are carried out by the formula:

where: M. - average arithmetic simple;

Σ V. - amount option;

n. - The number of observations.

In a grouped variational series, a weighted average arithmetic is determined. The formula of its calculation:

where: M. - average arithmetic weighted;

Σ VP. - amount of products option at their frequency;

n. - The number of observations.

With a large number of observations in the case of manual calculations, the method of moments can be applied.

The average arithmetic has the following properties:

· Amount of deviation option from the average ( Σ d. ) equal to zero (see Table 15);

· Upon multiplying (division) of all the option on the same factor (divider), the average arithmetic is multiplied (divided) to the same factor (divider);

· If you add (subtract) to all variants, the same number, the average arithmetic increases (decreases) to the same number.

The average arithmetic values, taken by themselves, without taking into account the variability of the series, of which they are calculated, may not fully reflect the properties of the variational series, especially when comparison with other mediums is needed. Correct medium can be obtained from a row with different degrees of scattering. The closer to each other some options in their quantitative characteristic, the less scattering (variability, variability) A number, the more typical of its average.

The main parameters that allow evaluating the variability of the feature are:

· Scope;

· Amplitude;

· Average quadratic deviation;

· The coefficient of variation.

Approximately about the sections of the sign can be judged by the scope and amplitude of the variational series. Scope indicates the maximum (V max) and the minimum (V min) options in the row. Amplitude (a m) is the difference of these option: a m \u003d v max - v min.

The main, generally accepted measure of the variation range variations are dispersion (D. ). But the most commonly used more convenient parameter, calculated based on the dispersion - the average quadratic deviation ( σ ). It takes into account the magnitude of the deviation ( d. ) each variants of the variation range from its middle arithmetic ( d \u003d V - M ).

Since deviations option from the average can be positive and negative, then when summing, they give the value "0" (s d \u003d 0.). To avoid this, the deviation values \u200b\u200b( d.) Early in the second degree and are averaged. Thus, the dispersion of the variation series is an average square of deviations option from the middle arithmetic and is calculated by the formula:

It is the most important characteristic of variability and is used to calculate many statistical criteria.

Since the dispersion is expressed by the square of deviations, its value cannot be used in comparison with the average arithmetic. For these purposes applies average quadratic deviationwhich is indicated by the sign "Sigma" ( σ ). It characterizes the average deviation of all variation variation from the middle arithmetic value in the same units as the middle value itself, so they can be used together.

The average quadratic deviation is determined by the formula:

This formula is applied with the number of observations ( n. ) more than 30. With a smaller number n. The average quadratic deviation value will have an error associated with mathematical displacement ( n. - one). In this regard, a more accurate result can be obtained by taking into account such a displacement in the formula for calculating the standard deviation:

standard deviation (s. ) - this is an assessment of the riconductic deviation of a random variable H. Regarding its mathematical expectation based on an unbelievable estimate of its dispersion.

At values N. \u003e 30 Average quadratic deviation ( σ ) and standard deviation ( s. ) will be the same ( Σ \u003d S. ). Therefore, in most practical benefits, these criteria are considered varied. In the Excel program, the standard deviation calculation can be performed by function \u003d Standotclone (range). And in order to calculate the average quadratic deviation, it is required to create an appropriate formula.

The average quadratic or standard deviation allows you to determine how important the character values \u200b\u200bmay differ from the average value. Suppose there are two cities with the same average daytime temperature in the summer. One of these cities is located on the coast, and the other on the continent. It is known that in the cities located on the coast, the differences in daytime temperatures are smaller than the cities located inside the continent. Therefore, the average quadratic deviation of daytime temperatures at the coastal city will be less than the second city. In practice, this means that the average air temperature of each particular day in the city located on the continent will differ harder from the average value than in the city on the coast. In addition, the standard deviation allows you to estimate possible temperature deviations from the average with the required level of probability.

According to the theory of probability, in phenomena submitted to the normal distribution law, between the values \u200b\u200bof the average arithmetic, average quadratic deviation and the options there is a strict dependence ( rule three sigm). For example, 68.3% of the values \u200b\u200bof the variation feature are within m ± 1 σ , 95.5% - within m ± 2 σ and 99.7% - within m ± 3 σ .

The magnitude of the average quadratic deviation allows to judge the nature of the homogeneity of the variational series and the studied group. If the magnitude of the average quadratic deviation is small, this indicates a sufficiently high uniformity of the phenomenon under study. The average arithmetic in this case should be recognized as quite characteristic of this variational series. However, too small sigma makes thinking about the artificial selection of observations. With a very large sigma, the average arithmetic to a lesser extent characterizes the variational series, which indicates a significant variability of the studied character or phenomenon or the heterogeneity of the group under study. However, the comparison of the magnitude of the average quadratic deviation is possible only for signs of the same dimension. Indeed, if you compare the variety of weights of newborn children and adults, we always get higher sigma values \u200b\u200bin adults.

Comparison of the variability of signs of different dimensions can be performed using coefficient variation. It expresses a variety as a percentage of the average value, which allows the comparison of various signs. The coefficient of variation in the medical literature is indicated by the sign " FROM ", And in mathematical" v."And calculated by the formula:

The values \u200b\u200bof the variation coefficient of less than 10% indicate a small scattering, from 10 to 20% - about an average, more than 20% - about strong scattering the option around the middle arithmetic.

The average arithmetic value is usually calculated based on the selective set of data. With repeated studies, under the influence of random phenomena, the average arithmetic can change. This is due to the fact that it is examined, as a rule, only part of possible units of observation, that is, a selective aggregate. Information about all possible units representing the studied phenomenon can be obtained when studying the entire general population, which is not always possible. At the same time, with the aim of generalizing the experimental data, the value of the average in the general population is of interest. Therefore, for the formulation of the general conclusion about the studied phenomenon, the results obtained on the basis of the selective aggregate must be transferred to the general set of statistical methods.

To determine the degree of coincidence of the sample study and the general population, it is necessary to assess the magnitude of the error, which inevitably occurs when selective observation. This error is called " Representative error"Or" middle arithmetic error ". It is actually a difference between the average obtained in the sample statistical observation, and similar values \u200b\u200bthat would be obtained with a continuous study of the same object, i.e. When studying the general population. Since the selective average is a random value, such a forecast is performed with an acceptable probability to researcher. In medical studies, it is at least 95%.

Representative error can not be mixed with reference errors or errors of attention (etc.), which must be reduced to a minimum by adequate techniques and tools used in the experiment.

The magnitude of the error of representativeness depends on both the sample size and the specimity variability. The greater the number of observations, the closer the sample towards the general population and the less error. The more changing the sign, the greater the value of the statistical error.

In practice, to determine the error of representativeness in variational series uses the following formula:

where: m. - Representative error;

σ - secondary quadratic deviation;

n. - The number of observations in the sample.

From the formula, it can be seen that the size of the average error is directly proportional to the average quadratic deviation, i.e. the variability of the studied attribution, and inversely proportional to the root of the square from the number of observations.

When performing statistical analysis, based on calculating relative values, the construction of a variation number is not mandatory. At the same time, the definition of an average error for relative indicators can be performed on a simplified formula:

where: R- the magnitude of the relative indicator, expressed in percent, ppm, etc.;

q. - the amount, inverse P and expressed as (1-p), (100-p), (1000-p), etc., depending on the basis on which the indicator is calculated;

n. - The number of observations in the selective aggregate.

However, the specified formula for calculating the error of representativeness for relative values \u200b\u200bcan only be used in the case when the value of the indicator is less than its base. In some cases, the calculation of intensive indicators is not complied with such a condition, and the indicator may be expressed by the number of more than 100% or 1000% of. In such a situation, there is a conjunctional series and calculating the representative error by the formula for average values \u200b\u200bbased on the middle quadratic deviation.

Prediction of the value of the average arithmetic in the general population is performed with the indication of two values \u200b\u200b- the minimum and maximum. These extreme values \u200b\u200bof possible deviations, within which the desired average value of the general population can fluctuate, are called " Trust borders».

Translations of probability theory are proved that in the normal distribution of the feature with a probability of 99.7%, the extreme values \u200b\u200bof the average deviations will not be more than the magnitude of the tripled representativeness error ( M. ± 3. m. ); 95.5% - no more than the value of the double average error of the average value ( M. ± 2. m. ); 68.3% - no more than the amount of one average error ( M. ± 1. m. ) (Fig. 9).

P%

Fig. 9. The density of probability of normal distribution.

Note that the above statement is fairly only for a feature that is subject to the normal law of the Gauss distribution.

Most experimental studies, including in the field of medicine, are associated with measurements whose results can take almost any values \u200b\u200bat a given interval, therefore, as a rule, a model of continuous random variables are described. In this regard, in most statistical methods, continuous distributions are considered. One such distribution that has a fundamental role in mathematical statistics is normal, or Gaussian, distribution.

This is explained by a number of reasons.

1. First of all, many experimental observations can be successfully described using a normal distribution. It should be immediately noted that there are no allocating distributions of empirical data that would be fine with normal, since a normally distributed random value is ranging from to that, which is never found in practice. However, the normal distribution is very often well suited as an approximation.

Whether the measurements of the weight, growth and other physiological parameters of the human body are carried out - everywhere the results have an impact of a very large number of random factors (natural causes and measurement errors). Moreover, as a rule, the action of each of these factors is insignificant. Experience shows that the results in such cases will be distributed approximately normal.

2. Many distributions associated with a random sample, with an increase in the latter, go to normal.

3. Normal distribution is well suited as an approximate description of other continuous distributions (for example, asymmetric).

4. The normal distribution has a number of favorable mathematical properties, in many respects providing its widespread use in statistics.

At the same time, it should be noted that there are many experimental distributions in medical data, the description of which the model of normal distribution is impossible. To do this, in the statistics developed methods that are called "non-parametric".

The choice of a statistical method that is suitable for processing the data of a specific experiment must be made depending on the belonging of the data to the normal distribution law. Checking the hypothesis on the submission of a sign by the normal distribution law is performed using the frequency distribution histogram (graph), as well as a number of statistical criteria. Among them:

Asymmetry criterion ( b. );

Exscess check criterion ( g. );

Criteria Shapiro - Wilx ( W. ) .

Analysis of the nature of the distribution of data (it is also called the validity of the distribution) is carried out for each parameter. To confidently judge the compliance of the distribution of the parameter by the normal law, a sufficiently large number of observation units are needed (at least 30 values).

For normal distribution, the criteria of asymmetry and excesses take the value of 0. If the distribution is shifted to the right b. \u003e 0 (positive asymmetry), with b. < 0 - график распределения смещен влево (отрицательная асимметрия). Критерий асимметрии проверяет форму кривой распределения. В случае нормального закона g. \u003d 0. For g. \u003e 0 sharp distribution curve if g. < 0 пик более сглаженный, чем функция нормального распределения.

To check for normality by the criterion of Shapiro - Wilx, it is required to find the meaning of this criterion on statistical tables with the required level of significance and depending on the number of observation units (degrees of freedom). Annex 1. The hypothesis of normality is rejected at small values \u200b\u200bof this criterion, as a rule, w. <0,8.

The combination of objects or phenomena combined by any common feature or property of high-quality or quantitative nature is called object observation .

Any object of statistical observation consists of individual elements - units observation .

The results of statistical observation are numerical information - data . Statistical data - This is the information about what values \u200b\u200bthe research is interested in the researcher in a statistical aggregate.

If the character values \u200b\u200bare expressed by numbers, the sign is called quantitative .

If the feature characterizes some property or condition of the elements of the aggregate, then the sign is called qualitative .

If the study is subject to all elements of the aggregate (solid observation), then the statistical aggregate is called general.

If a study is subject to a part of the elements of the general population, then the statistical aggregate is called selective (sampling) . The selection from the general population is randomly extracted, so that each of the sample elements have equal chances of being selected.

The values \u200b\u200bof the attribute during the transition from one element of the set to another are changed (vary), therefore, in statistics, various signs are also called options . Options are usually indicated by small Latin letters x, y, z.

The sequence number of the option (sign values) is called rank . x 1 - 1st Embodiment (1st sign), x 2 - 2nd option (2nd sign value), X i - i-th variant (I-E sign).

Ordered in order of increasing or decreasing a number of signs (options) with the corresponding weights called them is called variational near (nearby distribution).

As weighs Frequency or frequency.

Frequency(M i) shows how many times one or another (sign value) in a statistical aggregate is encountered.

Frequency or relative frequency (W i) shows which part of the combination units has one or another. Frequency is calculated as the ratio of the frequency of one or another option to the sum of all frequencies of the row.

. (6.1)

The sum of all frequencies is equal to 1.

. (6.2)

Variation rows are discrete and interval.

Discrete variational rows Typically build in the event that the values \u200b\u200bof the studied sign may differ from each other by no less than some final value.

In discrete variation rows, point values \u200b\u200bare set.

The general view of the discrete variation number is specified in Table 6.1.

Table 6.1.

where i \u003d 1, 2, ..., l.

In the interval variation rows in each interval, the upper and lower boundaries of the interval are distinguished.

The difference between the upper and lower boundaries of the interval is called interval difference or length (value) interval .

The magnitude of the first interval K 1 is determined by the formula:

k 1 \u003d. a 2 - a 1;

second: k 2 \u003d and 3 - a 2; ...

last: k l \u003d a L - A L -1.

In general interval difference K i is calculated by the formula:

k i \u003d x i (max) - x i (min). (6.3)

If the interval has both borders, then it is called closed .

The first and last intervals can be open . have only one border.

For example, the first interval can be set as "up to 100", the second - "100-110", ..., the penultimate - "190-200", the last - "200 and more". Obviously, the first interval does not have the lower boundary, and the last - top, both of them are open.

Often open intervals have to be consecrated. For this purpose, the first interval is usually taken equal to the value of the second, and the magnitude of the last one is the magnitude of the penultimate. In our example, the magnitude of the second interval is 110-100 \u003d 10, therefore, the lower boundary of the first interval is conditionally 100-10 \u003d 90; The magnitude of the penultimate interval is 200-190 \u003d 10, therefore, the upper boundary of the last interval is conditional in 200 + 10 \u003d 210.

In addition, interval variational groups may encounter intervals of different lengths. If the intervals in the variation range have the same length (interval difference), they are called isometric , otherwise - non-uniform.

When constructing an interval variation range, the problem of selecting the size of the interval (interval difference) is often faced.

To determine the optimal size of the intervals (in the event that a row with equal intervals are built) apply formula Stargessa:

, (6.4)

where n is the number of units of aggregate,

x (MAX) and X (MIN) - the largest and smallest values \u200b\u200bof the variants of the row.

For the characteristics of the variational series, along with frequencies and parties, accumulated frequencies and frequencies are used.

Accumulated frequencies (frequencies) Show how many units of the aggregate (which part of them) do not exceed the specified value (option) x.

Accumulated frequencies ( v I.) According to the discrete series, it is possible to calculate according to the following formula:

. (6.5)

For interval variation series - this is the sum of frequencies (frequencies) of all intervals not exceeding this.

Discrete variationaries can be graphically represented using polygon frequency distribution or frequency.

When constructing a distribution polygon along the abscissa axis, the values \u200b\u200bof the feature (variants) are postponed, and along the ordinate or frequency axes. At the intersection of the values \u200b\u200bof the feature and corresponding frequencies (frequencies), points are postponed, which, in turn, are connected by segments. The resulting leakage is called a polygon of frequency distribution (frequency).

X K.
x 2
x 1 x i


Fig. 6.1.

Interval variationaries graphically can be represented using histograms. Film chart.

When constructing a histogram along the abscissa axis, the values \u200b\u200bof the studied attribute (interval boundaries) are postponed.

In the event that the intervals are the same value, the ordinate or frequencies or frequencies can be postponed along the ordinate axis.

If the intervals have different magnitudes, along the ordinate axis, it is necessary to postpone the values \u200b\u200bof the absolute or relative distribution density.

Absolute density - The ratio of the frequency of the interval to the size of the interval:

; (6.6)

where: f (a) i is the absolute density of the i-th interval;

m i - frequency of the i-th interval;

k i is the value of the i-th interval (interval difference).

Absolute density shows how many units of the aggregate are per unit of interval.

Relative density - The ratio of the frequency of the interval to the size of the interval:

; (6.7)

where: F (o) i is the relative density of the i-th interval;

w i is the frequency of the i-th interval.

Relative density shows which part of the units of the aggregate is per unit of interval.

A L.
A 1 X I
A 2.

And discrete and interval variational series graphically can be represented as cumulatives and rigs.

When constructing cumulats According to the discrete series, the values \u200b\u200bof the feature (options) are postponed along the abscissa axis, and the accumulated frequencies or frequencies are accumulated on the axis. At the intersection of the signs (options) and the corresponding accumulated frequencies (frequencies), the points are built, which, in turn, are connected by segments or curve. The resulting broken (curve) is called a cumulated (cumulative curve).

When constructing cumulates according to the interval series along the abscissa axis, the boundaries of the intervals are postponed. The abscissions of the points are the upper boundaries of the intervals. The ordinates form accumulated frequencies (frequencies) of the corresponding intervals. Often add another point, the abscissa of which is the lower boundary of the first interval, and the ordinate is zero. Connecting points with sections or curve, we get cumulating.

Ogiva It is built similarly to cumulative with the difference that the abscissa axis is applied, corresponding to the accumulated frequencies (generals), and according to the ordinate axis - the sign values \u200b\u200b(options).

Russian Academy of National Economy and Public Service under the President of the Russian Federation

Oryol branch

department of Mathematics and Mathematical Methods in Management

Independent work

Mathematics

on the topic "Variational series and its characteristics"

for students of the full-time department of the Faculty "Economics and Management"

directions of training "Personnel Management"


Purpose of work:Mastering the concepts of mathematical statistics and receptions of primary data processing.

An example of solving typical tasks.

Task 1.

By survey, the following data () were obtained:

1 2 3 2 2 4 3 3 5 1 0 2 4 3 2 2 3 3 1 3 2 4 2 4 3 3 3 2 0 6

3 3 1 1 2 3 1 4 3 1 7 4 3 4 2 3 2 3 3 1 4 3 1 4 5 3 4 2 4 5

3 6 4 1 3 2 4 1 3 1 0 0 4 6 4 7 4 1 3 5

Need:

1) Make a variational series (statistical distribution of the sample), pre-writing a ranked discrete number of options.

2) Build a polygon and cumulat.

3) Create a number of distribution of relative frequencies (frequencies).

4) Find the main numeric characteristics of the variational series (use simplified formulas for their stay): a) the average arithmetic, b) median Me. and fashion MO, c) dispersion s 2., d) secondary quadratic deviation S., e) variation coefficient V..

5) clarify the meaning of the results obtained.

Decision.

1) For compilation ranked discrete number of options sort out the polling data in size and place them in ascending order.

0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2

3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4

5 5 5 5 6 6 6 7 7.

We will make a variational series, writing in the first line of the table observed values \u200b\u200b(options), and in the second frequency corresponding to them (Table 1)

Table 1.

2) The frequency polygon is a broken point connecting ( x I.; n I.), i.=1, 2,…, m.where m. X..

I will depict a polygon of the frequencies of the variational series (Fig. 1).

Fig.1. Polygon Frequency

Cumulative curve (cumulat) for a discrete variation range represents a broken point connecting ( x I.; n i h), i.=1, 2,…, m..

Find accumulated frequencies n i h (accumulated frequency shows how many options were observed with a sign of a sign of smaller h.). The values \u200b\u200bfound in the third line of the table 1.



We construct cumulatory (Fig. 2).

Fig.2. Cumulat

3) We find relative frequencies (frequencies), where, where m. - the number of different signs of the feature X.which we will calculate with the same accuracy.

We write a number of distribution of relative frequencies (frequencies) in the form of table 2

table 2

4) We find the main numerical characteristics of the variational series:

a) Middle arithmetic found using a simplified formula:

,

where - Conditional Options

Put from\u003d 3 (one of the average observed values), k.\u003d 1 (the difference between two adjacent options) and make a calculated table (Table 3).

Table 3.

x I. n. I. u I. u i n i u i 2 n i
-3 -12
-2 -26
-1 -14
Sum -11

Then the average arithmetic

b) Median Me. The variation series is called the value of the attribute coming to the middle of the ranked row of observations. This discrete variation range contains an even number of members ( n.\u003d 80), it means that the median is equal to half the middle of the two middle options.

Modoy MO The variation range is called the option to which the highest frequency corresponds. For this variational series, the highest frequency N. Max \u003d 24 meets the option h. \u003d 3, then fashion MO=3.

c) dispersion s 2.which is a measure of scattering of possible values \u200b\u200bof the indicator X. Around its average, we will find using the simplified formula:

where u I. - Conditional options

Intermediate calculations also bring to Table 3.

Then dispersion

d) secondary quadratic deviation s. Find by the formula:

.

e) coefficient of variation V.: (),

The variation coefficient is an immeasurable value, therefore it is suitable for comparing the scattering of variational series, options of which have different dimensions.

The coefficient of variation

.

5) The meaning of the results obtained is that the value characterizes the average sign X. Within the sample under consideration, that is, the average value was 2.86. Average quadratic deviation s. describes the absolute scatter of values \u200b\u200bof the indicator X. And in this case amounts to s. ≈ 1.55. The coefficient of variation V. characterizes the relative variability of the indicator X., that is, the relative scatter around its average value, and in this case is.

Answer: ; ; ; .

Task 2.

There are the following data on their own capital of 40 largest banks in Central Russia:

12,0 49,4 22,4 39,3 90,5 15,2 75,0 73,0 62,3 25,2
70,4 50,3 72,0 71,6 43,7 68,3 28,3 44,9 86,6 61,0
41,0 70,9 27,3 22,9 88,6 42,5 41,9 55,0 56,9 68,1
120,8 52,4 42,0 119,3 49,6 110,6 54,5 99,3 111,5 26,1

Need:

1) Build interval variationaries.

2) Calculate the middle selective and selective dispersion

3) Find the average quadratic deviation, and the coefficient of variation.

4) Build a histogram of distribution frequency.

Decision.

1) Choose an arbitrary number of intervals, for example, 8. Then the width of the interval:

.

Let's make a calculation table:

Interval option x k -x k +1 Frequency, n I. Middle interval x I. Conditional option and I. and I n i and I. 2 n I. (and I +.1) 2 N I.
10 – 25 17,5 – 3 – 12
25 – 40 32,5 – 2 – 10
40 – 55 47,5 – 1 – 11
55 – 70 62,5
70 – 85 77,5
85 – 100 92,5
100 – 115 107,5
115 – 130 122,5
Sum – 5

As a false zero, the value is chosen c \u003d.62.5 (this option is located approximately in the middle of the variational series) .

Conditional options are determined by the formula

The group of numbers, combined by any sign, is called aggregate.

As noted above, the primary statistical sports material is a group of scattered numbers that do not give the coach of ideas about the essence of a phenomenon or process. The task is to turn this totality to the system and use it with indicators to obtain the required information.

The preparation of the variation series is precisely the formation of a certain mathematical

Example 2. In 34 skiers' athletes have been registered such a pulse recovery time after passing distance (in seconds):

81; 78: 84; 90; 78; 74; 84; 85; 81; 84: 79; 84; 74; 84; 84;

85; 81; 84; 78: 81; 74; 84; 81; 84; 85; 81; 78; 81; 81; 84;

As can be seen, this group of numbers does not bear any information.

For the preparation of the variation number, first produce operation ranking - The location of the numbers is in ascending order or descending. For example, in ascending order, ranking leads to the following;

78; 78; 78; 78; 78; 78;

81; 81; 81; 81; 81; 81; 81; 81; 81;

84; 84; 84; 84; 84; 84; 84; 84; 84; 84; 84;

In descending order, ranking leads to such a group of numbers:

84; 84; 84; 84; 84; 84; 84; 84: 84: 84; 84;

81; 81; 81; 81; 8!; 81: 81; 81; 81;

78; 78; 78; 78; 78; 78;

After ranking, the irrational form of recording of this group of the number of numbers and the same numbers is becoming the same number repeated many times. Therefore, a natural thought occurs to convert the recording in such a way as to specify which number how many times is repeated. For example, considering ranking in ascending order:

Here, the number is recorded by the number indicating the recovery time of the athlete's pulse, to the right of the repetitions of this testimony in this group of 34 athletes.

In accordance with the above concepts about mathematical symbols, the considered group of measurements will designate any letter, for example x. Given the increasing order of numbers in this group: X 1 -74 C; x 2 - 78 s; x 3 - 81 s; x 4 - 84 s; x 5 - 85 s; x 6 n - 90 s, each considered number can be designated by the x i symbol.

Denote the number of repetitions of the considered measurements of the letter n. Then:

n 1 \u003d 4; N 2 \u003d 6; N 3 \u003d 9; N 4 \u003d 11; N 5 \u003d 3; N 6 \u003d n n \u003d 1, and each number of repetitions can be denoted as n i.

The total number of measurements carried out, as follows from the condition of the example, is 34. This means that the sum of all n is equal to 34. or in symbolic expression:

Denote this amount by one letter - n. Then the initial data of the example under consideration can be recorded in this form (Table 1).

The resulting group of numbers is the transformed series of chaotic scattered testimony obtained by the coach at the beginning of work.

Table 1

X I. N I.
N \u003d 34.

Such a group is a specific system whose parameters characterize the measurements performed. Numbers representing the measurement results (x i) call options; N. I - the numbers of their repetitions - are called frequencies; n - the sum of all frequencies - there the volume of the totality.

The whole obtained system is called variational near. Sometimes these rows are called empirical or statistical.

It is easy to notice that a particular case of the variational series is possible, when all frequencies are equal to one n i \u003d\u003d 1, that is, each measurement in this group of numbers met only once.

The resulting variational series, like any other, can be represented graphically. To build the graph of the resulting series, you must first of all be on the scale on the horizontal and vertical axis.

In this task, on the horizontal axis, we will deposit the pulse recovery time (x 1) in such a way that the length of the length elected arbitrarily corresponds to the value of one second. It will start to postpone these values \u200b\u200bfrom 70 seconds, conventionally retreating from the intersection of two axes 0.

On the vertical axis, postpone the values \u200b\u200bof the frequencies of our row (N i), taking the scale: the unit of length is equal to a unit of frequency.

Prepare the conditions for building a schedule, proceed to work with the obtained variational.

The first pair of numbers x 1 \u003d 74, n 1 \u003d 4 is applied to the chart like this: on the x axis; We find x 1 =74 And we restore the perpendicular from this point, on the n axis we find N 1 \u003d 4 and carry out a horizontal line from it to the intersection with the restored perpendicular. Both lines vertical and horizontal are auxiliary lines and therefore are applied to the drawing of the dotted line. The point of their intersection is a ratio x 1 \u003d 74 and N 1 \u003d 4 on the scale of this graph.

In the same way, all other points of the schedule are applied. Then they are connected by the sections of the straight lines. In order for the schedule to have a closed look, the extreme dots connect the segments with adjacent points of the horizontal axis.

The resulting figure is a graph of our variational series (Fig. 1).

It is quite clear that each variational series seems to be their own schedule.

Fig. 1. Graphic representation of the variational series.

In fig. 1 shows:

1) of all the most surveyed the largest group accounted for athletes, the recovery time of the pulse, which 84 s;

2) Many this time is 81 s;

3) The smallest group was athletes with a small pulse recovery time - 74 s and large - 90 s.

Thus, by performing a series of tests, the numbers obtained should be ranked and draw up a variational series, which is a specific mathematical system. For clarity, the variationaries can be illustrated by the schedule.

The above variation range is called yet discrete Next - this, in which each option is expressed in one number.

Let us give a few more examples to compile variational series.

Example 3. 12 shooters, performing an exercise lying out of 10 shots, showed such results (in glasses):

94; 91; 96; 94; 94; 92; 91; 92; 91; 95; 94; 94.

To form a variation range, we will rank data number;

94; 94; 94; 94; 94;

After ranking, we make a variation series (Table 3).