Medicine      05/12/2021

What is "mathematical statistics"? The subject of mathematical statistics General population and sampling

Math statistics is a modern branch of mathematics that deals with statistical description results of experiments and observations, as well as building mathematical models containing concepts probabilities. The theoretical basis of mathematical statistics is probability theory.

In the structure of mathematical statistics, two main sections are traditionally distinguished: descriptive statistics and statistical inference (Figure 1.1).

Rice. 1.1. Main sections of mathematical statistics

Descriptive statistics is used for:

o generalization of indicators of one variable (statistics of a random sample);

o identifying relationships between two or more variables (correlation-regression analysis).

Descriptive statistics makes it possible to obtain new information, quickly understand and comprehensively evaluate it, that is, it performs the scientific function of describing the objects of study, which justifies its name. The methods of descriptive statistics are designed to turn a set of individual empirical data into a system of forms and numbers that are visual for perception: frequency distributions; indicators of trends, variability, communication. These methods calculate the statistics of a random sample, which serve as the basis for the implementation of statistical inferences.

Statistical Inference give the opportunity:

o evaluate the accuracy, reliability and effectiveness of sample statistics, find errors that occur in the process of statistical research (statistical evaluation)

o generalize the parameters of the general population obtained on the basis of sample statistics (testing statistical hypotheses).

the main objective scientific research- this is the acquisition of new knowledge about a large class of phenomena, persons or events, which are commonly called the general population.

Population is the totality of objects of study, sample- its part, which is formed in a certain scientifically substantiated way 2.

The term "general population" is used when it comes to a large but finite set of objects under study. For example, about the totality of applicants in Ukraine in 2009 or the totality of children preschool age the city of Rivne. General populations can reach significant volumes, be finite and infinite. In practice, as a rule, one deals with finite sets. And if the ratio of the size of the general population to the size of the sample is more than 100, then, according to Glass and Stanley, the estimation methods for finite and infinite populations give essentially the same results. The general set can also be called the complete set of values ​​of some attribute. The fact that the sample belongs to the general population is the main basis for assessing the characteristics of the general population according to the characteristics of the sample.

Main idea mathematical statistics is based on the belief that a complete study of all objects of the general population in most scientific tasks or practically impossible, or economically impractical, since it requires a lot of time and significant material costs. Therefore, in mathematical statistics, it is used selective approach, the principle of which is shown in the diagram in Fig. 1.2.

For example, according to the formation technology, the samples are randomized (simple and systematic), stratified, clustered (see Section 4).

Rice. 1.2. Scheme of application of methods of mathematical statistics According to selective approach the use of mathematical and statistical methods can be carried out in the following sequence (see Fig. 1.2):

o with general population, properties of which are subject to research, certain methods form a sample- a typical but limited number of objects to which research methods are applied;

o as a result of observational methods, experimental actions and measurements on sample objects, empirical data are obtained;

o processing of empirical data using descriptive statistics methods gives sample indicators, which are called statisticians - like the name of the discipline, by the way;

o applying statistical inference methods to statistician, receive parameters that characterize the properties the general population.

Example 1.1. In order to assess the stability of the level of knowledge (variable x) testing of a randomized sample of 3 students with a volume of n. The tests contained m tasks, each of which was evaluated according to the scoring system: "completed" "- 1," not fulfilled "- 0. average current achievements of students remained X

3 randomized sample(from the English. Random - random) is a representative sample, which is formed according to the strategy of random tests.

at the level of previous years / h? Solution sequence:

o find out a meaningful hypothesis of the type: "if the current test results do not differ from the past, then we can consider the level of students' knowledge to be unchanged, and educational process- stable";

o formulate an adequate statistical hypothesis, such as the null hypothesis H 0 that the "current GPA X is not statistically different from the average of previous years / h", i.e. H 0: X = ⁄ r, against the corresponding alternative hypothesis X Ф ^ ;

o build empirical distributions of the investigated variable X;

o define(if necessary) correlations, for example, between a variable X and other indicators, build regression lines;

o check the correspondence of the empirical distribution to the normal law;

o evaluate the value of point indicators and the confidence interval of parameters, for example, the average;

o define criteria for testing statistical hypotheses;

o test statistical hypotheses based on the selected criteria;

o formulate a decision on the statistical null hypothesis on a certain significance level;

o move from the decision to accept or reject the statistical null hypothesis of the interpretation of the conclusions regarding the meaningful hypothesis;

o formulate meaningful conclusions.

So, if we summarize the above procedures, the application of statistical methods consists of three main blocks:

The transition from an object of reality to an abstract mathematical and statistical scheme, that is, the construction of a probabilistic model of a phenomenon, process, property;

Carrying out computational actions by proper mathematical means within the framework of a probabilistic model based on the results of measurements, observations, experiments and the formulation of statistical conclusions;

Interpretation of statistical conclusions about the real situation and making an appropriate decision.

Statistical methods for processing and interpreting data are based on probability theory. The theory of probability is the basis of the methods of mathematical statistics. Without the use of fundamental concepts and laws of probability theory, it is impossible to generalize the conclusions of mathematical statistics, and hence their reasonable use for scientific and practical purposes.

Thus, the task of descriptive statistics is to transform a set of sample data into a system of indicators - statistics - frequency distributions, measures of central tendency and variability, coupling coefficients, and the like. However, statistics are characteristics, in fact, of a particular sample. Of course, it is possible to calculate sample distributions, sample means, variances, etc., but such "data analysis" is of limited scientific and educational value. The "mechanical" transfer of any conclusions drawn on the basis of such indicators to other populations is not correct.

In order to be able to transfer sample indicators or others, or to more common populations, it is necessary to have mathematically justified provisions about the conformity and ability of sample characteristics with the characteristics of these common so-called general populations. These provisions are based on theoretical approaches and schemes associated with probabilistic models of reality, for example, on the axiomatic approach, in the law big numbers etc. Only with their help it is possible to transfer the properties that are established by the results of the analysis of limited empirical information, either to other or to widespread sets. Thus, the construction, the laws of functioning, the use of probabilistic models, is the subject of a mathematical field called "probability theory", becomes the essence of statistical methods.

Thus, in mathematical statistics, two parallel lines of indicators are used: the first line, which is relevant to practice (these are sample indicators) and the second, based on theory (these are indicators of a probabilistic model). For example, the empirical frequencies that are determined on the sample correspond to the concepts of theoretical probability; sample mean (practice) corresponds expected value(theory), etc. Moreover, in studies, selective characteristics, as a rule, are primary. They are calculated on the basis of observations, measurements, experiments, after which they undergo a statistical assessment of the ability and effectiveness, testing of statistical hypotheses in accordance with the objectives of the research, and in the end are accepted with a certain probability as indicators of the properties of the studied populations.

Question. Task.

1. Describe the main sections of mathematical statistics.

2. What is the main idea of ​​mathematical statistics?

3. Describe the ratio of the general and sample populations.

4. Explain the scheme for applying the methods of mathematical statistics.

5. Specify the list of the main tasks of mathematical statistics.

6. What are the main blocks of the application of statistical methods? Describe them.

7. Expand the connection between mathematical statistics and probability theory.

Introduction

2. Basic concepts of mathematical statistics

2.1 Basic concepts of sampling

2.2 Sampling

2.3 Empirical distribution function, histogram

Conclusion

Bibliography

Introduction

Mathematical statistics is the science of mathematical methods of systematization and use of statistical data for scientific and practical conclusions. In many of its branches, mathematical statistics is based on the theory of probability, which makes it possible to assess the reliability and accuracy of conclusions drawn from limited statistical material (for example, to estimate the required sample size to obtain results of the required accuracy in a sample survey).

In probability theory, random variables are considered with given distribution or random experiments whose properties are fully known. The subject of probability theory is the properties and relationships of these quantities (distributions).

But often the experiment is a black box, giving only some results, according to which it is required to draw a conclusion about the properties of the experiment itself. The observer has a set of numerical (or they can be made numerical) results obtained by repeating the same random experiment under the same conditions.

In this case, for example, the following questions arise: If we observe one random variable, how can we draw the most accurate conclusion about its distribution from a set of its values ​​in several experiments?

An example of such a series of experiments is a sociological survey, a set of economic indicators, or, finally, a sequence of coats of arms and tails during a thousand-fold coin toss.

All of the above factors lead to relevance and the importance of the topic of work on present stage aimed at a deep and comprehensive study of the basic concepts of mathematical statistics.

In this regard, the purpose of this work is to systematize, accumulate and consolidate knowledge about the concepts of mathematical statistics.

1. Subject and methods of mathematical statistics

Mathematical statistics is the science of mathematical methods for analyzing data obtained during mass observations (measurements, experiments). Depending on the mathematical nature of the specific results of observations, mathematical statistics is divided into statistics of numbers, multidimensional statistical analysis, analysis of functions (processes) and time series, statistics of objects of non-numerical nature. A significant part of mathematical statistics is based on probabilistic models. Allocate common tasks of data description, estimation and testing of hypotheses. They also consider more specific tasks related to conducting sample surveys, restoring dependencies, building and using classifications (typologies), etc.

To describe the data, tables, charts, and other visual representations are built, for example, correlation fields. Probabilistic models are usually not used. Some data description methods rely on advanced theory and the capabilities of modern computers. These include, in particular, cluster analysis, aimed at identifying groups of objects that are similar to each other, and multidimensional scaling, which makes it possible to visualize objects on a plane, distorting the distances between them to the least extent.

Estimation and hypothesis testing methods rely on probabilistic data generation models. These models are divided into parametric and non-parametric. In parametric models, it is assumed that the objects under study are described by distribution functions that depend on a small number (1-4) of numerical parameters. In nonparametric models, the distribution functions are assumed to be arbitrary continuous. In mathematical statistics, the parameters and characteristics of distribution (mathematical expectation, median, variance, quantiles, etc.), densities and distribution functions, dependencies between variables (based on linear and non-parametric correlation coefficients, as well as parametric or non-parametric estimates of functions expressing dependencies) are evaluated etc. Use point and interval (giving bounds for true values) estimates.

In mathematical statistics, there general theory hypothesis testing and a large number of methods dedicated to testing specific hypotheses. Hypotheses are considered about the values ​​of parameters and characteristics, about checking homogeneity (that is, about the coincidence of characteristics or distribution functions in two samples), about the agreement of the empirical distribution function with given function distribution or with a parametric family of such functions, about the symmetry of the distribution, etc.

Of great importance is the section of mathematical statistics associated with conducting sample surveys, with the properties of various sampling schemes and the construction of adequate methods for estimating and testing hypotheses.

Dependency recovery problems have been actively studied for more than 200 years, since the development of the method of least squares by K. Gauss in 1794. Currently, the methods of searching for an informative subset of variables and non-parametric methods are the most relevant.

The development of methods for data approximation and description dimension reduction was started more than 100 years ago, when K. Pearson created the principal component method. Later, factor analysis and numerous non-linear generalizations were developed.

Various methods of constructing (cluster analysis), analysis and use (discriminant analysis) of classifications (typologies) are also called methods of pattern recognition (with and without a teacher), automatic classification, etc.

Mathematical methods in statistics are based either on the use of sums (based on the Central Limit Theorem theory of probability) or indicators of difference (distances, metrics), as in the statistics of non-numerical objects. Usually only asymptotic results are rigorously substantiated. Nowadays computers play a big role in mathematical statistics. They are used both for calculations and for simulation modeling (in particular, in sampling methods and in studying the suitability of asymptotic results).

Basic concepts of mathematical statistics

2.1 Basic concepts of the sampling method

Let be a random variable observed in a random experiment. It is assumed that the probability space is given (and will not interest us).

We will assume that, having carried out this experiment once under the same conditions, we obtained the numbers , , , - the values ​​of this random variable in the first, second, etc. experiments. A random variable has some distribution , which is partially or completely unknown to us.

Let's take a closer look at a set called a sample.

In a series of experiments already performed, a sample is a set of numbers. But if this series of experiments is repeated again, then instead of this set we will get a new set of numbers. Instead of a number, another number will appear - one of the values ​​​​of a random variable. That is, (and , and , etc.) is a variable that can take the same values ​​as the random variable , and just as often (with the same probabilities). Therefore, before the experiment - a random variable equally distributed with , and after the experiment - the number that we observe in this first experiment, i.e. one of the possible values ​​of the random variable .

A sample of volume is a set of independent and identically distributed random variables (“copies”) that, like and , have a distribution.

What does it mean to “draw a conclusion about the distribution from a sample”? The distribution is characterized by the distribution function, density or table, set numerical characteristics- , , etc. Based on the sample, one must be able to build approximations for all these characteristics.

.2 Sampling

Consider the implementation of the sample on one elementary outcome - a set of numbers , , . On a suitable probability space, we introduce a random variable taking the values ​​, , with probabilities in (if some of the values ​​coincide, we add the probabilities the corresponding number of times). The probability distribution table and the random variable distribution function look like this:

The distribution of a quantity is called the empirical or sample distribution. Let us calculate the mathematical expectation and variance of a quantity and introduce the notation for these quantities:

In the same way, we calculate the moment of order

In the general case, we denote by the quantity

If, when constructing all the characteristics introduced by us, we consider the sample , , as a set of random variables, then these characteristics themselves - , , , , - will become random variables. These sample distribution characteristics are used to estimate (approximate) the corresponding unknown characteristics of the true distribution.

The reason for using the characteristics of the distribution to estimate the characteristics of the true distribution (or ) is in the closeness of these distributions for large .

Consider, for example, tossing a regular die. Let - the number of points that fell on the -th throw, . Let's assume that one in the sample will occur once, two - once, and so on. Then the random variable will take the values 1 , , 6 with probabilities , , respectively. But these proportions approach with growth according to the law of large numbers. That is, the distribution of magnitude in some sense approaches the true distribution of the number of points that fall out when the correct die is tossed.

We will not specify what is meant by the closeness of the sample and true distributions. In the following paragraphs, we will take a closer look at each of the characteristics introduced above and examine its properties, including its behavior with increasing sample size.

.3 Empirical distribution function, histogram

Since the unknown distribution can be described, for example, by its distribution function , we will construct an “estimate” for this function from the sample.

Definition 1.

An empirical distribution function built on a sample of volume , is called a random function , for each equal to

Reminder: random function

called an event indicator. For each, this is a random variable having a Bernoulli distribution with parameter . Why?

In other words, for any value of , equal to the true probability of the random variable being less than , the proportion of sample elements less than is estimated.

If the sample elements , , are sorted in ascending order (on each elementary outcome), a new set of random variables will be obtained, called a variation series:

The element , , is called the th member of the variational series or the th order statistic .

Example 1

Sample:

Variation row:

Rice. 1. Example 1

The empirical distribution function has jumps at sample points, the jump value at the point is , where is the number of sample elements that match with .

It is possible to construct an empirical distribution function for the variational series:

Another characteristic of the distribution is the table (for discrete distributions) or density (for absolutely continuous). An empirical, or selective analogue of a table or density is the so-called histogram.

The histogram is based on grouped data. The estimated range of values ​​of a random variable (or the range of sample data) is divided, regardless of the sample, into a certain number of intervals (not necessarily the same). Let , , be intervals on the line, called grouping intervals . Let us denote for by the number of sample elements that fall into the interval :

(1)

On each of the intervals, a rectangle is built, the area of ​​\u200b\u200bwhich is proportional to. The total area of ​​all rectangles must be equal to one. Let be the length of the interval. The height of the rectangle above is

The resulting figure is called a histogram.

Example 2

There is a variation series (see example 1):

Here is the decimal logarithm, therefore, i.e. when the sample is doubled, the number of grouping intervals increases by 1. Note that the more grouping intervals, the better. But, if we take the number of intervals, say, of the order of , then with growth the histogram will not approach density.

The following statement is true:

If the sample density is continuous function, then for so that , pointwise convergence in probability of the histogram to the density takes place.

So the choice of the logarithm is reasonable, but not the only possible one.

Conclusion

Mathematical (or theoretical) statistics is based on the methods and concepts of probability theory, but in a sense it solves inverse problems.

If we observe the simultaneous manifestation of two (or more) signs, i.e. we have a set of values ​​of several random variables - what can be said about their dependence? Is she there or not? And if so, what is this dependence?

It is often possible to make some assumptions about the distribution hidden in the "black box" or about its properties. In this case, according to experimental data, it is required to confirm or refute these assumptions (“hypotheses”). At the same time, we must remember that the answer "yes" or "no" can only be given with a certain degree of certainty, and the longer we can continue the experiment, the more accurate the conclusions can be. The most favorable situation for research is when one can confidently assert about some properties of the observed experiment - for example, about the presence of a functional dependence between the observed quantities, about the normality of the distribution, about its symmetry, about the presence of density in the distribution or about its discrete nature, etc. .

So, it makes sense to remember about (mathematical) statistics if

there is a random experiment, the properties of which are partially or completely unknown,

We are able to reproduce this experiment under the same conditions some (or better, any) number of times.

Bibliography

1. Baumol W. Economic theory and operations research. – M.; Science, 1999.

2. Bolshev L.N., Smirnov N.V. Tables of mathematical statistics. Moscow: Nauka, 1995.

3. Borovkov A.A. Math statistics. Moscow: Nauka, 1994.

4. Korn G., Korn T. Handbook of mathematics for scientists and engineers. - St. Petersburg: Lan Publishing House, 2003.

5. Korshunov D.A., Chernova N.I. Collection of tasks and exercises in mathematical statistics. Novosibirsk: Publishing House of the Institute of Mathematics. S.L. Sobolev SB RAS, 2001.

6. Peheletsky I.D. Mathematics: textbook for students. - M.: Academy, 2003.

7. Sukhodolsky V.G. Lectures on higher mathematics for the humanities. - St. Petersburg Publishing House of St. Petersburg state university. 2003

8. Feller V. Introduction to the theory of probability and its applications. - M.: Mir, T.2, 1984.

9. Harman G., Modern factor analysis. - M.: Statistics, 1972.


Harman G., Modern factor analysis. - M.: Statistics, 1972.

Probability theory and mathematical statistics are the basis of probabilistic-statistical methods of data processing. And we process and analyze the data primarily for decision-making. To use the modern mathematical apparatus, it is necessary to express the considered problems in terms of probabilistic-statistical models.

The application of a specific probabilistic-statistical method consists of three stages:

The transition from economic, managerial, technological reality to an abstract mathematical and statistical scheme, i.e. building a probabilistic model of a control system, a technological process, a decision-making procedure, in particular based on the results of statistical control, etc.

Carrying out calculations and obtaining conclusions by purely mathematical means within the framework of a probabilistic model;

Interpretation of mathematical and statistical conclusions in relation to a real situation and making an appropriate decision (for example, on the conformity or non-compliance of product quality with established requirements, the need to adjust the technological process, etc.), in particular, conclusions (on the proportion of defective units of products in a batch, on a specific form of laws of distribution of controlled parameters of the technological process, etc.).

Mathematical statistics uses the concepts, methods and results of probability theory. Next, we consider the main issues of building probabilistic models in economic, managerial, technological and other situations. We emphasize that for the active and correct use of normative-technical and instructive-methodological documents on probabilistic statistical methods prior knowledge is required. So, it is necessary to know under what conditions one or another document should be applied, what initial information is necessary to have for its selection and application, what decisions should be made based on the results of data processing, etc.

Application examples probability theory and mathematical statistics. Let us consider several examples when probabilistic-statistical models are a good tool for solving managerial, industrial, economic, and national economic problems. So, for example, in the novel by A.N. Tolstoy "Walking through the torments" (vol. 1) it says: "the workshop gives twenty-three percent of the marriage, you hold on to this figure," Strukov told Ivan Ilyich.

How to understand these words in the conversation of factory managers? One unit of production cannot be defective by 23%. It can be either good or defective. Perhaps Strukov meant that a large batch contains approximately 23% of defective units. Then the question arises, what does “about” mean? Let 30 out of 100 tested units of products turn out to be defective, or out of 1,000 - 300, or out of 100,000 - 30,000, etc., should Strukov be accused of lying?

Or another example. The coin that is used as a lot must be "symmetrical". When it is thrown, on average, in half the cases, the coat of arms (eagle) should fall out, and in half the cases - the lattice (tails, number). But what does "average" mean? If you spend many series of 10 throws in each series, then there will often be series in which a coin drops out 4 times with a coat of arms. For a symmetrical coin, this will happen in 20.5% of the series. And if there are 40,000 coats of arms for 100,000 tosses, can the coin be considered symmetrical? The decision-making procedure is based on the theory of probability and mathematical statistics.

The example may not seem serious enough. However, it is not. The drawing of lots is widely used in the organization of industrial feasibility experiments. For example, when processing the results of measuring the quality index (friction moment) of bearings, depending on various technological factors (the influence of a conservation environment, methods of preparing bearings before measurement, the effect of bearing load in the measurement process, etc.). Suppose it is necessary to compare the quality of bearings depending on the results of their storage in different conservation oils, i.e. in composition oils A And IN. When planning such an experiment, the question arises which bearings should be placed in the oil composition A, and which ones - in the composition oil IN, but in such a way as to avoid subjectivity and ensure the objectivity of the decision. The answer to this question can be obtained by drawing lots.

A similar example can be given with the quality control of any product. To decide whether or not an inspected batch of products meets the established requirements, a sample is taken from it. Based on the results of the sample control, a conclusion is made about the entire batch. In this case, it is very important to avoid subjectivity in the formation of the sample, i.e. it is necessary that each unit of product in the controlled lot has the same probability of being selected in the sample. Under production conditions, the selection of units of production in the sample is usually carried out not by lot, but by special tables of random numbers or with the help of computer random number generators.

Similar problems of ensuring the objectivity of comparison arise when comparing various schemes for organizing production, remuneration, when holding tenders and competitions, selecting candidates for vacant positions, etc. Everywhere you need a lottery or similar procedures.

Let it be necessary to identify the strongest and second strongest team when organizing a tournament according to the Olympic system (the loser is eliminated). Let's say that the stronger team always defeats the weaker one. It is clear that the strongest team will definitely become the champion. The second strongest team will reach the final if and only if it has no games with the future champion before the final. If such a game is planned, then the second strongest team will not reach the final. The one who plans the tournament can either “knock out” the second strongest team from the tournament ahead of schedule, bringing it down in the first meeting with the leader, or ensure it second place, ensuring meetings with weaker teams until the final. To avoid subjectivity, draw lots. For an 8-team tournament, the probability that the two strongest teams will meet in the final is 4/7. Accordingly, with a probability of 3/7, the second strongest team will leave the tournament ahead of schedule.

In any measurement of product units (using a caliper, micrometer, ammeter, etc.), there are errors. To find out if there are systematic errors, it is necessary to make repeated measurements of a unit of production, the characteristics of which are known (for example, a standard sample). It should be remembered that in addition to the systematic error, there is also a random error.

Therefore, the question arises of how to find out from the results of measurements whether there is a systematic error. If we note only whether the error obtained during the next measurement is positive or negative, then this problem can be reduced to the one already considered. Indeed, let's compare the measurement with the throwing of a coin, the positive error - with the loss of the coat of arms, the negative - with the lattice (zero error with a sufficient number of divisions of the scale almost never occurs). Then checking the absence of a systematic error is equivalent to checking the symmetry of the coin.

So, the problem of checking the absence of a systematic error is reduced to the problem of checking the symmetry of a coin. The above reasoning leads to the so-called "criterion of signs" in mathematical statistics.

In the statistical regulation of technological processes based on the methods of mathematical statistics, rules and plans for statistical control of processes are developed, aimed at timely detection of the disorder of technological processes and taking measures to adjust them and prevent the release of products that do not meet the established requirements. These measures are aimed at reducing production costs and losses from the supply of low-quality products. With statistical acceptance control, based on the methods of mathematical statistics, quality control plans are developed by analyzing samples from product batches. The difficulty lies in being able to correctly build probabilistic-statistical decision-making models. In mathematical statistics, probabilistic models and methods for testing hypotheses have been developed for this, in particular, hypotheses that the proportion of defective units of production is equal to a certain number R 0 , For example, R 0 = 0.23 (remember the words of Strukov from the novel by A.N. Tolstoy).

Assessment tasks. In a number of managerial, industrial, economic, national economic situations, problems of a different type arise - problems of estimating the characteristics and parameters of probability distributions.

Consider an example. Let a party from N electric lamps From this lot, a sample of n electric lamps A number of natural questions arise. How to determine the average service life of electric lamps based on the results of testing the sample elements, with what accuracy can this characteristic be estimated? How does accuracy change if a larger sample is taken? At what number of hours T it is possible to guarantee that at least 90% of electric lamps will last T or more hours?

Let us assume that when testing a sample with a volume n light bulbs are defective X electric lamps What limits can be specified for a number D defective electric lamps in a batch, for the level of defectiveness D/ N and so on.?

Or, in a statistical analysis of the accuracy and stability of technological processes, it is necessary to evaluate such quality indicators as the average value of the controlled parameter and the degree of its spread in the process under consideration. According to the theory of probability, it is advisable to use its mathematical expectation as the mean value of a random variable, and the variance, standard deviation, or coefficient of variation as a statistical characteristic of the spread. Questions arise: how to evaluate these statistical characteristics from sample data, with what accuracy can this be done?

There are many similar examples. Here it was important to show how probability theory and mathematical statistics can be used in engineering and management problems.

Modern concept of mathematical statistics. Under mathematical statistics understand “a section of mathematics devoted to the mathematical methods of collecting, systematizing, processing and interpreting statistical data, as well as using them for scientific or practical conclusions. The rules and procedures of mathematical statistics are based on the theory of probability, which makes it possible to evaluate the accuracy and reliability of the conclusions obtained in each problem on the basis of the available statistical material. At the same time, statistical data refers to information about the number of objects in any more or less extensive collection that have certain characteristics.

According to the type of problems being solved, mathematical statistics is usually divided into three sections: data description, estimation, and hypothesis testing.

According to the type of statistical data being processed, mathematical statistics is divided into four areas:

One-dimensional statistics (statistics of random variables), in which the result of an observation is described by a real number;

Multivariate statistical analysis, where the result of observation of an object is described by several numbers (vector);

Statistics of random processes and time series, where the result of observation is a function;

Statistics of objects of non-numerical nature, in which the observation result is of non-numerical nature, for example, is a set ( geometric figure), ordering or obtained as a result of measurement on a qualitative basis.

Historically, some areas of statistics of objects of a non-numerical nature appeared first (in particular, the problems of estimating the percentage of defective products and testing hypotheses about it) and univariate statistics. The mathematical apparatus is simpler for them, therefore, by their example, they usually demonstrate the main ideas of mathematical statistics.

Only those methods of data processing, ie. mathematical statistics are evidence-based, which are based on probabilistic models of relevant real phenomena and processes. We are talking about models of consumer behavior, the occurrence of risks, the functioning of technological equipment, obtaining the results of an experiment, the course of a disease, etc. A probabilistic model of a real phenomenon should be considered built if the quantities under consideration and the relationships between them are expressed in terms of probability theory. Correspondence to the probabilistic model of reality, i.e. its adequacy is substantiated, in particular, with the help of statistical methods for testing hypotheses.

Incredible data processing methods are exploratory, they can only be used in preliminary data analysis, since they do not make it possible to assess the accuracy and reliability of the conclusions obtained on the basis of limited statistical material.

Probabilistic and statistical methods are applicable wherever it is possible to construct and substantiate a probabilistic model of a phenomenon or process. Their use is mandatory when conclusions drawn from sample data are transferred to the entire population (for example, from a sample to an entire batch of products).

In specific areas of application, both probabilistic-statistical methods of wide application and specific ones are used. For example, in the section of production management devoted to statistical methods of product quality management, applied mathematical statistics (including the design of experiments) are used. With the help of its methods, a statistical analysis of the accuracy and stability of technological processes and a statistical assessment of quality are carried out. Specific methods include methods of statistical acceptance control of product quality, statistical regulation of technological processes, assessment and control of reliability, etc.

Such applied probabilistic-statistical disciplines as reliability theory and queuing theory are widely used. The content of the first of them is clear from the title, the second deals with the study of systems such as a telephone exchange, which receives calls at random times - the requirements of subscribers dialing numbers on their telephones. The duration of the service of these requirements, i.e. the duration of conversations is also modeled by random variables. Huge contribution Corresponding Member of the USSR Academy of Sciences A.Ya. Khinchin (1894-1959), academician of the Academy of Sciences of the Ukrainian SSR B.V. Gnedenko (1912-1995) and other domestic scientists.

Briefly about the history of mathematical statistics. Mathematical statistics as a science begins with the works of the famous German mathematician Carl Friedrich Gauss (1777-1855), who, based on the theory of probability, investigated and substantiated the least squares method, which he created in 1795 and applied to the processing of astronomical data (in order to clarify the orbit of a small planet Ceres). One of the most popular probability distributions, normal, is often named after him, and in the theory of random processes, the main object of study is Gaussian processes.

At the end of the XIX century. - the beginning of the twentieth century. a major contribution to mathematical statistics was made by English researchers, primarily K. Pearson (1857-1936) and R. A. Fisher (1890-1962). In particular, Pearson developed the chi-square test for testing statistical hypotheses, and Fisher developed analysis of variance, the theory of experiment design, and the maximum likelihood method for estimating parameters.

In the 30s of the twentieth century. Pole Jerzy Neumann (1894-1977) and Englishman E. Pearson developed a general theory of testing statistical hypotheses, and Soviet mathematicians Academician A.N. Kolmogorov (1903-1987) and Corresponding Member of the USSR Academy of Sciences N.V. Smirnov (1900-1966) laid the foundations of nonparametric statistics. In the forties of the twentieth century. Romanian A. Wald (1902-1950) built the theory of consistent statistical analysis.

Mathematical statistics is rapidly developing at the present time. So, over the past 40 years, four fundamentally new areas of research can be distinguished:

Development and implementation of mathematical methods for planning experiments;

Development of statistics of objects of non-numerical nature as an independent direction in applied mathematical statistics;

Development of statistical methods resistant to small deviations from the used probabilistic model;

Widespread development of work on the creation of computer software packages designed for statistical analysis of data.

Probabilistic-statistical methods and optimization. The idea of ​​optimization permeates modern applied mathematical statistics and other statistical methods. Namely, methods of planning experiments, statistical acceptance control, statistical control of technological processes, etc. On the other hand, optimization formulations in decision theory, for example, the applied theory of optimizing product quality and standard requirements, provide for the widespread use of probabilistic-statistical methods, primarily applied mathematical statistics.

In production management, in particular, when optimizing product quality and standard requirements, it is especially important to apply statistical methods to initial stage product life cycle, i.e. at the stage of research preparation of experimental design developments (development of promising requirements for products, preliminary design, terms of reference for experimental design development). This is due to the limited information available at the initial stage of the product life cycle and the need to predict the technical possibilities and economic situation for the future. Statistical methods should be applied at all stages of solving an optimization problem - when scaling variables, developing mathematical models for the functioning of products and systems, conducting technical and economic experiments, etc.

In optimization problems, including optimization of product quality and standard requirements, all areas of statistics are used. Namely, statistics of random variables, multivariate statistical analysis, statistics of random processes and time series, statistics of objects of non-numerical nature. Recommendations on the choice of a statistical method for the analysis of specific data have been developed.

Mathematical statistics is one of the main sections of such a science as mathematics, and is a branch that studies the methods and rules for processing certain data. In other words, it explores ways to uncover patterns that are inherent in large collections of identical objects, based on their sample survey.

The task of this section is to build methods for estimating the probability or making a certain decision about the nature of developing events, based on the results obtained. Tables, charts, and correlation fields are used to describe the data. rarely applied.

Mathematical statistics are used in various fields of science. For example, it is important for the economy to process information about homogeneous sets of phenomena and objects. They can be products manufactured by industry, personnel, profit data, etc. Depending on the mathematical nature of the results of observations, one can single out the statistics of numbers, the analysis of functions and objects of a non-numerical nature, and multidimensional analysis. In addition, they consider general and particular (related to the restoration of dependencies, the use of classifications, selective studies) tasks.

The authors of some textbooks believe that the theory of mathematical statistics is only a section of the theory of probability, while others believe that it is an independent science with its own goals, objectives and methods. However, in any case, its use is very extensive.

Thus, mathematical statistics is most clearly applicable in psychology. Its use will allow the specialist to correctly substantiate, find the relationship between the data, generalize them, avoid many logical errors, and much more. It should be noted that to measure one or another psychological phenomenon or a property of personality without computational procedures is often simply impossible. This suggests that the basics of this science are necessary. In other words, it can be called the source and basis of probability theory.

The method of research, which relies on the consideration of statistical data, is used in other areas. However, it should immediately be noted that its features as applied to objects that have different nature origin, always original. Therefore, it does not make sense to combine physical science into one science. Common features this method are reduced to counting a certain number of objects that are included in a particular group, as well as studying the distribution of quantitative characteristics and applying the theory of probability to obtain certain conclusions.

Elements of mathematical statistics are used in areas such as physics, astronomy, etc. Here, the values ​​of characteristics and parameters, hypotheses about the coincidence of any characteristics in two samples, about the symmetry of the distribution, and much more can be considered.

Mathematical statistics plays an important role in their implementation. Their goal is most often to build adequate methods for estimating and testing hypotheses. At present, computer technologies are of great importance in this science. They allow not only to significantly simplify the calculation process, but also to create samples for replication or when studying the suitability of the results obtained in practice.

In the general case, the methods of mathematical statistics help to draw two conclusions: either to make the desired judgment about the nature or properties of the data being studied and their relationships, or to prove that the results obtained are not enough to draw conclusions.

Mathematical statistics is understood as “a section of mathematics devoted to mathematical methods for collecting, systematizing, processing and interpreting statistical data, as well as using them for scientific or practical conclusions. The rules and procedures of mathematical statistics are based on the theory of probability, which makes it possible to evaluate the accuracy and reliability of the conclusions obtained in each problem on the basis of the available statistical material. At the same time, statistical data refers to information about the number of objects in any more or less extensive collection that have certain characteristics.

According to the type of problems being solved, mathematical statistics is usually divided into three sections: data description, estimation, and hypothesis testing.

According to the type of statistical data being processed, mathematical statistics is divided into four areas:

- one-dimensional statistics (statistics of random variables), in which the result of the observation is described real number;

- multidimensional statistical analysis, where the result of observation of an object is described by several numbers (vector);

- statistics of random processes and time series, where the result of observation is a function;

- statistics of objects of non-numerical nature, in which the result of observation has a non-numerical nature, for example, it is a set (geometric figure), ordering, or obtained as a result of measurement by a qualitative attribute.

Historically, some areas of statistics of objects of non-numerical nature (in particular, problems of estimating the percentage of defective products and testing hypotheses about it) and one-dimensional statistics were the first to appear. The mathematical apparatus is simpler for them, therefore, by their example, they usually demonstrate the main ideas of mathematical statistics.

Only those methods of data processing, ie. mathematical statistics are evidence-based, which are based on probabilistic models of relevant real phenomena and processes. We are talking about models of consumer behavior, the occurrence of risks, the functioning of technological equipment, obtaining the results of an experiment, the course of a disease, etc. A probabilistic model of a real phenomenon should be considered built if the quantities under consideration and the relationships between them are expressed in terms of probability theory.

Correspondence to the probabilistic model of reality, i.e. its adequacy is substantiated, in particular, with the help of statistical methods for testing hypotheses.

Incredible data processing methods are exploratory, they can only be used in preliminary data analysis, since they do not make it possible to assess the accuracy and reliability of the conclusions obtained on the basis of limited statistical material.

Probabilistic and statistical methods are applicable wherever it is possible to construct and substantiate a probabilistic model of a phenomenon or process. Their use is mandatory when conclusions drawn from sample data are transferred to the entire population (for example, from a sample to an entire batch of products).

In specific areas of application, both probabilistic-statistical methods of wide application and specific ones are used. For example, in the section of production management devoted to statistical methods of product quality management, applied mathematical statistics (including the design of experiments) are used. With the help of its methods, a statistical analysis of the accuracy and stability of technological processes and a statistical assessment of quality are carried out. Specific methods include methods of statistical acceptance control of product quality, statistical regulation of technological processes, assessment and control of reliability, etc.

Such applied probabilistic-statistical disciplines as reliability theory and queuing theory are widely used. The content of the first of them is clear from the title, the second deals with the study of systems such as a telephone exchange, which receives calls at random times - the requirements of subscribers dialing numbers on their telephones. The duration of the service of these requirements, i.e. the duration of conversations is also modeled by random variables. A great contribution to the development of these disciplines was made by Corresponding Member of the USSR Academy of Sciences A.Ya. Khinchin (1894-1959), academician of the Academy of Sciences of the Ukrainian SSR B.V. Gnedenko (1912-1995) and other domestic scientists.