The program for determining regularity online. Functions Index and exploration in Excel - the best alternative for the UPR. Search values \u200b\u200bin Excel array

To combat the combinatorial explosion requires a "combinatorial scrap". There are two tools that allow in practice to solve complex combinatorial tasks. The first is a mass parallelization of calculations. And it is important not only to have a large number of Parallel processors, but also choose such an algorithm that allows you to parallelate the task and download all available computing power.

The second tool is the principle of limitations. The main method that uses the principle of limitations is the method of "random subspaces". Sometimes combinatorial tasks allow a strong limitation of baseline conditions and at the same time maintain hope that after these restrictions in the data will be saved enough information so that the required solution can be found. Options for how to limit the starting conditions may be a lot. Not all of them can be successful. But if, nevertheless, the likelihood that successful options for restrictions are there, then the complex task can be broken into a large number of limited tasks, each of which is resolved much easier for the initial one.

Combining these two principles, you can build a solution and our task.

Combinatorial space

Take the input bit vector and correct its bits. Create combinatorial "points". In each point, we minimize several random bits of the input vector (drawing below). Watching the entrance, each of these points will not see the entire picture, but only its small part, determined by which bits come together in the selected point. So in the figure below the extreme left point with an index 0 monitors only beyond the bits 1, 6, 10 and 21 of the initial input signal. Let us create a lot of such points and call them a set of combinatorial space.

Combinatorial space

What is the meaning of this space? We assume that the input signal is not accidental, but contains certain patterns. Patterns can be two main types. Something in the input description may appear slightly more often than another. For example, in our case, individual letters appear more often than their combinations. When bitted coding, this means that certain combinations of bits are found more often than others.

Another type of patterns is when it is when there is a learning signal associated with it, and something contained in the input signal turns out to be associated with something contained in the learning signal. In our case, active output bits are a reaction to a combination of certain input bits.

If you look for the patterns "in the forehead", that is, looking at the entire entrance and the entire weekend vectors, it is not very clear what to do and where to move. If you begin to build hypotheses for what it can depend on, then a combinatorial explosion occurs instantly. The number of possible hypotheses is monstrous.

The classic method is widely used in neural networks - gradient descent. It is important for him to understand which way to move. It is usually easy when the output target is one. For example, if we want to educate the neural network writing numbers, we show her images of numbers and point out that it sees the figure at the same time. The network is understandable "how and where to descend." If we show the pictures at once with several numbers and call all these figures at the same time, without pointing out wherever, the situation becomes much more complicated.

When the combinatorial space points are created with a strongly limited "overview" (random subspaces), it turns out that some points may be driving and they will see the pattern if not quite clean, then at least in a significant purified form. Such a limited view will allow, for example, to carry out a gradient descent and get a clean pattern. The probability for a separate point stumble upon regularity may not be very high, but you can always choose such a number of points to ensure that any pattern "somewhere will pop up."

Of course, if the dimensions of the points are too narrow, that is, the number of bits at the points to choose approximately equal to the time how many bits are expected in patterns, the dimensions of the combinatorial space will strive to strive for the amount of options for full integrity of possible hypotheses, which returns us to the combinatorial explosion. But, fortunately, you can increase the overview of the points, reducing their total number. This reduction is not free, the combinatorics "is transferred to the point", but until a certain point it is not fatal.

Create an output vector. We simply minimize each bit of the output of the combinatorial space. What points it will choose randomly. The number of points falling into one bit will correspond to how many times we want to reduce the combinatorial space. This exit vector will be a hash function for the status vector of the combinatorial space. About how this condition is considered, we will talk a little later.

IN generalFor example, as shown in the figure above, the size of the input and exit may be different. In our example with the transcoding of strings, these sizes coincide.

Clusters receptors

How to look for patterns in combinatorial space? Each point sees its input vector fragment. If in the fact that it sees, there are quite a lot of active bits, then it can be assumed that what she sees is there any pattern. That is, a set of active bits falling into a point can be called a hypothesis about the presence of regularity. We will remember such a hypothesis, that is, fix a set of active bits visible at the point. In the situation shown in Figure below, it can be seen that at point 0 it is necessary to fix bits 1, 6 and 21.

Fixation of bits in a cluster

We will call the number of one bit number receptor to this bit. This implies that the receptor monitors the state of the corresponding bit of the entrance vector and reacts when a unit appears there.

A set of receptors will be called a receptor cluster or a receptive cluster. When the input vector is presented, the cluster receptors react if there are units in the appropriate vector positions. For the cluster, you can calculate the number of received receptors.

Since information is not encoded by separate bits, but by code, then from how many bits we take to the cluster depends on the accuracy with which we formulate the hypothesis. The article applies the text of the program, decisive task with row transcoding. By default, the following settings are specified in the program:

length of the entrance vector - 256 bits;
the length of the output vector is 256 bits;
a separate letter is encoded with 8 bits;
length line - 5 characters;
the number of displacement contexts - 10;
the size of the combinatorial space is 60000;
the number of bits intersecting at the point - 32;
cluster creation threshold - 6;
the threshold of partial activation of the cluster - 4.

With such settings, almost every bit, which is in the code of one letter, is repeated in the code of another letter, and even in the codes of several letters. Therefore, a single receptor cannot securely indicate a pattern. Two receptors indicate the letter is much better, but they may indicate a combination of completely other letters. You can enter a certain threshold of length, starting from which it is possible to reliably judge whether the code fit we needed in the cluster.

We introduce a minimum threshold by the number of receptors necessary for the formation of the hypothesis (in the example it is equal to 6). Let's start learning. We will serve source and the code we want to get at the exit. For the source code, it is easy to calculate how much active bits falls into each of the points of the combinatorial space. Select only those points that are connected to the active bits of the output code and in which the number of active bits of the input code that came to it will be at least the cluster creation threshold. At such points, we will create receptor clusters with the corresponding battles. Save these clusters at the points where they were created. In order not to create duplicates, pre-check that these clusters are unique to these points and the points do not contain exactly the same clusters.

Let's say the same in other words. On the weekend vector, we know which bits must be active. Accordingly, we can choose the associated points of the combinatorial space. For each such point, we can formulate a hypothesis that something that she now sees on the input vector is the pattern that is responsible for the activity of that bit to which this point is connected. We cannot say for one example, right or not this hypothesis, but no one interferes with us to nominate the assumption.

Training. Memory consolidation

In the process of learning, each new example creates a huge amount of hypotheses, most of which are incorrect. We need to check all these hypotheses and cut off false. This we can do, watching whether these hypotheses will be confirmed on subsequent examples. In addition, creating a new cluster, we remember all the bits that the point sees, and this is, even if there is a pattern, there are also random bits that came there from other concepts that do not affect our exit, and which in our case are noise. Accordingly, it is required not only to confirm or refute that in the memorized combination of bits contains the necessary pattern, but also to clean this combination from noise, leaving only the "clean" rule.

There are different approaches to solving the task. I will describe one of them without claiming that he is the best. I went over many options, this one bribed me the quality of work and simplicity, but this does not mean that it cannot be improved.

It is convenient to perceive clusters like autonomous computers. If each cluster can check its hypothesis and make decisions regardless of the rest, it is very good for potential parallelization of calculations. Each receptor cluster after creation begins an independent life. It follows the incoming signals, accumulates the experience, changes himself and accepts if it is necessary to decide on self-destruction.

The cluster is a set of bits, which we suggested that inside it is a pattern associated with the triggering of that output bit to which the point containing this cluster is connected. If there is a pattern, then most likely it affects only part of the bits, and we do not know what. Therefore, we will fix all the moments when a significant number of receptors are triggered in the cluster (in the example of at least 4). It is possible that in these moments the pattern, if it is, manifests itself. When certain statistics accumulate, we will be able to try to determine if there is something natural in such partial triggers or not.

An example of statistics is shown in the figure below. Plus, at the beginning of the line shows that at the time of partial triggering cluster, the output bit was also active. Cluster bits are formed from the corresponding bits of the input vector.

Chronicle of partial response cluster receptor

What should be interested in this statistic? It is important to us which bits are more often work together. Do not confuse it with the most frequent bits. If you calculate the frequency of its appearance for each bits and take the most common bits, it will be averaging that is not at all what we need. If several stable patterns agreed at the point, then in averaging the average between them "not regularity". In our example, it can be seen that 1.2 and 4 lines are similar to each other, are also similar to 3.4 and 6 lines. We need to choose one of these patterns, preferably the strongest, and clean it from unnecessary bits.

The most common combination that manifests itself as a joint response of certain bits is for this statistics. To calculate the main component, you can use the Hebb filter. To do this, you can specify a vector with single initial weights. Then get the activity of the cluster, multiplying the vector of the scales on the current status of the cluster. And then shift the weight towards the current state is the stronger than the higher this activity. For weights do not grow uncontrollably, after changing the scales, they need to be normalized, for example, for the maximum value from the weight vector.

This procedure is repeated for all existing examples. As a result, the weight vector is increasingly approaching the main component. If there are no examples that there is not enough to come together, you can repeat the process several times on the same examples, gradually reducing the rate of learning.

The main idea is that as the cluster approaches the main component, the cluster begins to react more and less to the rest, with the rest, due to this training in the right side goes faster than "bad" examples try to spoil it. The result of the operation of such an algorithm after several iterations is shown below.

The result obtained after several iterations of the allocation of the first main component

If it is now to cut a cluster, that is, to leave only those receptors that have high weights (for example, above 0.75), then we will obtain a pattern cleaned by unnecessary noise bits. This procedure can be repeated several times as statistics accumulate. As a result, you can understand whether there is any regularity in the cluster, or we collected a random set of bits together. If there is no regularity, then as a result, the cluster cutting will remain too short fragment. In this case, such a cluster can be removed as a failed hypothesis.

In addition to trimming the cluster, it is necessary to ensure that the right pattern is caught. In the original line, the codes of several letters are mixed, each of them is a pattern. Any of these codes may be "caught" cluster. But we are interested in the code of only the letter that affects the formation of the output bit. For this reason, most hypotheses will be false and they must be rejected. This can be done by the criteria that partial or even complete cluster triggering too often will not coincide with the activity of the desired output bits. Such clusters are to be deleted. The process of such control and removal of unnecessary clusters along with their "trimming" can be called memory consolidation.

The process of accumulation of new clusters is quite fast, each new experience forms several thousand new hypotheses clusters. Training is advisable to conduct stages with a break to sleep. When clusters are created critical, it is required to go into the "idle" mode. In this mode, the previously remembered experience is scrolled. But the new hypotheses is not created, but only the old test is being checked. As a result of "sleep", it is possible to remove a huge percentage of false hypotheses and leave only hypotheses that have been checking. After "sleep", the combinatorial space not only turns out to be cleaned and ready to receive new informationBut much more confidently focuses in what was learning "yesterday."

Exit combinatorial space

As clusters will accumulate statistics and undergo consolidation, clusters will appear, quite similar to the fact that their hypothesis is either true or close to the truth. We will take such clusters and ensure that they will be fully activated, that is, when all cluster receptors are active.

Next, from this activity, we form a way out as a hash of the combinatorial space. At the same time, we will take into account that the longer the cluster, the higher the chance that we caught the pattern. For short clusters, there is a chance that the combination of bits arose by chance as a combination of other concepts. To increase noise immunity, we use the idea of \u200b\u200bthe busting, that is, we will require that the activation of the output bit takes place for short clusters only when there are several such triggering. In the case of long clusters, we assume that enough single response. This can be submitted through the potential that occurs when cluster is triggered. This potential is higher than a longer cluster. Potentials of points connected to one output bit are folded. If the final potential exceeds a certain threshold, the bit is activated.

After some learning at the output, the part begins to play, which coincides with what we want to get (drawing below).

An example of the work of the combinatorial space in the learning process (about 200 steps). From above the source code, in the middle of the required code, below the code predicted by the combinatorial space

Gradually, the output of the combinatorial space begins to better reproduce the required exit code. After several thousand learning steps, the output is reproduced with a sufficiently high accuracy (figure below).

An example of the work of trained combinators. From above the source code, in the middle of the required code, below the code predicted by the combinatorial space

To clearly imagine how it all works, I recorded video with the learning process. In addition, my explanations may help better understand all this kitchen.

Strengthening rules

To identify more complex patterns, brake receptors can be used. That is, enter patterns that block the triggering of certain affirmative rules when a certain combination of input bits appears. It looks like creating a cluster of receptors with brake properties under certain conditions. When the cluster is triggered, it will not increase, but reduce the potential of the point.

It is easy to come up with the rules for checking brake hypotheses and run the consolidation of brake recipe clusters.

Since the brake clusters are created at specific points, they do not affect the output bit blocking at all, but to block it triggered from the rules detected at this point. You can complicate the link architecture and enter brake rules common to the dot group or for all points connected to the output bit. It seems that you can come up with a lot of interesting things, but still focus on the described simple model.

Random forest

The described mechanism allows you to find patterns that Data Mining. It is customary to call the "if-then" type rules. Accordingly, you can find something in common between our model and all those methods that are traditionally used to solve such tasks. Perhaps the closest to us "Random Forest".

This method begins with the idea of \u200b\u200b"random subspaces". If in the source data, too many variables and these variables are weakly, but correlated, then it becomes difficult to fully determine individual patterns. In this case, you can create subspaces in which both variables used and training examples are limited. That is, each subspace will contain only part of the input data, and these data will be presented not by all variables, but by their random limited set. For some of these subspaces, the chances of detecting regularity, poorly visible on the full amount of data, are significantly increased.

Then in each subspace on a limited set of variables and training examples, a decisive tree is trained. The decisive tree is a tree structure (drawing below), in the nodes of which the input variables (attributes) checks. According to the results of verification of conditions in the nodes, the path from the vertex to the terminal node, which is called a sheet of wood is determined. The wood sheet is the result that can be the value of any value or class number.

Sample decision making tree

For decisive trees there are various learning algorithms that allow us to build a tree with more or less optimal attributes in its nodes.

At the final stage, the idea of \u200b\u200bbusting is applied. Decisive trees form a community for voting. Based on collective opinion, the most believable answer is created. The main advantage of the busting is the possibility when combining a plurality of "bad" algorithms (the result of which is only a little better random) to get an arbitrarily "good" final result.

In our algorithm operating the combinatorial space and receptor clusters, the same fundamental ideas are used as in the random forest method. Therefore, there is nothing surprising that our algorithm works and gives a good result.

Training Biology

Actually, this article describes software implementation The mechanisms that were described in the previous parts of the cycle. Therefore, we will not repeat everything from the very beginning, we note only the main one. If you forgot about how neurons work, you can re-read.

The neuron membrane contains many different receptors. Most of these receptors are in "free swimming". The membrane creates a medium for receptors in which they can freely move, easily by changing their position on the surface of the neuron (Sheng, M., Nakagawa, T., 2002) (Tovar K. R., Westbrook G. L., 2002).

Membrane and receptors

In a classic approach for the reasons for such "freedom" receptors, it is usually no accent. When synaps enhances its sensitivity, it is accompanied by the movement of receptors from the incompatible space in the synaptic slit (Malenka R.C., Nicoll R.A., 1999). This fact is unwritten as an excuse for the mobility of the receptors.

In our model, it can be assumed that the main reason for the mobility of receptors is the need for "on the fly" to form clusters from them. That is, the picture is as follows. The membrane is freely drifting a variety of receptors that are sensitive to different neurotransmitters. The information signal that emerged in the minicolone causes the ejection of neurotransmitters by axes of the endings of neurons and astrocytes. In each synapse, where neurotransmitters are emitted, in addition to the main neurotransmitter, there is a unique additive that identifies this synaps. Neuromediators splash out of synaptic slots into the surrounding space, due to which a specific cocktail of neurotransmitters arises in each place of dendrites (points of the combinatorial space) (the ingredients of the cocktail indicate the bits entering the point). Those freely wandering receptors who find their neuromediator at this moment in this cocktail (receptors of specific input bits), switch to a new state - search status. In this state, they have a short time (until the next tact come to the next time), for which they can meet other "active" receptors and create a common cluster (receptor cluster sensitive to a specific combination of bits).

Metabotropic receptors, and we are talking About them, have a rather complicated form (drawing below). They consist of seven transmembrane domains that are connected by loops. In addition, they have two free end. Due to different electrostatic charges, the free ends can be "filling out" to each other through the membrane. Due to such compounds, receptors and are combined into clusters.

Single metabotropic receptor

After the union, the shared life of receptors in the cluster begins. It can be assumed that the position of receptors relative to each other may vary widely and the cluster can take bizarre forms. If we assume that the receptors that work together will strive to take place closer to each other, for example, at the expense of electrostatic forces, it will be an interesting consequence. The closer such "joint" receptors will be provided, the stronger their joint attraction will be. Rights will begin to strengthen each other's influence. Such behavior reproduces the behavior of the Hebb filter, which highlights the first major component. The more precisely the filter is configured to the main component, the stronger its reaction is when it appears in the example. Thus, if after a number of iterations, the triggering receptors will be together in the conditional "center" of the cluster, and the "extra" receptors on the removal, at its edges, then, in principle, such "extra" receptors may at some point to self-deceit, then There is just to break away from the cluster. And then we get the cluster behavior, similar to what is described above in our computing model.

Clusters that have passed consolidation can move somewhere in a quiet harbor, for example, in a synaptic gap. There is a postsynaptic seal, for which receptor clusters can argue, losing their mobility already unnecessary. Near them will be ion channels that they will be able to control through G-proteins. Now these receptors will begin to influence the formation of local postsynaptic potential (point potential).

The local potential is made up of the joint influence of a number of activating and braking receptors. In our approach, activating are responsible for recognizing the patterns that encourage activate the output bits inhibiting the definition of patterns that block the action of local rules.

Sinapses (points) are located on a dendritic tree. If somewhere on this tree is a place where several activating receptors are in a small area and it is not blocked by brake receptors, then a dendritic spike arises, which spreads to the neuron body and, reaching the axon hilly, causes the spike of the neuron itself. The dendritic tree combines many synapses, clicked them into one neuron, which is very similar to the formation of the output bits of the combinatorial space.

Combining signals from different synapses of one dendritic tree may not be simple logical addition, but it is more difficult and implementing some algorithm of cunning busting.

Let me remind you that the base element of the cortex is a cortical minicolone. In the minicoloneka about a hundred neurons are located in each other. At the same time, they are tightly shrouded with connections that are much rich in the midnight than the connections going to neighboring minicolons. The entire bark of the brain is the space of such minicolonok. One neuron of minicolones can correspond to one output bit, all neurons of one cortical minicolone can be an analogue of the output binary vector.

The receptor clusters described in this chapter create a memory responsible for finding patterns. Previously, we described how to create holographic event memory using clusters of receptors. These are two of different types Memory performing different functions, although based on common mechanisms.

Sleep

A healthy person has a dream begins with the first stage of slow sleep, which lasts 5-10 minutes. Then the second stage comes, which lasts about 20 minutes. Another 30-45 minutes accounts for the periods of the third and the fourth stages. After that, sleeping again returns to the second stage of slow sleep, after which the first episode of fast sleep occurs, which has a short duration - about 5 minutes. During fast sleep, the eyeballs very often and periodically make rapid movements under closed centuries. If at this time to wake sleeping, then in 90% of cases you can hear a story about a bright dream. All this sequence is called a cycle. The first cycle has a duration of 90-100 minutes. The cycles are then repeated, while the proportion of slow sleep is reduced and the share of fast sleep is gradually growing, the last episode of which in some cases can reach 1 hour. On average, five full cycles are observed with a full healthy sleep.

It can be assumed that in a dream there is a basic work on clearing the clusters of receptors accumulated per day. In the computational model, we described the "idling" procedure. Old experience is presented to the brain, without causing the formation of new clusters. The goal is to check the already existing hypotheses. Such a check consists of two stages. The first is the calculation of the main component of the pattern and verify that the number of bits responsible for it is sufficient for a clear identification. The second is to verify the truth of the hypothesis, that is, the fact that the pattern was at the desired point associated with the desired output bit. It can be assumed that part of the stages of night sleep is associated with such procedures.

All processes associated with changes in cells are accompanied by the expression of certain proteins and transcription factors. There are proteins and factors that they have shown that they participate in the formation of a new experience. So, it turns out that their quantity increases greatly during wakefulness and sharply decreases during sleep.

To see and estimate the concentration of proteins, it is possible through the staining of the brainstone cutting with a dye, selectively reacting to the desired protein. Such observations have shown that the most large-scale changes for memory associated proteins occur during sleep (Chiara Cirelli, Giulio Tononi, 1998) (Cirelli, 2002) (Figures below).

Arc protein distribution in the rare crust of the rat after three hours of sleep (s) and after three hours of spontaneous wakefulness (W) (Cirelli, 2002)

The distribution of the transcriptional factor P-CREB in the coronal sections of the rare crust of the rat after three hours of sleep (s) and in case of deprivation of sleep for three hours (SD) (Cirelli, 2002)

In such reasoning about the role of sleep, a well-known feature is well stacked - "Morning in the evening wisen." In the morning we are much better focused on the fact that yesterday was not particularly clear. Everything becomes clearer and more clear. It is possible that by this we are obliged precisely a scale clearing of receptor clusters that occurred during sleep. False and dubious hypotheses are removed, reliable consolidation undergo and begin to actively participate in information processes.

In the simulation, it was seen that the number of false hypotheses in many thousands times the number of true. Since it is possible to distinguish one of others with time and experience, then the brain does not have anything else, except to save all this information ore in the hope of finding in it with the Radium grams. Upon receipt of a new experience, the number of clusters with hypotheses requiring checks is constantly growing. The number of clusters forming for the day and containing ore, which has yet to be treated may exceed the number of clusters responsible for encoding the proven experience gained in the entire previous life. The brain resource for the storage of raw hypotheses requiring verification must be limited. It seems that in 16 hours of day wakefulness with receptor clusters, all available space is almost completely clogged. When this moment comes, the brain begins to force us to go to sleep mode to allow it to make consolidation and clear free place. Apparently, the process of full clearance takes about 8 hours. If you wake us before, part of the clusters will remain untreated. From here there is a phenomenon that fatigue accumulates. If you have to prone for a few days, then you have to crawl sleep. Otherwise, the brain begins to "emergency" to remove clusters, which does not lead to anything good, as it deprives us of the task of learning from the experience gained. The event memory is most likely to be preserved, but regularities will remain non-observed.
By the way, my personal advice: Do not neglect high-quality sleep, especially if you learn. Do not try to save on a dream to have more time. Sleep is no less important in training than visiting lectures and repeat material in practical classes. No wonder the children in those periods of development, when the accumulation and generalization of information goes most actively, most of the time spend in a dream.

Speed \u200b\u200bbrain

The assumption of the role of recipe clusters allows you to take a fresh look at the question of the brain speed. Earlier, we said that every minicolone of a bark, consisting of hundreds of neurons, is an independent computing module that considers the interpretation of incoming information in a separate context. This allows one cortex zone to consider up to a million possible options Interpretations at the same time.

Now it can be assumed that each receptor cluster can work as an autonomous computing element by performing the entire cycle of calculations on checking its hypothesis. Such clusters in one cortical column can be hundreds of millions. This means that, although the frequencies that the brain works are far from the frequencies on which modern computers work, it is not worth a worm of the brain. Hundreds of millions of receptor clusters working in parallel in each minicolone of the cortex allow you to successfully solve complex tasks on the border with a combinatorial explosion. Miracles will not be. But you can learn to walk along the verge.

meaning

neural networks

neuron

consciousness

Add Tags

This textbook tells about the main advantages of functions. INDEX and Search board in Excel, which make them more attractive compared to Pr. You will see a few examples of formulas that will help you easily cope with many. complex tasksin front of which function Pr powerless.

In several recent articles, we have made every effort to clarify novice users of the function Pr and show examples of more complex formulas for advanced users. Now we will try, if you do not dissuade you from use Pr, at least show alternative methods Implementing the vertical search in Excel.

Why do we need it? - you ask. Yes because Pr - This is not the only search function in Excel, and its numerous restrictions may prevent you from getting the desired result in many situations. On the other hand, functions INDEX and Search board - more flexible and have a number of features that make them more attractive compared to Pr.

Basic information about index and search

As the task of this tutorial - show features INDEX and Search board To implement the vertical search in Excel, we will not linger on their syntax and application.

Let's give here required minimum For understanding of the essence, and then we analyze in detail examples of formulas that show the benefits of use INDEX and Search board instead Pr.

Index - syntax and application

Function Index. (Index) in Excel Returns a value from the array according to the specified row and column numbers. The function has such a syntax:

Each argument has a very simple explanation:

array. (Array) is a range of cells from which it is necessary to extract the value.
row_num (number_name) is a row number in an array from which you want to extract the value. If not specified, the argument is required. column_num (_stolbet number).
column_num (Number_number) is a column number in an array, from which you want to extract. If not specified, the argument is required. row_num (number_stroke)

If both arguments are indicated, then the function INDEX Returns the value from the cell located at the intersection of the specified rows and column.

Here the simplest example Functions Index. (INDEX):

Index (A1: C10,2,3)
\u003d Index (A1: C10; 2; 3)

Formula performs a search in the range A1: C10 and returns the cell value in 2-y. Row I. 3m column, that is, from the cell C2..

Very simple, right? However, in practice, you do not always know which row and the column you need, and therefore you need help Search board.

Search for syntax and application function

Function Match (Search) in Excel looks for the specified value in the range of cells and returns the relative position of this value in the range.

For example, if in the range B1: B3. contains New-York, Paris, London values, then the next formula will return the number 3 Since "London" is the third item in the list.

Match ("London", B1: B3.0)
\u003d Search board ("London"; B1: B3; 0)

Function Match (Finding) has such a syntax:

Match (Lookup_Value, Lookup_Array,)
Search location (desired_station; viewed_missive; [type_station])

lookup_Value. (The desired) is the number or text you are looking for. The argument can be a value, including a logical or link to the cell.
lookup_Array. (Looking through_Massive) - the range of cells in which the search takes place.
match_Type. (type type) - this argument reports functions Search boardwhether you want to find an exact or approximate match:
- 1 or not specified - Fits the maximum value, less or equal to the desired. The area viewed must be ordered ascending, that is, from a smaller to more.
- 0 - Finds the first value equal to the desired. For a combination INDEX/Search board Always need an exact coincidence, so the third argument function Search board Must be equal 0 .
- -1 - finds the smallest value, more or equal to the desired value. The area viewed must be ordered in descending order, that is, from a larger to a smaller.

At first glance, the benefits of the function Search board Causes doubt. Who needs to know the position of the element in the range? We want to know the meaning of this element!

Let me recall that the relative position of the desired value (i.e., the line number and / or column) is exactly what we must specify for arguments row_num (number_) and / or column_num (number_stall) functions Index. (INDEX). How do you remember the function INDEX It can return the value in the intersection of the specified rows and column, but it cannot determine which row and column are interested in us.

How to use an index and expocode in Excel

Now that you know the basic information about these two functions, I believe that it becomes clear how functions Search board and INDEX Can work together. Search board determines the relative position of the desired value in the specified range of cells, and INDEX Uses this number (or numbers) and returns the result from the corresponding cell.

Isn't it completely clear? Imagine functions INDEX and Search board In this form:

Index (, (Match ( secondary value,column in which we are looking for,0))
\u003d Index ( column from which remove; (Search board ( secondary value;column in which we are looking for;0))

I think it will also be easier to understand the example. Suppose you have such a list of state capitals:

Let's find the population of one of the capitals, for example, Japan, using the following formula:

Index ($ D $ 2: $ D $ 10, Match ("Japan", $ B $ 2: $ B $ 10,0))
\u003d Index ($ d $ 2: $ d $ 10; search board ("japan"; $ b $ 2: $ b $ 10; 0))

Now let's look at what makes each element of this formula:

Function Match (Search) searches for "japan" in column B.specifically - in cells B2: B10and returns the number 3 Since "japan" in the list in third place.
Function Index. (Index) uses 3 For argument row_num (number_stroke), which indicates from which string you need to return the value. Those. It turns out a simple formula:
Index ($ D $ 2: $ D $ 10.3)
\u003d Index ($ d $ 2: $ d $ 10; 3)
Formula says about the following: look for in cells from D2. before D10 and remove the value from the third line, that is, from the cell D4.Since the account begins with the second line.

This is the result will be in Excel:

Important! The number of rows and columns in an array that uses the function Index. (Index) must comply with the values \u200b\u200bof the arguments row_num (number_) and column_num (number_stall) functions Match (Search). Otherwise, the result of the formula will be erroneous.

Stop, stop ... why we can not just use the function VLOOKUP. (UPR)? Does it make sense to spend time trying to figure out the labyrinths Search board and INDEX?

VLOOKUP ("JAPAN", $ B $ 2: $ D $ 2,3)
\u003d VD ("japan"; $ b $ 2: $ d $ 2; 3)

In this case, no sense! The purpose of this example is exclusively a demonstration so that you can understand how functions Search board and INDEX Work in a pair. Subsequent examples will show you the true power of the bundle INDEX and Search boardwhich easily copes with many difficult situations when Pr It turns out in a dead end.

Why is the index / search board better than the PRD?

Solving which formula to use for vertical search, most of the Guru Excel believe that INDEX/Search board much better than Pr. However, many Excel users still resort to use Prbecause This feature is much easier. This happens because very few people fully understand all the advantages of the transition with Pr On the bunch INDEX and Search board, And no one wants to spend time on learning a more complex formula.

4 main advantages of using the search board / index in Excel:

1. Finding right left. As you know, any competent Excel user, Pr It cannot look left, which means that the desired value must be in the extreme left column of the study range. In case of Search board/INDEX, the search column can be both in the left and on the right side of the search range. Example: Shows this opportunity in action.

2. Safe add or remove columns. Formulas with function Pr Stop working or return erroneous values \u200b\u200bif you delete or add a column to the search table. For function Pr Any inserted or remote column will change the result of the formula because the syntax Pr Requires to specify the entire range and specific column number, from which you want to extract the data.

For example, if you have a table A1: C10and it is required to extract data from the column B.then you need to set a value 2 For argument col_index_num (number_stall) functions Pr, like this:

VLOOKUP ("Lookup Value", A1: C10,2)
\u003d VALUE ("Lookup Value"; A1: C10; 2)

If you later insert a new column between columns A. and B., the value of the argument will have to change with 2 on the 3 Otherwise, the formula will return the result from the same inserted column.

Using Search board/INDEXYou can delete or add columns to the test range without distorting the result, since the column containing the desired value is defined. Indeed, this is a great advantage, especially when working with large data volumes. You can add and delete columns without worrying that it will be necessary to correct each function used. Pr.

3. There is no limit on the size of the desired value. Using Prremember a limit on the length of the desired value in 255 characters, otherwise you risk getting an error #Value! (# Mean!). So, if the table contains long strings, the only existing solution is to use it INDEX/Search board.

Suppose you use this formula with Prlooking for in cells from B5. before D10 The value specified in the cell A2.:

VLOOKUP (A2, B5: D10,3, False)
\u003d VD (A2; B5: D10; 3; lie)

The formula will not work if the value is in the cell A2. Long than 255 characters. Instead, you need to use a similar formula INDEX/Search board:

Index (D5: D10, Match (True, Index (B5: B10 \u003d A2.0), 0))
\u003d Index (D5: D10; search board (truth; index (B5: B10 \u003d A2; 0); 0))

4. More high speed Work. If you work with small tables, the difference in the speed of Excel will most likely not be noticeable, especially in recent versions. If you work with large tables that contain thousands of rows and hundreds of search formulas, Excel will work much faster when used Search board and INDEX instead Pr. In general, such a replacement increases speed excel works on the 13% .

Influence Pr Excel performance is especially noticeable if the working book contains hundreds of complex array formulas such as PRP + sums. The fact is that checking each value in the array requires a separate function call Pr. Therefore, the more values \u200b\u200bcontain an array and more formulas The array contains your table, the slower works Excel.

On the other hand, formula with functions Search board and INDEX Just makes a search and returns the result, performing similar work noticeably faster.

Index and search - examples of formulas

Now that you understand the reasons because of which it is worth studying the functions Search board and INDEXLet's go to the most interesting and see how theoretical knowledge can be applied in practice.

How to search on the left side using the search and index

Any textbook by Pr It means that this feature can not look left. Those. If the column viewed is not an extreme left in the search range, then there is no chance to get from Pr The desired result.

Functions Search board and INDEX Excel is much more flexible, and they are still as much as there is a column with the value that needs to be removed. For example, we will return to the table with the capitals of states and the population. This time we write the formula Search board/INDEXwhich will show what place in the population is occupied by the capital of Russia (Moscow).

As can be seen in the figure below, the formula is perfectly coping with this task:

Index ($ A $ 2: $ A $ 10, Match ("Russia", $ b $ 2: $ B $ 10,0))

Now you should not have any problems with understanding how this formula works:

First, we use the function Match (Location), which finds the "Russia" position in the list:
Match ("Russia", $ b $ 2: $ b $ 10,0))
\u003d Search board ("Russia"; $ b $ 2: $ b $ 10; 0))
Next, set the range for the function Index. (Index) from which you need to extract. In our case, it A2: A10..
Then connect both parts and get the formula:
Index ($ A $ 2: $ A $ 10; Match ("Russia"; $ b $ 2: $ b $ 10; 0))
\u003d Index ($ A $ 2: $ A $ 10; search board ("Russia"; $ b $ 2: $ b $ 10; 0))

Prompt: The correct solution will always use absolute links for INDEX and Search boardSo that the search ranges do not move when copying the formula to other cells.

Calculations using an index and exposure in Excel (SRVNAV, Max, Min)

You can attach other Excel features in INDEX and Search board, for example, to find the minimum, maximum or the nearest to the average value. Here are some options for formulas, as applied to the table from:

1. Max (MAX). Formula is a maximum of the column D. C. of the same row:

Index ($ C $ 2: $ C $ 10, Match (MAX ($ D $ 2: I $ 10), $ D $ 2: D $ 10,0))
\u003d Index ($ C $ 2: $ C $ 10; search board (max ($ D $ 2: I $ 10); $ D $ 2: D $ 10; 0))

Result: Beijing

2. MIN. (Min). Formula is a minimum in the column D. and returns a value from the column C. of the same row:

Index ($ C $ 2: $ C $ 10, Match (Min ($ D $ 2: I $ 10), $ D $ 2: D $ 10.0))
\u003d Index ($ C $ 2: $ C $ 10; search board (min ($ D $ 2: I $ 10); $ d $ 2: D $ 10; 0))

Result: Lima.

3. Average. (SRNVOW). Formula calculates the average in the range D2: D10.then comes the nearest to it and returns the value from the column C. of the same row:

Index ($ C $ 2: $ C $ 10, Match (Average ($ D $ 2: D $ 10), $ D $ 2: D $ 10,1))
\u003d Index ($ C $ 2: $ C $ 10; search board (CPNAH ($ D $ 2: D $ 10); $ d $ 2: D $ 10; 1))

Result: Moscow.

What needs to be remembered using the function of the SRVNOV together with the index and the search

Using a function Srnzoke In combination S. INDEX and Search boardas the third function argument Search board Most often it will be necessary to indicate 1 or -1 In case you are not sure that the viewable range contains a value equal to the average. If you are sure that there is such a value - put 0 To search for accurate coincidence.

If you specify 1 The values \u200b\u200bin the search column must be ordered ascending, and the formula will return the maximum value smaller or equal to the average.
If you specify -1 The values \u200b\u200bin the search column must be ordered in descending order, and returned will be the minimum value, more or equal to the average.

In our example, the value in the column D. Ordered ascending, so we use the type of comparison 1 . Formula INDEX/Search byZ. Returns "Moscow", since the population of the city of Moscow is the closest less than the average value (12 269 006).

How to search the index and search for search by a known line and column

This formula is equivalent to two-dimensional search. Pr and allows you to find a value at the intersection of a specific string and column.

In this example, the formula INDEX/Search board It will be very similar to the formulas that we have already discussed in this lesson, with the difference only. Guess what?

How do you remember the syntax function Index. (Index) allows you to use three arguments:

Index (Array, Row_num,)
Index (array; server number; [number_stolbits])

And I congratulate those of you who guessed!

Let's start with the fact that we write the formula template. To do this, take the formula already familiar to us. INDEX/Search board and add another function to it Search boardwhich will return the column number.

Index (Your Table, (Match (, column0)), (Match (, a string in which to search,0))
\u003d Index (your table, (Match ( vertical search value,column0)), (Match ( value for horizontal search,a string in which to search,0))

Note that for a two-dimensional search you need to specify the entire table in the argument array. (array) functions Index. (INDEX).

And now let's test this pattern in practice. Below you see a list of the most populated countries in the world. Suppose our task to know the US population in 2015.

Well, let's write the formula. When I need to create a complicated formula in Excel with nested functions, then I first record each invested separately.

So, let's start with two functions Search boardwhich will return the line numbers and column for the function INDEX:

Search for column - We are looking at the column B., or rather in the range B2: B11., value that is specified in the cell H2. (USA). The function will look like this:
Match ($ H $ 2, $ B $ 1: $ B $ 11.0)
\u003d Search Company ($ H $ 2; $ B $ 1: $ B $ 11; 0)
4 because "USA" is the 4th element of the list in the column B. (including title).
Search for string - We are looking for a cell value H3. (2015) in the string 1 , that is, in cells A1: E1:
Match ($ H $ 3, $ A $ 1: $ E $ 1.0)
\u003d Search Company ($ H $ 3; $ A $ 1: $ E $ 1; 0)

The result of this formula will be 5 Since "2015" is in the 5th column.

Now insert these formulas to the function INDEX and voila:

Index ($ A $ 1: $ E $ 11, Match ($ H $ 2, $ B $ 1: $ B $ 11.0), Match ($ H $ 3, $ A $ 1: $ E $ 1,0))
\u003d Index ($ A $ 1: $ E $ 11; search board ($ H $ 2; $ B $ 1: $ B $ 11; 0); search board ($ H $ 3; $ A $ 1: $ E $ 1; 0))

If you replace the functions Search board The values \u200b\u200bthat they return, the formula will become easy and understandable:

Index ($ A $ 1: $ E $ 11,4,5))
\u003d Index ($ A $ 1: $ E $ 11; 4; 5))

This formula returns a value at the intersection. 4th Rows I. 5th Column in the range A1: E11, that is, the cell value E4.. Simply? Yes!

Search for several criteria with index and search

In the textbook Pr We showed an example of a formula with a function Pr To search for several criteria. However, a substantial restriction of such a solution was to add auxiliary column. Good news: Formula INDEX/Search board Can search by values \u200b\u200bin two columns, without the need to create auxiliary column!

Suppose we have a list of orders, and we want to find the amount on two criteria - buyer name (Customer) and product (Product). The case is complicated by the fact that one buyer can buy several different products at once, and buyers' names in the table on a sheet Lookup Table Located in random order.

Here is such a formula INDEX/Search board Solves the task:

(\u003d Index ("Lookup Table"! $ A $ 2: $ C $ 13, Match (1, (A2 \u003d "Lookup Table"! $ A $ 2: $ A $ 13) *
(B2 \u003d "Lookup Table"! $ B $ 2: $ B $ 13), 0), 3))
(\u003d Index ("Lookup Table"! $ A $ 2: $ C $ 13; Search Company (1; (A2 \u003d "Lookup Table"! $ A $ 2: $ A $ 13) *
(B2 \u003d "Lookup Table"! $ B $ 2: $ B $ 13); 0); 3))

This formula is more complicated by the others that we have discussed earlier, but armed with knowledge of functions. INDEX and Search board You will overcome it. The most difficult part is a function. Search boardI think it needs to be explained first.

Match (1, (A2 \u003d "Lookup Table"! $ A $ 2: $ A $ 13), 0) * (B2 \u003d "Lookup Table"! $ B $ 2: $ B $ 13)
Search Company (1; (A2 \u003d "Lookup Table"! $ A $ 2: $ A $ 13); 0) * (B2 \u003d "Lookup Table"! $ B $ 2: $ B $ 13)

In the formula shown above, the desired value is 1 , and the search array is the result of multiplication. Well, what should we multiply and why? Let's analyze everything in order:

Take the first value in the column A. (Customer) on a sheet Main Table and compare it with all the names of buyers in the table on the sheet Lookup Table (A2: A13).
If the coincidence is found, the equation returns 1 (Truth), and if not - 0 (FALSE).
Next, we do the same for column values. B. (Product).
Then we turn the results obtained (1 and 0). Only if the coincidences are found in both columns (i.e. both criteria are true), you will get 1 . If both criteria are false, or only one of them is performed - you will receive 0 .

Now you understand why we asked 1 as a desired value? That's right to function Search board Returned the position only when both criteria are performed.

Note: In this case, it is necessary to use the third not mandatory argument of the function. INDEX. It is necessary, because In the first argument, we set the entire table and must specify the functions from which column you need to be removed. In our case, this is a column C. (SUM), and therefore we introduced 3 .

And finally, because We need to check each cell in the array, this formula must be an array formula. You can see it according to braces in which it is enclosed. Therefore, when you finish entering the formula, do not forget to click Ctrl + Shift + Enter.

If everything is done correctly, you will receive the result as in the figure below:

Index and search board in combination with if in an Excel

As you probably have already noticed (and more than once), if you enter an incorrect value, for example, which is not in the viewed array, formula INDEX/Search board Reports about the error # N / A (# N / d) or #Value! (# Mean!). If you want to replace such a message on something more understandable, you can insert the formula with INDEX and Search board in function If serviced.

Syntax function If serviced Very simple:

IFERROR (VALUE, VALUE_IF_IRROR)
If utility (value; value_If_ error)

Where is the argument value (value) is the value verified for an error (in our case - the result of the formula INDEX/Search board); A argument value_if_error (The value of the error) is the value you want to return if the formula is issued an error.

For example, you can insert into the function If serviced So this way:

IFERROR (Index ($ A $ 1: $ E $ 11, Match ($ G $ 2, $ B $ 1: $ B $ 11.0), Match ($ G $ 3, $ A $ 1: $ E $ 1,0))
"The coincidences were not found. Try again!") \u003d If the index ($ A $ 1: $ E $ 11; Search Company ($ G $ 2; $ B $ 1: $ B $ 11; 0); search board ($ G $ 3; $ A $ 1 : $ E $ 1; 0));
"The coincidences were not found. Try again!")

And now, if someone enters the erroneous value, the formula will issue this result:

If you prefer in case of an error, leave the cell blank, you can use quotes (""), as the value of the second argument function If serviced. Like this:

IFERROR (INDEX (array, Match (desired_station, viewed by_Massive, 0), "")
If the error (index (array; search board (the desired_station; viewed_massive; 0); "")

I hope that at least one formula described in this textbook seemed helpful to you. If you faced other search tasks for which you could not find a suitable solution among the information in this lesson, feel free to describe your problem in the comments, and we will all try to solve it together.

User's manual.

Natclass

the name of the operation

Building classification and analysis of genomic sequences.

Conditions, subject to which the operation is possible

Procedure for performing an operation

Preparatory actions

Basic steps in the desired sequence.

The input data for the program are two sequence samples in FASTA format: POSITIVE SEQUENCES (Sequences samples, negative sequences (sampling random sequences, or contrast genomic sequences).

Use the menu command to download data Source -\u003eAdd.Positive.Sequences. (Fig. 1) or toolbar button. A wizard appears on the screen and offers to specify a file name with a positive / negative sequence sample.

As input data, project files saved earlier with this program can be. In such file files, all data that has been loaded or obtained at the time of saving can be saved.

2. Installation of program parameters. Starting the generation of patterns.

On the first bookmark, "Rules. ", There are elements of finding patterns (Fig. 2). It is necessary to specify the settings for finding patterns and click the "Start" button.

Search parameters are:

Confidence Interval : minimum conditional probability;

Min. Level of CP. : threshold for the value of the Fisher's criterion;

Size of Finish Buffer : number of regularities detected;

Size Of Sub Buffers : The size of auxiliary buffer patterns.

Also selected mode of operation: fixed positions (Fixed Positions. ) or sliding window mode (SHIFT POSITIONS ). The latter is used to recognize along the long genomic sequence and requires specifying the window size (Width of Scanning Frame)

The program allows you to suspend the process of generating patterns (by pressing the "pause" button), or stop the process (by pressing the "Stop" button).

Fig. 2. Booking elements of finding patterns.

At the end of the process of finding regularities, the program issues the message "The process of finding patterns has been successfully completed." As a result, the regularities found are represented by the user in the order of their detection (Fig. 3).

Fig. 3 detected patterns.

3. Building ideal classes objects.

In addition to detected patterns, output dataNatclass The ideal representatives of classes are also. To build them servesbookmark "Objects. "Programs (Fig. 4). Ideal objects can be built either by initial objects from a positive training sample (option "original Objects. "), Either by laws ("regularities. "). You can also choose one of the three options for the construct algorithm (iDEALIZATION TYPE ), setting the priorities between the removal and addition of signs. After building ideal objects, the program refer to the objects of learning to one of the detected classes, or recognizes them as belonging to the new class "New ". By analogy with the process of generating patterns, the process of idealization can be suspended (by pressing the "pause" button), or stop (by pressing the "Stop" button).

Fig. 4. Laying elements of building ideal objects.

Upon completion of the process of finding regularities, the program issues a message "The Idealization Process is successfully completed."

4. Application of the issued patterns. Calculation of recognition errors.

Bookmark Classes Contains the processing functions of the resulting output data (Fig. 5).

The following functions are available here for analyzing account results: Classification of control samples ("Classification "), Recognition about the available classes ("RECOGNITION CONTROL DATA. "), Counting the errors of recognition ("RECOGNITION ERRORS COUNT "), BOOTSTRAP procedure.

To download control sequences, use the menu command Control -\u003eAdd.ControlPositive.

When calculating the recognition errors, the program will display the optimal result (build a histogram), but the user can adjust it independently by setting either the recognition threshold ("Recogn.Level. "), Or the value of the first kind error (" 1 st.level Error ").

Fig. 5. Bookmark "Clases.”.

When you press right-clicking on the perfect object, it is possible to delete an object ("Delete. ", Fig. 6), show class objects ("Show Objects. ", Fig. 7), regularities ("Show Regularities. "), Prediction matrix ("Prediction Matrix. "), Recognition matrix ("RECOGNITION MATRIX ", FIG. eight)

Fig. 6 Operations manufactured by the program with an ideal object.

Fig. 7 Operation of the display of class objects "Show.objects.”.

Fig. 8 Operation of the display of class object recognition matrixRecognitionMatrix.”.

Recognition and error results are saved by the program in the formhTML -Tablitsa.

Control example: Building a classification and analysis of transcription factor binding sites (SSTP)EGR 1.

The program is sent to the program as a positive sample.SSTF EGR 1:

\u003e S1916;

gtccgtgggt.

\u003e S4809;

tTGGGGCGA.

\u003e S6067;

gagggggcgg

file Egr1_pos.seq.

As negative - random sequences generated with the same nucleotide frequencies as positive sequences:

\u003e S1916. ; _ N1_h1_w1;

gggTTGGC.

\u003e S1916. ; _ N1_h2_w1;

gggCgtttcg.

\u003e S1916. ; _ N1_h3_w1;

ggtgggctotal

file nEG._2200. sEQ.

To download the input, see the user manual, paragraph 1

2. Installation of program parameters. Starting the generation of patterns.

Search parameters are installed:

Confidence Interval: 0.05;

Min. LEVEL OF CP: 0 ,8 ;

Size of Finish Buffer: 2000;

Size of Sub Buffers: 100.

The program has discovered 2000 patterns (Fig. 9).

Fig. 9 patterns that satisfy the search parameters.

3. Building ideal classes objects.

As a result of the program, one class was discovered. The perfect class object and the prediction matrix is \u200b\u200bshown in Fig. 10.

Fig. 10. Perfect class object and prediction matrix for SSTPEgr.1.

4. Application of the issued patterns. Calculation of recognition errors.

As negative control, sequences generated with the same frequency of nucleotides as positive sequences were taken. Filecontrol _ nEG._1000. sEQ.. The program was classified, calculating the weight of each object, and recognition (Fig. 11).

Fig. 11. Classification and recognition of control objects for SSTPEgr.1.

According to functional programming, which is conducted under the auspices of the FP functional programming fund (FP). Traditionally, I would like to summarize the contest and talk about solving a competitive task using the Haskell programming language. So all interested I invite you to get acquainted with this little note.

As a task for the competition, a task was proposed for the search for regularities in a number of manifestations it would seem the "random" event. But as well as everything in this world is purely random, apparently, the results of measuring quantum states, so that in the whole other you can find some patterns. So here. A list of dates was given when a certain event occurred, and was offered to give answers to two questions:

What is the minimum period in which the frequency probability of the event is at least one day of the period or more than 50%?
It was necessary to give a forecast for the manifestation of the event from the date of the competition until the end of this year.

Only two contestants were able to provide decisions. However, both of them were wrong, since the correct answer to the first question is the number 24. But the second task will be processed at the end of the year, when statistics on event manifestations are revealed. So the prize for the first question remained unlawful, and the prize for the second question will be provided to the contestant, the forecast of which will score more points next year.

Well, here it remains to consider the solution of these tasks in the Haskell programming language.

Search period

First of all, a list of events manifestations was defined:

Dates :: Dates \u003d ["09/27/2013", "06.10.2013", "10/23/2013", "06.11.2013", "11/26/2013", "27.11.2013", "21.12.2013", "30.12 .2013 "," 06.01.2014 "," 01/16/2014 "," 01/17/2014 "," 01/21/2014 "," 01/21/2014 "," 01/26/2014 "," 02/06/2014 "," 11.02.2014 "," 21.02.2014 "," 03/02/2014 "," 03/07/2014 "," 03/30/2014 "," 04/08/2014 "," 04/18/2014 "," 04/24/2014 "," 04/23/2014 ", "05.05.2014", "15.05.2014", "05/15/2014", "05/18/2014", "05/19/2014", "05/20/2014", "25.05.2014", "05/26/2014", "28.05 .2014 "]

Even a quick look will show that there are no explicit patterns in this list. But the meditating over him for several days a row can identify something. In order not to indulge in a senseless time of time on meditation, it was decided to write a program to search for patterns in periods of different lengths. Let's start writing this program, as is usually in developing in functional programming languages, from top to bottom. Here is the main function of the program:

Main :: Io () Main \u003d Do Putstrln ("In the sequence of dates are found" ++ "patterns in the minimum period" ++ show (Length FindminimalPeriod) ++ "days.") Revealsequency FindminimalPeriod

There is a challenge of two functions that we will look at below. The first, FindminimalPeriod, returns a period of minimum length for a given probability threshold. The attentive reader will notice that no arguments are transmitted, so the probability threshold is defined somewhere in the form of a constant. In general, this is a vicious practice for the functional programming paradigm, but sometimes in research purposes it helps to quickly achieve results. So here is the definition:

Significance :: int Significance \u003d 5 LowProbability :: Float LowProbability \u003d 0.5 Findminimalperiod :: [(int, Float)] Findminimalperiod \u003d Head $ Filter (L -\u003e Maximum (Map SND L)\u003e \u003d LowProbability) $ Map Process

Significance constant determines the minimum "height" of the cylinder to which the time scale is wound (after all, in order to find periods, you can note the dates of the event on a long tape, which is then coiled to the cylinder with a given circumference long, which determines the period; respectively, the patterns will be Look like vertical lines). Well, the LowProbability constant sets the minimum probability threshold for the event. The FindminimalPeriod's FINDMALPERIOD feature takes the list head obtained after filtering the list for the presence of a probability of an at least a specified threshold, which (list) is obtained using processing (Process function) numbers from 1 to a certain top border.

The upper limit is determined using the Interval function, the definition of which is as follows:

Interval :: INT INTERVAL \u003d FINISH - START WHERE STARTING \u003d STRINGTODAYFYEAR $ Head Dates Finish \u003d 365 + StringTodayofyear (Last Dates)

As can be seen, here we consider the length of the interval in which the dates of the events are specified. We subtract the first dates first (it would be possible to add 1). This feature is not very good because it has a number 365, which means that it is not universal. Anyway. In addition, the previous function (FindminimalPeriod) is generally written out of the hand badly and can throw the error of the execution time due to the lack of checking on the void of the list transmitted to the HEAD function.

Now let's turn to the definition of the Process function:

Process :: Fractional A \u003d\u003e int -\u003e [(int, a)] Process P \u003d Map (L -\u003e (FST $ Head L, ((/ )` on` fromInteger) (SUM $ MAP SND L) (Toenum $ Length L))) $ groupby ((\u003d\u003d) `on` fst) $ sortby (comparing fst) $ MAP (first (` mod` p). I -\u003e if I` elem` DS3 THEN (I, 1) ELSE (i, 0)) WHERE DS1 \u003d Map StringTodayofyear Dates DS2 \u003d Uncurry (++) $ Second (Map (+ 365)) $ span (\u003e \u003d head DS1) DS1 DS3 \u003d MAP (Subtract (Head DS2)) DS2

The function gets to the input length of the period, and returns a list of pairs (histogram), in which the first element is the number of the day in the period, and the second is the frequency probability of the events on this day. Using the local definitions of DS1, DS2 and DS3, a list of consecutive numbers of occurrence of events occurs, starting from the first day in the DATES list. Next, this list is subjected to such a procedure. For all rooms from 1 to the number of the last date of the event, the residue from dividing the length of the period is searched. For all such residues, the flag is set, if there were no events in the appropriate day, and the flag 1 - if it was. Then the list of residues with flags is grouped by remnants, after which the groups are collapsed into the pair of the species (day number in the period, the probability of an event manifestation). Everything.

Here you have to consider two more service functions:

Double :: A -\u003e (A, A) DOUBLE X \u003d (X, X) STRINGTODAYFYEAR :: STRING -\u003e INT STRINGTODAYFYEAR \u003d UNCURRY (MONTHANDYTODAYFYEAR FALSE). (Read. Take 2. Drop 3 *** Read. take 2). Double.

There is nothing about the first and talk (there is only strange that its definitions are not in the standard Prelude module; although it is understandable, since it is so trivial). The second function translates the date from the string view "dd.mm.yyyy" into the numeric, accepted in the DATA.Time.calendar.monthday module, with which the dates are processed.

Finally, we define the function Revealsequences:

Revealsequences :: [INT, FLOAT)] -\u003e IO () Revealsequences PS \u003d Do Let L \u003d Length PS (D1, P1) \u003d Maximumby (Comparing SND) PS (D2, P2) \u003d Maximumby (comparing SND) $ delete ( d1, p1) PS Putstrln ("Maximum manifestation of an event (probability:" ++ show p1 ++ ") occurs on" ++ show (d1 + 1) ++ "-Y day" ++ show l ++ " period. ") Putstrln (" The second maximum (probability: "++ show P2 ++") occurs on "++ show (d2 + 1) ++" -Y day. ")

As you can see, it simply finds and displays two peaks of the event in the specified list (histogram) on the screen. This feature is implemented solely for the convenience of research and is no longer relevant to solving the task set in the competition.

Forecasting

Now let's turn to the second issue of the competition - forecasting the dates of the event by the end of the year. It should be noted that in general the forecasting task is ungrateful. One question is when we have a clear pattern (at least even probabilistic), and you can more or less apply a deterministic formula. It is quite another thing when there is some (very small) the number of values, on the basis of which it is necessary to build a possible future. Points of Bifurcations of Miriad, so that even tolerances and interval arithmetic will not help. Nevertheless, you can come up with some method and try to estimate its applicability already by the rear mind when a certain period of time goes, and we will be able to compare the overall values \u200b\u200bwith actually happened.

We will do here exactly that way. As a method, you can offer such. Consider all possible periods, ranging from the minimum found at the previous stage (smaller it, to be honest, do not cause confidence in terms of statistical plausibility) and ending with the length of the known period of observations divided by the rate of admission ("thickness" as we called earlier) . For each of these periods, we obtain the probability of the manifestation of events at every day of the period, as we have learned how to do this using the Process function. Next, just find an average likelihood for each day in all probabilities found for each period under consideration.

It is this method that implements the following function:

Forecast :: FilePath -\u003e String -\u003e String -\u003e Io () Forecast FP SD FD \u003d Do Let (B, E) \u003d (Length Findminimalperiod, Interval `Div` Significance) WriteFile FP $ unlines $ Map (((n, q ) -\u003e Let (M, D) \u003d DayofyeartomonthandDay False (N - 365) in PRETTYSHOWINT D ++. "" ++ PrettyShowint M ++. .2014: "++ Prettyshowfloat Q). Second (/ ToeNUM (E - B + 1))) $ FOLDR1 (zipwith ((d, q1) (_, q2) -\u003e (d, q1 + q2))) $ Map GetProbabilities WHERE GetProbabilities P \u003d Let DS \u003d StringTodayofyear $ Head Dates FS \u003d 365 + StringTodayofyear (Last Dates) D1 \u003d 365 + StringTodayofyear SD D2 \u003d 365 + StringTodayofyear FD in Drop (Interval + (D1 - FS)) $ zipwith (x (_, q) -\u003e (x, q)) $ Cycle $ Process P LEADINGZERO :: STRING -\u003e STRING LEADINGZERO [C] \u003d "0": [C] LEADINGZERO C \u003d C PrettyShowint i \u003d LEADINGZERO $ show i prettyshowfloat f \u003d let (d, r) \u003d span (/ \u003d ".") $ Show ( F * 100) in LEADINGZERO D ++ Take 5 (R ++ Cycle "0")

Its definition looks somewhat monster, but the core of the calculations is the local definition of getProbabilities (from the title should be clear what step of the method it corresponds to). The rest is just a bind to output the received values \u200b\u200bto the file in the specified conditions of the competition format.

In general, everything. Now it remains only to wait for the end of the year and compare the forecast with the fact.

Service