What is mp3 format. What is mp3? Does mp3 have competitors

In 1987, the Fraunhofer Institute began work on a perception-based audio coding algorithm at EUREKA, project EU147: Digital Audio Broadcasting (DAB). In collaboration with the University of Erlangen (Prof. Dieter Seitzer), Fraunhofer IIS conceived and developed a very powerful algorithm that has been standardized as ISO-MPEG Audio Layer-3 ( IS 11172-3 and IS 13818-3).

Various information on the developments of the Fraunhofer Institute can be found at

General information

The format is sometimes confused with MPEG-3, but MP3 is intended for compression of audio information only and the full name sounds like MPEG Audio Layer-3. MPEG-3 was intended for use in high-definition television (HDTV) systems with a bitrate of 20-40 Mbps, but later became part of the MPEG-2 standard and is no longer mentioned separately.

This is not to say that this audio compression format easily made its way to the mainstream. At the initial stage of promotion, Fraunhofer is the institute that created MP3, almost killed his own brainchild with excessively hasty greed (by the way, many developers of new audio formats make the same mistake), but seeing that no one wants to pay money for a pig in a poke, took the only right step - made this format open and free. Say that after that MP3 became popular - it means to say nothing. It was an explosion in popularity! This audio format, which so briskly went to the masses, had an unrealistically high sound compression ratio at that time with a sufficiently high sound quality and easily conquered any user who loved to listen to music. An entire industry emerged with lightning speed: MP3 sites that only dealt with MP3, manufacturers of software and hardware music players in MP3 format, illegal distributors of musical compositions, better known as pirates, who were the first to think of the release of discs like “all songs of this group on one disc”. The demand for CD-R writers and blank discs for them has grown tremendously. MP3 now it is a recognized audio format everywhere. It is used in games, codecs are built into operating systems. For several years now MP3 at the top of popularity ...

But now, the euphoria caused by his appearance gradually passed, and it became clear that MP3 not flawless at all. Despite the high compression ratio, the file size is still too large to make MP3 a truly network format, and 128 kbps, so beloved by pirates and the people, gives such a low quality that not only advanced musicians and music lovers, but also quite ordinary users hear flaws during playback. In light of this, the need arose for new alternative, more progressive sound compression algorithms. And such algorithms appeared very quickly. Some of them began to be developed almost simultaneously with MP3(for example VQF), but for some reason they came out later and missed the palm, some algorithms were developed and positioned by the developers as a replacement MP3(MPEG-2 AAC family). Nevertheless, in terms of their capabilities and quality, these algorithms are in many ways superior to MP3.

MP3Pro

An enhanced version from Coding Technologies that uses Spectral Band Replicaton - SBR to improve its efficiency for bitrates below 96 kbps stereo. Not being part of the MPEG standard, mp3PRO support is provided only by some software and hardware products, for example Thomson demo player / encoder and input plugin for Winamp, MusicMatch JukeBox, Nero, dbPowerAMP, JetAudio, Steinberg myMP3PRO, Impload, Spacial Audio, Audion 3, RCA Lyra portable and DVD player. This may not change in the future, however some auditory tests have shown good performance at low bit rates when compared to other codecs.

Implementation

Lame (Lame Ain't Mp3 Encoder)

Development of the Lame began around mid-1998. Mike Cheng started to improve and fix the source codes of the 8hz-MP3 encoder. After some doubts from the community about its quality, Mike was determined to start with a rough draft based on the dist10 sources. This branch became LAME 2.0, and it wasn't until LAME 3.81 that they got rid of all the dist10 source code, and LAME was finally no longer just a revision. The project quickly turned into a team project. Mike Cheng eventually left the leadership and started working on the MP2 coder, tooLame. Mark Taylor took the lead and version 3.0 came out with a new psychoacoustic model, gpsycho, that he had developed. Today Lame is considered the best MP3 coder at high and variable bitrates, I should say a big thank you to the talented developers who dedicated this work, such as Takehiro Tominaga, Naoki Shibata, Darin Morrison, Gabriel Bouvigne, Robert Hegemann and so on. Development of Lame continues to this day. Thus, LAME is by far the most promising coder. Remarkable quality and high availability allowed it to gain popular recognition. Used in Winamp to encode information into MP3 from Audio-CD, also available in dbPowerAmp.

Gogo-No-Coda

Gogo is a branch MP3 LAME encoder with rewritten places in assembly language that are most demanding on the processor. This made Gogo one of the fastest MP3 encoders with acceptable quality. It was developed by a team of Japanese programmers.

FhG Fastencc

It - MP3 an encoder based on coding libraries stolen from the Fraunhofer Institute. The developer is rumored to have violated their confidentiality agreement and made these libraries available to some programmers. One of these programmers made a CLI for these libraries and called it fastencc.

This encoder is known for its nasty stereo bug and is highly discouraged.

Fraunhofer IIS

This is a codec from the creators of technology MP3 and AAC. Considered the slowest MP3-codec, however, it has a fairly high quality. It is included in the standard Windows distribution and is used in Adobe audition.

Helix

After purchasing the codec Xing company RealNetworks, its development continued and the codec became known as Helix... On forums doom9 user karl_lillevold(looks like he is a developer at the company RealNetworks) announced the opening of the source code of the project Helix... The community took such a step quite warmly, the participant Enig123 Builds with fixes and improvements began to form.

The Helix community site has a description MP3 a decoder, the key points of which are optimization for ARM processors, easy connectivity in the form of a library and, in general, high-quality code.

Wookiees of a wide variety of natures surround a person from the moment of his birth. Agree, if not for them, our life has lost a lot. Just imagine that the perky chirping of birds, the bewitching noise of the sea surf, the infectious laughter of a child, and indeed the human voice in general, disappeared overnight in the world. Mortal anguish! That is why we cannot imagine our existence without sound in all its manifestations, in the musical - above all.

To be convinced of what has been said, it is enough to remember, for example, your daily stay in front of the screen of a turned on monitor. How often do you listen to music? Almost always, if not always. At the same time, few people think about what is behind the music in, on the hard drive of a PC or tablet, pocket music player and other gadgets. But this is the ubiquitous MP3 format - the key file format for storing sound in today's digital world. He will become the hero of today's story.


MP3 - what is it and how does it work?

In the language of professionals, MP3 is a third level codec (full name from English - MPEG-1 Layer 3), designed to encode and store audio with little loss for human perception. The compression algorithm used in MP3 format can significantly reduce the size of audio data (according to some estimates, up to 12 times) when compared to an Audio CD file. At the same time, the quality of sound reproduction in MP3 format is practically the same as the original. At least, the overwhelming majority of ordinary listeners are convinced of this.

Why does the sound quality remain unchanged with such multiple compression? It's pretty simple. In the process of digital processing (encoding into MP3 format) of the original audio file, certain parts of the audio stream are removed from it, which the human ear cannot distinguish. The information remaining as a result of this filtering is recorded and then reproduced in a truncated form. This is the essence of the MP3 format, in the language of amateurs.

Due to its compactness, the MP3 format has become an indispensable attribute of the digital age. Without it, the transmission and storage of most of the audio content in. It is distinguished by all popular operating systems, and is also supported by all portable and stationary audio devices without exception. In a word, MP3 is the head of everything!

Where do the legs of this head grow? Let's find out further.

The Germans are to blame for everything

It is quite common on the Internet that history of the MP3 format began in the second half of the 80s of the last century. In fact, its origins should be sought a decade earlier.

Early 1970s at the University of Erlangen-Nuremberg(Germany) gathers a group of like-minded students under the leadership of the professor Dieter Seitzer... The group's goal is to solve the problem of high-precision transmission of human speech through traditional telephone lines.

It is not known for certain what the researchers achieved on this path, because in the second half of the 70s their initial goal seemed no longer so urgent. The fact is that it was then that a real revolution took place in the telecommunications industry - the world learned about fiber-optic cable and digital communication network (ISDN). The exploitation of such innovations left the Seitzer group with its task out of work.

Among other things, the guys did not despair, turning their attention to solving another problem related to the efficient encoding (compression) of music signals.

In 1979, the determination of scientists bore the first fruits. Seitzer & Co. developed the world's first digital audio compression algorithm. In the course of work on its creation, a student named, who later became the "father" of the MP3 format, was especially zealous. It was he who turned out to be the person who first drew the attention of his colleagues to the fact that optimal compression of audio content is impossible without taking into account the peculiarities of the device of the human hearing aid.

Subsequently, under the leadership of Seitzer, Brandenburg and the rest of the team made significant improvements in digital audio compression algorithms. At the same time, it should be noted that at that time the results of their research were more theoretical than applied. Their research did not go to the masses. We haven't gone yet.


The era of CDs, or the delayed triumph of MP3

It was 1981, when the compact disc appeared to the world. Compact disc) or simply CD... Its appearance, on the one hand, heralded a new era in the recording, storage and reproduction of digital sound, and on the other, a retreat into the shadows of Brandenburg's research.

The euphoria that began after the start of mass production of CDs in 1982 has noticeably cooled the interest of the general public in the problem of digital audio information compression. And in fact - why bother with such trifles, if Audio CD allows you to store and play quite voluminous and at the same time very high-quality audio content ?! So what if not in a compressed form ?! Who cares?!

I didn’t care. But several years have passed, and the boom of digital technologies has again actualized the issue of the need for digital audio compression. There were several reasons for this. And the main ones boiled down to the following:

  • ~ firstly, in the conditions of limited disk space of most PCs of that time (up to 1000 MB) with the simultaneous growth of digital, it was necessary to figure out how to save this very space;

  • ~ secondly, the speed of digital data transmission left much to be desired, so it was necessary to find a solution how to increase it;

  • ~ thirdly, it was necessary to create a new audio recording format, which, due to its convenience (small size plus high bit rate), would become generally accepted for the software popular at that time.

And here our old acquaintances - German scientists - come on the scene again. Who, since the 70s, have been researching in the right direction.

MP3 format: gradual rise

Further events related to MP3, if we discard some details, developed like an avalanche. The following chronology allows us to verify this.

  • 1987 year - a research alliance is formed between the University of Erlangen-Nuremberg and the Fraunhofer Institute within the framework of the European Research Coordination Agency. The latter was codenamed Project EU147... Project EU147 has focused on solving the problem of digital audio broadcasting (English - Digital Audio Broadcasting). By the way, the Germans' company was diluted by the Americans and Canadians - the research divisions of AT&T Bell Labs and Thomson. This time, Karlheinz Brandenburg was at the head of the Project.

  • 1988 year - the first working prototypes of the MP3 format are created. In January of the same year, under the auspices of the International Organization for Standardization (ISO), a body was formed responsible for the development and implementation of international standards for the compression and transmission of digital video and audio content. The name of this organization corresponded to its vocation - the Expert Group on the Moving Image (eng. - Moving Picture Experts Group). In the world, it's just MPEG.

  • In April 1989 Years - The Fraunhofer Institute receives a German MP3 patent. Curiously, files of this format do not yet exist in nature.

  • 1991 year - The expert group for the new MPEG-1 digital compression standard receives 14 different proposals for the compression of audio content. Among them there is also an experimental codec ASPEC(English - Adaptive Spectral Perceptual Entropy Coding), developed by the aforementioned alliance.
    In the end (1992), the choice of MPEG was made in favor of ASPEC, which, after certain modifications and a name change, became the base codec of the third level of MPEG-1. Due to its progressive qualities, this audio compression codec soon became used separately from the MPEG-1 family for storing music on a small disk space, as well as for transferring audio files over the Internet. But that was later, but for now ...

  • ... came 1994 year, and the Fraunhofer Institute presented the pioneering software product L3enc - the world's first MP3 encoder.

  • 1995 year - files of this format are given the .mp3 extension. Until that time, in the context of research, the .bit extension was used to denote them. Thus, this year is considered to be the official birthday of the MP3 name.

  • V september another landmark event happened - the world's first operational MP3 player named WinPlay3... With its help, millions of people around the world could now first create and then play MP3 files on their PCs. The era of MP3 has begun!

World domination mp3

The next history of the MP3 format is a statement of its total expansion in the world.

  • 1996 year - the MP3 format is patented in the USA. In addition, the popular satellite radio network in those years Worldspace radio one of the first to announce the use of the MP3 format for encoding audio information.

  • 1997 year - the mp3.com portal starts on the Web. Initially, on its pages, it accumulated the most relevant information regarding the new format (data on encoders, players, etc.). After some time, the resource turned into the largest legal archive of MP3 music files on the planet. In principle, it remains so to this day.

  • 1998 year - appearing on store shelves of portable MP3-players. The first were “ Rio 100"In the USA and" MPMAN»In South Korea. Time of disc players " Walkman"From Sony and their counterparts was relentlessly drawing to a close.
    This year is also significant in that the Fraunhofer Institute (after an overwhelming success Winamp) began to demand that everyone who, in one way or another, tried to commercially exploit (on mp3) the compression algorithms patented by him, to purchase an appropriate license. MP3-freebie ordered to live long!

  • 1999 year - record company SubPop was the first in the world to dare to distribute music tracks in MP3 format.

  • 2000 year - a real boom in sales of devices supporting the new format broke out in the United States. Since then, millions of MP3-oriented technology have been sold annually in the United States. And all over the world, like mushrooms after a rain, companies specializing in the production of all kinds of mp3-devices are beginning to appear. All this proved that this format is gradually turning into a cultural phenomenon of the new millennium.

  • 2004 year - the notorious German developers are working on further improving the algorithms for compressing audio content and, as a result, present an updated format to the public - MP3 surround... Now playing mp3 files allows you to achieve stereo sound!

  • 2007 the year was remembered for the pompous celebration of the twentieth anniversary of the successful work, primarily of German researchers, in the development of digital audio coding algorithms.

  • After a stormy party, work in the indicated direction continued. These are the Germans, guys! V 2009 the Fraunhofer Institute together with Technicolor showed the world MP3 HD... The updated format allows for optimal compression, while guaranteeing maximum sound quality without the slightest loss in the original sound stream.

Curious epilogue

Talking about a Cinderella named MP3 can be endlessly long, but the scope of one material simply will not stand it. Therefore, in conclusion, I would like to express my gratitude to the clever Germans, thanks to whom our life and sound have become truly inseparable!

By the way, practical Germans recognize not only abstract gratitude. According to the latest estimates, advances in MP3 technology are generating over 10,000 jobs in Germany. The taxes that the German treasury receives as a result of the commercial exploitation of MP3 algorithms exceed 300 million euros per year. And the Germans themselves spend over 1.5 billion euros annually on mp3 players and related accessories. A great bonus to everyone's thanks :)

Today it is difficult to find someone unfamiliar with the three-letter acronym - MP3. But when you start asking what it is and how it is deciphered, some people shrug their hands in bewilderment, while others say: “What are you? This is this MP3 music! ”. In addition, many mobile phones support MP3. We know about this, and then, and more? :) It is clear that very few people understand what this format is. In this article I will just explain what it is all the same, this MP3.

MP3 is the most popular format for storing and transmitting information in digital form, using signal compression. The MP3, or MPEG Audio Layer-3, format was developed by Fraunhofer IIS and Thomson. Compared to WAV files, which are copies of Audio CD tracks (PCM, 16 bit, Stereo, 44.1 kHz), MP3 songs take up much less disk space. An ordinary CD-R / RW blank can store over 11 hours of music of quite decent quality.

For MP3, many excellent programs have been written (encoders, players, etc.), the production of hardware (stationary, pocket and car) players has been established, every modern phone supports MP3 melodies (and even has a built-in player for their convenient playback). Compared to many other audio compression formats, MP3 provides the best sound quality and is now perhaps the second most popular after Audio CD.

MP3 format description

The MP3 audio compression format (short for MPEG Layer3) was one of the first popular audio compression methods. Developed by the German company Fraunhofer IIS and later, with the support of THOMSON, implemented as part of the MPEG1 and MPEG2 video formats. Provides high quality sound with relatively small file sizes.

MP3 technical details

A high compression ratio in MP3 is achieved due to a rather complex encoding algorithm. Both mathematical methods of compression and features of human hearing (psychoacoustic model) are used: the effect of masking a weak sound of one frequency with a louder sound of the same or an adjacent frequency, lowering the ear's sensitivity to a quiet sound immediately after a loud one, immunity to sounds below a certain volume level.

During encoding, the audio stream is divided into equal sections (frames). Each of the frames is encoded separately with its own parameters and contains a header in which these parameters are specified. Compression can be performed with different quality and, accordingly, the size of the final file.

The compression ratio is characterized by the bitrate - the amount of information transmitted per unit of time. MP3 files are usually encoded with a bit rate from 64 to 320 kilobits per second (kbps or kb / s), as well as with a variable bit rate (VBR) - when each frame uses its own, optimal for a given section, bit rate.

Using filters, the original signal is divided into several frequency ranges, for each range, the amount of masking effect from adjacent ranges and the previous frame is determined, insignificant signals are ignored. For the remaining data, for each band, it is determined how many bits can be sacrificed to keep the loss below the masking value. This completes the work of the psychoacoustic model, and the final stream is additionally compressed using the Huffman algorithm (similar to the RAR archiver).

At a bitrate of 320 kbps, only the final compression is applied, without psychoacoustic modeling. Keep in mind that different codecs can encode the audio signal differently, the differences are especially evident at high frequencies and low bitrates. The MP3 format encodes a stereo signal, and several conversion options are possible:

  • Dual Channel - each channel receives half of the stream and is encoded separately - recording of two completely different signals is possible.
  • Stereo - each channel is encoded separately, but the encoder program can use the free space of one channel to accommodate the information of the other. Stereo is the default in most encoders.
  • Joint Stereo (MS Stereo) - the stereo signal is decomposed into a common signal for both channels and a difference one. Has an option - MS / IS Stereo with a simplified difference signal.

Strengths of MP3 format:

  • High compression ratio with acceptable sound quality.
  • The compression ratio and quality can be adjusted by the user.
  • The frame structure is convenient for transmission over the network, it allows you to go to any place in the file.
  • Widespread use of hardware and software.

Features of MP3 application

Despite the fact that MP3 encoding is carried out with the loss of some of the original information, at bit rates of 256 and 320 kbps it is almost impossible to distinguish the compressed signal from the original by ear, especially when listening on common audio equipment. In this case, the file size in the worst case will be 4 times smaller than in the CD audio format.

For use in compact players and other devices with low quality acoustics, you can fully use a bit rate of at least 192 kbps. Bitrates below 192 kbps are recommended for compression of a signal with a limited frequency range or low reliability requirements (for example, a conversation or a TV broadcast).

What is the idea behind audio compression to MP3 based on?

Surely, referring to a friend, fenced off from the outside world by a "music phone", you noticed how he begins to answer your questions unnaturally loudly, since his own voice, heard by him under the roar of a rock concert, sounds unusually quiet for him - a feature of human perception. And the point here is not so much in the acuity of hearing, but in the ability of our brain to "digest" sound information: not to respond to impulses whose power is below a certain level; after a strong roar, do not hear a whisper, etc.

This is what is used when creating MP3 coders, each of which can implement its own so-called psychoacoustic model, which varies depending on the goals and objectives, where relatively weak signals can be neglected.

How do these methods work?

The original audio signal is divided into separate blocks, called frames, to each of which a special coding algorithm is applied, and the compression parameters for different frames can differ significantly from each other. In the process of encoding a block, the original signal is divided into several constituent frequency ranges. For each of them, the magnitude of the so-called masking effect of a weak signal is calculated by the more powerful one from the adjacent range or from the previous frame. Then, depending on the results, minor sounds are removed, which will not be heard by the "average" person due to the presence of a louder signal at the moment. It also takes into account the inability of most people to distinguish between high frequency signals (above 16 kHz).

Audio information compressed according to this scheme can be transmitted by a stream, for example, via the Internet, or it can be stored in MP3 files.

Bit rate and its meaning

One of the most important characteristics of an MP3 file is bit rate - the rate of the processed data stream, or the total amount of information transmitted per unit of time. This value is independent of whether the stream contains mono or stereo audio.

Fraunhofer IIS recognized the 128 Kbps bitrate as optimal for use on the Internet, and some encoder manufacturers spread the opinion that this speed is quite enough for encoding music with a quality close to Audio CD. However, it is not. On good equipment, irreplaceable loss of audio information becomes noticeable.

The higher the bit rate, the more disk space will be required to save the final MP3 file, but generally, the higher the quality of the encoded signal will be. In general, each bitrate value has its own area of ​​application.

Even professional experts with a fine ear for music are sometimes unable to distinguish on good equipment the sound of an Audio CD track and its image encoded into an MP3 file with a low compression ratio, for example 4: 1 (320 Kbps). For an ordinary music lover, this difference becomes almost imperceptible at a flow rate of 192-256 Kbps.

If you are dealing only with computer acoustic systems or inexpensive household radio equipment, then 160-192 Kbps is quite enough for encoding and subsequent listening to songs. For the compression of quickly obsolete pop music, as well as for "laying out" the music archive on the network, 128 Kbps is quite suitable. Bitrate values ​​below 128 Kbps do not allow achieving the proper sound quality. Speeds 64-96 Kbps are most often used to compress foreign language audio lessons, lectures, interviews and audio broadcasts.

For a long time, coders only supported a constant bitrate (CBR - Constant BitRate), that is, the user set a certain data rate, and the program provided the highest possible encoding quality. But it is obvious that the density of the flow of meaningful information from frame to frame is different. (Why, for example, encode pauses?) So the developers of the encoders decided to use their own bitrate for compressing each frame, that is, they set the task to minimize the data rate while maintaining the same quality level. This is how the idea of ​​VBR (Variable BitRate) was born.

I think now it became a little more clear to you what kind of music your phone "prefers". I wish you more pleasant melodies! See you in the following articles on mobime!

Today it is difficult to find a person who is unfamiliar with the three-letter abbreviation - mp3. But when you start asking what it is and how it is deciphered, some people shrug their hands in bewilderment, while others say: “What are you? This is this MP3 music! ”. In addition, many mobile phones support MP3. We know about this, and then, and more? :) It is clear that very few people understand what this format is. In this article I will just explain what it is all the same, this MP3.

MP3 is the most popular format for storing and transmitting information in digital form, using signal compression. The MP3, or MPEG Audio Layer-3, format was developed by Fraunhofer IIS and Thomson. Compared to WAV files, which are copies of Audio CD tracks (PCM, 16 bit, Stereo, 44.1 kHz), MP3 songs take up much less disk space. An ordinary CD-R / RW blank can store over 11 hours of music of quite decent quality.
For MP3, many excellent programs have been written (encoders, players, etc.), the production of hardware (stationary, pocket and car) players has been established, every modern phone supports MP3 melodies (and even has a built-in player for their convenient playback). Compared to many other audio compression formats, MP3 provides the best sound quality and is now perhaps the second most popular after Audio CD.

MP3 format description

The MP3 audio compression format (short for MPEG Layer3) was one of the first popular audio compression methods. Developed by the German company Fraunhofer IIS and later, with the support of THOMSON, implemented as part of the MPEG1 and MPEG2 video formats. Provides high quality sound with relatively small file sizes.

MP3 technical details

A high compression ratio in MP3 is achieved due to a rather complex encoding algorithm. Both mathematical methods of compression and features of human hearing (psychoacoustic model) are used: the effect of masking a weak sound of one frequency with a louder sound of the same or an adjacent frequency, lowering the ear's sensitivity to a quiet sound immediately after a loud one, immunity to sounds below a certain volume level.

During encoding, the audio stream is divided into equal sections (frames). Each of the frames is encoded separately with its own parameters and contains a header in which these parameters are specified. Compression can be performed with different quality and, accordingly, the size of the final file.
The compression ratio is characterized by the bitrate - the amount of information transmitted per unit of time. MP3 files are usually encoded with a bit rate from 64 to 320 kilobits per second (kbps or kb / s), as well as with a variable bit rate (VBR) - when each frame uses its own, optimal for a given section, bit rate.
Using filters, the original signal is divided into several frequency ranges, for each range, the amount of masking effect from adjacent ranges and the previous frame is determined, insignificant signals are ignored. For the remaining data, for each band, it is determined how many bits can be sacrificed to keep the loss below the masking value. This completes the work of the psychoacoustic model, and the final stream is additionally compressed using the Huffman algorithm (similar to the RAR archiver).

At a bitrate of 320 kbps, only the final compression is applied, without psychoacoustic modeling. Keep in mind that different codecs can encode the audio signal differently, the differences are especially evident at high frequencies and low bitrates. The MP3 format encodes a stereo signal, and several conversion options are possible:

Dual Channel - each channel receives half of the stream and is encoded separately - recording of two completely different signals is possible.
Stereo - each channel is encoded separately, but the encoder program can use the free space of one channel to accommodate the information of the other. Stereo is the default in most encoders.
Joint Stereo (MS Stereo) - the stereo signal is decomposed into a common signal for both channels and a difference one. Has an option - MS / IS Stereo with a simplified difference signal.

Strengths of MP3 format:

High compression ratio with acceptable sound quality.
The compression ratio and quality can be adjusted by the user.
The frame structure is convenient for transmission over the network, it allows you to go to any place in the file.
Widespread use of hardware and software. September 6, 2010 at 03:53 PM

Inside MP3. And how does it all work?

  • Algorithms

Once I needed to solve a simple (as it seemed to me then) problem - to find out the duration of an mp3 file in a PHP script. I heard about ID3 tags and immediately thought that duration information is stored either in tags or in the headers of the mp3 file. A superficial search on the Internet showed that it would not be possible to solve this problem in a couple of minutes. Since I'm quite curious by nature and time was running out, I decided not to use third-party tools, but to figure out one of the most popular formats on my own.

If you are interested in what is inside - welcome under the cat (traffic).

In this article, we will not dwell on extracting ID3v2 tags - this can be taken out in a separate article, since there are various nuances. And also on fragments of titles that are practically not used at the present time (for example, the Emphasis part of the title of an mp3 frame). We also do not consider the structure of the audio data itself - the very ones that we hear from the speakers.

ID3 tags

ID3 (from English Identify a MP3) is a metadata format most often used in audio files in MP3 format. The ID3 signature contains information about the track title, album, artist name, and so on, which is used by multimedia players and other programs, as well as hardware players, to display file information and automatically organize the audio collection.

Wikipedia

There are two completely different versions of ID3 data: ID3v1 and ID3v2.

ID3v1- has a fixed size of 128 bytes, which are appended to the end of the mp3 file. There you can store: track title, artist, album, year, comment, track number (for version 1.1) and genre.

It quickly became clear to everyone that 128 bytes is a very small place to store such data. And therefore, over time, the second version of the data appeared and is successfully used - ID3v2.
Unlike the first version, v2 tags are variable length and are placed at the beginning of the file, which allows streaming support. (The ID3v2.4 format also allows you to store data at the end of the file).
ID3v2 data consists of a header and subsequent ID3v2 frames. For example, in version ID3v2.3 there are more than 70 types of frames.

  • marker always equal to 'ID3'
  • There are currently three versions ID3v2.2, ID3v2.3 and ID3v2.4
    Version v2.2 is deprecated.
    v2.3 is the most popular version.
    v2.4 - gaining popularity. One of the differences from v2.3 is that it allows the use of UTF-8 encoding (not just UTF-16)
  • Flags... Currently only three (5,6,7) bits are used:
    bin:% abc00000
    a ‘unsynchronisation’ - used only with MPEG-2 and MPEG-2.5 formats.
    b ‘Extended header’ - indicates the presence of an extended header
    with ‘Experimental indicator’ - experimental indicator
  • Length... The peculiarity of specifying the length of ID3v2 data is that in each byte the 7th bit is not used and is always set to 0.
Let's consider an example:

In this case, together with the ID3v2 header (10 bytes), the ID3v2 data occupies 1024 bytes.

The ID3v2 header is followed by the actual tags. A detailed analysis of reading ID3v2 tags, as stated above, I decided not to include in this article.

Now we have information about the presence and length of ID3 tags and we can start parsing the mp3 frame and understand where the duration is stored. And at the same time to understand everything else.

MP3 frame

The entire mp3 file consists of frames that can only be extracted sequentially. The frame contains a header and audio data. Since we do not set ourselves the goal of writing a firmware for a tape recorder, we are interested in the frame header.

More about him (a bunch of tables and dry information)

The header size is 4 bytes.

Description:

Data compression modes or what is the bit rate

There are 3 data compression modes:

CBR(constant bitrate) - constant bitrate. Does not change throughout the track.

VBR(variable bitrate) - variable bitrate. With this compression, the bitrate is constantly changing throughout the track.

ABR(average bitrate) - average bitrate. This concept is used only when encoding a file. The "output" is a file with VBR.

CBR

If the file is encoded with a constant bitrate, then we can finally! get the duration of our track using the following formula:
Duration = Audio data size / Bit rate (in bits!) * 8

For example, the file is 350670 bytes in size. There are ID3v1 tags (128 bytes) and ID3v2 tags (1024 bytes). Bitrate = 96. Therefore, the size of the audio data is 350670 - 128 - 1024 = 349518 bytes.
Duration = 349518/96000 * 8 = 29.1265 = 29 seconds

VBR

It is necessary to clarify how to determine the compression mode. It's simple. If the file is compressed with VBR, then a VBR header is added. By its presence, we can understand that a variable bitrate is used.
There are two kinds of headers: Xing and VBRI.
Xing is placed with an offset from the beginning of the first mp3-frame in the position, according to the table:

For example: our ID3v2 tag is 1024 bytes. If our mp3 file has the "Stereo" channel mode, then the VBR Xing header will start with an offset of 1024 + 32 = 1056 bytes.

The VBRI header is always +32 bytes from the beginning of the first mp3 frame.

The first four bytes in both headers contain the ‘Xing’ or ‘Info’ marker for Xing. And 'VBRI' for VBRI.

These VBR headers are variable in length and contain various information about the encoding of the file. You can read more about the structure of VBR headers (and not only), for example,.

I will only tell you about what interests us at the moment. Namely - the number of frames (Number of Frames). This number is 4 bytes long.
In the Xing header, it is contained at an offset of +8 bytes from the beginning of the header. In VBRI +14 bytes from the beginning of the header.

Using the Sampler Per Frame table we can get the duration of an mp3 file encoded with a variable bit rate.

Duration = Number of Frames * Samples Per Frame / Sample Rate

For example: from the VBRI header the number of frames was 1118, samples per frame = 1152. Sample rate = 44100.
Duration = 1118 * 1152/44100 = 29.204 = 29 seconds.

That's all for today. If it was useful to someone - thanks.

For those who want to immediately dig into the insides of mp3 -

Bluetooth