EFCL Channelizer
Please use the following text to cite this item or export to a predefined format:
Klusáček, David, 2017,
EFCL Channelizer, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL),
http://hdl.handle.net/11234/1-2380.
Authors
Item identifier
Date issued
2017-01-10
Type
Description
Extremely fast digital audio channelizer implementation, usable as a building block for experimental ASR front-ends or signal denoising applications. Also applicable in software defined radios, due to its high throughput. It comes in a form of a C/C++ library and an executable example program which reads input stream, splitting it into equidistant frequency channels, emitting their data to the output.
Features:
(1) Hand tuned SIMD-aware assembly for x86 (SSE) and IA64 (AVX) as well as for ARM (NEON) processors.
(2) Generic non-SIMD C++ implementation for other architectures.
(3) Capable of taking advantage of multicore CPUs.
(4) Fully configurable number of channels and the output decimation rate.
(5) User supplied FIR of the channel separation filter, which allows to specify the width of the channels, whether they should overlap or be separated.
(6) Input and output signal samples are treated as complex numbers.
(7) Speed over 750 complex MS/s achieved on Core i7 4710HQ @ 2.5GHz, when channelizing into 72 output channels with a FIR length of 1152 samples, using 3 computing threads.
(8) Runs under Linux OS.
Acknowledgement
Ministerstvo školství, mládeže a tělovýchovy České republiky
Project code:LM2015071
Project name:LINDAT/CLARIN: Institut pro analýzu, zpracování a distribuci lingvistických dat
Collections
Files in this item
- Name
- efcl-0.3.tbz
- Size
- 1.02 MB
- Format
- application/octet-stream
- Description
- EFCL source code and documentation
- MD5
- 6aaa44332c31a36f4cc05760f666c8da

The file preview has not been generated yet. Please try again later or contact the system administrator lindat-help@ufal.mff.cuni.cz
- Name
- EFCL-v03.html
- Size
- 33.21 KB
- Format
- text/html
- Description
- Short introduction
- MD5
- bc8e806778b73206e164a3157cd282c1

EFCL
EFCL CHANNELIZER
v0.3, January 2017
KEYWORDS
Fast Channelizer, Filterbank, Speech Recognition Front End, Software
Defined Radio, Polyphase Filter, Frequency Multiplexing, Audio Denoising,
High Performance Computing, HPC, SDR, FFT, FFTW, SIMD, AVX, SSE, NEON.
DOWNLOAD
10.01.2017: EFCL-v0.3 minor compatibility bug fix, documentation update
31.12.2016: EFCL-v0.2 2 bug fixes
30.12.2016: EFCL-v0.1 initial release
ACKNOWLEDGMENTS
This work has been supported by the LINDAT/CLARIN project of the Ministry
of Education, Youth and Sports of the Czech Republic (project LM2015071).
WHAT IT IS
The output of the channelizer can be optionally decimated by an integer
factor D (typically D=C).
The library is designed for high throughput, using on-the-fly compiled
assembly codelets that get compiled and tuned for a particular machine and
C and D parameters when the program starts. The compiled codelet
is then dynamically linked with the main program so that the channelizer
could use it. The assembly heavily relies on using SIMD instructions, the
flavor of which (SSE, AVX or NEON) is detected by the makefile when the
library gets compiled. So, if one wants to distribute platform independent
binaries, it is necessary to compile libefcl.so for different SIMD flavors
and distribute several different libefcl.so files from which the installer
(or the main program) would select the proper one (in fact this would be
more complicated as there are also tuning constants and options which are
CPU model specific).
For CPUs that lack SIMD instructions, there is still a generic C version
of the code, which is about 3 times slower.
The library can also take an advantage of multi core CPUs to further
increase its speed by running the computation in multiple threads. It is
runtime configurable how many threads can be used and the library
adaptively balances the load (even turning itself off on cores overloaded
by other tasks) so as to gain maximal throughput.
WHAT IT ISN'T
To select a channel out of a signal one needs a filter. The efcl
channelizer expects to be given its impulse response in a file containing
binary float numbers. The response itself may or may not be symmetric ---
it does not matter as the filtering is fully general. If you need to use
minimum phase filter for some reason, it is possible. It is enough to
specify this filter for channel 0, i.e. the low pass filter --- filters
for other channels are derived automatically from this one by efcl.
Although there is a utility "./simple_filter" which produces example filter
(sigma factor weighted truncated sin(x)/x), it was meant only for debugging
and performance testing, not for meaningful signal processing.
Designing a proper filter is much more involved, the crucial thing to
consider is that the stop band of such a filter must be falling with 1/f
or better with 1/f^2 so that the stop band would sum to a small constant
when folded due to aliasing (which happens due to decimation). Otherwise
the noise and spectral leakage would be much higher than expected.
Designing such a filter is however beyond the scope of this library. Use
GnuRadio or write your own design tool (weighted Remez algorithm might be
a method of choice). Keep in mind that the filter length is zero-padded
towards the nearest higher multiple of C (the number of channels). This
new length is displayed as L in efcl's verbose mode. Note that those
zeroes are processed as if they were general numbers, thereby taking a
processing time. It is therefore advised to use only lengths divisible by
C when designing the filter so as to take advantage of the additional taps.
HOW FAST IS IT
The speed generally depends on the number of channels C, filter length L and the
decimation factor D. To get a notion I did some testing using C=72, D=50 and L=1152.
The performance is measured in complex megasamples per second that are being emitted
by the program. Note that for C!=D the output speed differs from the input one.
Moreover the performance (in MS/s) is almost insensitive to the value of D, therefore
it is better to specify the speed in output MS/s than in input MS/s (which can be
computed from the output speed multiplying it by C/D).
The plots show how the performance depends on the buffer size (which also
affects the latency of the filterbank (the less the better)). Note that it is a fairly
flat function, so we don't lose much when the buffer size is not optimal. Anyway,
for a known machine the makefile sets it optimal and the user can easily measure it
for his machine and then issue recompilation with his optimal size. In any case,
the buffer size can always be overridden from the command line using -B= option.
The following plots show a single core performance of various CPUs in MS/s. On each CPU
the program was compiled with SIMD enabled, i.e. Intel and AMD used SSEx86 implementation,
whereas the ARM used NEON implementation.

Note that this is a dry performance only, no real I/O was performed and the program just
worked from memory to memory. In reality, I/O may become a non-negligible CPU load due to
increased cache pressure. Nevertheless, at least 50% of what was measured should be
attainable in a practial setup.
Note that there is certain minimal size of the buffer (namely L+D), below which the
algorithm cannot work and which defines the minimum latency. If less than this is
specified, it is automatically replaced by L+D. This explains the flat region at
the beginning of all the plots. The noise that can be seen there is caused by natural
variations in speed. In fact these are sligtly greater than what can be seen in the picture
because each point was measured 10 times and the best speed was plotted. This was to
hide sudden drops in performance which I attribute to interference with system processes
(as the testing time for each buffer size was as short as possible, even a small time
stolen by the system (or other task) exhibits as a large drop in performace).
Note that the noise increases to the right, because the data don't fit into the cache
anymore.
The next plots show the same thing in relative way, counting how many complex
samples (on average) leave the channelizer at every CPU clock. This way is best for
comparing CPU architectures of different clock speeds. It also shows that in the
filter setup used for the test it takes about 100 CPU cycles to produce a single pair
of output floats.

Especially note that the per-clock performance of ARM-A53 is not at all bad, considering
it is a in-order machine competing against out-of-order Intel and AMD. In fact, the
assembly part ran as fast as the respective one on Intel, only the FFTW library seemed
to be somewhat lazier on ARM (which might change by recompiling it with specific
optimization for A53 turned on, but I have not tried that).
Also note that A53 is a 64 bit machine but here it was run in Aarch32 mode, in which it
emulates older 32 bit ARM instruction set. If programmed in native instructions it would
surely be faster (its NEON unit has twice as much registers then, for example). Unfortunately,
NEON64 implementation is not ready yet.
The A53 CPU (as used in Raspbery-Pi 3 model B) is in fact a quad-core CPU, and the next
picture shows its performance when more than one thread is running. The number of worker
threads used is denoted by the t=<number>_ prefix there. Note that it does not have much sense
to run more than 3 computation threads as there must always be a manager thread that supplies
the data, which also consumes some power.

Finally, the last plots show absolute and relative performance of modern multi-core
CPUs where 3 cores were utilized for computation:

Note that using more cores for minimal buffer size does only a little improvement. But
when the buffer gets above a ceartain threshold the performance of multicore run rapidly
increases (this happens when the buffers are big enough so that all three threads can
be given their own chunk, taken from the current buffer).
The next plot shows the same thing pre CPU clock.

Notice that although Phenom running at 3 cores was more powerful than a Core i7
single core in MS/s, it would be actually slower if it ran at the same clock speed.
The IPC performace of the new CPUs increased dramatically. Now it only takes
10 CPU clocks to produce a single sample of the result, not 100 as in the case
of AMD Athlon XP and its competitors shown above.
DEPENDENCIES
Only the Linux operating system is currently supported. It was tested that the library
works in Cygwin too, but only its generic C/C++ implementation. To compile the efcl
library one needs the following software packages.
PACKAGE VERSIONS KNOWN TO WORK WITH (*)
gcc/g++ 3.3.6 4.7.3 4.9.2 5.4.0
C library (getconf) 2.2.5 2.19 2.23
pthread library 0.9 2.19
FFTW library 3.3.3 3.3.4
nasm 2.11.06 2.10.07 2.11.05
ld 2.14 2.23.2 2.25
extra programs needed for NEON_IMPLEMENTATION (on ARM CPU):
as 2.25
grep 2.20
dash ? (does anyone know how to determine it?)
(*) There might be other versions that work too, the listed ones were
actually tested. Note that nasm and ld are also required at the runtime
if the library has been configured to use the SIMD code. Likewise,
grep, as and dash are needed for ARM/NEON at runtime.
QUICK TEST
Run
make bench
to benchmark performance of your machine and to test that compilation works.
HOW TO COMPILE AND INSTALL
To get help on what the makefile provides, run
make
To install the library, just type
make install
and it will compile and install:
efcl and gosese binary to $DESTDIR/usr/local/bin
libefcl.a to $DESTDIR/usr/local/lib
efcl.h header file to $DESTDIR/usr/local/include
If you wanted a different configuration than the one which was discovered
by the SIMD_probe script, type:
make clean
SIMD=<implementation_name> make install
where the <implementation_name> is one of SSEx86, SSE, AVX, AVX512, NEON,
NEON64, C1, C2, C3, A1, A2, A4. Note that A1, A2, A4 implementations only
compile using older gcc like gcc-3.3.6, but this is OK as it was only a
prototype of SSEx86 implementation, which is now faster than any of Ax
implementations.
PROBLEMS
On latest Intel processors (Skylake, Kaby Lake), the TSC counters of
individual cores can become out of sync somehow. This causes the load
balancer to mispredict optimal workload for those cores which effectively
degrades the overall performance as some cores have to wait for the others
to finish their jobs. Consequently the suspend/resume mechanism detects
this and usually suspends all the threads but two. These two still perform
worse than they would on a system with synchronous TSC counters.
The problem is already known to Linux developers as can be seen there:
http://lkml.org/lkml/2016/12/16/215
So there is some hope they will eventually fix it. Until then, the users
of the most recent Intel CPUs would have to get by with single threaded
performance which is unaffected by the bug.
FURTHER READING
See pgmdoc.txt, usrdoc.txt and math.pdf files distributed with the source code
for more detailed information. The easiest way to understand the internal
working of the code is to start with efcl0.cc, which is the algorithm in its
pure form, without complications stemming from multithreading and the use of
assembly language.
ORIGIN
This code was created at the Facutly of Math and Physics at Charles University,
in the Institute of Formal and Applied Linguistics. A student of mine
wanted to implement a channelizer in his bachelor thesis [1] as described in
the book [2]. He wanted to be faster than a similar channelizer in GnuRadio, in
which he succeded, outperforming it by a factor of 3. Nevertheless, he haven't
had a time to try all my advice in his implementation. To fulfil my tutor role
I decided to implement it from scratch to test if my ideas were right, hoping
that my student might appreciate some of them at last.
The resulting program is however quite far from a neat style of polished software
engineers. It is more aking to a wild style of "real programmers" [3], though it
is not a spegetti code. It is nevertheless quite complex and the complexity is not
hidden in functions/objects/whatever. Partly because of performance and partly
because the efcl can be configured in so many ways. The user can turn on or off
different ways by which it achieves high speed to study their impact. The price
for this was a heavy use of conditinal compilation that somewhat obfuscated
otherwise clean basic structure.
Anyway, when efcl is used as a library, all this mess is hidden inside, exposing
only the interface which is pretty straightforward.
AUTHOR
David Klusacek
HOW TO CITE
The permanent link to this work is http://hdl.handle.net/11234/1-2380.
REFERENCES
[1] Jan Hrach: Frequency Spectrum Monitoring System, Bachelor Thesis, MFF UK, 2016.
[2] Frederic J. Harris: Multirate Signal Processing for Communication Systems,
Prentice Hall PTR, Upper Saddle River, NJ, USA, 2004, ISBN 0131465112.
[3] Ed Nather: The Story of Mel, USENET, May 21, 1983.
EFCL CHANNELIZER
v0.3, January 2017
KEYWORDS
Fast Channelizer, Filterbank, Speech Recognition Front End, Software
Defined Radio, Polyphase Filter, Frequency Multiplexing, Audio Denoising,
High Performance Computing, HPC, SDR, FFT, FFTW, SIMD, AVX, SSE, NEON.
DOWNLOAD
10.01.2017: EFCL-v0.3 minor compatibility bug fix, documentation update
31.12.2016: EFCL-v0.2 2 bug fixes
30.12.2016: EFCL-v0.1 initial release
ACKNOWLEDGMENTS
This work has been supported by the LINDAT/CLARIN project of the Ministry
of Education, Youth and Sports of the Czech Republic (project LM2015071).
WHAT IT IS
This is a
library and a stand-alone program that can split input signal
stream into a number of equally spaced channels. The samples are complex
valued single precision floats. If Fs is a sampling frequency, the
channels' center frequencies will be 0, Fs/C, 2*Fs/C, ... (C-1)*Fs/C,
where C is the number of channels (i.e. the zeroth channel spans (-Fs/2C,
Fs/2C) interval, the first channel (Fs/2C,3Fs/2C) and the last one
((C-1.5)*Fs/C,(C-0.5)*Fs/C) which --- due to spectrum periodicity --- can
be regarded as the channel "-1" spanning from -3Fs/2C to -Fs/2C).
stream into a number of equally spaced channels. The samples are complex
valued single precision floats. If Fs is a sampling frequency, the
channels' center frequencies will be 0, Fs/C, 2*Fs/C, ... (C-1)*Fs/C,
where C is the number of channels (i.e. the zeroth channel spans (-Fs/2C,
Fs/2C) interval, the first channel (Fs/2C,3Fs/2C) and the last one
((C-1.5)*Fs/C,(C-0.5)*Fs/C) which --- due to spectrum periodicity --- can
be regarded as the channel "-1" spanning from -3Fs/2C to -Fs/2C).
The output of the channelizer can be optionally decimated by an integer
factor D (typically D=C).
The library is designed for high throughput, using on-the-fly compiled
assembly codelets that get compiled and tuned for a particular machine and
C and D parameters when the program starts. The compiled codelet
is then dynamically linked with the main program so that the channelizer
could use it. The assembly heavily relies on using SIMD instructions, the
flavor of which (SSE, AVX or NEON) is detected by the makefile when the
library gets compiled. So, if one wants to distribute platform independent
binaries, it is necessary to compile libefcl.so for different SIMD flavors
and distribute several different libefcl.so files from which the installer
(or the main program) would select the proper one (in fact this would be
more complicated as there are also tuning constants and options which are
CPU model specific).
For CPUs that lack SIMD instructions, there is still a generic C version
of the code, which is about 3 times slower.
The library can also take an advantage of multi core CPUs to further
increase its speed by running the computation in multiple threads. It is
runtime configurable how many threads can be used and the library
adaptively balances the load (even turning itself off on cores overloaded
by other tasks) so as to gain maximal throughput.
WHAT IT ISN'T
To select a channel out of a signal one needs a filter. The efcl
channelizer expects to be given its impulse response in a file containing
binary float numbers. The response itself may or may not be symmetric ---
it does not matter as the filtering is fully general. If you need to use
minimum phase filter for some reason, it is possible. It is enough to
specify this filter for channel 0, i.e. the low pass filter --- filters
for other channels are derived automatically from this one by efcl.
Although there is a utility "./simple_filter" which produces example filter
(sigma factor weighted truncated sin(x)/x), it was meant only for debugging
and performance testing, not for meaningful signal processing.
Designing a proper filter is much more involved, the crucial thing to
consider is that the stop band of such a filter must be falling with 1/f
or better with 1/f^2 so that the stop band would sum to a small constant
when folded due to aliasing (which happens due to decimation). Otherwise
the noise and spectral leakage would be much higher than expected.
Designing such a filter is however beyond the scope of this library. Use
GnuRadio or write your own design tool (weighted Remez algorithm might be
a method of choice). Keep in mind that the filter length is zero-padded
towards the nearest higher multiple of C (the number of channels). This
new length is displayed as L in efcl's verbose mode. Note that those
zeroes are processed as if they were general numbers, thereby taking a
processing time. It is therefore advised to use only lengths divisible by
C when designing the filter so as to take advantage of the additional taps.
HOW FAST IS IT
The speed generally depends on the number of channels C, filter length L and the
decimation factor D. To get a notion I did some testing using C=72, D=50 and L=1152.
The performance is measured in complex megasamples per second that are being emitted
by the program. Note that for C!=D the output speed differs from the input one.
Moreover the performance (in MS/s) is almost insensitive to the value of D, therefore
it is better to specify the speed in output MS/s than in input MS/s (which can be
computed from the output speed multiplying it by C/D).
The plots show how the performance depends on the buffer size (which also
affects the latency of the filterbank (the less the better)). Note that it is a fairly
flat function, so we don't lose much when the buffer size is not optimal. Anyway,
for a known machine the makefile sets it optimal and the user can easily measure it
for his machine and then issue recompilation with his optimal size. In any case,
the buffer size can always be overridden from the command line using -B= option.
The following plots show a single core performance of various CPUs in MS/s. On each CPU
the program was compiled with SIMD enabled, i.e. Intel and AMD used SSEx86 implementation,
whereas the ARM used NEON implementation.

Note that this is a dry performance only, no real I/O was performed and the program just
worked from memory to memory. In reality, I/O may become a non-negligible CPU load due to
increased cache pressure. Nevertheless, at least 50% of what was measured should be
attainable in a practial setup.
Note that there is certain minimal size of the buffer (namely L+D), below which the
algorithm cannot work and which defines the minimum latency. If less than this is
specified, it is automatically replaced by L+D. This explains the flat region at
the beginning of all the plots. The noise that can be seen there is caused by natural
variations in speed. In fact these are sligtly greater than what can be seen in the picture
because each point was measured 10 times and the best speed was plotted. This was to
hide sudden drops in performance which I attribute to interference with system processes
(as the testing time for each buffer size was as short as possible, even a small time
stolen by the system (or other task) exhibits as a large drop in performace).
Note that the noise increases to the right, because the data don't fit into the cache
anymore.
The next plots show the same thing in relative way, counting how many complex
samples (on average) leave the channelizer at every CPU clock. This way is best for
comparing CPU architectures of different clock speeds. It also shows that in the
filter setup used for the test it takes about 100 CPU cycles to produce a single pair
of output floats.

Especially note that the per-clock performance of ARM-A53 is not at all bad, considering
it is a in-order machine competing against out-of-order Intel and AMD. In fact, the
assembly part ran as fast as the respective one on Intel, only the FFTW library seemed
to be somewhat lazier on ARM (which might change by recompiling it with specific
optimization for A53 turned on, but I have not tried that).
Also note that A53 is a 64 bit machine but here it was run in Aarch32 mode, in which it
emulates older 32 bit ARM instruction set. If programmed in native instructions it would
surely be faster (its NEON unit has twice as much registers then, for example). Unfortunately,
NEON64 implementation is not ready yet.
The A53 CPU (as used in Raspbery-Pi 3 model B) is in fact a quad-core CPU, and the next
picture shows its performance when more than one thread is running. The number of worker
threads used is denoted by the t=<number>_ prefix there. Note that it does not have much sense
to run more than 3 computation threads as there must always be a manager thread that supplies
the data, which also consumes some power.

Finally, the last plots show absolute and relative performance of modern multi-core
CPUs where 3 cores were utilized for computation:

Note that using more cores for minimal buffer size does only a little improvement. But
when the buffer gets above a ceartain threshold the performance of multicore run rapidly
increases (this happens when the buffers are big enough so that all three threads can
be given their own chunk, taken from the current buffer).
The next plot shows the same thing pre CPU clock.

Notice that although Phenom running at 3 cores was more powerful than a Core i7
single core in MS/s, it would be actually slower if it ran at the same clock speed.
The IPC performace of the new CPUs increased dramatically. Now it only takes
10 CPU clocks to produce a single sample of the result, not 100 as in the case
of AMD Athlon XP and its competitors shown above.
DEPENDENCIES
Only the Linux operating system is currently supported. It was tested that the library
works in Cygwin too, but only its generic C/C++ implementation. To compile the efcl
library one needs the following software packages.
PACKAGE VERSIONS KNOWN TO WORK WITH (*)
gcc/g++ 3.3.6 4.7.3 4.9.2 5.4.0
C library (getconf) 2.2.5 2.19 2.23
pthread library 0.9 2.19
FFTW library 3.3.3 3.3.4
nasm 2.11.06 2.10.07 2.11.05
ld 2.14 2.23.2 2.25
extra programs needed for NEON_IMPLEMENTATION (on ARM CPU):
as 2.25
grep 2.20
dash ? (does anyone know how to determine it?)
(*) There might be other versions that work too, the listed ones were
actually tested. Note that nasm and ld are also required at the runtime
if the library has been configured to use the SIMD code. Likewise,
grep, as and dash are needed for ARM/NEON at runtime.
QUICK TEST
Run
make bench
to benchmark performance of your machine and to test that compilation works.
HOW TO COMPILE AND INSTALL
To get help on what the makefile provides, run
make
To install the library, just type
make install
and it will compile and install:
efcl and gosese binary to $DESTDIR/usr/local/bin
libefcl.a to $DESTDIR/usr/local/lib
efcl.h header file to $DESTDIR/usr/local/include
If you wanted a different configuration than the one which was discovered
by the SIMD_probe script, type:
make clean
SIMD=<implementation_name> make install
where the <implementation_name> is one of SSEx86, SSE, AVX, AVX512, NEON,
NEON64, C1, C2, C3, A1, A2, A4. Note that A1, A2, A4 implementations only
compile using older gcc like gcc-3.3.6, but this is OK as it was only a
prototype of SSEx86 implementation, which is now faster than any of Ax
implementations.
PROBLEMS
On latest Intel processors (Skylake, Kaby Lake), the TSC counters of
individual cores can become out of sync somehow. This causes the load
balancer to mispredict optimal workload for those cores which effectively
degrades the overall performance as some cores have to wait for the others
to finish their jobs. Consequently the suspend/resume mechanism detects
this and usually suspends all the threads but two. These two still perform
worse than they would on a system with synchronous TSC counters.
The problem is already known to Linux developers as can be seen there:
http://lkml.org/lkml/2016/12/16/215
So there is some hope they will eventually fix it. Until then, the users
of the most recent Intel CPUs would have to get by with single threaded
performance which is unaffected by the bug.
FURTHER READING
See pgmdoc.txt, usrdoc.txt and math.pdf files distributed with the source code
for more detailed information. The easiest way to understand the internal
working of the code is to start with efcl0.cc, which is the algorithm in its
pure form, without complications stemming from multithreading and the use of
assembly language.
ORIGIN
This code was created at the Facutly of Math and Physics at Charles University,
in the Institute of Formal and Applied Linguistics. A student of mine
wanted to implement a channelizer in his bachelor thesis [1] as described in
the book [2]. He wanted to be faster than a similar channelizer in GnuRadio, in
which he succeded, outperforming it by a factor of 3. Nevertheless, he haven't
had a time to try all my advice in his implementation. To fulfil my tutor role
I decided to implement it from scratch to test if my ideas were right, hoping
that my student might appreciate some of them at last.
The resulting program is however quite far from a neat style of polished software
engineers. It is more aking to a wild style of "real programmers" [3], though it
is not a spegetti code. It is nevertheless quite complex and the complexity is not
hidden in functions/objects/whatever. Partly because of performance and partly
because the efcl can be configured in so many ways. The user can turn on or off
different ways by which it achieves high speed to study their impact. The price
for this was a heavy use of conditinal compilation that somewhat obfuscated
otherwise clean basic structure.
Anyway, when efcl is used as a library, all this mess is hidden inside, exposing
only the interface which is pretty straightforward.
AUTHOR
David Klusacek
HOW TO CITE
The permanent link to this work is http://hdl.handle.net/11234/1-2380.
REFERENCES
[1] Jan Hrach: Frequency Spectrum Monitoring System, Bachelor Thesis, MFF UK, 2016.
[2] Frederic J. Harris: Multirate Signal Processing for Communication Systems,
Prentice Hall PTR, Upper Saddle River, NJ, USA, 2004, ISBN 0131465112.
[3] Ed Nather: The Story of Mel, USENET, May 21, 1983.

