You are on page 1of 13

Detecting symptoms of diseases in poultry through audio signal processing

CHAPTER 1
INTRODUCTION
The integrated nature of the animal agriculture system in the United States has led to
significant improvement in production efficiencies. Diseases pose a significant threat to
these industries because of their potential to spread quickly among many animals. The
effect is usually a decrease in performance and sometimes the loss of the livestock. There
is significant concern from consumers and producers about the health and overall well
being of the animals. If diseases are discovered and treated earlier in their development,
the morbidity and mortality rates can often be significantly lowered. Despite the scale of
many of these industries, there is usually no automated way to continuously monitor the
livestock for the onset of disease. Instead, diseases are not discovered until recognized by
human workers. This sometimes occurs at the later stages of the disease as the managers
are usually not trained veterinarians. Thus, symptoms of diseases may not be noticed until
they have become severe or widespread. This delays the implementation of corrective or
preventative measures and can result in significant losses for producers
Poultry farm workers are often taught to sit quietly and listen to the sounds a flock
makes in order to discover diseases or other problems [1]. The birds themselves may be
the best judges of their own welfare, and their vocalizations reflect their physical and
mental state. Thus, much can be learned about their health from the sounds they produce
[2]. Since human workers are able to discover diseases based on what they hear, there is
potential for automating that process through audio signal processing and machine
learning.
Rales are one of the sounds that workers specifically listen for as an indicator of
disease in chickens. They sound like a gurgling or rattling noise that occurs as the
chickens breathe, and are often relatively quiet. Rales are characteristic symptoms of
common respiratory diseasesespecially infectious bronchitis. Using data recorded from
chickens sick with infectious bronchitis, we developed an algorithm to automatically
detect and label rales.

Department of Electronics and Communication, NMAMIT, Nitte

Detecting symptoms of diseases in poultry through audio signal processing

CHAPTER 2
LITERATURE SURVEY
Signal processing techniques have been applied to sounds produced by animals in a
variety of settings. Chedad et al. passed components of the power spectral density into
probabilistic neural networks to detect pig coughs [3]. Lee et al. used mel frequency
cepstral coefficients (MFCCs) and linear discriminant analysis to identify different kinds
of frog and cricket calls [4]. Moura et al. characterized piglet vocalizations under
stressful and non-stressful conditions [5]. Jahns used MFCCs and hidden Markov models
(HMMs) to translate cow calls into English meanings [6]. Brown and Smaragdis
compared HMM and Gaussian mixture model (GMM) performance for classifying of
killer whale vocalizations [7]. Various works have addressed the problem of
automatically identifying bird species based on their calls and songs [8][10]. OtuNyarko used GMMs and HMMs to detect various types of stress in chickens [11].
Clemins et al. generalized MFCCs and perceptual linear prediction (PLP) features by
creating speciesspecific frequency warpings and equal loudness curves, showing modest
improvements in speaker recognition tasks for elephants and birds [12]. Aide et al. set up
a network of remote monitoring stations in the wilderness and used HMMs to track the
vocal activity of the fauna over long periods of time [13].

Department of Electronics and Communication, NMAMIT, Nitte

Detecting symptoms of diseases in poultry through audio signal processing

CHAPTER 3
PROPOSED METHOD
Sound from poultry chicken

Windowing
outpptp

MFCC

Clustering
(k means
algorithm)

Histogram

Decision tree
o/p

Figure 3.1. Block diagram of proposed method


Using the data, detection algorithm will be developed as follows:
1) Calculate MFCCs (including the deltas and delta-deltas) using a 25 ms window width
and a 10 ms shift, yielding 39-dimensional vectors for each time slice.
2) Cluster the vectors of MFCCs into 60 clusters using the k-means algorithm, yielding a
single cluster index for each time slice.
3) Take histograms of the cluster indices over a 100 ms (or 8 sample) wide window and
with a 30 ms (or 3 sample) shift.
4) Train a decision tree using the Wekas implementation of the C4.5 algorithm with a
confidence factor of 0.002 and a minimum of 5 samples per leaf.
3.1 METHODOLOGY
3.1.1 Windowing
Speech is non-stationary signal where properties change quite rapidly over time. This is
fully natural and nice thing but makes the use of DFT or autocorrelation as a such
impossible. For most phonemes the properties of the speech remain invariant for a short
period of time ( 5-100 ms). Thus for a short window of time, traditional signal
processing methods can be applied relatively successfully.

Department of Electronics and Communication, NMAMIT, Nitte

Detecting symptoms of diseases in poultry through audio signal processing

Most of speech processing in fact is done in this way: by taking short windows
(overlapping possibly) and processing them. The short window of signal like this is called
frame. In implementational view the windowing corresponds to what is understoods in
filter design as window-method: a long signal (of speech for instance or ideal impulse
response) is multiplied with a window function of finite length, giving finite length
weighted (usually) version of the original signal. Illustration is in figure 2.

Figure 3.2. The original signal and its windowed version below

3.1.2 Mel Frequency Cepstral Coefficient (MFCC)


The first step in any automatic speech recognition system is to extract features i.e
identify the components of the audio signal that are good for identifying the linguistic
content and discarding all the other stuff which carries information like background
noise, emotion etc.

Department of Electronics and Communication, NMAMIT, Nitte

Detecting symptoms of diseases in poultry through audio signal processing

The main point to understand about speech is that the sounds generated by a
human are filtered by the shape of the vocal tract including tongue, teeth etc. This shape
determines what sound comes out. If we can determine the shape accurately, this should
give us an accurate representation of the phoneme being produced. The shape of the vocal
tract manifests itself in the envelope of the short time power spectrum, and the job of
MFCCs is to accurately represent this envelope.
Mel Frequency Cepstral Coefficents (MFCCs) are a feature widely used in
automatic speech and speaker recognition. They were introduced by Davis and
Mermelstein in the 1980's, and have been state-of-the-art ever since. Prior to the
introduction of MFCCs, Linear Prediction Coefficients (LPCs) and Linear Prediction
Cepstral Coefficients (LPCCs) and were the main feature type for automatic speech
recognition (ASR).
Implement steps for MFCCS

Frame the signal into short frames.

For each frame calculate the periodogram estimate of the power spectrum.

Apply the mel filterbank to the power spectra, sum the energy in each filter.

Take the logarithm of all filterbank energies.

Take the DCT of the log filterbank energies.

Keep DCT coefficients 2-13, discard the rest.

An audio signal is constantly changing, so to simplify things we assume that on short


time scales the audio signal doesn't change much (when we say it doesn't change, we
mean statistically i.e. statistically stationary, obviously the samples are constantly
changing on even short time scales). This is why we frame the signal into 20-40ms
frames. If the frame is much shorter we don't have enough samples to get a reliable
spectral estimate, if it is longer the signal changes too much throughout the frame.
Department of Electronics and Communication, NMAMIT, Nitte

Detecting symptoms of diseases in poultry through audio signal processing

The next step is to calculate the power spectrum of each frame. This is motivated by
the human cochlea (an organ in the ear) which vibrates at different spots depending on
the frequency of the incoming sounds. Depending on the location in the cochlea that
vibrates (which wobbles small hairs), different nerves fire informing the brain that certain
frequencies are present. Our periodogram estimate performs a similar job for us,
identifying which frequencies are present in the frame.
The periodogram spectral estimate still contains a lot of information not required for
Automatic Speech Recognition (ASR). In particular the cochlea can not discern the
difference between two closely spaced frequencies. This effect becomes more
pronounced as the frequencies increase. For this reason we take clumps of periodogram
bins and sum them up to get an idea of how much energy exists in various frequency
regions. This is performed by our Mel filterbank: the first filter is very narrow and gives
an indication of how much energy exists near 0 Hertz. As the frequencies get higher our
filters get wider as we become less concerned about variations. We are only interested in
roughly how much energy occurs at each spot. The Mel scale tells us exactly how to
space our filterbanks and how wide to make them.
Once we have the filterbank energies, we take the logarithm of them. This is also
motivated by human hearing: we don't hear loudness on a linear scale. Generally to
double the percieved volume of a sound we need to put 8 times as much energy into it.
This means that large variations in energy may not sound all that different if the sound is
loud to begin with. This compression operation makes our features match more closely
what humans actually hear. Why the logarithm and not a cube root? The logarithm allows
us to use cepstral mean subtraction, which is a channel normalisation technique.
The final step is to compute the DCT of the log filterbank energies. There are 2 main
reasons this is performed. Because our filterbanks are all overlapping, the filterbank
energies are quite correlated with each other. The DCT decorrelates the energies which
means diagonal covariance matrices can be used to model the features in e.g. a HMM
classifier. But notice that only 12 of the 26 DCT coefficients are kept. This is because the
higher DCT coefficients represent fast changes in the filterbank energies and it turns out

Department of Electronics and Communication, NMAMIT, Nitte

Detecting symptoms of diseases in poultry through audio signal processing

that these fast changes actually degrade ASR performance, so we get a small
improvement by dropping them.
The Mel scale relates perceived frequency, or pitch, of a pure tone to its actual measured
frequency. Humans are much better at discerning small changes in pitch at low
frequencies than they are at high frequencies. Incorporating this scale makes our features
match more closely what humans hear.
The formula for converting from frequency to Mel scale is:
M (f )=1125 ln(1+f /700)
To go from Mels back to frequency:
1

(m /1125)

M ( m)=700(e

1)

3.1.3 Histogram
A histogram is a graphical representation of the distribution of numerical data. It
is an estimate of the probability distribution of a continuous variable. To construct a
histogram, the first step is to "bin" the range of valuesthat is, divide the entire range of
values into a series of intervalsand then count how many values fall into each interval.
The bins are usually specified as consecutive, non-overlapping intervals of a variable.
The bins (intervals) must be adjacent, and are usually equal size.
If the bins are of equal size, a rectangle is erected over the bin with height
proportional to the frequency, the number of cases in each bin. In general, however, bins
need not be of equal width; in that case, the erected rectangle has area proportional to the
frequency of cases in the bin. The vertical axis is not frequency but density: the number
of cases per unit of the variable on the horizontal axis. A histogram may also be
normalized displaying relative frequencies. It then shows the proportion of cases that fall
into each of several categories, with the sum of the heights equaling 1.

Department of Electronics and Communication, NMAMIT, Nitte

Detecting symptoms of diseases in poultry through audio signal processing

Histograms give a rough sense of the density of the underlying distribution of the data,
and often for density estimation: estimating the probability density function of the
underlying variable. The total area of a histogram used for probability density is always
normalized to 1. If the length of the intervals on the x-axis are all 1, then a histogram is
identical to a relative frequency plot.
3.1.4 Decision tree
C4.5 is an algorithm used to generate a decision tree. The decision trees generated by
C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a
statistical classifier.
C4.5 is done using the concept of information entropy. The training data is a set
of already classified samples. Each sample
vector

, where the

sample, as well as the class in which

consists of a p-dimensional

represent attribute values or features of the

falls.

At each node of the tree, C4.5 chooses the attribute of the data that most effectively splits
its set of samples into subsets enriched in one class or the other. The splitting criterion is
the normalized information gain (difference in entropy). The attribute with the highest
normalized information gain is chosen to make the decision. The C4.5 algorithm then
recurs on the smaller sublists.
This algorithm has a few base cases.

All the samples in the list belong to the same class. When this happens, it simply
creates a leaf node for the decision tree saying to choose that class.

None of the features provide any information gain. In this case, C4.5 creates a
decision node higher up the tree using the expected value of the class.

Instance of previously-unseen class encountered. Again, C4.5 creates a decision


node higher up the tree using the expected value.

Department of Electronics and Communication, NMAMIT, Nitte

Detecting symptoms of diseases in poultry through audio signal processing

CHAPTER 4
IMPLEMENTATION
Using voice data, Calculate MFCCs (including the deltas and delta-deltas) using a 25 ms

window width and a 10 ms shift, yielding 39-dimensional vectors for each time slice.
Cluster the vectors of MFCCs into 60 clusters using the k-means algorithm, yielding a
single cluster index for each time slice. we trained a decision tree to examine this
histogram and determine if the distribution of sounds match those that would be expected
in a rale.

VOICE

MFCC

CLUSTERING

VECTOR
output

Fig: 4.1 Block diagram of implementation

6.1 Hardware and Software Requirements


The entire project will be implanted using MATLAB.
The hardware requirements are:

System : 2.66 GHz Core II Quad CPU


Hard Disk : 80Gb
RAM
: 512 Mb

The software requirements are

Operating system : Windows XP Professional


Coding language : MATLAB

Department of Electronics and Communication, NMAMIT, Nitte

Detecting symptoms of diseases in poultry through audio signal processing

CHAPTER 5
APPLICATION
Although this work focuses specifically on rale detection, this type of system could be
applied to various other monitoring tasks as well, given appropriate training data.
Automatic detection of coughs and sneezes would be useful in detecting the onset of
laryngotracheitis. Detection of the frequency of chirps may be useful signals for nonrespiratory diseases and for heat stress, as the birds will often become quiet and conserve
their energy when they do not feel well. There may also be potential to detect changes in
the background noise indicative of environmental problems, such as fans going bad or
feeders getting jammed.

Department of Electronics and Communication, NMAMIT, Nitte

10

Detecting symptoms of diseases in poultry through audio signal processing

CHAPTER 6
CONCLUSION
An audio signal processing algorithm that detects rales (gurgling noises that are a distinct
symptom of common respiratory diseases in poultry).Although algorithm produces some
misclassifications. It successfully detects enough rales to easily distinguish between times
when the chickens are sick and when they are healthy. Algorithms such as this could be
used to continuously monitor chickens in commercial poultry farms, providing an early
warning system that could significantly reduce the costs incurred from disease.

Department of Electronics and Communication, NMAMIT, Nitte

11

Detecting symptoms of diseases in poultry through audio signal processing

REFERENCES
[1]

University

of

Kentucky,

Poultry

http://www2.ca.uky.edu/poultryprofitability/production

production
manual.html,

manual,
chapter

16,

Accessed: 2014-06-12.
[2] G. Manteuffel, B. Puppe, and P. C. Schon, Vocalization of farm animals as a measure
of welfare, Applied Animal Behaviour Science, vol. 88, no. 1, pp. 163182, Sep. 2004.
[3] A. Chedad, D. Moshou, J. M. Aerts, A. V. Hirtum, H. Ramon, and D. Berckmans,
Recognition system for pig cough based on probabilistic neural networks, Journal of
Agricultural Engineering Research, vol. 79, no. 4, pp. 449457, 2001.
[4] C.-H. Lee, C.-H. Chou, C.-C. Han, and R.-Z. Huang, Automatic recognition of
animal vocalizations using averaged MFCC and linear discriminant analysis, Pattern
Recognition Letters, vol. 2, no. 2, pp. 93101, Jan. 2006.
[5] D. Moura, W. Silva, I. Naas, Y. Tolon, K. Lima, and M. Vale, Real time computer
stress monitoring of piglets using vocalization analysis, Computers and Electronics in
Agriculture, vol. 64, no. 1, pp. 1118, Nov. 2008.
[6] G. Jahns, Call recognition to identify cow conditionsa call-recogniser translating
calls to text, Computers and Electronics in Agriculture, vol. 62, no. 1, pp. 5448, Jun.
2008.
[7] J. C. Brown and P. Smaragdis, Hidden markov and gaussian mixture models for
automatic call classification, The Journal of the Acoustical Society of America, vol. 125,
no. 6, pp. EL221EL224, 2009.
[8] Z. Chen and R. C. Maher, Semi-automatic classification of bird vocalizations using
spectral peak tracks. J Acoust Soc Am, vol. 120, no. 5 Pt 1, pp. 29742984, Nov. 2006.
[9] S. Fagerlund, Bird species recognition using support vector machines, EURASIP
Journal on Advances in Signal Processing, vol. 2007, pp. 18, 2007.

Department of Electronics and Communication, NMAMIT, Nitte

12

Detecting symptoms of diseases in poultry through audio signal processing

[10] J. Cheng, Y. Sun, and L. Ji, A call-independent and automatic acoustic system for
the individual recognition of animals: A novel model using four passerines, Pattern
Recognition, vol. 43, no. 11, pp. 38463852, Nov. 2010.
[11] E. Otu-Nyarko, The effect of stress on the vocalizations of captive poultry
populations, Ph.D. dissertation, University of Connecticut, 2010.
[12] P. J. Clemins, M. B. Trawicki, K. Adi, J. Tao, and M. T. Johnson, Generalized
perceptual features for vocalization analysis across multiple species, in Acoustics,
Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International
Conference on, vol. 1, May 2006.
[13] T. M. Aide, C. Corrada-Bravo, M. Campos-Cerqueira, C. Milan, G. Vega, and R.
Alvarez, Real-time bioacoustics monitoring and automated species identification,
PeerJ, vol. 1, p. e103, Jul. 2013.
[14] L. C. W. Pols, Spectral analysis and identification of Dutch vowels in monosyllabic
words, Ph.D. dissertation, University of Amsterdam, 1977. [Online]. Available:
http://dare.uva.nl/en/record/137029
[15] J. MacQueen, Some methods for classification and analysis of multivariate
observations, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics
and Probability, vol. 1, pp. 281297, 1967.

Department of Electronics and Communication, NMAMIT, Nitte

13

You might also like