Talkative Tom
Talkative Tom
Background:
Talkative tom is an embedded systems project developed by me and my partner Anmol under Professor D.V. Gadre at CEDT. The idea of talkative tom first came from a Hamleys toy. Making an Indian version of something like this has never been done in the open-source domain used for teaching and using it as a pedagogy material.
Block Diagram:
![]() |
Figure 1: Block Diagram of Talkative Tom |
Introduction:
Talkative Tom involves a variety of electronics and software concepts that are explained in detail later in this read. This is a blog cum comprehensive guide for making this or a similar project end-to-end. The aim is not limited to this particular project but teaching concepts that can be applied in almost any project. The read shares information on hardware design and moving to the detailed software design, which is heart and soul for the hardware, and all the theoretical aspects which are critical to this project, along with the technical parameters which are precisely set for obtaining a desirable output.
Overview:
Talkative Tom employs an Atmel-Atmega328P micro-controller for all the audio and digital processing jobs, outside of the processing block are the carefully designed application-specific analog circuits which perform the task of amplifying the audio signal to desirable levels with minimal noise. The functioning of this project can basically be divided into 3 cycles:
- Record
- Store
- Replay
The First step is to read audio, which involves the microphone, pre-amplifier, passive low pass filter, and on the software side, it requires an ADC interfacing using interrupts and the concept of dual buffering to collect audio data. The next step is to store, which involves the serial flash on the hardware side and on software, using the Serial Peripheral Interface(SPI) communication protocol to write data on the storage chip. The final step is the replay cycle, which reads data from serial flash using SPI and the concept of Pulse Width Modulation(PWM) to recreate the audio after passing through a filter and amplifier.
Concepts:
Interrupts:
Due to the basic architectural design of a single-core CPU, it can only perform one processing job at a time, which is a part of a more extensive program sequence in the computer memory. To perform an operation out of the series as part of a program, the CPU must complete the instruction in hand and jump over to another sequence in a different memory location, this branch in path of CPU program execution is known as a subroutine. Now for I/O processing, the CPU must wait for an input or an output command to trigger its response towards the I/O device. For example, the CPU works at 8MHz and waiting for an I/O device for 60us means a waste of 500 clock cycles, which could have been used for processing other parts of the program which require an equal if not higher priority than the I/O device. To solve this problem, the I/O devices interact through an I/O interface, which does the job of waiting for the I/O device to react, and then it notifies the CPU about the data in the form of an interrupt signal.
This interrupt is serviced by the CPU after jumping to a specific memory location unique to the I/O device, given by it, which is called the interrupt vector. The interrupt servicing code is stored under this location, and after the processing is complete, the CPU returns back to the process it was executing at that time. The main intuition behind usage of interrupts is that the CPU is used in such a way that it gives a feeling of multitasking.
Analog Design:
This section will discuss the various analog aspects and related concepts of the project.
This section will discuss the various analog aspects and related concepts of the project.
Audio Amplifier
The analog output is achieved by the PWM, and the output of the PWM channel is of low amplitude, which, if fed directly to the speaker, the output voice is not audible. Hence, it requires an audio amplifier to have a loud output. Also, the output of the PWM channel is first fed to the second-order low pass filter and then to the audio amplifier.
The audio is outputted from the micro-controlled via a relatively high-frequency pulse-width modulated(PWM) rectangular wave, which, when passed through a low pass filter(LPF) results in the regeneration of the audio signal. This audio signal is of reasonable voltage but of deficient power to generate any significant vibrations in the speaker coil, for that an audio amplifier or sometimes called a power amplifier is used.
![]() |
Figure 2: Schematic diagram of the audio amplifier circuit |
The audio amplifier used is IC PAM8403, a 3W, class-D audio amplifier, as shown in Figure 2. With the same numbers of external components, the efficiency of the PAM8403 is much better than that of Class-AB cousins. This IC outputs signals with the same concept of PWM, with 230kHz carrier frequency, and the message is varied on the duty cycle, due to which is more power-efficient and can extend the battery life significantly, which makes it well-suited for portable applications. It offers low THD(Total Harmonic Distortion)+N(Noise), which positively affects the dynamic range hence allowing it to achieve high-quality sound reproduction. PAM8403's gain, frequency response, and other characteristics with 8-ohm speakers are shown in Figure 3.
![]() |
(a) This characteristic shows the efficiency of this amplifier for 4-ohm and 8-ohm impedance speakers |
![]() |
(b) This figure shows the characteristics of 4-ohm vs. 8-ohm impedance driving Figure 3: Source: PAM8403 Datasheet |
Pre-Amplifier
These circuits are termed pre-amplifiers because of the purpose they serve in taking the input of a transducer/actuator.
An Electret microphone is a Junction Field Effect Transistor(JFET) based actuator which actuates when it resonates with ambient vibrations, of which most significant are sound vibrations. The JFET in the circuit amplifies the raw signal.
The electric signal produced by most transducers is feeble. To process this signal, it must be amplified and cleaned. This preparation job gives the name preamplifier to the circuit shown in Figure 4.
Looking at the datasheet of the electret microphone used (HM4522P-423-G), the operating voltage varies from 1-10V, so we bias the microphone with a 5V supply since opamp for the preamplifier is biased at 5V, next a coupling capacitor is used to block DC and let Alternating signal pass. The next is a ”Point Bias Arrangement,” which biases the input of the opamp to a DC voltage of 1.7V(Atmega Vcc/2=1.7V). This arrangement has the effect of making the negative swing of the microphone signal clamp above ground reference and signal around the point bias voltage. Opamp for the preamplifier is LM358N, which can operate at 3.3V, but to get a broader voltage swing range, LM358N is operated at 5V.
Low voltage signal from the microphone is coupled to the input of the opamp with DC bias at the input hence causing the AC variations around the bias point. The signal is amplified where C20 capacitor is blocking gain for DC component and provides gain to the AC component, the amplified output is available at AMPOUT.
Figure 4: Schematic diagram of microphone preamplifier circuit |
Low Pass Filter (LPF)
After amplification, the cleaning of the signal is to be done. Cleaning the signal is done by a passive low pass filter(Figure 5) which has a cutoff of approximately 5kHz(about 4.8kHz) for the first stage which can be as the frequency around the -3dB mark in Figures 6(simulated results),7(experimental results) (a). Having a second order low pass filter attenuates high-frequency noise with a reasonable quality factor of 0.3, plots similar to the first stage can be seen in the same Figures 6,7 (b). The filtered output is passed through a diode network that prevents overvoltage on the microcontroller's input pin. The diodes conduct when the voltage goes out of bounds, which are 0 and 3.3V.
Note: Figures 6,7 show the frequency response of the passive LPF to a sine wave of increasing frequency, and hence we can see the attenuation when the frequency reaches around 5kHz, making the frequencies containing the voice signal more prominent.
Figure 5: Schematic diagram of microphone LPF circuit
![]() |
(a) Stage 1 only |
![]() |
(b) Along with stage 2 Figure 6: Bode plot of the 2nd order RC LPF. Source for the graphs is: sim.okawa-denshi.jp/en/CRCRtool.php |
(b) Results with Stage 2 Added Figure 6: Experimental results of the 2nd order RC LPF. |
Power Supply
Power Supply is one of the most essential components of any of the embedded systems. Without an efficient power supply, any system will be unable to function correctly, and any problems in that part may result in catastrophic failure. A power supply consists of mainly 3 elements:
- Source: This is the input of the power supply. In this case, it is either a mini-USB or lithium-polymer(LiPo) battery of 9000 mAh.
- LIPO Charger Now, there is also a need to charge the lipo battery, and hence the IC TP4056 is used.
- Conversion: Now, the input source may not meet the requirements of the project, and therefore, it must be converted or modified to make it compatible. In this case, when LiPo battery is selected as the source, then it needs to be boosted as the LIPO is capable of providing at the max of 4.2V, but for our project, we need 2 voltages 5V and 3.3V and here is the case where conversion comes into the picture. So this process of achieving 5V from LIPO is called Boost. But if the USB is selected as the input source, then no conversion is needed, as the voltage is 5V, being fed directly for the regulation cycle.
- Voltage Boosting Process:
- The voltage to be boosted is fed at the boost terminal. Initially, the boost converter(Bl8530) acts as a closed switch. Hence, the current flows through the inductor in a clockwise direction, and the inductor stores some energy by generating a magnetic field.
- When the switch is opened, the current will be reduced as the impedance is higher. The magnetic field previously created will collapse to maintain the current towards the load. Thus the polarity will be reversed
- As a result, two sources will fall in a series connection, leading to their voltages add up. The capacitor through the diode D, up to this boosted voltage. If the switch is cycled fast enough, the inductor will not discharge fully in between charging stages, and the load will always see a voltage greater than that of the input source alone when the switch is opened.
- Also, while the switch is opened, the capacitor in parallel with the load is charged to this combined voltage. When the switch is then closed, and the right-hand side is shorted out from the left-hand side, the capacitor is, therefore, able to provide the voltage and energy to the load. During this time, the blocking diode prevents the capacitor from discharging through the switch. The switch must, of course, be opened again fast enough to prevent the capacitor from discharging too much. This on-off switching cycle is performed by the IC Bl8530, as shown in Figure 9, which is a boosting circuit designed for achieving 5V from 3.3V, BL8503 is a voltage-type pulse-frequency modulation(PFM) step-up DC-DC converter.
Figure 9: Boost circuit - Regulation: This is the most important and engrossing part while designing any power supply, the regulatory circuit. This project requires a regulated power supply of 3.3V, and hence it is achieved through an LDO(Low Drop Out) voltage regulator.
Figure 8: LIPO charger circuit |
Micro-Controller & Code:
The microcontroller used in this project is Atmel Atmega328, an 8-bit AVR RISC-based microcontroller with 32 kB ISP flash memory. The choice of this microcontroller is justified as follows:-
- The most used microcontroller in the Arduino development platform.
- 8-bit architecture and hence lower complexity than others.
- Robust in terms of operating voltage flexibility.
ADC:
ADC is the acronym for analog to digital converter. This module is used to take the analog input from the microphone circuit for further processing. Atmega328 has a 10-bit Successive Approximation Register(SAR) ADC, but for our purposes, we will only use the higher 8-bit.
ADC Input
Anything analog as we observe the world around us has infinite variations at any given time, to record every single variation; theoretically, we require an unlimited amount of processing and storage. We cannot record an analog signal perfectly, but we can surely attain a high percentage of matching to the original signal. Practically, analog signals are recorded by sampling and storing those finite resolution samples in storage, for processing later. Recreation of the same signal is done by aggregating the recorded samples with software or hardware integrator(s) to make it closely resemble the original or the desired analog signal. The audio data is input via an ADC channel whose 8 most significant bits are taken for further processing. This is done in this way because of 2 reasons; First one is related to the fact that the flash storage we have used, has a word length of 8-bits(i.e., one memory location is 8 bit wide), second reason is related to the micro-controller we have chosen, Atmega328 having an 8-bit architecture processes 8 bits in a single cycle, any more than that takes twice or more instructions. Since for our purposes voice data remains at a desirable quality even at 8 bits, so using 8 bits instead of all 10, increases the speed by about 4x (2x from the recording phase and 2x from storage phase), and it also allows us to improve the bit rate of the audio, which results in smoother playback.
ADC Sampling
Audio can be recorded in many ways. A seemingly straightforward yet unwise approach could be to take/obtain an audio sample as soon as we get a chance, this could mean that we get the highest possible number of samples that the processor can give in a given time frame. This approach will only make sense if we record the time taken between obtaining two samples, which presents several complications such as irregular samples, increased storage used, regeneration would be inaccurate due to errors in timekeeping. The other approach is to use a fixed sample rate system so that we get consistent results. An ADC interrupt is used for obtaining periodically generated samples. This practice is visualized in the "Profiling" Section.
SPI Communication:
SPI is the short form for SerialPeripheralInterface. It is one of the many communication protocols used in the electronics industry to establish communication between two devices. Example: SPI communication is used for interfacing SD cards used in phones, cameras, etc... SPI communication in its basic form is a 4 wire protocol, having SCK(Serial Clock), MISO(Master Data Input), MOSI(Master Data Output), and CS/SS(Slave Select). SPI being a synchronous serial communication protocol is faster and cheaper to implement, it can also handle multiple slaves on a single bus, which can be activated by using slave select.
The following steps are taken to transmit and receive a data word:-
- The data is placed from the data register to the shift register.
- Next, the CS (active low) pin of the required slave is turned LOW for selecting that particular slave.
- As soon as the CS is turned LOW, the data is shifted out along the falling clock pulses
- When the data is transferred fully, the CS pin is kept low, and now the slave sends the response data back, it could be the same data for error check or some return value to some command issued to the onboard slave micro-controller status register
SPI communication protocol can be implemented in two ways, either hardware or software. Hardware SPI is implemented by using hardware components such as shift registers, the hardware clock, buffer registers, etc... Hardware SPI is fastest, but usually, there are only one or two communication peripherals provided on a microcontroller, a software or bit-banged SPI can be used to get around this limitation.
The following Figure 11 shows the timing diagram of a routine SPI based data transfer. Using this, we get more clarity as to how actually the data is transferred via SPI. As discussed, there is a clock that synchronizes the process, but we see that the clock itself can be of 4 different types based on its 2 properties, namely; phase, and polarity. Clock polarity in Motorola based SPI system is defined by :-
- CPOL is Clock Polarity: It tells us if the SPI transfer is beginning at a rising edge or a falling edge, by determining the idle state of the clock. It is 0 for clock idle state 'LOW' and 1 for idle state 'HIGH.'
- CPHA is Clock Phase: It gives us information about the point at which the data is sampled and shifted out. CPHA 0 means that the data transfer is taking place at the first edge of every cycle, and 1 means data transfer is taking place at the following edge.
In this project, SPI Mode 0 is used, which means CPOL=0, CPHA=1. (SS )* is put to low, MOSI data is changed and the clock rising edge occurs and data is transmitted and received at the same time, the data on MOSI is changed on the next falling edge and then is transmitted on the following rising edge.
![]() |
Figure 11: SPI Communication diagram sourced from Atmega328P datasheet |
ADC Implementation
ADC peripheral can be used in two ways; single conversion or auto-triggered continuous conversions, and both supported with interrupting capability. Setting it up requires clock setup to the peripheral, mode of conversion, and input channel setup. The ADC settings are:-
- ADC Prescaler does the job of dividing the sub-main to determine the clock to the peripheral. This controls the sample rate as the clock cycles required for a single conversion remain constant. We want to regulate the sample rate of ADC, to leave time for processing other things.
- Input Channel selection is important as this setting specifies the pin receiving the analog input.
- Result Left Adjust: We only need the most significant 8 bits of the ADC for capturing the audio, we will use this feature for collecting our most significant bits in ADCH.
- Interrupt Enable To enable Interrupts and ADC operation, the IE(interrupt enable) and EN(enable) bit must be set in the ADCSRA register.
- Global Interrupt Enable/Disable and Start Conversion
The jobs performed in the following service routine involve the use of double buffers(explained ahead). The catch to this implementation is that both the recording and replaying processes are being done in the ADC interrupt service routine. The part where the recording is done, the ADC conversion result is used for recording. But for the replay part, the conversion result is not used but this ISR is used for a fixed output rate and the converted ADC value is not used.
SPI Implementation in our project involves the following settings, which set the communication interface to master mode with the required pins as output and set to a voltage corresponding to logic LOW. The mode we use is the default and corresponds to the Motorola defined SPI Mode 0*(CPHA=0, CPOL=0), which means clock and data sampling begins with a rising edge and the data is changed on the next falling edge. CPHA and CPOL are clock phase and clock polarity respectively.
To transmit data via SPI, a shift register is loaded with the data, and the data is transmitted automatically as per the SPI mode set by the internal registers.
Figure 12: Routine for obtaining JEDEC ID. Source: Winbond W25QXX SPI Flash Datasheet |
Using this and the required method of obtaining JEDEC ID in Figure 12 following is the function to obtain JEDEC ID from the serial flash chip.
![]() |
Figure 13: Data from logic analyzer during JEDEC ID read |
Dual buffer implementation refers to the usage of two data buffers back to back for recording real-time data in a resource-efficient way.
In the following process (Figure 14) you see that in step 1, both the buffers are idle; moving to step 2, buffer 'A' starts filling up with the audio data, and as soon as it is filled up, the buffer 'B' replaces it(step 4); meanwhile, it is shown that buffer 'A' dumps its data into the storage memory for storing samples, and when 'A' is done storing, it is ready for buffering again, so it replaces 'B' for filling up, again now its turn of 'B's to dump its data in the storage. This way the two buffers never keep the audio data waiting to be taken, giving a real-time response to the audio sample, because at any given time there is at least 1 buffer available to take in the ADC value.
By using the above-defined variables the following code implements the dual buffering technique. Enable for one buffer in True and for other is False, that means initially only one buffer is doing work which means that while the first buffer collects data some garbage value won't be written into the flash storage, why this could have been a potential problem we'll see further in this section.
Buffer 0 is enabled and is taking in the audio data as shown in Figure 14 step 2, meanwhile, the other buffer is just sitting. When "bufByteCount" reaches 256 that means 256 bytes have been filled and the 257th interrupt is being called and now the real action happens. Both buffers are enabled, and "work_pointer" is inverted in value making them switch jobs. Now ”work pointer” being True is taken as buffer 0 filling data and buffer 1 dumping so initially if buffer 1 had been enabled then it would have dumped 256bytes of worthless data into the memory, now when the jobs switch buffer 0 goes to job 2 and buffer 1 comes in for a fill. Observing job 2 we see that now ”work pointer” being False has a meaning that buffer 0 dumps data and buffer 1 fills up, so buffer 0 is dumped into the memory, this happens very quickly and thus if nothing is done after data dump the buffer 0 will be dumped again, so to check that buffer 0 is disabled until job switch happens as we see buffer 'A' waiting for buffer B in Figure 14 step 6.
This process continues until a set number of pages are written in the memory.
JOB1 Sample the data or Output it to the Speaker. (Done in ISR) |
JOB 2 Read or Dump the audio data in the Flash Memory. Perform buffer switch and management operations
![]() |
Figure 14: Dual buffers A, B are shown working in quick succession. |
To tackle the pitch change problem we need to understand what pitch actually is. Pitch is a term used for describing the voice, its an indicator of the range of frequencies of sound the person produces with a loosely defined reference (usually a typical male's voice). Higher pitch means higher frequencies. The problem is that we need to increase the pitch of our recorded sound and replay it. To do that we first see how a computer software changes pitch, example Audacity is computer software that can be used for manipulating sounds. In the following Figure 15, the normal waveform was recorded from a laptop microphone in the single-channel mode.
When the built-in feature of pitch effect is applied for pitch increase, we see that at a basic level the waveform is made shorter, which means it is being played faster and hence effectively decreasing the time between samples and increasing pitch.
![]() |
Figure 15: Pitch increase effect applied to a simple audio waveform |
After seeing this, we can safely say that if we increase the rate of the replay of sound the effective pitch will increase. Hence we do exactly that. In Figure 16 we can see that the audio is being sampled at a fixed rate and fed into a buffer for further processing. And the time between samples is being utilized for storing the other buffer into the SPI Flash as we have seen in the previous subsection 'Dual Buffer Implementation'.
![]() |
Figure 16: Sample points represented by vertical lines on the time axis |
Figure 17 shows the replay rate. As we saw in the subsection 7.3.2 the replay is done inside the ADC ISR and since the sample rate can be controlled so is the replay rate and hence reducing the Prescaler/divider of ADC clock by a factor of 2 effectively doubles the rate and hence changes the pitch, adding a custom delay after the retrieved sample is put on the output of a high-frequency PWM (see subsection 7.3.8), the level of pitch changed can be controlled. In this process too the time between the two output changes is used to retrieve data from the storage into the other buffer as shown in the previous subsection 'Dual Buffer Implementation'.
![]() |
Figure 17: Replay routine of the process |
When tackling the alien voice problem the motive was to give the voice output the same pitch but a broken effect along with a twist. To achieve this there could have been two ways. The first and the intuitive way would have been to introduce a square wave into the output of our system, which would effectively break the output into pieces containing voids/gaps in the output.
The other way that is actually used and I stumbled upon it during testing is introducing an interruption in the process of putting the output. This was done in by putting the trigger of new conversion before the job inside the current ISR is performed and every so often a new interrupt is generated without the current one being serviced completely which breaks the flow and along with it increases the duration of the sample being on the output hence giving it a twist. This trick worked wonders and only works in a specific case with the Prescaler being 16 at 8MHz and 8 at 16MHz.
Figure 18 shows the difference in the outputs of alien mode and the normal pitch change mode on a 1kHz sine wave input to the microcontroller. You can see that in the alien mode the waveform is almost the same but is going on and off periodically which exactly gives that alien voice we desired.
(a) Normal pitch changed output (1.35kHz) (b) The broken alien voice waveform (1.25kHz)
Figure 18: Comparison between the normal and the alien mode output
EWMA stands for Exponentially Weighted Moving Average. This term itself signifies that the latest sample is weighted exponentially with respect to other previous samples in the average calculation. This means that the latest sample in data will have the highest impact on our threshold calculation.
EWMA being a statistical quality control measure provides us the smoothness of a digital low pass filter and for our purposes a resistance to sudden noise spike. The parameter in question is lambda that gives the weight to the latest value, it must satisfy 0 <= lambda <= 1, it is often recommended that lambda can work better being 0.05 <= lambda <= 0.25 or 0.2 <= lambda <= 0.3.
If '<xi>' is the moving average and 'v' is the new value coming, then the following equations show EWMA averaging technique.
<xi> = lambda * v + (1 − lambda ) * <xi>
Our purpose is to monitor the idle microphone value and set it as a zero reference. After experimenting with the input values and the lambda values
The following Figures 19, 20, 21, 22 show the progressing development of a system where a threshold can be set to safely determine whether something is being spoken or not.
![]() |
Figure 19: The normal analog waveform, showing an example of a speech "Hello!" |
![]() |
Figure 20: The normal analog waveform, with EWMA applied to it, its application has smoothed the waveform and enabled us to determine an approximate idle sound value of "132" (decimal) |
Going by the concepts of communication we can ride data on a square wave by varying its duty cycle (duty cycle is the percentage of time a square wave remains high in one wave cycle). The square wave acts as a carrier to the data on its duty cycle, if this PWM wave's frequency is significantly greater than the frequency of the data, then passing it through an LPF(Low Pass Filter) will attenuate the high-frequency PWM and the averaged signal of lower frequency will pass. This effectively creates a DAC(Digital To Analog Converter from 1 pin) and we can output audio data easily.
![]() |
Figure 23: Shows a PWM wave unfiltered and filtered with an LPF(Low Pass Filter) |
Shown below is an example of signal riding on a PWM carrier, the following Figure 24 shows the PWM of 31kHz and the of a 1kHz sine wave which was recorded and the repeated with higher pitch i.e. higher frequency and hence we get a 1.33kHz wave which is after filtering the PWM with a 5kHz w0 cutoff.
(a)PWM carrying signal on duty cycle. (b) The output of the LPF to produce a 1.3kHz
represent variation in duty cycle approx sine wave
Figure 24: Working of the PWM to Audio output
Appendix:
Storage Memory
Chip Select (CS*)
The Storage memory used in the project is Winbond Serial flash W25Q32 with the following specifications: 32M-bit / 4M-byte, single 2.7 to 3.6V supply, 4mA active current, -40 Celcius to +85 Celcius operating range, 20-year data retention. The pin configuration and description of the W25Q32 used is shown below:
Figure 25: Serial Flash Details |
Chip Select (CS*)
The SPI Chip Select pin is active low(as represented by ) and enables and disables device operation. When CS* is high the device is deselected and the Serial Data Output (DO, or IO0, IO1, IO2, IO3) pins are at high impedance. When deselected, the device's power consumption will be at standby levels unless an internal erase, program or status register cycle is in progress.
When CS* is brought low the device will be selected, power consumption will increase to active levels and instructions can be written to and data read from the device. After power-up, CS* must transition from high to low before a new instruction will be accepted. The CS* input must track the VCC supply level at power-up. If needed a pull-up resister on CS* can be used to accomplish this.
Standard SPI instructions use the unidirectional DI (input) pin to serially write instructions, addresses or data to the device on the rising edge of the Serial Clock (CLK) input pin. Standard SPI also uses the unidirectional DO (output) to read data or status from the device on the falling edge CLK.
TheWrite Protect (WP*) pin can be used to prevent the Status Register from being written. Used in conjunction with the Status Register's Block Protect (SEC, TB, BP2, BP1, and BP0) bits and Status Register Protect (SRP) bits, a portion of the entire memory array can be hardware protected. The WP* pin is active low.
The HOLD* pin allows the device to be paused while it is actively selected. When HOLD* is brought low, while CS* is low, the DO pin will be at high impedance and signals on the DI and CLK pins will be ignored (don't care). When HOLD* is brought high, device operation can resume. The HOLD* function can be useful when multiple devices are sharing the same SPI signals. The HOLD* pin is active low.
The SPI Serial Clock Input (CLK) pin provides the timing for serial input and output operations.
In order to program the Atmega IC via the bootloader method, there is a requirement of a UART communication link between the PC and the Atmega, which is done via a bridge IC which converts USB to UART. CH340 is a cheap USB to UART converter which does the job without having a complex circuit. The major advantage of using an external programmer is that additional circuitry which may not come in use very often is avoided on the PCB.
(a) USB to UART CH340 Programmer (b) CH340 onboard circuit for early prototypes
For designing a good circuit for communication protocols, one must understand how any pin changes voltages on its output. The GPIO pins of any micro-controller follow a totem pole arrangement for making them have a high Z when wanted, this creates problems when some pin is made high Z and its voltage is measured on the line it's connected to. Pulling the GPIO pin to a logic level that is neutral to the device and constant is recommended. The following points are to be kept in mind when making the circuit for a communication protocol.
- Use pull-up or pull-down resistors on all control pins.
- Don't have stubs on clock pins. Instead, use clock buffers.
- Ensure the communication pins are latched and
![]() |
Figure 27: Talkative Tom Eagle CAD Schematic Diagram |
Board Design
![]() |
Figure 28: Talkative Tom Eagle CAD Board Layout |
Speaker
It is one of the important output component of this project. The speaker used has a diameter of 2.5 inches and impedance of 8ohm. Typically audio drivers can easily drive 8ohm impedance speakers, which is enough for a room and solves our purpose. PAM8403 can drive a 4ohm speaker at 3W and an 8ohm speaker at half the wattage meaning lesser power consumption and higher efficiency as per the datasheet. along with the images of the speaker of choice in Figure 29. For details on the characteristics of PAM8403 refer to Figure 3 in the 'Audio Amplifier'.
Figure 28: 8ohm speaker |
Profiling
Profiling is a neat technique used for finding out the time taken by program segments with the use of an oscilloscope. The way to do profiling is that a GPIO pin is toggled on and off as a program part begins and its toggled on and off again when that part ends. Neglecting the time taken by the toggling instruction itself, there will be a spike on the oscilloscope screen at the beginning as well as the end of the program part and hence by measuring the wave, we can find out how much time the program takes. The waveform we see on the screen is known as the program's time profile.
In this project, profiling was used to accurately determine the sample rate and replay rate for the audio, as shown in the following Figure 30.
(a) Recording at 15ksps (b) Replay at 20ksps
Figure 30: Time profiles of the record and replay programs
Avrdude
Avrdude is a command-line utility for reading, writing and manipulating Atmel AVR microcontrollers. By using this tool one can upload codes, manipulate fuses and program almost any AVR microcontroller via the command prompt. This utility comes pre-installed with the Arduino IDE package and is used under the hood to upload sketches.
To form commands for fuses visit http://www.engbedded.com/fusecalc/.
Easter Eggs:
This section is dedicated to the photographs of the various versions of this project which were tried and tested in the prototype phase. Finally, we went with the approach of keeping it out of the case to use it for teaching aspiring minds which would want to see how it looks and feels like to hold a bare working PCB with components.
The Latest Talkative Tom (2018), faster and better with MSP430
![]() |
Figure 31: Talkative Tom V1 prototype board, Atmega328P @5V 16MHz, DAC with 3.5mm audio jack for output, and an active filter(5kHz). Large size, and inefficient layout. |
![]() |
Figure 32: Talkative Tom V2 prototype board, Atmega328P @3.3V 8MHz, with selector for active and passive LPFs to study the quality differences in audio. Significantly more compact and better layout. |
Good job
ReplyDeleteInteresting. Thanks for sharing!!
ReplyDeleteIt is awesome, but how long it will take you to write all of this ?
ReplyDeleteThanks sir, It took me around a week to write the blog from the documentation I created earlier.
DeleteSuperb
ReplyDeleteThank you Anirudh
Delete