In [324]:
import os
import unittest
from speaker_id.transcribe_result import TranscribeResult
import bokeh
from bokeh.io import output_notebook
import pandas as pd

from bokeh.models import ColumnDataSource, HoverTool, ranges
from bokeh.plotting import figure, show
from bokeh.transform import factor_cmap
output_notebook()
Loading BokehJS ...

 Summary

This notebooks show a visualization of the results from amazon transcribe service. The initial idea was to build a dashboard that analyzes recorded meeting to see how long each speaker talks.

https://aws.amazon.com/transcribe/ The audio in spoken text was taken from an open source audio service. Transcribe could correctly transcribe the file and identify the different speakers.

The documentation stated that it is possible to stream the audio data via http2 but the documentation for the python library does not show how it works.

Compared to aws the documentation of the corresponding google services is much more detailed and has examples in many different languages (Ruby, Java, Python, Javascript ...) Google supports 120 hours whereas AWS offers only Spanish and English.

Load results of AWS Transcribe to python

In [325]:
result = TranscribeResult()
result.file_path = os.path.join('/Users/renzo/workspace/speaker_id/speaker_id/tests/../transcribe_results/test_2.json')
result.load_result()
In [ ]:
 

Create a Pandas dataframe for further Processing

In [326]:
df = pd.DataFrame(data=result.raw_dict['results']['speaker_labels']['segments'])
df[["end_time", "start_time"]] = df[["end_time", "start_time"]].apply(pd.to_numeric)
df["duration"] = df['end_time'] - df['start_time']
df[0:5]
Out[326]:
end_timeitemsspeaker_labelstart_timeduration
03.15[{'start_time': '0.54', 'speaker_label': 'spk_...spk_10.542.61
115.85[{'start_time': '4.3', 'speaker_label': 'spk_1...spk_14.3011.55
220.05[{'start_time': '17.44', 'speaker_label': 'spk...spk_117.442.61
326.95[{'start_time': '21.34', 'speaker_label': 'spk...spk_121.345.61
429.05[{'start_time': '28.31', 'speaker_label': 'spk...spk_128.310.74

Plot a speakers timeline

In [327]:
source = ColumnDataSource(df)

categories  = ['spk_0', 'spk_1', 'spk_2', 'spk_3', 'spk_4', 'spk_5']

p = figure(y_range=categories, plot_width=800, plot_height=300, title="Speaker Graph",)
p.hbar(y= df['speaker_label'].values, left=df['start_time'].values, right=df['end_time'].values, height=0.4)
show(p)

List the Speakers total Speech time

This can be achieved by a simple group by statement

In [328]:
sums = df.groupby('speaker_label').sum()
sums
Out[328]:
end_timestart_timeduration
speaker_label
spk_04328.1214233.20594.916
spk_12575.2022506.37268.830
spk_229818.23129330.758487.473
spk_31321.7261287.96633.760
spk_421001.53620845.856155.680
spk_57840.7467710.864129.882
In [329]:
p = figure(y_range=categories, plot_width=800, plot_height=300, title="Speaker Total time Graph",)
p.hbar(y=sums['duration'].index, right=sums['duration'].values, height=0.5, left=0,
        color="navy")

show(p)
In [ ]:
 
In [ ]: