Tutorial

sed_eval – Evaluation toolbox for Sound Event Detection

The structure of the sed_eval toolbox is as follows:

  • For evaluating the sound event detection system (SED system later), there are two types of metrics available: segment-based and event-based. For both types, there is a metric class SegmentBasedMetrics and EventBasedMetrics. A member function evaluate() is used to go through system output (estimated event list) and ground truth (reference event list) pairs. The results() function is used to get the metric values in dictionary. There are also functions to return results as a formatted string for convenience (e.g. result_report_overall()), or one can just print class instance.
  • For evaluating the acoustic scene classification system, there is similar evaluation class, SceneClassificationMetrics, as for SED system evaluation.

sed_eval also includes the following additional submodules:

  • io which contains convenience functions for loading annotations
  • util which includes miscellaneous functions to handle event lists (list of event items), event roll (event activity indicator matrix used in evaluation), and scene list.

Quickstart: Using the evaluators

The easiest way to evaluate systems with sed_eval is to use provided evaluators. Evaluators are Python scripts which can be run from the command prompt and utilize sed_eval to compute metrics according to reference and estimated annotations you provide. To use the evaluators, you must first install sed_eval and its dependencies (see Getting started). The evaluator scripts can be found in the sed_eval repository in the evaluators folder:

https://github.com/TUT-ARG/sed_eval/tree/master/evaluators

Currently there are two evaluators available, one for evaluating the sound event detection systems and one for evaluating acoustic scene classification systems.

Sound event detection

To get usage help:

./sound_event_eval.py --help

Evaluator takes as argument a csv-formatted file-list. The list contains pairs of filenames, one pair per row: first the filename of the reference event list file and the second the estimated event list file. Format is [reference_file][delimiter][estimated_file], and supported delimiters are ,, ;, tab.

Example of file-list:

office_snr0_high_v2.txt     office_snr0_high_v2_detected.txt
office_snr0_med_v2.txt      office_snr0_med_v2_detected.txt

Event list is csv-formatted text-file. Supported formats for the file are:

  • [event onset (float >= 0)][delimiter][event offset (float >= 0)]
  • [event onset (float >= 0)][delimiter][event offset (float >= 0)][delimiter][label]
  • [filename][delimiter][scene_label][delimiter][event onset (float >= 0)][delimiter][event offset (float >= 0)][delimiter][event label]

Supported delimiters: ,, ;, tab

Example of event list file:

21.64715    23.00552        alert
36.91184    38.27021        alert
69.72575    71.09029        alert
63.53990    64.89827        alert
84.25553    84.83920        alert
20.92974    21.82661        clearthroat
28.39992    29.29679        clearthroat
80.47837    81.95937        clearthroat
44.48363    45.96463        clearthroat
78.13073    79.05953        clearthroat
15.17031    16.27235        cough
20.54931    21.65135        cough
27.79964    28.90168        cough
75.45959    76.32490        cough
70.81708    71.91912        cough
21.23203    22.55902        doorslam
7.546220    9.014880        doorslam
34.11303    35.04183        doorslam
45.86001    47.32867        doorslam

To get segment-based and event-based metrics report printed, run:

./sound_event_eval.py file_list.txt

To get segment-based and event-based metrics saved in YAML-format, run:

./sound_event_eval.py file_list.txt -o results.yaml

Acoustic scene classification

./scene_eval.py --help

Evaluator takes as argument a csv-formatted file-list. The list contains pairs of filenames, one pair per row: first the filename of the reference scene list file and the second the estimated scene list file. Format is [reference_file][delimiter][estimated_file], and supported delimiters are ,, ;, tab.

Example of file-list:

fold1_reference.txt fold1_estimated.txt
fold2_reference.txt fold2_estimated.txt
fold3_reference.txt fold3_estimated.txt
fold4_reference.txt fold4_estimated.txt
fold5_reference.txt fold5_estimated.txt

Scene list is csv-formatted text-file. Supported formats for the file are:

  • [filename][delimiter][scene label]
  • [filename][delimiter][segment start (float >= 0)][delimiter][segment stop (float >= 0)][delimiter][scene label]

Supported delimiters: ,, ;, tab

Example of scene list file:

scenes_stereo/supermarket09.wav     supermarket
scenes_stereo/tubestation10.wav     tubestation
scenes_stereo/quietstreet08.wav     quietstreet
scenes_stereo/restaurant05.wav      restaurant
scenes_stereo/busystreet05.wav      busystreet
scenes_stereo/openairmarket04.wav   openairmarket
scenes_stereo/quietstreet01.wav     quietstreet
scenes_stereo/supermarket05.wav     supermarket
scenes_stereo/openairmarket01.wav   openairmarket

To get metrics printed, run:

./scene_eval.py file_list.txt

To get metrics saved in YAML-format, run:

./scene_eval.py file_list.txt -o results.yaml

Quickstart: Using sed_eval in Python code

After sed_eval is installed (see Getting started), it can be imported to your Python code as follows:

import sed_eval

Sound event detection

Usage example when reading event lists from disk (you can run example in path tests/data/sound_event):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
import sed_eval
import dcase_util

file_list = [
    {
     'reference_file': 'office_snr0_high_v2.txt',
     'estimated_file': 'office_snr0_high_v2_detected.txt'
    },
    {
     'reference_file': 'office_snr0_med_v2.txt',
     'estimated_file': 'office_snr0_med_v2_detected.txt'
    }
]

data = []

# Get used event labels
all_data = dcase_util.containers.MetaDataContainer()
for file_pair in file_list:
    reference_event_list = sed_eval.io.load_event_list(
        filename=file_pair['reference_file']
    )
    estimated_event_list = sed_eval.io.load_event_list(
        filename=file_pair['estimated_file']
    )

    data.append({'reference_event_list': reference_event_list,
                 'estimated_event_list': estimated_event_list})

    all_data += reference_event_list

event_labels = all_data.unique_event_labels

# Start evaluating

# Create metrics classes, define parameters
segment_based_metrics = sed_eval.sound_event.SegmentBasedMetrics(
    event_label_list=event_labels,
    time_resolution=1.0
)

event_based_metrics = sed_eval.sound_event.EventBasedMetrics(
    event_label_list=event_labels,
    t_collar=0.250
)

# Go through files
for file_pair in data:
    segment_based_metrics.evaluate(
        reference_event_list=file_pair['reference_event_list'],
        estimated_event_list=file_pair['estimated_event_list']
    )

    event_based_metrics.evaluate(
        reference_event_list=file_pair['reference_event_list'],
        estimated_event_list=file_pair['estimated_event_list']
    )

# Get only certain metrics
overall_segment_based_metrics = segment_based_metrics.results_overall_metrics()
print("Accuracy:", overall_segment_based_metrics['accuracy']['accuracy'])

# Or print all metrics as reports
print(segment_based_metrics)
print(event_based_metrics)

Usage example to evaluate results stored in variables:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
import sed_eval
import dcase_util

reference_event_list = dcase_util.containers.MetaDataContainer(
    [
        {
            'event_label': 'car',
            'event_onset': 0.0,
            'event_offset': 2.5,
            'file': 'audio/street/b099.wav',
            'scene_label': 'street'
        },
        {
            'event_label': 'car',
            'event_onset': 2.8,
            'event_offset': 4.5,
            'file': 'audio/street/b099.wav',
            'scene_label': 'street'
        },
        {
            'event_label': 'car',
            'event_onset': 6.0,
            'event_offset': 10.0,
            'file': 'audio/street/b099.wav',
            'scene_label': 'street'
        }
    ]
)

estimated_event_list = dcase_util.containers.MetaDataContainer(
    [
        {
            'event_label': 'car',
            'event_onset': 1.0,
            'event_offset': 3.5,
            'file': 'audio/street/b099.wav',
            'scene_label': 'street'
        },
        {
            'event_label': 'car',
            'event_onset': 7.0,
            'event_offset': 8.0,
            'file': 'audio/street/b099.wav',
            'scene_label': 'street'
        }
    ]
)

segment_based_metrics = sed_eval.sound_event.SegmentBasedMetrics(
    event_label_list=reference_event_list.unique_event_labels,
    time_resolution=1.0
)
event_based_metrics = sed_eval.sound_event.EventBasedMetrics(
    event_label_list=reference_event_list.unique_event_labels,
    t_collar=0.250
)

for filename in reference_event_list.unique_files:
    reference_event_list_for_current_file = reference_event_list.filter(
        filename=filename
    )

    estimated_event_list_for_current_file = estimated_event_list.filter(
        filename=filename
    )

    segment_based_metrics.evaluate(
        reference_event_list=reference_event_list_for_current_file,
        estimated_event_list=estimated_event_list_for_current_file
    )

    event_based_metrics.evaluate(
        reference_event_list=reference_event_list_for_current_file,
        estimated_event_list=estimated_event_list_for_current_file
    )

# Get only certain metrics
overall_segment_based_metrics = segment_based_metrics.results_overall_metrics()
print("Accuracy:", overall_segment_based_metrics['accuracy']['accuracy'])

# Or print all metrics as reports
print(segment_based_metrics)
print(event_based_metrics)

Acoustic scene classification

Usage example to evaluate files:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import sed_eval
import dcase_util

file_list = [
    {'reference_file': 'fold1_reference.txt', 'estimated_file': 'fold1_estimated.txt'}
]

data = []

# Get used scene labels and load data in
all_data = []
for file_pair in file_list:
    reference_scene_list = sed_eval.io.load_scene_list(
        filename=file_pair['reference_file'],
        csv_header=False,
        file_format=dcase_util.utils.FileFormat.CSV,
        fields=['filename', 'scene_label']
    )
    estimated_scene_list = sed_eval.io.load_scene_list(
        filename=file_pair['estimated_file'],
        csv_header=False,
        file_format=dcase_util.utils.FileFormat.CSV,
        fields=['filename', 'onset', 'offset', 'scene_label']
    )

    data.append(
        {
            'reference_scene_list': reference_scene_list,
            'estimated_scene_list': estimated_scene_list
        }
    )

    all_data += reference_scene_list

scene_labels = sed_eval.sound_event.util.unique_scene_labels(all_data)

# Create metrics class
scene_metrics = sed_eval.scene.SceneClassificationMetrics(
    scene_labels=scene_labels
)
for file_pair in data:
    scene_metrics.evaluate(
        reference_scene_list=file_pair['reference_scene_list'],
        estimated_scene_list=file_pair['estimated_scene_list']
    )

# Get only certain metrics
overall_metrics_results = scene_metrics.results_overall_metrics()
print("Accuracy:", overall_metrics_results['accuracy'])

# Or print all metrics as reports
print(scene_metrics)

Usage example to evaluate results stored in variables:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
import sed_eval
import dcase_util

reference = dcase_util.containers.MetaDataContainer([
    {
        'scene_label': 'supermarket',
        'file': 'supermarket09.wav'
    },
    {
        'scene_label': 'tubestation',
        'file': 'tubestation10.wav'
    },
    {
        'scene_label': 'quietstreet',
        'file': 'quietstreet08.wav'
    },
    {
        'scene_label': 'office',
        'file': 'office10.wav'
    },
    {
        'scene_label': 'bus',
        'file': 'bus01.wav'
    },
])

estimated = dcase_util.containers.MetaDataContainer([
    {
        'scene_label': 'supermarket',
        'file': 'supermarket09.wav'
    },
    {
        'scene_label': 'bus',
        'file': 'tubestation10.wav'
    },
    {
        'scene_label': 'quietstreet',
        'file': 'quietstreet08.wav'
    },
    {
        'scene_label': 'park',
        'file': 'office10.wav'
    },
    {
        'scene_label': 'car',
        'file': 'bus01.wav'
    },
])

scene_labels = sed_eval.sound_event.util.unique_scene_labels(reference)

scene_metrics = sed_eval.scene.SceneClassificationMetrics(scene_labels)
scene_metrics.evaluate(
    reference_scene_list=reference,
    estimated_scene_list=estimated
)

print(scene_metrics)