Thursday, September 25, 2014

What Dataset am I going to use ?


Ample data streams were required to hypothesize and validate any anomaly detection algorithm. Fortunately, it has been a while since CSN set up its network of inexpensive sensors. Hence CSN servers had ample data stored for the task, which spanned across 400 hundred sensors and 36 events.
Some of them are:
Earthquakes in Southern California
Name
UTC Date and Time
La Canada
17-10-2013 12:35
Joshua Tree
06-10-2013 02:06
La Verne 2
19-09-2013 12:06
La Verne
19-09-2013 11:43
Weldon 2
25-08-2013 18:50
Weldon
24-07-2013 16:46
Rancho Palos Verdes
07-06-2013 11:25
Santa Barbara Channel
29-05-2013 14:38
Meiners Oaks
07-05-2013 09:06
Marina del Rey 2
29-04-2013 03:06
Marina del Rey
27-04-2013 02:52
Los Angeles
20-03-2013 16:22
Fontana 2
17-03-2013 14:00
Anza 2
13-03-2013 04:21
Anza
11-03-2013 16:56
Rancho Cucamonga
10-02-2013 20:26
Avalon
14-12-2012 10:36
North Hollywood 2
19-11-2012 15:44
Monterey Park
14-11-2012 00:25
Lennox
05-11-2012 04:06
Manhattan Beach
06-11-2012 02:39
San Fernando
08-11-2012 03:33
Newhall 2
28-10-2012 15:24
Beverly Hills
03-09-2012 10:26
Beverly Hills Long
03-09-2012 10:26
Brawley
26-08-2012 20:57
Yorba Linda 2
08-08-2012 16:33
Altadena
05-07-2012 18:48
Altadena Long
05-07-2012 18:48
Fontana 3
15-01-2014 09:35
Fontana Monica
15-01-2014 09:35
Fontana 3 Long
15-01-2014 09:35
Fontana 3 Long Millikan
15-01-2014 09:35
Hollywood
08-02-2013 18:13
Westwood
17-03-2014 13:25
La Habra
29-03-2014 04:09

A CSN Sensor
The task remaining was to collect this data from the servers. CSN has a command line interface which allows data download in very convenient data SAC format, which ObsPY can read and handle operations on it very well. I download over 1200 streams of data, each of which had following characteristics:
  • Each stream represents one channel of one sensor (one axis of acceleration X, Y and Z)
  • Each stream was exactly 20 minutes long and had acceleration values the collected from the sensor by the server.
  • The sampling rate of the data was 50 samples per second.
  • An earthquake occurred in each of data stream at precisely 18th minute, on the basis time declared by United States Geological Survey (USGS) official report.
Thus a typical data stream looked like the one below:
../../_images/waveform_plotting_tutorial_1.png
Figure 2. Plot generated from ObsPy look like this. This is data stream where each point is acceleration value at time instant the sensor experiences.

No comments:

Post a Comment