seed_vault.service package

Submodules

seed_vault.service.db module

Database management module for the SEED-vault archive

This module provides a DatabaseManager class for handling seismic data storage in SQLite, including archive data and arrival data. It implements connection management, data insertion, querying, and database maintenance operations.

class seed_vault.service.db.DatabaseManager(db_path)[source]

Bases: object

Manages seismic data storage and retrieval using SQLite.

This class handles database connections, table creation, data insertion, and querying for seismic archive and arrival data.

db_path

Path to the SQLite database file.

Type:: str

Initialize DatabaseManager with database path.

Parameters:: db_path (str) – Path where the SQLite database should be created/accessed.

analyze_table(table_name)[source]

Update table statistics for query optimization.

Parameters:: table_name (str) – Name of the table to analyze.

bulk_insert_archive_data(archive_list)[source]

Insert multiple archive data records.

Parameters:: archive_list (List[Tuple]) – List of tuples containing archive data records.
Returns:: Number of inserted records.
Return type:: int

bulk_insert_arrival_data(arrival_list)[source]

Insert multiple arrival data records.

Parameters:: arrival_list (List[Tuple]) – List of tuples containing arrival data records.
Returns:: Number of inserted records.
Return type:: int

check_data_existence(netcode, stacode, location, channel, starttime, endtime)[source]

Run a simple check to see if a db element exists for a trace

Parameters:

db_manager (DatabaseManager) – Database manager instance
network (str) – Network code
station (str) – Station code
location (str) – Location code
channel (str) – Channel code
start/endtime (str) – Time in iso

Returns:

True if data exists for the specified parameters, False otherwise

Return type:

bool

connection(max_retries=3, initial_delay=1)[source]

Context manager for database connections with retry mechanism.

Parameters:

max_retries (int) – Maximum number of connection retry attempts.
initial_delay (float) – Initial delay between retries in seconds.

Yields:

sqlite3.Connection – Database connection object.

Raises:

sqlite3.OperationalError – If database connection fails after all retries.

delete_elements(table_name, start_time=0, end_time=4102444799)[source]

Delete elements from specified table within time range.

Parameters:

table_name (str) – Name of the table (‘archive_data’ or ‘arrival_data’).
start_time (Union[int, float, datetime, UTCDateTime]) – Start time for deletion range.
end_time (Union[int, float, datetime, UTCDateTime]) – End time for deletion range.

Returns:

Number of deleted rows.

Return type:

int

Raises:

ValueError – If table_name is invalid or time format is incorrect.

display_contents(table_name, start_time=0, end_time=4102444799, limit=100)[source]

Display contents of a specified table within a given time range.

Parameters:

table_name (str) – Name of the table to query (‘archive_data’ or ‘arrival_data’).
start_time (Union[int, float, datetime, UTCDateTime]) – Start time for the query.
end_time (Union[int, float, datetime, UTCDateTime]) – End time for the query.
limit (int) – Maximum number of rows to return.

execute_query(query)[source]

Execute an SQL query and return results.

Parameters:

query (str) – SQL query to execute.

Returns:

bool: Whether an error occurred
str: Status message or error description
Optional[pd.DataFrame]: Results for SELECT queries, None otherwise

Return type:

Tuple containing

fetch_arrivals_distances(resource_id, netcode, stacode)[source]

Retrieve arrival times and distance metrics for a specific event and station.

Parameters:

resource_id (str) – Unique identifier for the seismic event.
netcode (str) – Network code for the station.
stacode (str) – Station code.

Returns:

Tuple containing: (p_arrival, s_arrival, dist_km, dist_deg, azimuth), where: - p_arrival: P wave arrival time (timestamp) - s_arrival: S wave arrival time (timestamp) - dist_km: Distance in kilometers - dist_deg: Distance in degrees - azimuth: Azimuth angle from event to station Returns None if no matching record is found.

Return type:

Optional[Tuple[float, float, float, float, float]]

get_arrival_data(resource_id, netcode, stacode)[source]

Retrieve complete arrival data for a specific event and station.

Parameters:

resource_id (str) – Unique identifier for the seismic event.
netcode (str) – Network code for the station.
stacode (str) – Station code.

Returns:

Dictionary containing all arrival data fields for the: specified event and station, or None if no matching record is found.

Return type:

Optional[Dict[str, Any]]

get_events_for_station(netcode, stacode)[source]

Retrieve all seismic events recorded by a specific station.

Parameters:

netcode (str) – Network code for the station.
stacode (str) – Station code.

Returns:

List of dictionaries containing arrival data for all: events recorded by the station. Returns empty list if no events found.

Return type:

List[Dict[str, Any]]

get_stations_for_event(resource_id)[source]

Retrieve all station data associated with a specific seismic event.

Parameters:

resource_id (str) – Unique identifier for the seismic event.

Returns:

List of dictionaries containing arrival data for all: stations that recorded the event. Returns empty list if no stations found.

Return type:

List[Dict[str, Any]]

join_continuous_segments(gap_tolerance=30)[source]

Join continuous data segments in the database.

Parameters:: gap_tolerance (float) – Maximum allowed gap (in seconds) to consider segments continuous.

reindex_tables()[source]: Reindex both of the tables in our DB

setup_database()[source]: Initialize database schema with required tables and indices.

vacuum_database()[source]: Rebuild the database file to reclaim unused space.

seed_vault.service.db.miniseed_to_db_elements(file_path)[source]

Convert a miniseed file to a database element tuple.

Processes a miniseed file and extracts relevant metadata for database storage. Expects files in the format: network.station.location.channel.*.year.julday

Parameters:

file_path (str) – Path to the miniseed file.

Returns:

A tuple containing:

network: Network code
station: Station code
location: Location code
channel: Channel code
start_time: ISO format start time
end_time: ISO format end time

Returns None if file is invalid or cannot be processed.

Return type:

Optional[Tuple[str, str, str, str, str, str]]

Example

>>> element = miniseed_to_db_element("/path/to/IU.ANMO.00.BHZ.D.2020.001")
>>> if element:
...     network, station, location, channel, start, end = element

seed_vault.service.db.populate_database_from_files(cursor, file_paths=[])[source]

Insert or update MiniSEED file metadata into an SQL database.

Takes a list of SDS archive file paths, extracts metadata, and updates a database tracking data availability. If data spans overlap with existing database entries, the spans are merged. Uses miniseed_to_db_elements() to parse file metadata.

Parameters:

cursor (sqlite3.Cursor) – Database cursor for executing SQL commands
file_paths (list, optional) – List of paths to MiniSeed files. Defaults to empty list.

Notes

Database must have an ‘archive_data’ table with columns:
- network (text)
- station (text)
- location (text)
- channel (text)
- starttime (integer): Unix timestamp
- endtime (integer): Unix timestamp
- importtime (integer): Unix timestamp of database insertion
Handles overlapping time spans by merging them into a single entry
Sets importtime to current Unix timestamp
Skips files that fail metadata extraction (when miniseed_to_db_elements returns None)

Examples

>>> import sqlite3
>>> conn = sqlite3.connect('archive.db')
>>> cursor = conn.cursor()
>>> files = ['/path/to/IU.ANMO.00.BHZ.mseed', '/path/to/IU.ANMO.00.BHN.mseed']
>>> populate_database_from_files(cursor, files)
>>> conn.commit()

seed_vault.service.db.populate_database_from_files_dumb(cursor, file_paths=[])[source]

Simple version of database population from MiniSEED files without span merging.

A simplified “dumb” version that blindly replaces existing database entries with identical network/station/location/channel codes, rather than checking for and merging overlapping time spans.

Parameters:

cursor (sqlite3.Cursor) – Database cursor for executing SQL commands
file_paths (list, optional) – List of paths to MiniSeed files. Defaults to empty list.

seed_vault.service.db.populate_database_from_sds(sds_path, db_path, search_patterns=['??.*.*.???.?.????.???'], newer_than=None, num_processes=None, gap_tolerance=60)[source]

Scan an SDS archive directory and populate a database with data availability.

Recursively searches an SDS (Seismic Data Structure) archive for MiniSEED files, extracts their metadata, and records data availability in a SQLite database. Supports parallel processing and can optionally filter for recently modified files.

Parameters:

sds_path (str) – Path to the root SDS archive directory
db_path (str) – Path to the SQLite database file
search_patterns (list, optional) – List of file patterns to match. Defaults to [“??.*.*.???.?.????.???”] (standard SDS naming pattern).
newer_than (str or UTCDateTime, optional) – Only process files modified after this time. Defaults to None (process all files).
num_processes (int, optional) – Number of parallel processes to use. Defaults to None (use all available CPU cores).
gap_tolerance (int, optional) – Maximum time gap in seconds between segments that should be considered continuous. Defaults to 60.

Notes

Uses DatabaseManager class to handle database operations
Attempts multiprocessing but falls back to single process if it fails
(common on OSX and Windows)
Follows symbolic links when walking directory tree
Files are processed using miniseed_to_db_elements() function
After insertion, continuous segments are joined based on gap_tolerance
Progress is displayed using tqdm progress bars
If newer_than is provided, it’s converted to a Unix timestamp for comparison

Raises:: RuntimeError – If bulk insertion into database fails

seed_vault.service.db.stream_to_db_elements(st)[source]

Convert an ObsPy Stream object to multiple database element tuples, properly handling gaps. Creates database elements from a stream, assuming all traces have the same Network-Station-Location-Channel (NSLC) codes (e.g. an SDS file).

Parameters:

st (Stream) – ObsPy Stream object containing seismic traces.

Returns:

A list of tuples, each containing:

network: Network code
station: Station code
location: Location code
channel: Channel code
start_time: ISO format start time
end_time: ISO format end time

Returns empty list if stream is empty.

Return type:

List[Tuple[str, str, str, str, str, str]]

Example

>>> stream = obspy.read()
>>> elements = stream_to_db_element(stream)
>>> for element in elements:
...     network, station, location, channel, start, end = element

seed_vault.service.events module

The events service should get the events based on a selection (filter) settings. UI should generate the selection and pass it here. We need a single function here that gets the selection and runs Rob’s script.

We should also be able to support multi-select areas.

@TODO: For now, dummy scripts are used. @Yunlong to fix.

seed_vault.service.events.event_response_to_df(data)[source]: @TODO: base on response from FSDN, below should be re-written

seed_vault.service.events.get_event_data(settings)[source]

seed_vault.service.events.remove_duplicate_events(events)[source]

seed_vault.service.gen_config_models module

seed_vault.service.seismoloader module

The main functions for SEED-vault, from original CLI-only version (Pickle 2024)

class seed_vault.service.seismoloader.CustomConfigParser(*args, **kwargs)[source]

Bases: ConfigParser

Custom configuration parser that can preserve case sensitivity for specified sections.

This class extends the standard ConfigParser to allow certain sections to maintain case sensitivity while others are converted to lowercase.

case_sensitive_sections

Set of section names that should preserve case sensitivity.

Type:: set

Initialize the CustomConfigParser.

Parameters:

*args – Variable length argument list passed to ConfigParser.
**kwargs – Arbitrary keyword arguments passed to ConfigParser.

optionxform(optionstr)[source]

Transform option names during parsing.

Overrides the default behavior to preserve the original string case.

Parameters:: optionstr (str) – The option string to transform.
Returns:: The original string unchanged.
Return type:: str

seed_vault.service.seismoloader.archive_request(request, waveform_clients, sds_path, db_manager)[source]

Download seismic data for a request and archive it in SDS format.

Retrieves waveform data from FDSN web services, saves it in SDS format, and updates the database. Handles authentication, data merging, and various error conditions.

Parameters:

request (Tuple[str, str, str, str, str, str]) – Tuple containing (network, station, location, channel, start_time, end_time)
waveform_clients (Dict[str, Client]) – Dictionary mapping network codes to FDSN clients. Special key ‘open’ is used for default client.
sds_path (str) – Root path of the SDS archive.
db_manager (DatabaseManager) – DatabaseManager instance for updating the database.

Return type:

None

Note

Supports per-network and per-station authentication
Handles splitting of large station list requests
Performs data merging when files already exist
Attempts STEIM2 compression, falls back to uncompressed format
Groups traces by day to handle fragmented data efficiently

Example

>>> clients = {'IU': Client('IRIS'), 'open': Client('IRIS')}
>>> request = ("IU", "ANMO", "00", "BHZ", "2020-01-01", "2020-01-02")
>>> archive_request(request, clients, "/data/seismic", db_manager)

seed_vault.service.seismoloader.collect_requests(inv, time0, time1, days_per_request=3, cha_pref=None, loc_pref=None)[source]

Generate time-windowed data requests for all channels in an inventory.

Creates a list of data requests by breaking a time period into smaller windows and collecting station metadata for each window. Can optionally filter for preferred channels and location codes.

Parameters:

inv (obspy.core.inventory.Inventory) – Station inventory to generate requests for
time0 (obspy.UTCDateTime) – Start time for data requests
time1 (obspy.UTCDateTime) – End time for data requests
days_per_request (int, optional) – Length of each request window in days. Defaults to 3.
cha_pref (list, optional) – List of preferred channel codes in priority order. If provided, only these channels will be requested. Defaults to None.
loc_pref (list, optional) – List of preferred location codes in priority order. If provided, only these location codes will be requested. Defaults to None.

Returns:

List of tuples containing request parameters:

(network_code, station_code, location_code, channel_code,: start_time_iso, end_time_iso)

Returns None if start time is greater than or equal to end time.

Return type:

list or None

Notes

End time is capped at 120 seconds before current time
Times in returned tuples are ISO formatted strings with ‘Z’ suffix
Uses get_preferred_channels() if cha_pref or loc_pref are specified

Examples

>>> from obspy import UTCDateTime
>>> t0 = UTCDateTime("2020-01-01")
>>> t1 = UTCDateTime("2020-01-10")
>>> requests = collect_requests(inventory, t0, t1,
...                           days_per_request=2,
...                           cha_pref=['HHZ', 'BHZ'],
...                           loc_pref=['', '00'])

seed_vault.service.seismoloader.collect_requests_event(eq, inv, model=None, settings=None)[source]

Collect data requests and arrival times for an event at multiple stations.

For a given earthquake event, calculates arrival times and generates data requests for all appropriate stations in the inventory.

Parameters:

eq (Event) – ObsPy Event object containing earthquake information.
inv (Inventory) – ObsPy Inventory object containing station information.
model (Optional[TauPyModel]) – Optional TauPyModel for travel time calculations. If None, uses model from settings or falls back to IASP91.
settings (Optional[SeismoLoaderSettings]) – Optional SeismoLoaderSettings object containing configuration.

Returns:

List of request tuples (net, sta, loc, chan, start, end)
List of arrival data tuples for database
Dictionary mapping “net.sta” to P-arrival timestamps

Return type:

Tuple containing

Note

Requires a DatabaseManager instance to check for existing arrivals. Time windows are constructed around P-wave arrivals using settings. Handles both new calculations and retrieving existing arrival times.

Example

>>> event = client.get_events()[0]
>>> inventory = client.get_stations(network="IU")
>>> requests, arrivals, p_times = collect_requests_event(
...     event, inventory, model=TauPyModel("iasp91")
... )

seed_vault.service.seismoloader.combine_requests(requests)[source]

Combine multiple data requests for efficiency.

Groups requests by network and time range, combining stations, locations, and channels into comma-separated lists to minimize the number of requests.

Parameters:: requests (List[Tuple[str, str, str, str, str, str]]) – List of request tuples, each containing: (network, station, location, channel, start_time, end_time)
Return type:: List[Tuple[str, str, str, str, str, str]]
Returns:: List of combined request tuples with the same structure but with station, location, and channel fields potentially containing comma-separated lists.

Example

>>> original = [
...     ("IU", "ANMO", "00", "BHZ", "2020-01-01", "2020-01-02"),
...     ("IU", "COLA", "00", "BHZ", "2020-01-01", "2020-01-02")
... ]
>>> combined = combine_requests(original)
>>> print(combined)
[("IU", "ANMO,COLA", "00", "BHZ", "2020-01-01", "2020-01-02")]

seed_vault.service.seismoloader.get_events(settings)[source]

Retrieve seismic event catalogs based on configured criteria.

Queries FDSN web services or loads local catalogs for seismic events matching specified criteria including time range, magnitude, depth, and geographic constraints.

Parameters:

settings (SeismoLoaderSettings) – Configuration settings containing event search criteria, client information, and filtering preferences.

Return type:

List[Catalog]

Returns:

List of ObsPy Catalog objects containing matching events. Returns empty catalog if no events found.

Raises:

FileNotFoundError – If local catalog file not found.
PermissionError – If unable to access local catalog file.
ValueError – If invalid geographic constraint type specified.

Example

>>> settings = SeismoLoaderSettings()
>>> settings.event.min_magnitude = 5.0
>>> catalogs = get_events(settings)

seed_vault.service.seismoloader.get_missing_from_request(db_manager, eq_id, requests, st)[source]

Compare requested seismic data against what’s present in a Stream. Handles comma-separated values for location and channel codes.

Return type:: dict

Parameters:

eq_idstr: Earthquake ID to use as dictionary key
requestsList[Tuple]: List of request tuples, each containing (network, station, location, channel, starttime, endtime)
stStream: ObsPy Stream object containing seismic traces

Returns:

: dict

Nested dictionary with structure: {eq_id: {

“network.station”: value, “network2.station2”: value2, …

}} where value is either: - list of missing channel strings (“network.station.location.channel”) - “Not Attempted” if stream is empty - “ALL” if all requested channels are missing - [] if all requested channels are present

seed_vault.service.seismoloader.get_p_s_times(eq, dist_deg, ttmodel)[source]

Calculate theoretical P and S wave arrival times for an earthquake at a given distance.

Uses a travel time model to compute the first P and S wave arrivals for a given earthquake and distance. The first arrival (labeled as “P”) may not necessarily be a direct P wave. For S waves, only phases explicitly labeled as ‘S’ are considered.

Parameters:

eq (obspy.core.event.Event) – Earthquake event object containing origin time and depth information
dist_deg (float) – Distance between source and receiver in degrees
ttmodel (obspy.taup.TauPyModel) – Travel time model to use for calculations

Returns:

A tuple containing:

(UTCDateTime or None): Time of first arrival (“P” wave)
(UTCDateTime or None): Time of first S wave arrival Returns (None, None) if travel time calculation fails

Return type:

tuple

Notes

Earthquake depth is expected in meters in the QuakeML format and is converted to kilometers for the travel time calculations
For S waves, only searches for explicit ‘S’ phase arrivals
Warns if no P arrival is found at any distance
Warns if no S arrival is found at distances ≤ 90 degrees

Examples

>>> from obspy.taup import TauPyModel
>>> model = TauPyModel(model="iasp91")
>>> p_time, s_time = get_p_s_times(earthquake, 45.3, model)

seed_vault.service.seismoloader.get_preferred_channels(inv, cha_rank=None, loc_rank=None, time=None)[source]

Select the best available channels from an FDSN inventory based on rankings.

Filters an inventory to keep only the preferred channels based on channel code and location code rankings. For each component (Z, N, E), selects the channel with the highest ranking.

Parameters:

inv (Inventory) – ObsPy Inventory object to filter.
cha_rank (Optional[List[str]]) – List of channel codes in order of preference (e.g., [‘BH’, ‘HH’]). Lower index means higher preference.
loc_rank (Optional[List[str]]) – List of location codes in order of preference (e.g., [‘’, ‘00’]). Lower index means higher preference. ‘–’ is treated as empty string.
time (Optional[UTCDateTime]) – Optional time to filter channel availability at that time.

Return type:

Inventory

Returns:

Filtered ObsPy Inventory containing only the preferred channels. If all channels would be filtered out, returns original station.

Note

Channel preference takes precedence over location preference. If neither cha_rank nor loc_rank is provided, returns original inventory.

Example

>>> inventory = client.get_stations(network="IU", station="ANMO")
>>> cha_rank = ['BH', 'HH', 'EH']
>>> loc_rank = ['00', '10', '']
>>> filtered = get_preferred_channels(inventory, cha_rank, loc_rank)

seed_vault.service.seismoloader.get_selected_stations_at_channel_level(settings)[source]

Update inventory information to include channel-level details for selected stations.

Retrieves detailed channel information for each station in the selected inventory using the specified FDSN client.

Parameters:: settings (SeismoLoaderSettings) – Configuration settings containing station selection and client information.
Return type:: SeismoLoaderSettings
Returns:: Updated settings with refined station inventory including channel information.

Example

>>> settings = SeismoLoaderSettings()
>>> settings = get_selected_stations_at_channel_level(settings)

seed_vault.service.seismoloader.get_stations(settings)[source]

Retrieve station inventory based on configured criteria.

Gets station information from FDSN web services or local inventory based on settings, including geographic constraints, network/station filters, and channel preferences.

Parameters:: settings (SeismoLoaderSettings) – Configuration settings containing station selection criteria, client information, and filtering preferences.
Return type:: Optional[Inventory]
Returns:: Inventory containing matching stations, or None if no stations found or if station service is unavailable.

Note

The function applies several layers of filtering: 1. Basic network/station/location/channel criteria 2. Geographic constraints (if specified) 3. Station exclusions/inclusions 4. Channel and location preferences 5. Sample rate filtering

Example

>>> settings = SeismoLoaderSettings()
>>> settings.station.network = "IU"
>>> inventory = get_stations(settings)

seed_vault.service.seismoloader.prune_requests(requests, db_manager, sds_path, min_request_window=3)[source]

Remove overlapping requests where data already exists in the archive.

Checks both the database and filesystem for existing data and removes or splits requests to avoid re-downloading data that should be there already.

Parameters:

requests (List[Tuple[str, str, str, str, str, str]]) – List of request tuples containing: (network, station, location, channel, start_time, end_time)
db_manager (DatabaseManager) – DatabaseManager instance for querying existing data.
sds_path (str) – Root path of the SDS archive.
min_request_window (float) – Minimum time window in seconds to keep a request. Requests shorter than this are discarded. Default is 3 seconds.

Return type:

List[Tuple[str, str, str, str, str, str]]

Returns:

List of pruned request tuples, sorted by start time, network, and station.

Note

This function will update the database if it finds files in the SDS structure that aren’t yet recorded in the database.

Example

>>> requests = [("IU", "ANMO", "00", "BHZ", "2020-01-01", "2020-01-02")]
>>> pruned = prune_requests(requests, db_manager, "/data/SDS")

seed_vault.service.seismoloader.read_config(config_file)[source]

Read and process a configuration file with case-sensitive handling for specific sections.

Reads a configuration file and processes it such that certain sections (AUTH, DATABASE, SDS, WAVEFORM) preserve their case sensitivity while other sections are converted to lowercase.

Parameters:

config_file (str) – Path to the configuration file to read.

Returns:

Processed configuration with appropriate case handling: for different sections.

Return type:

CustomConfigParser

Example

>>> config = read_config("config.ini")
>>> auth_value = config.get("AUTH", "ApiKey")  # Case preserved
>>> other_value = config.get("settings", "parameter")  # Converted to lowercase

seed_vault.service.seismoloader.run_continuous(settings, stop_event=None)[source]

Retrieves continuous seismic data over long time intervals for a set of stations defined by the inv parameter. The function manages multiple steps including generating data requests, pruning unnecessary requests based on existing data, combining requests for efficiency, and finally archiving the retrieved data.

The function uses a client setup based on the configuration in settings to handle different data sources and authentication methods. Errors during client creation or data retrieval are handled gracefully, with issues logged to the console.

Parameters: - settings (SeismoLoaderSettings): Configuration settings containing client information,

authentication details, and database paths necessary for data retrieval and storage. This should include the start and end times for data collection, database path, and SDS archive path among other configurations.

stop_event (threading.Event): Optional event flag for canceling the operation mid-execution. If provided and set, the function will terminate gracefully at the next safe point.

Workflow: 1. Initialize clients for waveform data retrieval. 2. Retrieve station information based on settings. 3. Collect initial data requests for the given time interval. 4. Prune requests based on existing data in the database to avoid redundancy. 5. Combine similar requests to minimize the number of individual operations. 6. Update or create clients based on specific network credentials if necessary. 7. Execute data retrieval requests, archive data to disk, and update the database.

Raises: - Exception: General exceptions could be raised due to misconfiguration, unsuccessful

data retrieval or client initialization errors. These exceptions are caught and logged, but not re-raised, allowing the process to continue with other requests.

Notes: - It is crucial to ensure that the settings object is correctly configured, especially

the client details and authentication credentials to avoid runtime errors.

The function logs detailed information about the processing steps and errors to aid in debugging and monitoring of data retrieval processes.

seed_vault.service.seismoloader.run_event(settings, stop_event=None)[source]

Processes and downloads seismic event data for each event in the provided catalog using the specified settings and station inventory. The function manages multiple steps including data requests, arrival time calculations, database updates, and data retrieval.

The function handles data retrieval from FDSN web services with support for authenticated access and restricted data. Processing can be interrupted via the stop_event parameter, and errors during execution are handled gracefully with detailed logging.

Parameters: - settings (SeismoLoaderSettings): Configuration settings that include client details,

authentication credentials, event-specific parameters like radius and time window, and paths for data storage.

stop_event (threading.Event): Optional event flag for canceling the operation mid-execution. If provided and set, the function will terminate gracefully at the next safe point.

Workflow: 1. Initialize paths and database connections 2. Load appropriate travel time model for arrival calculations 3. Process each event in the catalog:

Calculate arrival times and generate data requests

Update arrival information in database

Check for existing data and prune redundant requests

Download and archive new data

Add event metadata to traces (arrivals, distances, azimuths)

Combine data into event streams with complete metadata

Returns: - List[obspy.Stream]: List of streams, each containing data for one event with

complete metadata including arrival times, distances, and azimuths. Returns None if operation is canceled or no data is processed.

Raises: - Exception: General exceptions from client creation, data retrieval, or processing

are caught and logged but not re-raised, allowing processing to continue with remaining events.

Notes: - The function supports threading and can be safely interrupted via stop_event - Station metadata is enriched with event-specific information including arrivals - Data is archived in SDS format and the database is updated accordingly - Each stream in the output includes complete event metadata for analysis

seed_vault.service.seismoloader.run_main(settings=None, from_file=None, stop_event=None)[source]

Main entry point for seismic data retrieval and processing.

Coordinates the overall workflow for retrieving and processing seismic data, handling both continuous and event-based data collection based on settings.

Parameters:

settings (Optional[SeismoLoaderSettings]) – Configuration settings for data retrieval and processing. If None, settings must be provided via from_file.
from_file (Optional[str]) – Path to configuration file to load settings from. Only used if settings is None.
stop_event (Optional[Event]) – Optional event flag for canceling the operation mid-execution. If provided and set, the function will terminate gracefully at the next safe point.

Return type:

None

Returns:

The result from run_continuous or run_event, or None if cancelled.

Example

>>> # Using settings object
>>> settings = SeismoLoaderSettings()
>>> settings.download_type = DownloadType.EVENT
>>> run_main(settings)

>>> # Using configuration file
>>> run_main(from_file="config.ini")

seed_vault.service.seismoloader.select_highest_samplerate(inv, minSR=10, time=None)[source]

Filters an inventory to keep only the highest sample rate channels where duplicates exist.

For each station in the inventory, this function identifies duplicate channels (those sharing the same location code) and keeps only those with the highest sample rate. Channels must meet the minimum sample rate requirement to be considered.

Parameters:

inv (obspy.core.inventory.Inventory) – Input inventory object
minSR (float, optional) – Minimum sample rate in Hz. Defaults to 10.
time (obspy.UTCDateTime, optional) – Specific time to check channel existence. If provided, channels are considered duplicates if they share the same location code and both exist at that time. If None, channels are considered duplicates if they share the same location code and time span. Defaults to None.

Returns:

Filtered inventory containing only the highest: sample rate channels where duplicates existed.

Return type:

obspy.core.inventory.Inventory

Examples

>>> # Filter inventory keeping only highest sample rate channels
>>> filtered_inv = select_highest_samplerate(inv)
>>>
>>> # Filter for a specific time, minimum 1 Hz
>>> from obspy import UTCDateTime
>>> time = UTCDateTime("2020-01-01")
>>> filtered_inv = select_highest_samplerate(inv, minSR=1, time=time)

Notes

Channel duplicates are determined by location code and either: * Existence at a specific time (if time is provided) * Having identical time spans (if time is None)
All retained channels must have sample rates >= minSR
For duplicate channels, all channels with the highest sample rate are kept

seed_vault.service.seismoloader.setup_paths(settings)[source]

Initialize paths and database for seismic data management.

Parameters:

settings (SeismoLoaderSettings) – Configuration settings containing paths and database information.

Returns:

Updated settings with validated paths
Initialized DatabaseManager instance

Return type:

Tuple containing

Raises:

ValueError – If SDS path is not set in settings.

Example

>>> settings = SeismoLoaderSettings()
>>> settings.sds_path = "/data/seismic"
>>> settings, db_manager = setup_paths(settings)

seed_vault.service.stations module

The stations service should get the stations based on a selection (filter) settings. UI should generate the selection and pass it here. We need a single function here that gets the selection and runs Rob’s script.

We should also be able to support multi-select areas.

@TODO: For now, dummy scripts are used. @Yunlong to fix.

seed_vault.service.stations.get_station_data(settings)[source]

seed_vault.service.stations.remove_duplicate_inventories(inventories)[source]

seed_vault.service.stations.station_response_to_df(inventory)[source]: Convert ObsPy Inventory data into a DataFrame with station information.

seed_vault.service.utils module

seed_vault.service.utils.check_client_services(client_name)[source]: Check which services are available for a given client name.

seed_vault.service.utils.convert_to_datetime(value)[source]

Convert a string or other value to a date and time object, handling different formats.

If only a date is provided, it defaults to 00:00:00 time.

note that this returns a tuple of (date, time)

seed_vault.service.utils.filter_catalog_by_geo_constraints(catalog, constraints)[source]

Filter an ObsPy event catalog to include events within ANY of original search constraints. This should be done to clean up any superfluous events that our reducted get_event calls may have introduced.

Return type:: Catalog

Parameters:

catalogobspy.core.event.Catalog: The input event catalog to filter

constraints: settings.event.geo_constraint (whatever object type this is TODO)

Returns:

: obspy.core.event.Catalog

A new catalog containing events within any of the specified circles

seed_vault.service.utils.filter_inventory_by_geo_constraints(inventory, constraints)[source]

Filter an ObsPy inventory to include stations within ANY of the original search constraints.

Return type:: Inventory

Parameters:

inventoryobspy.Inventory: The input inventory to filter
constraints: settings.event.geo_constraint: List of geographical constraints

Returns:

: obspy.Inventory

A new inventory containing only stations within any of the specified constraints

seed_vault.service.utils.format_error(station, error)[source]

seed_vault.service.utils.get_sds_filenames(n, s, l, c, time_start, time_end, sds_path)[source]

Generate SDS (SeisComP Data Structure) format filenames for a time range.

Creates a list of daily SDS format filenames for given network, station, location, and channel codes over a specified time period.

Parameters:

n (str) – Network code.
s (str) – Station code.
l (str) – Location code.
c (str) – Channel code.
time_start (UTCDateTime) – Start time for data requests.
time_end (UTCDateTime) – End time for data requests.
sds_path (str) – Root path of the SDS archive.

Returns:

/sds_path/YEAR/NETWORK/STATION/CHANNEL.D/NET.STA.LOC.CHA.D.YEAR.DOY

Return type:

List of SDS format filepaths in the form

Example

>>> paths = get_sds_filenames(
...     "IU", "ANMO", "00", "BHZ",
...     UTCDateTime("2020-01-01"),
...     UTCDateTime("2020-01-03"),
...     "/data/seismic"
... )

seed_vault.service.utils.get_time_interval(interval_type, amount=1)[source]

Get the current date-time and the date-time amount intervals earlier.

Parameters:

interval_type (str) – One of [‘hour’, ‘day’, ‘week’, ‘month’]
amount (int) – Number of intervals to go back (default is 1)

Returns:

(current_datetime, past_datetime)

Return type:

tuple

seed_vault.service.utils.is_in_enum(item, enum_class)[source]

seed_vault.service.utils.parse_inv(inv)[source]

Return 4 lists (net, sta, loc, cha) detailing the contents of an ObsPy inventory file

Parameters:: inv (Inventory) – ObsPy Inventory object
Returns:: Four lists containing all network, station, location, and channel codes
Return type:: tuple

seed_vault.service.utils.remove_duplicate_events(catalog)[source]

Remove duplicate events from an ObsPy Catalog based on resource IDs.

Takes a catalog of earthquake events and returns a new catalog containing only unique events, where uniqueness is determined by the event’s resource_id. The first occurrence of each resource_id is kept.

Parameters:: catalog (obspy.core.event.Catalog) – Input catalog containing earthquake events
Returns:: New catalog containing only unique events
Return type:: obspy.core.event.Catalog

Examples

>>> from obspy import read_events
>>> cat = read_events('events.xml')
>>> unique_cat = remove_duplicate_events(cat)
>>> print(f"Removed {len(cat) - len(unique_cat)} duplicate events")

seed_vault.service.utils.shift_time(reftime, interval_type, amount=1)[source]

Shift time amount intervals relative to reftime :type reftime: :param reftime: Reference time :type reftime: datetime :type interval_type: str :param interval_type: One of [‘hour’, ‘day’, ‘week’, ‘month’, ‘year’] :type interval_type: str :type amount: int :param amount: Number of intervals to shift (positive = forward, negative = backward) :type amount: int

Returns:: The new datetime after the shift, capped at current time if shifting forward
Return type:: shifted_datetime

seed_vault.service.utils.to_timestamp(time_obj)[source]

Convert various time objects to Unix timestamp.

Parameters:: time_obj (Union[int, float, datetime, UTCDateTime]) – Time object to convert. Can be one of: - int/float: Already a timestamp - datetime: Python datetime object - UTCDateTime: ObsPy UTCDateTime object
Returns:: Unix timestamp (seconds since epoch).
Return type:: float
Raises:: ValueError – If the input time object type is not supported.

Example

>>> ts = to_timestamp(datetime.now())
>>> ts = to_timestamp(UTCDateTime())
>>> ts = to_timestamp(1234567890.0)

seed_vault.service.waveform module

seed_vault.service.waveform.check_is_archived(cursor, req)[source]

seed_vault.service.waveform.get_local_waveform(request, settings)[source]

Get waveform data from a local client, handling comma-separated values for network, station, location, and channel fields. Unlike remote requests, local SDS does not handle such things.

Parameters:

request (Tuple[str, str, str, str, str, str]) – Tuple containing (network, station, location, channel, starttime, endtime)
settings (SeismoLoaderSettings) – Settings object containing SDS path

Return type:

Optional[Stream]

Returns:

Stream object containing requested waveform data, or None if no data found

seed_vault.service.waveform.get_local_waveform_OLD(request, settings)[source]

seed_vault.service.waveform.stream_to_dataframe(stream)[source]

seed_vault.service package

Submodules

seed_vault.service.db module

seed_vault.service.events module

seed_vault.service.gen_config_models module

seed_vault.service.seismoloader module

Parameters:

Returns:

seed_vault.service.stations module

seed_vault.service.utils module

Parameters:

Returns:

Parameters:

Returns:

seed_vault.service.waveform module

Module contents