seed_vault.service package

Submodules

seed_vault.service.db module

Database management module for the SEED-vault archive

This module provides a DatabaseManager class for handling seismic data storage in SQLite, including archive data and arrival data. It implements connection management, data insertion, querying, and database maintenance operations.

class seed_vault.service.db.DatabaseManager(db_path)[source]

Bases: object

Manages seismic data storage and retrieval using SQLite.

This class handles database connections, table creation, data insertion, and querying for seismic archive and arrival data.

db_path

Path to the SQLite database file.

Type:

str

Initialize DatabaseManager with database path.

Parameters:

db_path (str) – Path where the SQLite database should be created/accessed.

analyze_table(table_name)[source]

Update table statistics for query optimization.

Parameters:

table_name (str) – Name of the table to analyze.

bulk_insert_archive_data(archive_list)[source]

Insert multiple archive data records.

Parameters:

archive_list (List[Tuple]) – List of tuples containing archive data records.

Returns:

Number of inserted records.

Return type:

int

bulk_insert_arrival_data(arrival_list)[source]

Insert multiple arrival data records.

Parameters:

arrival_list (List[Tuple]) – List of tuples containing arrival data records.

Returns:

Number of inserted records.

Return type:

int

check_data_existence(netcode, stacode, location, channel, starttime, endtime)[source]

Run a simple check to see if a db element exists for a trace

Parameters:
  • db_manager (DatabaseManager) – Database manager instance

  • network (str) – Network code

  • station (str) – Station code

  • location (str) – Location code

  • channel (str) – Channel code

  • start/endtime (str) – Time in iso

Returns:

True if data exists for the specified parameters, False otherwise

Return type:

bool

connection(max_retries=3, initial_delay=1)[source]

Context manager for database connections with retry mechanism.

Parameters:
  • max_retries (int) – Maximum number of connection retry attempts.

  • initial_delay (float) – Initial delay between retries in seconds.

Yields:

sqlite3.Connection – Database connection object.

Raises:

sqlite3.OperationalError – If database connection fails after all retries.

delete_elements(table_name, start_time=0, end_time=4102444799)[source]

Delete elements from specified table within time range.

Parameters:
  • table_name (str) – Name of the table (‘archive_data’ or ‘arrival_data’).

  • start_time (Union[int, float, datetime, UTCDateTime]) – Start time for deletion range.

  • end_time (Union[int, float, datetime, UTCDateTime]) – End time for deletion range.

Returns:

Number of deleted rows.

Return type:

int

Raises:

ValueError – If table_name is invalid or time format is incorrect.

display_contents(table_name, start_time=0, end_time=4102444799, limit=100)[source]

Display contents of a specified table within a given time range.

Parameters:
  • table_name (str) – Name of the table to query (‘archive_data’ or ‘arrival_data’).

  • start_time (Union[int, float, datetime, UTCDateTime]) – Start time for the query.

  • end_time (Union[int, float, datetime, UTCDateTime]) – End time for the query.

  • limit (int) – Maximum number of rows to return.

execute_query(query)[source]

Execute an SQL query and return results.

Parameters:

query (str) – SQL query to execute.

Returns:

  • bool: Whether an error occurred

  • str: Status message or error description

  • Optional[pd.DataFrame]: Results for SELECT queries, None otherwise

Return type:

Tuple containing

fetch_arrivals_distances(resource_id, netcode, stacode)[source]

Retrieve arrival times and distance metrics for a specific event and station.

Parameters:
  • resource_id (str) – Unique identifier for the seismic event.

  • netcode (str) – Network code for the station.

  • stacode (str) – Station code.

Returns:

Tuple containing

(p_arrival, s_arrival, dist_km, dist_deg, azimuth), where: - p_arrival: P wave arrival time (timestamp) - s_arrival: S wave arrival time (timestamp) - dist_km: Distance in kilometers - dist_deg: Distance in degrees - azimuth: Azimuth angle from event to station Returns None if no matching record is found.

Return type:

Optional[Tuple[float, float, float, float, float]]

get_arrival_data(resource_id, netcode, stacode)[source]

Retrieve complete arrival data for a specific event and station.

Parameters:
  • resource_id (str) – Unique identifier for the seismic event.

  • netcode (str) – Network code for the station.

  • stacode (str) – Station code.

Returns:

Dictionary containing all arrival data fields for the

specified event and station, or None if no matching record is found.

Return type:

Optional[Dict[str, Any]]

get_events_for_station(netcode, stacode)[source]

Retrieve all seismic events recorded by a specific station.

Parameters:
  • netcode (str) – Network code for the station.

  • stacode (str) – Station code.

Returns:

List of dictionaries containing arrival data for all

events recorded by the station. Returns empty list if no events found.

Return type:

List[Dict[str, Any]]

get_stations_for_event(resource_id)[source]

Retrieve all station data associated with a specific seismic event.

Parameters:

resource_id (str) – Unique identifier for the seismic event.

Returns:

List of dictionaries containing arrival data for all

stations that recorded the event. Returns empty list if no stations found.

Return type:

List[Dict[str, Any]]

join_continuous_segments(gap_tolerance=30)[source]

Join continuous data segments in the database.

Parameters:

gap_tolerance (float) – Maximum allowed gap (in seconds) to consider segments continuous.

reindex_tables()[source]

Reindex both of the tables in our DB

setup_database()[source]

Initialize database schema with required tables and indices.

vacuum_database()[source]

Rebuild the database file to reclaim unused space.

seed_vault.service.db.miniseed_to_db_elements(file_path)[source]

Convert a miniseed file to a database element tuple.

Processes a miniseed file and extracts relevant metadata for database storage. Expects files in the format: network.station.location.channel.*.year.julday

Parameters:

file_path (str) – Path to the miniseed file.

Returns:

A tuple containing:
  • network: Network code

  • station: Station code

  • location: Location code

  • channel: Channel code

  • start_time: ISO format start time

  • end_time: ISO format end time

Returns None if file is invalid or cannot be processed.

Return type:

Optional[Tuple[str, str, str, str, str, str]]

Example

>>> element = miniseed_to_db_element("/path/to/IU.ANMO.00.BHZ.D.2020.001")
>>> if element:
...     network, station, location, channel, start, end = element
seed_vault.service.db.populate_database_from_files(cursor, file_paths=[])[source]

Insert or update MiniSEED file metadata into an SQL database.

Takes a list of SDS archive file paths, extracts metadata, and updates a database tracking data availability. If data spans overlap with existing database entries, the spans are merged. Uses miniseed_to_db_elements() to parse file metadata.

Parameters:
  • cursor (sqlite3.Cursor) – Database cursor for executing SQL commands

  • file_paths (list, optional) – List of paths to MiniSeed files. Defaults to empty list.

Notes

  • Database must have an ‘archive_data’ table with columns:
    • network (text)

    • station (text)

    • location (text)

    • channel (text)

    • starttime (integer): Unix timestamp

    • endtime (integer): Unix timestamp

    • importtime (integer): Unix timestamp of database insertion

  • Handles overlapping time spans by merging them into a single entry

  • Sets importtime to current Unix timestamp

  • Skips files that fail metadata extraction (when miniseed_to_db_elements returns None)

Examples

>>> import sqlite3
>>> conn = sqlite3.connect('archive.db')
>>> cursor = conn.cursor()
>>> files = ['/path/to/IU.ANMO.00.BHZ.mseed', '/path/to/IU.ANMO.00.BHN.mseed']
>>> populate_database_from_files(cursor, files)
>>> conn.commit()
seed_vault.service.db.populate_database_from_files_dumb(cursor, file_paths=[])[source]

Simple version of database population from MiniSEED files without span merging.

A simplified “dumb” version that blindly replaces existing database entries with identical network/station/location/channel codes, rather than checking for and merging overlapping time spans.

Parameters:
  • cursor (sqlite3.Cursor) – Database cursor for executing SQL commands

  • file_paths (list, optional) – List of paths to MiniSeed files. Defaults to empty list.

seed_vault.service.db.populate_database_from_sds(sds_path, db_path, search_patterns=['??.*.*.???.?.????.???'], newer_than=None, num_processes=None, gap_tolerance=60)[source]

Scan an SDS archive directory and populate a database with data availability.

Recursively searches an SDS (Seismic Data Structure) archive for MiniSEED files, extracts their metadata, and records data availability in a SQLite database. Supports parallel processing and can optionally filter for recently modified files.

Parameters:
  • sds_path (str) – Path to the root SDS archive directory

  • db_path (str) – Path to the SQLite database file

  • search_patterns (list, optional) – List of file patterns to match. Defaults to [“??.*.*.???.?.????.???”] (standard SDS naming pattern).

  • newer_than (str or UTCDateTime, optional) – Only process files modified after this time. Defaults to None (process all files).

  • num_processes (int, optional) – Number of parallel processes to use. Defaults to None (use all available CPU cores).

  • gap_tolerance (int, optional) – Maximum time gap in seconds between segments that should be considered continuous. Defaults to 60.

Notes

  • Uses DatabaseManager class to handle database operations

  • Attempts multiprocessing but falls back to single process if it fails

    (common on OSX and Windows)

  • Follows symbolic links when walking directory tree

  • Files are processed using miniseed_to_db_elements() function

  • After insertion, continuous segments are joined based on gap_tolerance

  • Progress is displayed using tqdm progress bars

  • If newer_than is provided, it’s converted to a Unix timestamp for comparison

Raises:

RuntimeError – If bulk insertion into database fails

seed_vault.service.db.stream_to_db_elements(st)[source]

Convert an ObsPy Stream object to multiple database element tuples, properly handling gaps. Creates database elements from a stream, assuming all traces have the same Network-Station-Location-Channel (NSLC) codes (e.g. an SDS file).

Parameters:

st (Stream) – ObsPy Stream object containing seismic traces.

Returns:

A list of tuples, each containing:
  • network: Network code

  • station: Station code

  • location: Location code

  • channel: Channel code

  • start_time: ISO format start time

  • end_time: ISO format end time

Returns empty list if stream is empty.

Return type:

List[Tuple[str, str, str, str, str, str]]

Example

>>> stream = obspy.read()
>>> elements = stream_to_db_element(stream)
>>> for element in elements:
...     network, station, location, channel, start, end = element

seed_vault.service.events module

The events service should get the events based on a selection (filter) settings. UI should generate the selection and pass it here. We need a single function here that gets the selection and runs Rob’s script.

We should also be able to support multi-select areas.

@TODO: For now, dummy scripts are used. @Yunlong to fix.

seed_vault.service.events.event_response_to_df(data)[source]

@TODO: base on response from FSDN, below should be re-written

seed_vault.service.events.get_event_data(settings)[source]
seed_vault.service.events.remove_duplicate_events(events)[source]

seed_vault.service.gen_config_models module

seed_vault.service.seismoloader module

The main functions for SEED-vault, from original CLI-only version (Pickle 2024)

class seed_vault.service.seismoloader.CustomConfigParser(*args, **kwargs)[source]

Bases: ConfigParser

Custom configuration parser that can preserve case sensitivity for specified sections.

This class extends the standard ConfigParser to allow certain sections to maintain case sensitivity while others are converted to lowercase.

case_sensitive_sections

Set of section names that should preserve case sensitivity.

Type:

set

Initialize the CustomConfigParser.

Parameters:
  • *args – Variable length argument list passed to ConfigParser.

  • **kwargs – Arbitrary keyword arguments passed to ConfigParser.

optionxform(optionstr)[source]

Transform option names during parsing.

Overrides the default behavior to preserve the original string case.

Parameters:

optionstr (str) – The option string to transform.

Returns:

The original string unchanged.

Return type:

str

seed_vault.service.seismoloader.archive_request(request, waveform_clients, sds_path, db_manager)[source]

Download seismic data for a request and archive it in SDS format.

Retrieves waveform data from FDSN web services, saves it in SDS format, and updates the database. Handles authentication, data merging, and various error conditions.

Parameters:
  • request (Tuple[str, str, str, str, str, str]) – Tuple containing (network, station, location, channel, start_time, end_time)

  • waveform_clients (Dict[str, Client]) – Dictionary mapping network codes to FDSN clients. Special key ‘open’ is used for default client.

  • sds_path (str) – Root path of the SDS archive.

  • db_manager (DatabaseManager) – DatabaseManager instance for updating the database.

Return type:

None

Note

  • Supports per-network and per-station authentication

  • Handles splitting of large station list requests

  • Performs data merging when files already exist

  • Attempts STEIM2 compression, falls back to uncompressed format

  • Groups traces by day to handle fragmented data efficiently

Example

>>> clients = {'IU': Client('IRIS'), 'open': Client('IRIS')}
>>> request = ("IU", "ANMO", "00", "BHZ", "2020-01-01", "2020-01-02")
>>> archive_request(request, clients, "/data/seismic", db_manager)
seed_vault.service.seismoloader.collect_requests(inv, time0, time1, days_per_request=3, cha_pref=None, loc_pref=None)[source]

Generate time-windowed data requests for all channels in an inventory.

Creates a list of data requests by breaking a time period into smaller windows and collecting station metadata for each window. Can optionally filter for preferred channels and location codes.

Parameters:
  • inv (obspy.core.inventory.Inventory) – Station inventory to generate requests for

  • time0 (obspy.UTCDateTime) – Start time for data requests

  • time1 (obspy.UTCDateTime) – End time for data requests

  • days_per_request (int, optional) – Length of each request window in days. Defaults to 3.

  • cha_pref (list, optional) – List of preferred channel codes in priority order. If provided, only these channels will be requested. Defaults to None.

  • loc_pref (list, optional) – List of preferred location codes in priority order. If provided, only these location codes will be requested. Defaults to None.

Returns:

List of tuples containing request parameters:
(network_code, station_code, location_code, channel_code,

start_time_iso, end_time_iso)

Returns None if start time is greater than or equal to end time.

Return type:

list or None

Notes

  • End time is capped at 120 seconds before current time

  • Times in returned tuples are ISO formatted strings with ‘Z’ suffix

  • Uses get_preferred_channels() if cha_pref or loc_pref are specified

Examples

>>> from obspy import UTCDateTime
>>> t0 = UTCDateTime("2020-01-01")
>>> t1 = UTCDateTime("2020-01-10")
>>> requests = collect_requests(inventory, t0, t1,
...                           days_per_request=2,
...                           cha_pref=['HHZ', 'BHZ'],
...                           loc_pref=['', '00'])
seed_vault.service.seismoloader.collect_requests_event(eq, inv, model=None, settings=None)[source]

Collect data requests and arrival times for an event at multiple stations.

For a given earthquake event, calculates arrival times and generates data requests for all appropriate stations in the inventory.

Parameters:
  • eq (Event) – ObsPy Event object containing earthquake information.

  • inv (Inventory) – ObsPy Inventory object containing station information.

  • model (Optional[TauPyModel]) – Optional TauPyModel for travel time calculations. If None, uses model from settings or falls back to IASP91.

  • settings (Optional[SeismoLoaderSettings]) – Optional SeismoLoaderSettings object containing configuration.

Returns:

  • List of request tuples (net, sta, loc, chan, start, end)

  • List of arrival data tuples for database

  • Dictionary mapping “net.sta” to P-arrival timestamps

Return type:

Tuple containing

Note

Requires a DatabaseManager instance to check for existing arrivals. Time windows are constructed around P-wave arrivals using settings. Handles both new calculations and retrieving existing arrival times.

Example

>>> event = client.get_events()[0]
>>> inventory = client.get_stations(network="IU")
>>> requests, arrivals, p_times = collect_requests_event(
...     event, inventory, model=TauPyModel("iasp91")
... )
seed_vault.service.seismoloader.combine_requests(requests)[source]

Combine multiple data requests for efficiency.

Groups requests by network and time range, combining stations, locations, and channels into comma-separated lists to minimize the number of requests.

Parameters:

requests (List[Tuple[str, str, str, str, str, str]]) – List of request tuples, each containing: (network, station, location, channel, start_time, end_time)

Return type:

List[Tuple[str, str, str, str, str, str]]

Returns:

List of combined request tuples with the same structure but with station, location, and channel fields potentially containing comma-separated lists.

Example

>>> original = [
...     ("IU", "ANMO", "00", "BHZ", "2020-01-01", "2020-01-02"),
...     ("IU", "COLA", "00", "BHZ", "2020-01-01", "2020-01-02")
... ]
>>> combined = combine_requests(original)
>>> print(combined)
[("IU", "ANMO,COLA", "00", "BHZ", "2020-01-01", "2020-01-02")]
seed_vault.service.seismoloader.get_events(settings)[source]

Retrieve seismic event catalogs based on configured criteria.

Queries FDSN web services or loads local catalogs for seismic events matching specified criteria including time range, magnitude, depth, and geographic constraints.

Parameters:

settings (SeismoLoaderSettings) – Configuration settings containing event search criteria, client information, and filtering preferences.

Return type:

List[Catalog]

Returns:

List of ObsPy Catalog objects containing matching events. Returns empty catalog if no events found.

Raises:
  • FileNotFoundError – If local catalog file not found.

  • PermissionError – If unable to access local catalog file.

  • ValueError – If invalid geographic constraint type specified.

Example

>>> settings = SeismoLoaderSettings()
>>> settings.event.min_magnitude = 5.0
>>> catalogs = get_events(settings)
seed_vault.service.seismoloader.get_missing_from_request(db_manager, eq_id, requests, st)[source]

Compare requested seismic data against what’s present in a Stream. Handles comma-separated values for location and channel codes.

Return type:

dict

Parameters:

eq_idstr

Earthquake ID to use as dictionary key

requestsList[Tuple]

List of request tuples, each containing (network, station, location, channel, starttime, endtime)

stStream

ObsPy Stream object containing seismic traces

Returns:

: dict

Nested dictionary with structure: {eq_id: {

“network.station”: value, “network2.station2”: value2, …

}} where value is either: - list of missing channel strings (“network.station.location.channel”) - “Not Attempted” if stream is empty - “ALL” if all requested channels are missing - [] if all requested channels are present

seed_vault.service.seismoloader.get_p_s_times(eq, dist_deg, ttmodel)[source]

Calculate theoretical P and S wave arrival times for an earthquake at a given distance.

Uses a travel time model to compute the first P and S wave arrivals for a given earthquake and distance. The first arrival (labeled as “P”) may not necessarily be a direct P wave. For S waves, only phases explicitly labeled as ‘S’ are considered.

Parameters:
  • eq (obspy.core.event.Event) – Earthquake event object containing origin time and depth information

  • dist_deg (float) – Distance between source and receiver in degrees

  • ttmodel (obspy.taup.TauPyModel) – Travel time model to use for calculations

Returns:

A tuple containing:
  • (UTCDateTime or None): Time of first arrival (“P” wave)

  • (UTCDateTime or None): Time of first S wave arrival Returns (None, None) if travel time calculation fails

Return type:

tuple

Notes

  • Earthquake depth is expected in meters in the QuakeML format and is converted to kilometers for the travel time calculations

  • For S waves, only searches for explicit ‘S’ phase arrivals

  • Warns if no P arrival is found at any distance

  • Warns if no S arrival is found at distances ≤ 90 degrees

Examples

>>> from obspy.taup import TauPyModel
>>> model = TauPyModel(model="iasp91")
>>> p_time, s_time = get_p_s_times(earthquake, 45.3, model)
seed_vault.service.seismoloader.get_preferred_channels(inv, cha_rank=None, loc_rank=None, time=None)[source]

Select the best available channels from an FDSN inventory based on rankings.

Filters an inventory to keep only the preferred channels based on channel code and location code rankings. For each component (Z, N, E), selects the channel with the highest ranking.

Parameters:
  • inv (Inventory) – ObsPy Inventory object to filter.

  • cha_rank (Optional[List[str]]) – List of channel codes in order of preference (e.g., [‘BH’, ‘HH’]). Lower index means higher preference.

  • loc_rank (Optional[List[str]]) – List of location codes in order of preference (e.g., [‘’, ‘00’]). Lower index means higher preference. ‘–’ is treated as empty string.

  • time (Optional[UTCDateTime]) – Optional time to filter channel availability at that time.

Return type:

Inventory

Returns:

Filtered ObsPy Inventory containing only the preferred channels. If all channels would be filtered out, returns original station.

Note

Channel preference takes precedence over location preference. If neither cha_rank nor loc_rank is provided, returns original inventory.

Example

>>> inventory = client.get_stations(network="IU", station="ANMO")
>>> cha_rank = ['BH', 'HH', 'EH']
>>> loc_rank = ['00', '10', '']
>>> filtered = get_preferred_channels(inventory, cha_rank, loc_rank)
seed_vault.service.seismoloader.get_selected_stations_at_channel_level(settings)[source]

Update inventory information to include channel-level details for selected stations.

Retrieves detailed channel information for each station in the selected inventory using the specified FDSN client.

Parameters:

settings (SeismoLoaderSettings) – Configuration settings containing station selection and client information.

Return type:

SeismoLoaderSettings

Returns:

Updated settings with refined station inventory including channel information.

Example

>>> settings = SeismoLoaderSettings()
>>> settings = get_selected_stations_at_channel_level(settings)
seed_vault.service.seismoloader.get_stations(settings)[source]

Retrieve station inventory based on configured criteria.

Gets station information from FDSN web services or local inventory based on settings, including geographic constraints, network/station filters, and channel preferences.

Parameters:

settings (SeismoLoaderSettings) – Configuration settings containing station selection criteria, client information, and filtering preferences.

Return type:

Optional[Inventory]

Returns:

Inventory containing matching stations, or None if no stations found or if station service is unavailable.

Note

The function applies several layers of filtering: 1. Basic network/station/location/channel criteria 2. Geographic constraints (if specified) 3. Station exclusions/inclusions 4. Channel and location preferences 5. Sample rate filtering

Example

>>> settings = SeismoLoaderSettings()
>>> settings.station.network = "IU"
>>> inventory = get_stations(settings)
seed_vault.service.seismoloader.prune_requests(requests, db_manager, sds_path, min_request_window=3)[source]

Remove overlapping requests where data already exists in the archive.

Checks both the database and filesystem for existing data and removes or splits requests to avoid re-downloading data that should be there already.

Parameters:
  • requests (List[Tuple[str, str, str, str, str, str]]) – List of request tuples containing: (network, station, location, channel, start_time, end_time)

  • db_manager (DatabaseManager) – DatabaseManager instance for querying existing data.

  • sds_path (str) – Root path of the SDS archive.

  • min_request_window (float) – Minimum time window in seconds to keep a request. Requests shorter than this are discarded. Default is 3 seconds.

Return type:

List[Tuple[str, str, str, str, str, str]]

Returns:

List of pruned request tuples, sorted by start time, network, and station.

Note

This function will update the database if it finds files in the SDS structure that aren’t yet recorded in the database.

Example

>>> requests = [("IU", "ANMO", "00", "BHZ", "2020-01-01", "2020-01-02")]
>>> pruned = prune_requests(requests, db_manager, "/data/SDS")
seed_vault.service.seismoloader.read_config(config_file)[source]

Read and process a configuration file with case-sensitive handling for specific sections.

Reads a configuration file and processes it such that certain sections (AUTH, DATABASE, SDS, WAVEFORM) preserve their case sensitivity while other sections are converted to lowercase.

Parameters:

config_file (str) – Path to the configuration file to read.

Returns:

Processed configuration with appropriate case handling

for different sections.

Return type:

CustomConfigParser

Example

>>> config = read_config("config.ini")
>>> auth_value = config.get("AUTH", "ApiKey")  # Case preserved
>>> other_value = config.get("settings", "parameter")  # Converted to lowercase
seed_vault.service.seismoloader.run_continuous(settings, stop_event=None)[source]

Retrieves continuous seismic data over long time intervals for a set of stations defined by the inv parameter. The function manages multiple steps including generating data requests, pruning unnecessary requests based on existing data, combining requests for efficiency, and finally archiving the retrieved data.

The function uses a client setup based on the configuration in settings to handle different data sources and authentication methods. Errors during client creation or data retrieval are handled gracefully, with issues logged to the console.

Parameters: - settings (SeismoLoaderSettings): Configuration settings containing client information,

authentication details, and database paths necessary for data retrieval and storage. This should include the start and end times for data collection, database path, and SDS archive path among other configurations.

  • stop_event (threading.Event): Optional event flag for canceling the operation mid-execution. If provided and set, the function will terminate gracefully at the next safe point.

Workflow: 1. Initialize clients for waveform data retrieval. 2. Retrieve station information based on settings. 3. Collect initial data requests for the given time interval. 4. Prune requests based on existing data in the database to avoid redundancy. 5. Combine similar requests to minimize the number of individual operations. 6. Update or create clients based on specific network credentials if necessary. 7. Execute data retrieval requests, archive data to disk, and update the database.

Raises: - Exception: General exceptions could be raised due to misconfiguration, unsuccessful

data retrieval or client initialization errors. These exceptions are caught and logged, but not re-raised, allowing the process to continue with other requests.

Notes: - It is crucial to ensure that the settings object is correctly configured, especially

the client details and authentication credentials to avoid runtime errors.

  • The function logs detailed information about the processing steps and errors to aid in debugging and monitoring of data retrieval processes.

seed_vault.service.seismoloader.run_event(settings, stop_event=None)[source]

Processes and downloads seismic event data for each event in the provided catalog using the specified settings and station inventory. The function manages multiple steps including data requests, arrival time calculations, database updates, and data retrieval.

The function handles data retrieval from FDSN web services with support for authenticated access and restricted data. Processing can be interrupted via the stop_event parameter, and errors during execution are handled gracefully with detailed logging.

Parameters: - settings (SeismoLoaderSettings): Configuration settings that include client details,

authentication credentials, event-specific parameters like radius and time window, and paths for data storage.

  • stop_event (threading.Event): Optional event flag for canceling the operation mid-execution. If provided and set, the function will terminate gracefully at the next safe point.

Workflow: 1. Initialize paths and database connections 2. Load appropriate travel time model for arrival calculations 3. Process each event in the catalog:

  1. Calculate arrival times and generate data requests

  2. Update arrival information in database

  3. Check for existing data and prune redundant requests

  4. Download and archive new data

  5. Add event metadata to traces (arrivals, distances, azimuths)

  1. Combine data into event streams with complete metadata

Returns: - List[obspy.Stream]: List of streams, each containing data for one event with

complete metadata including arrival times, distances, and azimuths. Returns None if operation is canceled or no data is processed.

Raises: - Exception: General exceptions from client creation, data retrieval, or processing

are caught and logged but not re-raised, allowing processing to continue with remaining events.

Notes: - The function supports threading and can be safely interrupted via stop_event - Station metadata is enriched with event-specific information including arrivals - Data is archived in SDS format and the database is updated accordingly - Each stream in the output includes complete event metadata for analysis

seed_vault.service.seismoloader.run_main(settings=None, from_file=None, stop_event=None)[source]

Main entry point for seismic data retrieval and processing.

Coordinates the overall workflow for retrieving and processing seismic data, handling both continuous and event-based data collection based on settings.

Parameters:
  • settings (Optional[SeismoLoaderSettings]) – Configuration settings for data retrieval and processing. If None, settings must be provided via from_file.

  • from_file (Optional[str]) – Path to configuration file to load settings from. Only used if settings is None.

  • stop_event (Optional[Event]) – Optional event flag for canceling the operation mid-execution. If provided and set, the function will terminate gracefully at the next safe point.

Return type:

None

Returns:

The result from run_continuous or run_event, or None if cancelled.

Example

>>> # Using settings object
>>> settings = SeismoLoaderSettings()
>>> settings.download_type = DownloadType.EVENT
>>> run_main(settings)
>>> # Using configuration file
>>> run_main(from_file="config.ini")
seed_vault.service.seismoloader.select_highest_samplerate(inv, minSR=10, time=None)[source]

Filters an inventory to keep only the highest sample rate channels where duplicates exist.

For each station in the inventory, this function identifies duplicate channels (those sharing the same location code) and keeps only those with the highest sample rate. Channels must meet the minimum sample rate requirement to be considered.

Parameters:
  • inv (obspy.core.inventory.Inventory) – Input inventory object

  • minSR (float, optional) – Minimum sample rate in Hz. Defaults to 10.

  • time (obspy.UTCDateTime, optional) – Specific time to check channel existence. If provided, channels are considered duplicates if they share the same location code and both exist at that time. If None, channels are considered duplicates if they share the same location code and time span. Defaults to None.

Returns:

Filtered inventory containing only the highest

sample rate channels where duplicates existed.

Return type:

obspy.core.inventory.Inventory

Examples

>>> # Filter inventory keeping only highest sample rate channels
>>> filtered_inv = select_highest_samplerate(inv)
>>>
>>> # Filter for a specific time, minimum 1 Hz
>>> from obspy import UTCDateTime
>>> time = UTCDateTime("2020-01-01")
>>> filtered_inv = select_highest_samplerate(inv, minSR=1, time=time)

Notes

  • Channel duplicates are determined by location code and either: * Existence at a specific time (if time is provided) * Having identical time spans (if time is None)

  • All retained channels must have sample rates >= minSR

  • For duplicate channels, all channels with the highest sample rate are kept

seed_vault.service.seismoloader.setup_paths(settings)[source]

Initialize paths and database for seismic data management.

Parameters:

settings (SeismoLoaderSettings) – Configuration settings containing paths and database information.

Returns:

  • Updated settings with validated paths

  • Initialized DatabaseManager instance

Return type:

Tuple containing

Raises:

ValueError – If SDS path is not set in settings.

Example

>>> settings = SeismoLoaderSettings()
>>> settings.sds_path = "/data/seismic"
>>> settings, db_manager = setup_paths(settings)

seed_vault.service.stations module

The stations service should get the stations based on a selection (filter) settings. UI should generate the selection and pass it here. We need a single function here that gets the selection and runs Rob’s script.

We should also be able to support multi-select areas.

@TODO: For now, dummy scripts are used. @Yunlong to fix.

seed_vault.service.stations.get_station_data(settings)[source]
seed_vault.service.stations.remove_duplicate_inventories(inventories)[source]
seed_vault.service.stations.station_response_to_df(inventory)[source]

Convert ObsPy Inventory data into a DataFrame with station information.

seed_vault.service.utils module

seed_vault.service.utils.check_client_services(client_name)[source]

Check which services are available for a given client name.

seed_vault.service.utils.convert_to_datetime(value)[source]

Convert a string or other value to a date and time object, handling different formats.

If only a date is provided, it defaults to 00:00:00 time.

note that this returns a tuple of (date, time)

seed_vault.service.utils.filter_catalog_by_geo_constraints(catalog, constraints)[source]

Filter an ObsPy event catalog to include events within ANY of original search constraints. This should be done to clean up any superfluous events that our reducted get_event calls may have introduced.

Return type:

Catalog

Parameters:

catalogobspy.core.event.Catalog

The input event catalog to filter

constraints: settings.event.geo_constraint (whatever object type this is TODO)

Returns:

: obspy.core.event.Catalog

A new catalog containing events within any of the specified circles

seed_vault.service.utils.filter_inventory_by_geo_constraints(inventory, constraints)[source]

Filter an ObsPy inventory to include stations within ANY of the original search constraints.

Return type:

Inventory

Parameters:

inventoryobspy.Inventory

The input inventory to filter

constraints: settings.event.geo_constraint

List of geographical constraints

Returns:

: obspy.Inventory

A new inventory containing only stations within any of the specified constraints

seed_vault.service.utils.format_error(station, error)[source]
seed_vault.service.utils.get_sds_filenames(n, s, l, c, time_start, time_end, sds_path)[source]

Generate SDS (SeisComP Data Structure) format filenames for a time range.

Creates a list of daily SDS format filenames for given network, station, location, and channel codes over a specified time period.

Parameters:
  • n (str) – Network code.

  • s (str) – Station code.

  • l (str) – Location code.

  • c (str) – Channel code.

  • time_start (UTCDateTime) – Start time for data requests.

  • time_end (UTCDateTime) – End time for data requests.

  • sds_path (str) – Root path of the SDS archive.

Returns:

/sds_path/YEAR/NETWORK/STATION/CHANNEL.D/NET.STA.LOC.CHA.D.YEAR.DOY

Return type:

List of SDS format filepaths in the form

Example

>>> paths = get_sds_filenames(
...     "IU", "ANMO", "00", "BHZ",
...     UTCDateTime("2020-01-01"),
...     UTCDateTime("2020-01-03"),
...     "/data/seismic"
... )
seed_vault.service.utils.get_time_interval(interval_type, amount=1)[source]

Get the current date-time and the date-time amount intervals earlier.

Parameters:
  • interval_type (str) – One of [‘hour’, ‘day’, ‘week’, ‘month’]

  • amount (int) – Number of intervals to go back (default is 1)

Returns:

(current_datetime, past_datetime)

Return type:

tuple

seed_vault.service.utils.is_in_enum(item, enum_class)[source]
seed_vault.service.utils.parse_inv(inv)[source]

Return 4 lists (net, sta, loc, cha) detailing the contents of an ObsPy inventory file

Parameters:

inv (Inventory) – ObsPy Inventory object

Returns:

Four lists containing all network, station, location, and channel codes

Return type:

tuple

seed_vault.service.utils.remove_duplicate_events(catalog)[source]

Remove duplicate events from an ObsPy Catalog based on resource IDs.

Takes a catalog of earthquake events and returns a new catalog containing only unique events, where uniqueness is determined by the event’s resource_id. The first occurrence of each resource_id is kept.

Parameters:

catalog (obspy.core.event.Catalog) – Input catalog containing earthquake events

Returns:

New catalog containing only unique events

Return type:

obspy.core.event.Catalog

Examples

>>> from obspy import read_events
>>> cat = read_events('events.xml')
>>> unique_cat = remove_duplicate_events(cat)
>>> print(f"Removed {len(cat) - len(unique_cat)} duplicate events")
seed_vault.service.utils.shift_time(reftime, interval_type, amount=1)[source]

Shift time amount intervals relative to reftime :type reftime: :param reftime: Reference time :type reftime: datetime :type interval_type: str :param interval_type: One of [‘hour’, ‘day’, ‘week’, ‘month’, ‘year’] :type interval_type: str :type amount: int :param amount: Number of intervals to shift (positive = forward, negative = backward) :type amount: int

Returns:

The new datetime after the shift, capped at current time if shifting forward

Return type:

shifted_datetime

seed_vault.service.utils.to_timestamp(time_obj)[source]

Convert various time objects to Unix timestamp.

Parameters:

time_obj (Union[int, float, datetime, UTCDateTime]) – Time object to convert. Can be one of: - int/float: Already a timestamp - datetime: Python datetime object - UTCDateTime: ObsPy UTCDateTime object

Returns:

Unix timestamp (seconds since epoch).

Return type:

float

Raises:

ValueError – If the input time object type is not supported.

Example

>>> ts = to_timestamp(datetime.now())
>>> ts = to_timestamp(UTCDateTime())
>>> ts = to_timestamp(1234567890.0)

seed_vault.service.waveform module

seed_vault.service.waveform.check_is_archived(cursor, req)[source]
seed_vault.service.waveform.get_local_waveform(request, settings)[source]

Get waveform data from a local client, handling comma-separated values for network, station, location, and channel fields. Unlike remote requests, local SDS does not handle such things.

Parameters:
  • request (Tuple[str, str, str, str, str, str]) – Tuple containing (network, station, location, channel, starttime, endtime)

  • settings (SeismoLoaderSettings) – Settings object containing SDS path

Return type:

Optional[Stream]

Returns:

Stream object containing requested waveform data, or None if no data found

seed_vault.service.waveform.get_local_waveform_OLD(request, settings)[source]
seed_vault.service.waveform.stream_to_dataframe(stream)[source]

Module contents