seed_vault.service package
Submodules
seed_vault.service.db module
Database management module for the SEED-vault archive
This module provides a DatabaseManager class for handling seismic data storage in SQLite, including archive data and arrival data. It implements connection management, data insertion, querying, and database maintenance operations.
- class seed_vault.service.db.DatabaseManager(db_path)[source]
Bases:
object
Manages seismic data storage and retrieval using SQLite.
This class handles database connections, table creation, data insertion, and querying for seismic archive and arrival data.
- db_path
Path to the SQLite database file.
- Type:
str
Initialize DatabaseManager with database path.
- Parameters:
db_path (
str
) – Path where the SQLite database should be created/accessed.
- analyze_table(table_name)[source]
Update table statistics for query optimization.
- Parameters:
table_name (
str
) – Name of the table to analyze.
- bulk_insert_archive_data(archive_list)[source]
Insert multiple archive data records.
- Parameters:
archive_list (
List
[Tuple
]) – List of tuples containing archive data records.- Returns:
Number of inserted records.
- Return type:
int
- bulk_insert_arrival_data(arrival_list)[source]
Insert multiple arrival data records.
- Parameters:
arrival_list (
List
[Tuple
]) – List of tuples containing arrival data records.- Returns:
Number of inserted records.
- Return type:
int
- check_data_existence(netcode, stacode, location, channel, starttime, endtime)[source]
Run a simple check to see if a db element exists for a trace
- Parameters:
db_manager (DatabaseManager) – Database manager instance
network (str) – Network code
station (str) – Station code
location (str) – Location code
channel (str) – Channel code
start/endtime (str) – Time in iso
- Returns:
True if data exists for the specified parameters, False otherwise
- Return type:
bool
- connection(max_retries=3, initial_delay=1)[source]
Context manager for database connections with retry mechanism.
- Parameters:
max_retries (
int
) – Maximum number of connection retry attempts.initial_delay (
float
) – Initial delay between retries in seconds.
- Yields:
sqlite3.Connection – Database connection object.
- Raises:
sqlite3.OperationalError – If database connection fails after all retries.
- delete_elements(table_name, start_time=0, end_time=4102444799)[source]
Delete elements from specified table within time range.
- Parameters:
table_name (
str
) – Name of the table (‘archive_data’ or ‘arrival_data’).start_time (
Union
[int
,float
,datetime
,UTCDateTime
]) – Start time for deletion range.end_time (
Union
[int
,float
,datetime
,UTCDateTime
]) – End time for deletion range.
- Returns:
Number of deleted rows.
- Return type:
int
- Raises:
ValueError – If table_name is invalid or time format is incorrect.
- display_contents(table_name, start_time=0, end_time=4102444799, limit=100)[source]
Display contents of a specified table within a given time range.
- Parameters:
table_name (
str
) – Name of the table to query (‘archive_data’ or ‘arrival_data’).start_time (
Union
[int
,float
,datetime
,UTCDateTime
]) – Start time for the query.end_time (
Union
[int
,float
,datetime
,UTCDateTime
]) – End time for the query.limit (
int
) – Maximum number of rows to return.
- execute_query(query)[source]
Execute an SQL query and return results.
- Parameters:
query (
str
) – SQL query to execute.- Returns:
bool: Whether an error occurred
str: Status message or error description
Optional[pd.DataFrame]: Results for SELECT queries, None otherwise
- Return type:
Tuple containing
- fetch_arrivals_distances(resource_id, netcode, stacode)[source]
Retrieve arrival times and distance metrics for a specific event and station.
- Parameters:
resource_id (
str
) – Unique identifier for the seismic event.netcode (
str
) – Network code for the station.stacode (
str
) – Station code.
- Returns:
- Tuple containing
(p_arrival, s_arrival, dist_km, dist_deg, azimuth), where: - p_arrival: P wave arrival time (timestamp) - s_arrival: S wave arrival time (timestamp) - dist_km: Distance in kilometers - dist_deg: Distance in degrees - azimuth: Azimuth angle from event to station Returns None if no matching record is found.
- Return type:
Optional[Tuple[float, float, float, float, float]]
- get_arrival_data(resource_id, netcode, stacode)[source]
Retrieve complete arrival data for a specific event and station.
- Parameters:
resource_id (
str
) – Unique identifier for the seismic event.netcode (
str
) – Network code for the station.stacode (
str
) – Station code.
- Returns:
- Dictionary containing all arrival data fields for the
specified event and station, or None if no matching record is found.
- Return type:
Optional[Dict[str, Any]]
- get_events_for_station(netcode, stacode)[source]
Retrieve all seismic events recorded by a specific station.
- Parameters:
netcode (
str
) – Network code for the station.stacode (
str
) – Station code.
- Returns:
- List of dictionaries containing arrival data for all
events recorded by the station. Returns empty list if no events found.
- Return type:
List[Dict[str, Any]]
- get_stations_for_event(resource_id)[source]
Retrieve all station data associated with a specific seismic event.
- Parameters:
resource_id (
str
) – Unique identifier for the seismic event.- Returns:
- List of dictionaries containing arrival data for all
stations that recorded the event. Returns empty list if no stations found.
- Return type:
List[Dict[str, Any]]
- seed_vault.service.db.miniseed_to_db_elements(file_path)[source]
Convert a miniseed file to a database element tuple.
Processes a miniseed file and extracts relevant metadata for database storage. Expects files in the format: network.station.location.channel.*.year.julday
- Parameters:
file_path (
str
) – Path to the miniseed file.- Returns:
- A tuple containing:
network: Network code
station: Station code
location: Location code
channel: Channel code
start_time: ISO format start time
end_time: ISO format end time
Returns None if file is invalid or cannot be processed.
- Return type:
Optional[Tuple[str, str, str, str, str, str]]
Example
>>> element = miniseed_to_db_element("/path/to/IU.ANMO.00.BHZ.D.2020.001") >>> if element: ... network, station, location, channel, start, end = element
- seed_vault.service.db.populate_database_from_files(cursor, file_paths=[])[source]
Insert or update MiniSEED file metadata into an SQL database.
Takes a list of SDS archive file paths, extracts metadata, and updates a database tracking data availability. If data spans overlap with existing database entries, the spans are merged. Uses miniseed_to_db_elements() to parse file metadata.
- Parameters:
cursor (sqlite3.Cursor) – Database cursor for executing SQL commands
file_paths (list, optional) – List of paths to MiniSeed files. Defaults to empty list.
Notes
- Database must have an ‘archive_data’ table with columns:
network (text)
station (text)
location (text)
channel (text)
starttime (integer): Unix timestamp
endtime (integer): Unix timestamp
importtime (integer): Unix timestamp of database insertion
Handles overlapping time spans by merging them into a single entry
Sets importtime to current Unix timestamp
Skips files that fail metadata extraction (when miniseed_to_db_elements returns None)
Examples
>>> import sqlite3 >>> conn = sqlite3.connect('archive.db') >>> cursor = conn.cursor() >>> files = ['/path/to/IU.ANMO.00.BHZ.mseed', '/path/to/IU.ANMO.00.BHN.mseed'] >>> populate_database_from_files(cursor, files) >>> conn.commit()
- seed_vault.service.db.populate_database_from_files_dumb(cursor, file_paths=[])[source]
Simple version of database population from MiniSEED files without span merging.
A simplified “dumb” version that blindly replaces existing database entries with identical network/station/location/channel codes, rather than checking for and merging overlapping time spans.
- Parameters:
cursor (sqlite3.Cursor) – Database cursor for executing SQL commands
file_paths (list, optional) – List of paths to MiniSeed files. Defaults to empty list.
- seed_vault.service.db.populate_database_from_sds(sds_path, db_path, search_patterns=['??.*.*.???.?.????.???'], newer_than=None, num_processes=None, gap_tolerance=60)[source]
Scan an SDS archive directory and populate a database with data availability.
Recursively searches an SDS (Seismic Data Structure) archive for MiniSEED files, extracts their metadata, and records data availability in a SQLite database. Supports parallel processing and can optionally filter for recently modified files.
- Parameters:
sds_path (str) – Path to the root SDS archive directory
db_path (str) – Path to the SQLite database file
search_patterns (list, optional) – List of file patterns to match. Defaults to [“??.*.*.???.?.????.???”] (standard SDS naming pattern).
newer_than (str or UTCDateTime, optional) – Only process files modified after this time. Defaults to None (process all files).
num_processes (int, optional) – Number of parallel processes to use. Defaults to None (use all available CPU cores).
gap_tolerance (int, optional) – Maximum time gap in seconds between segments that should be considered continuous. Defaults to 60.
Notes
Uses DatabaseManager class to handle database operations
- Attempts multiprocessing but falls back to single process if it fails
(common on OSX and Windows)
Follows symbolic links when walking directory tree
Files are processed using miniseed_to_db_elements() function
After insertion, continuous segments are joined based on gap_tolerance
Progress is displayed using tqdm progress bars
If newer_than is provided, it’s converted to a Unix timestamp for comparison
- Raises:
RuntimeError – If bulk insertion into database fails
- seed_vault.service.db.stream_to_db_elements(st)[source]
Convert an ObsPy Stream object to multiple database element tuples, properly handling gaps. Creates database elements from a stream, assuming all traces have the same Network-Station-Location-Channel (NSLC) codes (e.g. an SDS file).
- Parameters:
st (
Stream
) – ObsPy Stream object containing seismic traces.- Returns:
- A list of tuples, each containing:
network: Network code
station: Station code
location: Location code
channel: Channel code
start_time: ISO format start time
end_time: ISO format end time
Returns empty list if stream is empty.
- Return type:
List[Tuple[str, str, str, str, str, str]]
Example
>>> stream = obspy.read() >>> elements = stream_to_db_element(stream) >>> for element in elements: ... network, station, location, channel, start, end = element
seed_vault.service.events module
The events service should get the events based on a selection (filter) settings. UI should generate the selection and pass it here. We need a single function here that gets the selection and runs Rob’s script.
We should also be able to support multi-select areas.
@TODO: For now, dummy scripts are used. @Yunlong to fix.
seed_vault.service.gen_config_models module
seed_vault.service.seismoloader module
The main functions for SEED-vault, from original CLI-only version (Pickle 2024)
- class seed_vault.service.seismoloader.CustomConfigParser(*args, **kwargs)[source]
Bases:
ConfigParser
Custom configuration parser that can preserve case sensitivity for specified sections.
This class extends the standard ConfigParser to allow certain sections to maintain case sensitivity while others are converted to lowercase.
- case_sensitive_sections
Set of section names that should preserve case sensitivity.
- Type:
set
Initialize the CustomConfigParser.
- Parameters:
*args – Variable length argument list passed to ConfigParser.
**kwargs – Arbitrary keyword arguments passed to ConfigParser.
- seed_vault.service.seismoloader.archive_request(request, waveform_clients, sds_path, db_manager)[source]
Download seismic data for a request and archive it in SDS format.
Retrieves waveform data from FDSN web services, saves it in SDS format, and updates the database. Handles authentication, data merging, and various error conditions.
- Parameters:
request (
Tuple
[str
,str
,str
,str
,str
,str
]) – Tuple containing (network, station, location, channel, start_time, end_time)waveform_clients (
Dict
[str
,Client
]) – Dictionary mapping network codes to FDSN clients. Special key ‘open’ is used for default client.sds_path (
str
) – Root path of the SDS archive.db_manager (
DatabaseManager
) – DatabaseManager instance for updating the database.
- Return type:
None
Note
Supports per-network and per-station authentication
Handles splitting of large station list requests
Performs data merging when files already exist
Attempts STEIM2 compression, falls back to uncompressed format
Groups traces by day to handle fragmented data efficiently
Example
>>> clients = {'IU': Client('IRIS'), 'open': Client('IRIS')} >>> request = ("IU", "ANMO", "00", "BHZ", "2020-01-01", "2020-01-02") >>> archive_request(request, clients, "/data/seismic", db_manager)
- seed_vault.service.seismoloader.collect_requests(inv, time0, time1, days_per_request=3, cha_pref=None, loc_pref=None)[source]
Generate time-windowed data requests for all channels in an inventory.
Creates a list of data requests by breaking a time period into smaller windows and collecting station metadata for each window. Can optionally filter for preferred channels and location codes.
- Parameters:
inv (obspy.core.inventory.Inventory) – Station inventory to generate requests for
time0 (obspy.UTCDateTime) – Start time for data requests
time1 (obspy.UTCDateTime) – End time for data requests
days_per_request (int, optional) – Length of each request window in days. Defaults to 3.
cha_pref (list, optional) – List of preferred channel codes in priority order. If provided, only these channels will be requested. Defaults to None.
loc_pref (list, optional) – List of preferred location codes in priority order. If provided, only these location codes will be requested. Defaults to None.
- Returns:
- List of tuples containing request parameters:
- (network_code, station_code, location_code, channel_code,
start_time_iso, end_time_iso)
Returns None if start time is greater than or equal to end time.
- Return type:
list or None
Notes
End time is capped at 120 seconds before current time
Times in returned tuples are ISO formatted strings with ‘Z’ suffix
Uses get_preferred_channels() if cha_pref or loc_pref are specified
Examples
>>> from obspy import UTCDateTime >>> t0 = UTCDateTime("2020-01-01") >>> t1 = UTCDateTime("2020-01-10") >>> requests = collect_requests(inventory, t0, t1, ... days_per_request=2, ... cha_pref=['HHZ', 'BHZ'], ... loc_pref=['', '00'])
- seed_vault.service.seismoloader.collect_requests_event(eq, inv, model=None, settings=None)[source]
Collect data requests and arrival times for an event at multiple stations.
For a given earthquake event, calculates arrival times and generates data requests for all appropriate stations in the inventory.
- Parameters:
eq (
Event
) – ObsPy Event object containing earthquake information.inv (
Inventory
) – ObsPy Inventory object containing station information.model (
Optional
[TauPyModel
]) – Optional TauPyModel for travel time calculations. If None, uses model from settings or falls back to IASP91.settings (
Optional
[SeismoLoaderSettings
]) – Optional SeismoLoaderSettings object containing configuration.
- Returns:
List of request tuples (net, sta, loc, chan, start, end)
List of arrival data tuples for database
Dictionary mapping “net.sta” to P-arrival timestamps
- Return type:
Tuple containing
Note
Requires a DatabaseManager instance to check for existing arrivals. Time windows are constructed around P-wave arrivals using settings. Handles both new calculations and retrieving existing arrival times.
Example
>>> event = client.get_events()[0] >>> inventory = client.get_stations(network="IU") >>> requests, arrivals, p_times = collect_requests_event( ... event, inventory, model=TauPyModel("iasp91") ... )
- seed_vault.service.seismoloader.combine_requests(requests)[source]
Combine multiple data requests for efficiency.
Groups requests by network and time range, combining stations, locations, and channels into comma-separated lists to minimize the number of requests.
- Parameters:
requests (
List
[Tuple
[str
,str
,str
,str
,str
,str
]]) – List of request tuples, each containing: (network, station, location, channel, start_time, end_time)- Return type:
List
[Tuple
[str
,str
,str
,str
,str
,str
]]- Returns:
List of combined request tuples with the same structure but with station, location, and channel fields potentially containing comma-separated lists.
Example
>>> original = [ ... ("IU", "ANMO", "00", "BHZ", "2020-01-01", "2020-01-02"), ... ("IU", "COLA", "00", "BHZ", "2020-01-01", "2020-01-02") ... ] >>> combined = combine_requests(original) >>> print(combined) [("IU", "ANMO,COLA", "00", "BHZ", "2020-01-01", "2020-01-02")]
- seed_vault.service.seismoloader.get_events(settings)[source]
Retrieve seismic event catalogs based on configured criteria.
Queries FDSN web services or loads local catalogs for seismic events matching specified criteria including time range, magnitude, depth, and geographic constraints.
- Parameters:
settings (
SeismoLoaderSettings
) – Configuration settings containing event search criteria, client information, and filtering preferences.- Return type:
List
[Catalog
]- Returns:
List of ObsPy Catalog objects containing matching events. Returns empty catalog if no events found.
- Raises:
FileNotFoundError – If local catalog file not found.
PermissionError – If unable to access local catalog file.
ValueError – If invalid geographic constraint type specified.
Example
>>> settings = SeismoLoaderSettings() >>> settings.event.min_magnitude = 5.0 >>> catalogs = get_events(settings)
- seed_vault.service.seismoloader.get_missing_from_request(db_manager, eq_id, requests, st)[source]
Compare requested seismic data against what’s present in a Stream. Handles comma-separated values for location and channel codes.
- Return type:
dict
Parameters:
- eq_idstr
Earthquake ID to use as dictionary key
- requestsList[Tuple]
List of request tuples, each containing (network, station, location, channel, starttime, endtime)
- stStream
ObsPy Stream object containing seismic traces
Returns:
: dict
Nested dictionary with structure: {eq_id: {
“network.station”: value, “network2.station2”: value2, …
}} where value is either: - list of missing channel strings (“network.station.location.channel”) - “Not Attempted” if stream is empty - “ALL” if all requested channels are missing - [] if all requested channels are present
- seed_vault.service.seismoloader.get_p_s_times(eq, dist_deg, ttmodel)[source]
Calculate theoretical P and S wave arrival times for an earthquake at a given distance.
Uses a travel time model to compute the first P and S wave arrivals for a given earthquake and distance. The first arrival (labeled as “P”) may not necessarily be a direct P wave. For S waves, only phases explicitly labeled as ‘S’ are considered.
- Parameters:
eq (obspy.core.event.Event) – Earthquake event object containing origin time and depth information
dist_deg (float) – Distance between source and receiver in degrees
ttmodel (obspy.taup.TauPyModel) – Travel time model to use for calculations
- Returns:
- A tuple containing:
(UTCDateTime or None): Time of first arrival (“P” wave)
(UTCDateTime or None): Time of first S wave arrival Returns (None, None) if travel time calculation fails
- Return type:
tuple
Notes
Earthquake depth is expected in meters in the QuakeML format and is converted to kilometers for the travel time calculations
For S waves, only searches for explicit ‘S’ phase arrivals
Warns if no P arrival is found at any distance
Warns if no S arrival is found at distances ≤ 90 degrees
Examples
>>> from obspy.taup import TauPyModel >>> model = TauPyModel(model="iasp91") >>> p_time, s_time = get_p_s_times(earthquake, 45.3, model)
- seed_vault.service.seismoloader.get_preferred_channels(inv, cha_rank=None, loc_rank=None, time=None)[source]
Select the best available channels from an FDSN inventory based on rankings.
Filters an inventory to keep only the preferred channels based on channel code and location code rankings. For each component (Z, N, E), selects the channel with the highest ranking.
- Parameters:
inv (
Inventory
) – ObsPy Inventory object to filter.cha_rank (
Optional
[List
[str
]]) – List of channel codes in order of preference (e.g., [‘BH’, ‘HH’]). Lower index means higher preference.loc_rank (
Optional
[List
[str
]]) – List of location codes in order of preference (e.g., [‘’, ‘00’]). Lower index means higher preference. ‘–’ is treated as empty string.time (
Optional
[UTCDateTime
]) – Optional time to filter channel availability at that time.
- Return type:
Inventory
- Returns:
Filtered ObsPy Inventory containing only the preferred channels. If all channels would be filtered out, returns original station.
Note
Channel preference takes precedence over location preference. If neither cha_rank nor loc_rank is provided, returns original inventory.
Example
>>> inventory = client.get_stations(network="IU", station="ANMO") >>> cha_rank = ['BH', 'HH', 'EH'] >>> loc_rank = ['00', '10', ''] >>> filtered = get_preferred_channels(inventory, cha_rank, loc_rank)
- seed_vault.service.seismoloader.get_selected_stations_at_channel_level(settings)[source]
Update inventory information to include channel-level details for selected stations.
Retrieves detailed channel information for each station in the selected inventory using the specified FDSN client.
- Parameters:
settings (
SeismoLoaderSettings
) – Configuration settings containing station selection and client information.- Return type:
- Returns:
Updated settings with refined station inventory including channel information.
Example
>>> settings = SeismoLoaderSettings() >>> settings = get_selected_stations_at_channel_level(settings)
- seed_vault.service.seismoloader.get_stations(settings)[source]
Retrieve station inventory based on configured criteria.
Gets station information from FDSN web services or local inventory based on settings, including geographic constraints, network/station filters, and channel preferences.
- Parameters:
settings (
SeismoLoaderSettings
) – Configuration settings containing station selection criteria, client information, and filtering preferences.- Return type:
Optional
[Inventory
]- Returns:
Inventory containing matching stations, or None if no stations found or if station service is unavailable.
Note
The function applies several layers of filtering: 1. Basic network/station/location/channel criteria 2. Geographic constraints (if specified) 3. Station exclusions/inclusions 4. Channel and location preferences 5. Sample rate filtering
Example
>>> settings = SeismoLoaderSettings() >>> settings.station.network = "IU" >>> inventory = get_stations(settings)
- seed_vault.service.seismoloader.prune_requests(requests, db_manager, sds_path, min_request_window=3)[source]
Remove overlapping requests where data already exists in the archive.
Checks both the database and filesystem for existing data and removes or splits requests to avoid re-downloading data that should be there already.
- Parameters:
requests (
List
[Tuple
[str
,str
,str
,str
,str
,str
]]) – List of request tuples containing: (network, station, location, channel, start_time, end_time)db_manager (
DatabaseManager
) – DatabaseManager instance for querying existing data.sds_path (
str
) – Root path of the SDS archive.min_request_window (
float
) – Minimum time window in seconds to keep a request. Requests shorter than this are discarded. Default is 3 seconds.
- Return type:
List
[Tuple
[str
,str
,str
,str
,str
,str
]]- Returns:
List of pruned request tuples, sorted by start time, network, and station.
Note
This function will update the database if it finds files in the SDS structure that aren’t yet recorded in the database.
Example
>>> requests = [("IU", "ANMO", "00", "BHZ", "2020-01-01", "2020-01-02")] >>> pruned = prune_requests(requests, db_manager, "/data/SDS")
- seed_vault.service.seismoloader.read_config(config_file)[source]
Read and process a configuration file with case-sensitive handling for specific sections.
Reads a configuration file and processes it such that certain sections (AUTH, DATABASE, SDS, WAVEFORM) preserve their case sensitivity while other sections are converted to lowercase.
- Parameters:
config_file (
str
) – Path to the configuration file to read.- Returns:
- Processed configuration with appropriate case handling
for different sections.
- Return type:
Example
>>> config = read_config("config.ini") >>> auth_value = config.get("AUTH", "ApiKey") # Case preserved >>> other_value = config.get("settings", "parameter") # Converted to lowercase
- seed_vault.service.seismoloader.run_continuous(settings, stop_event=None)[source]
Retrieves continuous seismic data over long time intervals for a set of stations defined by the inv parameter. The function manages multiple steps including generating data requests, pruning unnecessary requests based on existing data, combining requests for efficiency, and finally archiving the retrieved data.
The function uses a client setup based on the configuration in settings to handle different data sources and authentication methods. Errors during client creation or data retrieval are handled gracefully, with issues logged to the console.
Parameters: - settings (SeismoLoaderSettings): Configuration settings containing client information,
authentication details, and database paths necessary for data retrieval and storage. This should include the start and end times for data collection, database path, and SDS archive path among other configurations.
stop_event (threading.Event): Optional event flag for canceling the operation mid-execution. If provided and set, the function will terminate gracefully at the next safe point.
Workflow: 1. Initialize clients for waveform data retrieval. 2. Retrieve station information based on settings. 3. Collect initial data requests for the given time interval. 4. Prune requests based on existing data in the database to avoid redundancy. 5. Combine similar requests to minimize the number of individual operations. 6. Update or create clients based on specific network credentials if necessary. 7. Execute data retrieval requests, archive data to disk, and update the database.
Raises: - Exception: General exceptions could be raised due to misconfiguration, unsuccessful
data retrieval or client initialization errors. These exceptions are caught and logged, but not re-raised, allowing the process to continue with other requests.
Notes: - It is crucial to ensure that the settings object is correctly configured, especially
the client details and authentication credentials to avoid runtime errors.
The function logs detailed information about the processing steps and errors to aid in debugging and monitoring of data retrieval processes.
- seed_vault.service.seismoloader.run_event(settings, stop_event=None)[source]
Processes and downloads seismic event data for each event in the provided catalog using the specified settings and station inventory. The function manages multiple steps including data requests, arrival time calculations, database updates, and data retrieval.
The function handles data retrieval from FDSN web services with support for authenticated access and restricted data. Processing can be interrupted via the stop_event parameter, and errors during execution are handled gracefully with detailed logging.
Parameters: - settings (SeismoLoaderSettings): Configuration settings that include client details,
authentication credentials, event-specific parameters like radius and time window, and paths for data storage.
stop_event (threading.Event): Optional event flag for canceling the operation mid-execution. If provided and set, the function will terminate gracefully at the next safe point.
Workflow: 1. Initialize paths and database connections 2. Load appropriate travel time model for arrival calculations 3. Process each event in the catalog:
Calculate arrival times and generate data requests
Update arrival information in database
Check for existing data and prune redundant requests
Download and archive new data
Add event metadata to traces (arrivals, distances, azimuths)
Combine data into event streams with complete metadata
Returns: - List[obspy.Stream]: List of streams, each containing data for one event with
complete metadata including arrival times, distances, and azimuths. Returns None if operation is canceled or no data is processed.
Raises: - Exception: General exceptions from client creation, data retrieval, or processing
are caught and logged but not re-raised, allowing processing to continue with remaining events.
Notes: - The function supports threading and can be safely interrupted via stop_event - Station metadata is enriched with event-specific information including arrivals - Data is archived in SDS format and the database is updated accordingly - Each stream in the output includes complete event metadata for analysis
- seed_vault.service.seismoloader.run_main(settings=None, from_file=None, stop_event=None)[source]
Main entry point for seismic data retrieval and processing.
Coordinates the overall workflow for retrieving and processing seismic data, handling both continuous and event-based data collection based on settings.
- Parameters:
settings (
Optional
[SeismoLoaderSettings
]) – Configuration settings for data retrieval and processing. If None, settings must be provided via from_file.from_file (
Optional
[str
]) – Path to configuration file to load settings from. Only used if settings is None.stop_event (
Optional
[Event
]) – Optional event flag for canceling the operation mid-execution. If provided and set, the function will terminate gracefully at the next safe point.
- Return type:
None
- Returns:
The result from run_continuous or run_event, or None if cancelled.
Example
>>> # Using settings object >>> settings = SeismoLoaderSettings() >>> settings.download_type = DownloadType.EVENT >>> run_main(settings)
>>> # Using configuration file >>> run_main(from_file="config.ini")
- seed_vault.service.seismoloader.select_highest_samplerate(inv, minSR=10, time=None)[source]
Filters an inventory to keep only the highest sample rate channels where duplicates exist.
For each station in the inventory, this function identifies duplicate channels (those sharing the same location code) and keeps only those with the highest sample rate. Channels must meet the minimum sample rate requirement to be considered.
- Parameters:
inv (obspy.core.inventory.Inventory) – Input inventory object
minSR (float, optional) – Minimum sample rate in Hz. Defaults to 10.
time (obspy.UTCDateTime, optional) – Specific time to check channel existence. If provided, channels are considered duplicates if they share the same location code and both exist at that time. If None, channels are considered duplicates if they share the same location code and time span. Defaults to None.
- Returns:
- Filtered inventory containing only the highest
sample rate channels where duplicates existed.
- Return type:
obspy.core.inventory.Inventory
Examples
>>> # Filter inventory keeping only highest sample rate channels >>> filtered_inv = select_highest_samplerate(inv) >>> >>> # Filter for a specific time, minimum 1 Hz >>> from obspy import UTCDateTime >>> time = UTCDateTime("2020-01-01") >>> filtered_inv = select_highest_samplerate(inv, minSR=1, time=time)
Notes
Channel duplicates are determined by location code and either: * Existence at a specific time (if time is provided) * Having identical time spans (if time is None)
All retained channels must have sample rates >= minSR
For duplicate channels, all channels with the highest sample rate are kept
- seed_vault.service.seismoloader.setup_paths(settings)[source]
Initialize paths and database for seismic data management.
- Parameters:
settings (
SeismoLoaderSettings
) – Configuration settings containing paths and database information.- Returns:
Updated settings with validated paths
Initialized DatabaseManager instance
- Return type:
Tuple containing
- Raises:
ValueError – If SDS path is not set in settings.
Example
>>> settings = SeismoLoaderSettings() >>> settings.sds_path = "/data/seismic" >>> settings, db_manager = setup_paths(settings)
seed_vault.service.stations module
The stations service should get the stations based on a selection (filter) settings. UI should generate the selection and pass it here. We need a single function here that gets the selection and runs Rob’s script.
We should also be able to support multi-select areas.
@TODO: For now, dummy scripts are used. @Yunlong to fix.
seed_vault.service.utils module
- seed_vault.service.utils.check_client_services(client_name)[source]
Check which services are available for a given client name.
- seed_vault.service.utils.convert_to_datetime(value)[source]
Convert a string or other value to a date and time object, handling different formats.
If only a date is provided, it defaults to 00:00:00 time.
note that this returns a tuple of (date, time)
- seed_vault.service.utils.filter_catalog_by_geo_constraints(catalog, constraints)[source]
Filter an ObsPy event catalog to include events within ANY of original search constraints. This should be done to clean up any superfluous events that our reducted get_event calls may have introduced.
- Return type:
Catalog
Parameters:
- catalogobspy.core.event.Catalog
The input event catalog to filter
constraints: settings.event.geo_constraint (whatever object type this is TODO)
Returns:
: obspy.core.event.Catalog
A new catalog containing events within any of the specified circles
- seed_vault.service.utils.filter_inventory_by_geo_constraints(inventory, constraints)[source]
Filter an ObsPy inventory to include stations within ANY of the original search constraints.
- Return type:
Inventory
Parameters:
- inventoryobspy.Inventory
The input inventory to filter
- constraints: settings.event.geo_constraint
List of geographical constraints
Returns:
: obspy.Inventory
A new inventory containing only stations within any of the specified constraints
- seed_vault.service.utils.get_sds_filenames(n, s, l, c, time_start, time_end, sds_path)[source]
Generate SDS (SeisComP Data Structure) format filenames for a time range.
Creates a list of daily SDS format filenames for given network, station, location, and channel codes over a specified time period.
- Parameters:
n (
str
) – Network code.s (
str
) – Station code.l (
str
) – Location code.c (
str
) – Channel code.time_start (
UTCDateTime
) – Start time for data requests.time_end (
UTCDateTime
) – End time for data requests.sds_path (
str
) – Root path of the SDS archive.
- Returns:
/sds_path/YEAR/NETWORK/STATION/CHANNEL.D/NET.STA.LOC.CHA.D.YEAR.DOY
- Return type:
List of SDS format filepaths in the form
Example
>>> paths = get_sds_filenames( ... "IU", "ANMO", "00", "BHZ", ... UTCDateTime("2020-01-01"), ... UTCDateTime("2020-01-03"), ... "/data/seismic" ... )
- seed_vault.service.utils.get_time_interval(interval_type, amount=1)[source]
Get the current date-time and the date-time amount intervals earlier.
- Parameters:
interval_type (str) – One of [‘hour’, ‘day’, ‘week’, ‘month’]
amount (int) – Number of intervals to go back (default is 1)
- Returns:
(current_datetime, past_datetime)
- Return type:
tuple
- seed_vault.service.utils.parse_inv(inv)[source]
Return 4 lists (net, sta, loc, cha) detailing the contents of an ObsPy inventory file
- Parameters:
inv (Inventory) – ObsPy Inventory object
- Returns:
Four lists containing all network, station, location, and channel codes
- Return type:
tuple
- seed_vault.service.utils.remove_duplicate_events(catalog)[source]
Remove duplicate events from an ObsPy Catalog based on resource IDs.
Takes a catalog of earthquake events and returns a new catalog containing only unique events, where uniqueness is determined by the event’s resource_id. The first occurrence of each resource_id is kept.
- Parameters:
catalog (obspy.core.event.Catalog) – Input catalog containing earthquake events
- Returns:
New catalog containing only unique events
- Return type:
obspy.core.event.Catalog
Examples
>>> from obspy import read_events >>> cat = read_events('events.xml') >>> unique_cat = remove_duplicate_events(cat) >>> print(f"Removed {len(cat) - len(unique_cat)} duplicate events")
- seed_vault.service.utils.shift_time(reftime, interval_type, amount=1)[source]
Shift time amount intervals relative to reftime :type reftime: :param reftime: Reference time :type reftime: datetime :type interval_type:
str
:param interval_type: One of [‘hour’, ‘day’, ‘week’, ‘month’, ‘year’] :type interval_type: str :type amount:int
:param amount: Number of intervals to shift (positive = forward, negative = backward) :type amount: int- Returns:
The new datetime after the shift, capped at current time if shifting forward
- Return type:
shifted_datetime
- seed_vault.service.utils.to_timestamp(time_obj)[source]
Convert various time objects to Unix timestamp.
- Parameters:
time_obj (
Union
[int
,float
,datetime
,UTCDateTime
]) – Time object to convert. Can be one of: - int/float: Already a timestamp - datetime: Python datetime object - UTCDateTime: ObsPy UTCDateTime object- Returns:
Unix timestamp (seconds since epoch).
- Return type:
float
- Raises:
ValueError – If the input time object type is not supported.
Example
>>> ts = to_timestamp(datetime.now()) >>> ts = to_timestamp(UTCDateTime()) >>> ts = to_timestamp(1234567890.0)
seed_vault.service.waveform module
- seed_vault.service.waveform.get_local_waveform(request, settings)[source]
Get waveform data from a local client, handling comma-separated values for network, station, location, and channel fields. Unlike remote requests, local SDS does not handle such things.
- Parameters:
request (
Tuple
[str
,str
,str
,str
,str
,str
]) – Tuple containing (network, station, location, channel, starttime, endtime)settings (
SeismoLoaderSettings
) – Settings object containing SDS path
- Return type:
Optional
[Stream
]- Returns:
Stream object containing requested waveform data, or None if no data found