Remote access using python | Doc home | Madrigal home |
The easiest way to use the Madrigal python remote data access API is to simply let the web interface write the script you need for you. Just choose the Access data pull-down menu and choose Create a command to download multiple exps. Then follow the instructions, and you will have the command you need to download whatever you want from Madrigal. Be sure to select python as the language you want to create the command with. You can choose to download files as they are in Madrigal in either column-delimited ascii, Hdf5, or netCDF4 formats, or you can choose the parameters yourself (including derived parameters), and optionally include filters on the data you get back.
This web interface will generate python commands using one of the following two Python scripts: globalDownload.py and globalIsprint.py. Use globalDownload.py if you want data as it is in Madrigal. Use globalIsprint.py to choose parameters and/or filters. These two scripts are documented below, for those who do not want to use the web interface to generate the needed arguments:
Finally, this page describes the script globalCitation.py. This script is used to create a permanent citation to a group of Madrigal files.
globalDownload.py | globalIsprint.py | globalCitation.py |
Usage:
globalDownload.py --url=<Madrigal url> --outputDir=<output directory> \ --user_fullname=<user fullname> --user_email=<user email> \ --user_affiliation=<user affiliation> --format=<ascii,hdf5> [options]
where:
--url=<Madrigal url> - url to homepage of site to be searched (ie, http://madrigal.haystack.mit.edu/) This is required.
--outputDir=<output directory> - the output directory to store all files in. Default is to store all files in the same directory, and a number is added to the filename if a file might be overwritten. Set --tree flag to store all files in the same directory structure they appear in Madrigal. This allows all files to keep their original names.
--user_fullname=<user fullname> - the full user name (probably in quotes unless your name is Sting or Madonna)
--user_email=<user email>
--user_affiliation=<user affiliation> - user affiliation. Use quotes if it contains spaces. --format=<ascii or hdf5>
and options are:
--startDate=<MM/DD/YYYY> - start date to filter experiments before. Defaults to allow all experiments.
--endDate=<MM/DD/YYYY> - end date to filter experiments after. Defaults to allow all experiments.
--inst=<instrument list> - comma separated list of instrument codes or names. See Madrigal documentation for this list. Defaults to allow all instruments. If names are given, the argument must be enclosed in double quotes. An asterick will perform matching as in glob. For example: --inst=10,30 --inst="Jicamarca IS Radar,Arecibo*"
--expName - filter experiments by the experiment name. Give all or part of the experiment name. Matching is case insensitive. Default is no filtering by experiment name. --fileDesc - filter files using input file Description string and case-insensitive fnmatch
--kindat=<kind of data list> - comma separated list of kind of data codes. See Madrigal documentation for this list. Defaults to allow all kinds of data. If names are given, the argument must be enclosed in double quotes. An asterick will perform matching as in glob. For example: --kindat=3001,13201 --kindat="INSCAL Basic Derived Parameters,*efwind*,2001"
--seasonalStartDate=<MM/DD> - seasonal start date to filter experiments before. Use this to select only part of the year to collect data. Defaults to Jan 1. Example: --seasonalStartDate=07/01 would only allow experiments after July 1st from each year.
--seasonalEndDate=<MM/DD> - seasonal end date to filter experiments after. Use this to select only part of the year to collect data. Defaults to Dec 31. Example: --seasonalEndDate=10/31 would only allow experiments before Oct 31 of each year. --tree - add if you want to store the downloaded files in the same hierarchy as in Madrigal: <YYYY/<instCode>/<experimentDir>. Without --tree, stores all downloaded files in one directory. --includeNonDefault - if given, include realtime files when there are no default. Default is to search only default files.
--verbose - if given, print each file processed info to stdout. Default is to run silently. Example: globalDownload.py --url=http://madrigal.haystack.mit.edu --outputDir=/tmp --user_fullname="Bill Rideout" [email protected] --user_affiliation=MIT --startDate=01/01/1998 --endDate=-01/30/1998 --inst=30
Usage:
globalIsprint.py --url=<Madrigal url> --parms=<Madrigal parms> --output=<output file> \ --user_fullname=<user fullname> --user_email=<user email> \ --user_affiliation=<user affiliation> [options]
where:
--url=<Madrigal url> - url to homepage of site to be searched (ie, http://madrigal.haystack.mit.edu/) This is required.
--parms=<Madrigal parms> - a comma delimited string listing the desired Madrigal parameters in mnemonic form. (Example: gdalt,dte,te). Data will be returned in the same order as given in this string. See http://madrigal.haystack.mit.edu/cgi-bin/madrigal/getMetadata and choose "Parameter code table" for all possible parameters
--output=<output file or directory name> - the file or directory name to store the resulting data. If you give a file, all output will be stored in a single ascii file that you specify. Use a directory name if you want data stored as individual files, in either ascii, Hdf5, or netCDF4 formats. To use this option, you must set a format in the optional format argument. File names will be based on file names in Madrigal. Hdf5 or netCDF4 formats only available from Madrigal 3.0 or higher sites.
--user_fullname=<user fullname> - the full user name (probably in quotes unless your name is Sting or Madonna)
--user_email=<user email>
--user_affiliation=<user affiliation> - user affiliation. Use quotes if it contains spaces.
and options are:
--startDate=<MM/DD/YYY> - start date to filter experiments before. Defaults to allow all experiments.
--endDate=<MM/DD/YYY> - end date to filter experiments after. Defaults to allow all experiments.
--inst=<instrument list> - comma separated list of instrument codes or names. See Madrigal documentation for this list. Defaults to allow all instruments. If names are given, the argument must be enclosed in double quotes. An asterick will perform matching as in glob. Examples: (--inst=10,30 or --inst="Jicamarca IS Radar,Arecibo*") --format=<Hdf5 or netCDF4 or ascii> - format must be specified if output is a directory so that data is stored in individual files, one for each Madrigal file. Hdf5 or netCDF4 formats only available from Madrigal 3.0 or higher sites.
--expName - filter experiments by the experiment name. Give all or part of the experiment name. Matching is case insensitive and fnmatch characters * and ? are allowed. Default is no filtering by experiment name. --fileDesc - filter files by their file description string. Give all or part of the file description string. Matching is case insensitive and fnmatch characters * and ? are allowed. Default is no filtering by file description.
--kindat=<kind of data list> - comma separated list of kind of data codes. See Madrigal documentation for this list. Defaults to allow all kinds of data. If names are given, the argument must be enclosed in double quotes. An asterick will perform matching as in glob. Examples: (--kindat=3001,13201 or --kindat="INSCAL Basic Derived Parameters,*efwind*,2001")
--filter=<[mnemonic] or [mnemonic1,[+-*/]mnemonic2]>,<lower limit1>,<upper limit1>[or<lower limit2>,<upper limit2>...] a filter using any measured or derived Madrigal parameter, or two Madrigal parameters either added, subtracted, multiplied or divided. Each filter has one or more allowed ranges. The filter accepts data that is in any allowed range. If the Madrigal parameter value is missing, the filter will always reject that data. Multiple filter arguments are allowed on the command line. To skip either a lower limit or an upper limit, leave it blank. Examples: (--filter=ti,500,1000 (Accept when 500 <= Ti <= 1000) or --filter=gdalt,-,sdwht,0, (Accept when gdalt > shadowheight - that is, point in direct sunlight) or --filter=gdalt,200,300or1000,1200 (Accept when 200 <= gdalt <= 300 OR 1000 <= gdalt <= 1200))
--seasonalStartDate=<MM/DD> - seasonal start date to filter experiments before. Use this to select only part of the year to collect data. Defaults to Jan 1. Example: (--seasonalStartDate=07/01) would only allow experiments after July 1st from each year.
--seasonalEndDate=<MM/DD> - seasonal end date to filter experiments after. Use this to select only part of the year to collect data. Defaults to Dec 31. Example: (--seasonalEndDate=10/31) would only allow experiments before Oct 31 of each year.
--showFiles - if given, show file names. Default is to not show file names. Not used if format in <Hdf5, netCDF4).
--showSummary - if given, summarize all arguments at the beginning. Not used if format in <Hdf5, netCDF4). Default is to not show summary. --includeNonDefault - if given, include realtime files when there are no default. Not used if format in <Hdf5, netCDF4). Default is to search only default files.
--missing=<missing string> (defaults to "missing"). Not used if format in <Hdf5, netCDF4).
--assumed=<assumed string> (defaults to "assumed"). Not used if format in <Hdf5, netCDF4).
--knownbad=<knownbad string> (defaults to "knownbad"). Not used if format in <Hdf5, netCDF4).
--verbose - if given, print each file processed info to stdout. Default is to run silently.
The script globalCitation.py runs a global search through Madrigal data, and returns a permanent citation to the group of files.
Usage:
globalCitation.py --user_fullname=--user_email= \ --user_affiliation= --startDate= --endDate= \ inst=instrument list> [options] where: --user_fullname= - the full user name (probably in quotes unless your name is Sting or Madonna) --user_email= --user_affiliation= - user affiliation. Use quotes if it contains spaces. --startDate= - start date to filter experiments before. Defaults to allow all experiments. --endDate= - end date to filter experiments after. Defaults to allow all experiments. --inst= - comma separated list of instrument codes or names. See Madrigal documentation for this list. Defaults to allow all instruments. If names are given, the argument must be enclosed in double quotes. An asterick will perform matching as in glob. Examples: (--inst=10,30 or --inst="Jicamarca IS Radar,Arecibo*") and options are: --expName - filter experiments by the experiment name. Give all or part of the experiment name. Matching is case insensitive and fnmatch characters * and ? are allowed. Default is no filtering by experiment name. --excludeExpName - exclude experiments by the experiment name. Give all or part of the experiment name. Matching is case insensitive and fnmatch characters * and ? are allowed. Default is no excluding experiments by experiment name. --fileDesc - filter files by their file description string. Give all or part of the file description string. Matching is case insensitive and fnmatch characters * and ? are allowed. Default is no filtering by file description. --kindat= - comma separated list of kind of data codes. See Madrigal documentation for this list. Defaults to allow all kinds of data. If names are given, the argument must be enclosed in double quotes. An asterick will perform matching as in glob. Examples: (--kindat=3001,13201 or --kindat="INSCAL Basic Derived Parameters,*efwind*,2001") --seasonalStartDate= - seasonal start date to filter experiments before. Use this to select only part of the year to collect data. Defaults to Jan 1. Example: (--seasonalStartDate=07/01) would only allow experiments after July 1st from each year. --seasonalEndDate= - seasonal end date to filter experiments after. Use this to select only part of the year to collect data. Defaults to Dec 31. Example: (--seasonalEndDate=10/31) would only allow experiments before Oct 31 of each year. --includeNonDefault - if given, include realtime files when there are no default. Default is to search only default files.
The rest of this tutorial is for those who want to go beyond the automatically generated commands and write more advanced python applications that access Madrigal data.
This page describes the remote Python API, and gives some examples of using this API. These examples have been tested on both Windows and Linux, and require only access to the internet and python 2.3 to run. It is available for download here.
The remote Python API is organized in the same way as the Madrigal data model, from Instrument at the highest level, down to the level of data values. Readers who are not familiar with the Madrigal data model should read the material in that section before proceeding with this tutorial.
The basic object in the remote Python API is the MadrigalData, found in the madrigalWeb module. To initialize MadrigalData requires only the url of the home page on any Madrigal 2.3 (or above) site as an argument. Calling the methods of this object will return all possible information from one Madrigal site. The other objects in madrigalWeb are simply there to hold returned information - for example, the MadrigalExperiment object holds information about one experiment.
MadrigalData has the following methods:
See the Madrigal Python API reference guide for complete documentation.
Two applications written with the remote Python API follow. The first is a simple regression test that is run to test web services when Madrigal is installed. The second is a script that downloads realtime data from any desired Madrigal site.
Simple regression test
This simple script calls the following MadrigalData methods:
This example also shows how to get data from a different Madrigal site than the one you start with.
To use this regression test, cd to the examples directory in the installation directory, and type:
python exampleMadrigalWebServices.py
"""exampleMadrigalWebServices.py runs an example of the Madrigal Web Services interface
for a given Madrigal server.
usage:
python exampleMadrigalWebServices.py
"""
# $Id: exampleMadrigalWebServices.py 3984 2012-03-20 14:20:17Z brideout $
import madrigalWeb.madrigalWeb
# constants
user_fullname = 'Bill Rideout - automated test'
user_email = '[email protected]'
user_affiliation = 'MIT Haystack'
madrigalUrl = 'http://madrigal.haystack.mit.edu'
testData = madrigalWeb.madrigalWeb.MadrigalData(madrigalUrl)
print 'Example of call to getAllInstruments'
instList = testData.getAllInstruments()
# print out Millstone
for inst in instList:
if inst.code == 30:
print (str(inst) + '\n')
print 'Example of call to getExperiments'
expList = testData.getExperiments(30, 1998,1,19,0,0,0,1998,1,22,0,0,0)
for exp in expList:
# should be only one
print (str(exp) + '\n')
print 'Example of call to getExperimentFiles'
fileList = testData.getExperimentFiles(expList[0].id)
for thisFile in fileList:
if thisFile.category == 1:
print (str(thisFile.name) + '\n')
thisFilename = thisFile.name
break
print 'Example of downloadFile - simple and hdf5 formats:'
result = testData.downloadFile(thisFilename, "/tmp/test.txt",
user_fullname, user_email, user_affiliation, "simple")
result = testData.downloadFile(thisFilename, "/tmp/test.hdf5",
user_fullname, user_email, user_affiliation, "simple")
print 'Example of simplePrint - only first 1000 characters printed'
result = testData.simplePrint(thisFilename, user_fullname, user_email, user_affiliation)
print result[:1000]
print
print 'Example of call to getExperimentFileParameters - only first 10 printed'
fileParms = testData.getExperimentFileParameters(thisFilename)
for i in range(10):
print fileParms[i]
print
print 'Example of call to isprint (prints data)'
print(testData.isprint(thisFilename,
'gdalt,ti',
'filter=gdalt,500,600 filter=ti,1900,2000',
user_fullname, user_email, user_affiliation))
print 'Example of call to madCalculator (gets derived data at any time)'
result = testData.madCalculator(1999,2,15,12,30,0,45,55,5,-170,-150,10,200,200,0,'sdwht,kp')
for line in result:
for value in line:
print ('%8.2e ' % (value))
print('\n')
print 'Example of searching all Madrigal sites for an experiment - here we search for PFISR data'
expList = testData.getExperiments(61,2008,4,1,0,0,0,2008,4,30,0,0,0,local=0)
print expList[0]
print 'Since this experiment is not local (note the experiment id = -1), we need to create a new MadrigalData object to get it'
testData2 = madrigalWeb.madrigalWeb.MadrigalData(expList[0].madrigalUrl)
print 'Now repeat the same calls as above to get PFISR data from the SRI site'
expList2 = testData2.getExperiments(61,2008,4,1,0,0,0,2008,4,30,0,0,0,local=1)
print 'This is a PFISR experiment'
print expList2[0]
Script to download realtime data from Madrigal
The following is a demonstration script that shows how real-time data can be imported from any Madrigal site that is updated on a real-time basis.
In this example, data is imported from http://www.haystack.mit.edu/madrigal from "Millstone Hill IS Radar". The following Madrigal parameters are retrieved:
year,month,day,hour,min,sec,gdlat,glon,gdalt,az,el,vo,dvo
for all records from the past 15 minutes.
Although the particular Madrigal site (http://www.haystack.mit.edu/madrigal), the instrument ("Millstone Hill IS Radar"), the parameters, and the times are hard-coded in this example, they could be easily be modified to be arguments.
To avoid missing data, we choose one parameter to be the filter parameter: vo. By filtering on this parameter, any "missing" values are filtered out.
To run this script requires the python Madrigal API be installed, which can be downloaded from http://www.haystack.edu/madrigal/madDownload.html.
import os,sys,os.path import string import time import madrigalWeb.madrigalWeb #constants madrigalUrl = 'http://www.haystack.mit.edu/madrigal' instrument = 'Millstone Hill IS Radar' user_fullname = 'Put your name here!!!' user_email = '[email protected]' user_affiliation = 'Put your affiliation here!!!' # each line of data contains the following parameters params = 'year,month,day,hour,min,sec,gdlat,glon,gdalt,azm,elm,vo,dvo' filterParm = 'vo' timeDelay = 15 # create the main object to get all needed info from Madrigal madrigalObj = madrigalWeb.madrigalWeb.MadrigalData(madrigalUrl) # these next few lines convert instrument name to code code = None instList = madrigalObj.getAllInstruments() for inst in instList: if inst.name.lower() == instrument.lower(): code = inst.code break if code == None: raise ValueError, 'Unknown instrument %s' % (instrument) # next, get a list of real time experiments in the last timeDelay minutes startTime = time.gmtime(time.time() - timeDelay*60.0) endTime = time.gmtime(time.time()) try: expList = madrigalObj.getExperiments(code, startTime[0], startTime[1], startTime[2], startTime[3], startTime[4], startTime[5], endTime[0], endTime[1], endTime[2], endTime[3], endTime[4], endTime[5]) except: raise ValueError, 'No realtime experiments found' # assume there's only one realtime experiment, and get the file names fileList = madrigalObj.getExperimentFiles(expList[0].id) if len(fileList) == 0: raise ValueError, 'No realtime experiment files found' # get data from each of the files startDateStr = time.strftime('%m/%d/%Y', startTime) startDateStr = ' date1=' + startDateStr startTimeStr = time.strftime('%H:%M:%S', startTime) startTimeStr = ' time1=' + startTimeStr endDateStr = time.strftime('%m/%d/%Y', endTime) endDateStr = ' date2=' + endDateStr endTimeStr = time.strftime('%H:%M:%S', endTime) endTimeStr = ' time2=' + endTimeStr filterString = 'filter=%s,-1E30,1E30' % (filterParm) + startDateStr + startTimeStr + endDateStr + endTimeStr for dataFile in fileList: result = madrigalObj.isprint(dataFile.name, params, filterString, user_fullname, user_email, user_affiliation) # make sure it succeeded if result.find('No records were selected') != -1: continue if result.find('****') != -1: continue print result
Remote access using python | Doc home | Madrigal home |