This is the first in what will hopefully be a series of instructional articles describing some of MST’s software projects and how to use them. So, it makes sense to start with one of our most widely used open source projects CATE Numpy, which provides a handy python module for interacting with the CATE API and allows users to easily retreive data and manage CATE archives.

Installation

CATE-Numpy (or catenp) can be installed from the standard Python repository PyPi. So all you have to do is type

pip install catenp

however, if playing with code is more you thing then you can get the source code from  https://github.com/motionsignaltechnologies/cate-numpy

The basics – Authentication and Getting Data

All CATE servers are access protected (that way only you have access to your data), so once you have the python module first the step is to authenticate to a CATE server. CATE Numpy provides a convienent function for doign this namely Authenticate

from catenp import Authenticate 
tk = Authenticate("0.0.0.0",8000,"cate-user-name","cate-password")

Where you replace 0.0.0.0 with the address of the cate server, 8000 with the port, and supply the user name and password for your account on the server. The return value of the function is an API token that is used in future requests to the server, but don’t worry this token is also remembered by the module so you do not have to worry about saving it.

Now that you have an authentication token getting data from the compress archive is a just a matter of requesting the data interval you want. For example, the following retreives 30 seconds for data starting at 17:00 on 2023-02-06 for channels 5000 to 6000 from the archive.

from catenp import GetData
arr=GetData("0.0.0.0",8000,"cate-user-name", 
   "2023-02-06T17:00:00+00:00","2023-02-06T17:00:30+00:00", 
   5000,6000) p
print(arr.shape) # (1875,1001) 
print(arr.dtype) # int16 
print(arr) 
  # [[-24 14 127 ... -20 -25 12] 
  # [-17 -8 -21 ... 52 27 20] 
  # [ -2 -27 -97 ... 3 7 -30] 
  # ... 
  # [ 13 49 -18 ... 44 76 118] 
  # [-11 53 9 ... 7 -5 -81] 
  # [ 3 -86 -8 ... -44 -36 -37]]

The result is returned in a row-major numpy array,  GetData and the CATE API have handled all the decompression and data assembly for you. The data is organised as channels for the columns and time samples for the rows (i.e. channel 0 time 0, channel 1 time 0, …). It is also worth noting that the function takes server address, port and username as its first inputs, these are used by the catenp module to lookup the authentication token we acquired when we logged in.

Advanced  – Getting Archive Information

So we have seen that getting data is pretty simple, but how do you know what data there is to get from a CATE server? Well, the CATE API provides a number of endpoints for querying what data is available in the archive, and as you might expect catenp provides some handy interface functions.

The first and most general query function is ArchiveInfo, which provides general information on the dataset such as sample rate and channel range.

from catenp.catenumpy import ArchiveInfo
info = ArchiveInfo("0.0.0.0",8000,"cate-user-name") 
for kk in info: print(kk," : ",info[kk]) 
  # archive_id : cateid 
  # archive_name : full-cate-archive-name 
  # archive_version : {'major_version': 0, 'minor_version': 1, 'git_commit_id': None} 
  # database : {'db_type': 'SQLITE', 'url': 'xx://xxxxx/xxxxxx'} 
  # time_axis : {'sample_rate_hz': 62.5} 
  # channel_axis : {'start_channel': 0, 'stop_channel': 19999}

The return is a dictionary with useful information such as the time sampling rate and channel range for the archive.

For an overview of what data is available CATE-Numpy provides the function DatabaseInfo for initial archive queries.

from catenp.catenumpy import DatabaseInfo
info = DatabaseInfo("0.0.0.0",8000,"cate-user-name") 
for kk in info: 
  if kk !="segments": 
    print(" ",kk,":",info[kk]) 
  else: 
    print(" segments:") 
    for xx in info[kk]: 
      for ll in xx: print(" ",ll,":",xx[ll]) 
      print("")
  #
  # archive_resource : xx://xxxxx.sqlite
  # db_backend : SQLITE
  #
  # segments:
  #   min_time : 2023-02-06T01:14:38.880100+00:00
  #   max_time : 2023-02-06T02:00:38.864100+00:00
  #   min_channel : 0
  #   max_channel : 14591
  #   id : 1
  #   segment_url : xx://xxx
  #   size_mb : None
  #   number_of_resources : None
  #   row_series_info : []
  #  ...

The resulting the segments list is designed to be used as a first query and only returns a summary of the available data. Each item in the segments list is roughly an hour long (although this is configurable), and describes the bounding box for the data available within that hour.

The user can then use DatabaseCoverage to find exactly what data is available in a specific time and channel range as follows

from catenp.catenumpy import DatabaseCoverage 
cov=DatabaseCoverage("0.0.0.0",8000,"cate-user-name", 
      "2023-02-06T01:15:00+00:00","2023-02-06T01:20:00+00:00", 
      0,10000) 
# { 
#   query: { 
#     id: 1 
#     segment_url : "xx://xxxx" 
#     size_mb : None 
#     row_series_info: 
#     [{ 
#       "min_time": "2023-02-06T01:14:38.880100+00:00", 
#       "max_time": "2023-02-06T01:15:38.864100+00:00", 
#       "min_channel": 0, 
#       "max_channel": 14591, 
#       "id": 1, 
#       "data_url": "xx://xxxx", 
#       "number_of_rows": 3750 
#     } 
#     ..... 
#     ], 
#   } 
# }

Unlike the `segments` list returned by DatabaseInfo the `row_series_info` lists returned by DatabaseCoverage provides a full list of data segments in the archive that cover the desired time and channel interval.

Advanced 2 – Checking if points are covered

So suppose you have a list of points given as times and channel numbers and you want to know if they are covered by data in the CATE Archive. You could write a function using the above queries, but fortunately catenp provides the conveinece method CheckPointsCoverage for exactly this purpose.


from catenp.catenumpy import CheckPointsCoverage
cov=CheckPointsCoverage("0.0.0.0",8000,"cate-user-name",
    [{"time":"2023-02-06T17:00:15+00:00","channel": 5550}]
    )
# {'results': 
#   [{
#     'time': '2023-02-06T17:00:15+00:00', 
#     'channel': 5550, 
#     'result': 'inside', 
#     'segment_id': 12, 
#     'row_series_id': 60
#   }]
# }

Each point in the input is checked against the archive and a result is returned as inside or outside

Guru level- Requesting Uploads from onsite servers

One of the most powerful features of the CATE system is the ability to pair a cloud side server with one or more onsite units collecting data. The cloud server can then be used to request data from the onsite (or up-stream) units. Allowing the user to make efficient use of limited bandwidth and upload just the data that they are intertested in.

As with all the other CATE operations the underlying functionality is handled through the CATE API, however catenp does provide a convenience function to request uploads from onsite units.

from catenp.catenumpy import CheckPointsCoverage
rr=RequestUploads("0.0.0.0",8000,"cate-user-name",
    [
    {
      ["tmin":"2023-02-06T17:00:15+00:00", 
       "tmax":"2023-02-06T17:00:16+00:00", 
       "cmin":0 , "cmax":1023],
      ["tmin":"2023-02-06T17:00:15+00:00", 
       "tmax":"2023-02-06T17:00:16+00:00", 
       "cmin":1024 , "cmax":2047],
      ["tmin":"2023-02-06T17:00:16+00:00", 
       "tmax":"2023-02-06T17:00:17+00:00", 
       "cmin":1024 , "cmax":2047],
    }
    ])

When the function is called the data segments are submitted to the cloud side server, which first of all checks to see if any of the segments are already covered. Data segments that are not covered by the cloud server are submitted to the onsite/upstream servers, and if they have the data available these will upload to appear on the cloud side server.

Summary

So hopefully that gave you a flavour of what the CATE Numpy (catenp) package can do and how it can aid in workflows with CATE data. We are continuously working to improve and develop the package, so I would not be surprised if we come back to it in a later edition of Software Spotlight. If you’d like to know more about CATE,  CATE Numpy or any other services MST offers feel free to get in touch.