6.2. UCTB.preprocess package

6.2.1. UCTB.preprocess.GraphGenerator module

class UCTB.preprocess.GraphGenerator.GraphGenerator(data_loader, graph='Correlation', threshold_distance=1000, threshold_correlation=0, threshold_interaction=500, **kwargs)

Bases: object

This class is used to build graphs. Adajacent matrix and lapalace matrix will be stored in self.AM and self.LM.

Parameters:
  • data_loader (NodeTrafficLoader) – data_loader object.
  • graph (str) – Types of graphs used in neural methods. Graphs should be a subset of { 'Correlation', 'Distance', 'Interaction', 'Line', 'Neighbor', 'Transfer' } and concatenated by '-', and dataset should have data of selected graphs. Default: 'Correlation'
  • threshold_distance (float) – Used in building of distance graph. If distance of two nodes in meters is larger than threshold_distance, the corresponding position of the distance graph will be 1 and otherwise 0.the corresponding Default: 1000
  • threshold_correlation (float) – Used in building of correlation graph. If the Pearson correlation coefficient is larger than threshold_correlation, the corresponding position of the correlation graph will be 1 and otherwise 0. Default: 0
  • threshold_interaction (float) – Used in building of interatction graph. If in the latest 12 months, the number of times of interaction between two nodes is larger than threshold_interaction, the corresponding position of the interaction graph will be 1 and otherwise 0. Default: 500
AM

array – Adajacent matrices of graphs.

LM

array – Laplacian matrices of graphs.

static adjacent_to_laplacian(adjacent_matrix)

Turn adjacent_matrix into Laplace matrix.

static correlation_adjacent(traffic_data, threshold)

Calculate correlation graph based on pearson coefficient.

Parameters:
  • traffic_data (ndarray) – numpy array with shape [sequence_length, num_node].
  • threshold (float) – float between [-1, 1], nodes with Pearson Correlation coefficient larger than this threshold will be linked together.
distance_adjacent(lat_lng_list, threshold)

Calculate distance graph based on geographic distance.

Parameters:
  • lat_lng_list (list) – A list of geographic locations. The format of each element in the list is [latitude, longitude].
  • threshold (float) – (meters) nodes with geographic distacne smaller than this threshold will be linked together.
static haversine(lat1, lon1, lat2, lon2)

Calculate the great circle distance between two points on the earth (specified in decimal degrees)

static interaction_adjacent(interaction_matrix, threshold)

Binarize interaction_matrix based on threshold.

Parameters:
  • interaction_matrix (ndarray) –

    with shape [num_node, num_node], where each element represents the number of interactions during a certain time,

    e.g. 6 monthes, between the corresponding nodes.
  • threshold (float or int) – nodes with number of interactions between them greater than this threshold will be linked together.
UCTB.preprocess.GraphGenerator.scaled_Laplacian_ASTGCN(W)

compute ilde{L}

Parameters:W(np.ndarray) (shape is (num_node, num_node)) –
Returns:scaled_Laplacian_ASTGCN
Return type:np.ndarray, shape (num_node, num_node)
UCTB.preprocess.GraphGenerator.scaled_laplacian_STGCN(W)

Normalized graph Laplacian function.

Parameters:W (np.ndarray) – [num_node, num_node], weighted adjacency matrix of G.
Returns:Scaled laplacian matrix.
Type:np.matrix, [num_node, num_node].

6.2.2. UCTB.preprocess.preprocessor module

class UCTB.preprocess.preprocessor.MaxMinNormalizer(X, method='all')

Bases: UCTB.preprocess.preprocessor.Normalizer

This class can help normalize and denormalize data using maximum and minimum of data by calling transform and inverse_transform method.

Parameters:
  • X (ndarray) – Data which normalizer extracts characteristics from.
  • method (str) – Parameter to choose in which way the input data will be processed.
inverse_transform(X)

Restore normalized data.

Parameters:X (ndarray) – normalized data.
Returns:denormalized data.
Type:numpy.ndarray.
transform(X)

Process input data to obtain normalized data.

Parameters:X (ndarray) – input data.
Returns:normalized data.
Type:numpy.ndarray.
class UCTB.preprocess.preprocessor.Normalizer(X)

Bases: abc.ABC

Normalizer is the base abstract class for many normalizers such as MaxMinNormalizer and ZscoreNormalizer.You can also build your own normalizer by inheriting this class.

Parameters:X (ndarray) – Data which normalizer extracts characteristics from.
class UCTB.preprocess.preprocessor.ST_MoveSample(closeness_len, period_len, trend_len, target_length=1, daily_slots=24)

Bases: object

This class can converts raw data into temporal features including closenss, period and trend features.

Parameters:
  • closeness_len (int) – The length of closeness data history. The former consecutive closeness_len time slots of data will be used as closeness history.
  • period_len (int) – The length of period data history. The data of exact same time slots in former consecutive period_len days will be used as period history.
  • trend_len (int) – The length of trend data history. The data of exact same time slots in former consecutive trend_len weeks (every seven days) will be used as trend history.
  • target_length (int) – The numbers of steps that need prediction by one piece of history data. Have to be 1 now. Default: 1 default:1.
  • daily_slots (int) – The number of records of one day. Calculated by 24 * 60 /time_fitness. default:24.
move_sample(data)

Input data to generate closeness, period, trend features and target vector y.

Parameters:data (ndarray) – Orginal temporal data.

:return:closeness, period, trend and y matrices. :type: numpy.ndarray.

class UCTB.preprocess.preprocessor.SplitData

Bases: object

This class can help split data by calling split_data and split_feed_dict method.

static split_data(data, ratio_list)

Divide the data based on the given parameter ratio_list.

Parameters:
  • data (ndarray) – Data to be split.
  • ratio_list (list) – Split ratio, the data will be split according to the ratio.
:return:The elements in the returned list are the divided data, and the
dimensions of the list are the same as ratio_list.
Type:list
static split_feed_dict(feed_dict, sequence_length, ratio_list)

Divide the value data in feed_dict based on the given parameter ratio_list.

Parameters:
  • feed_dict (dict) – It is a dictionary composed of key-value pairs.
  • sequence_length (int) – If the length of value in feed_dict is equal to sequence_length, then this method divides the value according to the ratio without changing its key.
  • ratio_list (list) – Split ratio, the data will be split according to the ratio.
Returns:

The elements in the returned list are divided dictionaries, and the dimensions of the list are the same as ratio_list.

Type:

list

class UCTB.preprocess.preprocessor.WhiteNormalizer(X, method='all')

Bases: UCTB.preprocess.preprocessor.Normalizer

This class’s normalization won’t do anything.

inverse_transform(X)

Restore normalized data.

Parameters:X (ndarray) – normalized data.
Returns:denormalized data.
Type:numpy.ndarray.
transform(X)

Process input data to obtain normalized data.

Parameters:X (ndarray) – input data.
Returns:normalized data.
Type:numpy.ndarray.
class UCTB.preprocess.preprocessor.ZscoreNormalizer(X, method='all')

Bases: UCTB.preprocess.preprocessor.Normalizer

This class can help normalize and denormalize data using mean and standard deviation in data by calling transform and inverse_transform method.

Parameters:
  • X (ndarray) – Data which normalizer extracts characteristics from.
  • method (str) – Parameter to choose in which way the input data will be processed.
inverse_transform(X)

Restore normalized data.

Parameters:X (ndarray) – normalized data.
Returns:denormalized data.
Type:numpy.ndarray.
transform(X)

Process input data to obtain normalized data.

Parameters:X (ndarray) – input data.
Returns:normalized data.
Type:numpy.ndarray.
UCTB.preprocess.preprocessor.chooseNormalizer(in_arg, X_train)

Choose a proper normalizer consistent with user’s input.

Parameters:
  • in_arg (str|bool|object) – Function is based on it to choose different normalizer.
  • X_train (numpy.ndarray) – Function is based on it to initialize the normalizer.
Returns:

The normalizer consistent with definition.

Type:

object.

6.2.3. UCTB.preprocess.time_utils module

UCTB.preprocess.time_utils.is_valid_date(date_str)
Parameters:date_str (string) – e.g. 2019-01-01
Returns:True if date_str is valid date, otherwise return False.
UCTB.preprocess.time_utils.is_work_day_america(date, city)
Parameters:date (string or datetime) – e.g. 2019-01-01
Returns:True if date is not holiday in America, otherwise return False.
UCTB.preprocess.time_utils.is_work_day_china(date, city)
Parameters:date (string or datetime) – e.g. 2019-01-01
Returns:True if date is not holiday in China, otherwise return False.