6.2. UCTB.preprocess package¶

6.2.1. UCTB.preprocess.GraphGenerator module¶

class UCTB.preprocess.GraphGenerator.GraphGenerator(data_loader, graph='Correlation', threshold_distance=1000, threshold_correlation=0, threshold_interaction=500, **kwargs)¶

Bases: object

This class is used to build graphs. Adajacent matrix and lapalace matrix will be stored in self.AM and self.LM.

Parameters:

data_loader (NodeTrafficLoader) – data_loader object.
graph (str) – Types of graphs used in neural methods. Graphs should be a subset of { 'Correlation', 'Distance', 'Interaction', 'Line', 'Neighbor', 'Transfer' } and concatenated by '-', and dataset should have data of selected graphs. Default: 'Correlation'
threshold_distance (float) – Used in building of distance graph. If distance of two nodes in meters is larger than threshold_distance, the corresponding position of the distance graph will be 1 and otherwise 0.the corresponding Default: 1000
threshold_correlation (float) – Used in building of correlation graph. If the Pearson correlation coefficient is larger than threshold_correlation, the corresponding position of the correlation graph will be 1 and otherwise 0. Default: 0
threshold_interaction (float) – Used in building of interatction graph. If in the latest 12 months, the number of times of interaction between two nodes is larger than threshold_interaction, the corresponding position of the interaction graph will be 1 and otherwise 0. Default: 500

AM¶: array – Adajacent matrices of graphs.

LM¶: array – Laplacian matrices of graphs.

static adjacent_to_laplacian(adjacent_matrix)¶: Turn adjacent_matrix into Laplace matrix.

static correlation_adjacent(traffic_data, threshold)¶

Calculate correlation graph based on pearson coefficient.

Parameters:	traffic_data (ndarray) – numpy array with shape [sequence_length, num_node]. threshold (float) – float between [-1, 1], nodes with Pearson Correlation coefficient larger than this threshold will be linked together.

distance_adjacent(lat_lng_list, threshold)¶

Calculate distance graph based on geographic distance.

Parameters:	lat_lng_list (list) – A list of geographic locations. The format of each element in the list is [latitude, longitude]. threshold (float) – (meters) nodes with geographic distacne smaller than this threshold will be linked together.

static haversine(lat1, lon1, lat2, lon2)¶: Calculate the great circle distance between two points on the earth (specified in decimal degrees)

static interaction_adjacent(interaction_matrix, threshold)¶

Binarize interaction_matrix based on threshold.

Parameters:	interaction_matrix (ndarray) – with shape [num_node, num_node], where each element represents the number of interactions during a certain time, e.g. 6 monthes, between the corresponding nodes. threshold (float or int) – nodes with number of interactions between them greater than this threshold will be linked together.

UCTB.preprocess.GraphGenerator.scaled_Laplacian_ASTGCN(W)¶

compute ilde{L}

Parameters:	W(np.ndarray) (shape is (num_node, num_node)) –
Returns:	scaled_Laplacian_ASTGCN
Return type:	np.ndarray, shape (num_node, num_node)

UCTB.preprocess.GraphGenerator.scaled_laplacian_STGCN(W)¶

Normalized graph Laplacian function.

Parameters:	W (np.ndarray) – [num_node, num_node], weighted adjacency matrix of G.
Returns:	Scaled laplacian matrix.
Type:	np.matrix, [num_node, num_node].

6.2.2. UCTB.preprocess.preprocessor module¶

class UCTB.preprocess.preprocessor.MaxMinNormalizer(X, method='all')¶

Bases: UCTB.preprocess.preprocessor.Normalizer

This class can help normalize and denormalize data using maximum and minimum of data by calling transform and inverse_transform method.

Parameters:	X (ndarray) – Data which normalizer extracts characteristics from. method (str) – Parameter to choose in which way the input data will be processed.

inverse_transform(X)¶

Restore normalized data.

Parameters:	X (ndarray) – normalized data.
Returns:	denormalized data.
Type:	numpy.ndarray.

transform(X)¶

Process input data to obtain normalized data.

Parameters:	X (ndarray) – input data.
Returns:	normalized data.
Type:	numpy.ndarray.

class UCTB.preprocess.preprocessor.Normalizer(X)¶

Bases: abc.ABC

Normalizer is the base abstract class for many normalizers such as MaxMinNormalizer and ZscoreNormalizer.You can also build your own normalizer by inheriting this class.

Parameters:	X (ndarray) – Data which normalizer extracts characteristics from.

class UCTB.preprocess.preprocessor.ST_MoveSample(closeness_len, period_len, trend_len, target_length=1, daily_slots=24)¶

Bases: object

This class can converts raw data into temporal features including closenss, period and trend features.

Parameters:

closeness_len (int) – The length of closeness data history. The former consecutive closeness_len time slots of data will be used as closeness history.
period_len (int) – The length of period data history. The data of exact same time slots in former consecutive period_len days will be used as period history.
trend_len (int) – The length of trend data history. The data of exact same time slots in former consecutive trend_len weeks (every seven days) will be used as trend history.
target_length (int) – The numbers of steps that need prediction by one piece of history data. Have to be 1 now. Default: 1 default:1.
daily_slots (int) – The number of records of one day. Calculated by 24 * 60 /time_fitness. default:24.

move_sample(data)¶

Input data to generate closeness, period, trend features and target vector y.

Parameters:	data (ndarray) – Orginal temporal data.

:return:closeness, period, trend and y matrices. :type: numpy.ndarray.

class UCTB.preprocess.preprocessor.SplitData¶

Bases: object

This class can help split data by calling split_data and split_feed_dict method.

static split_data(data, ratio_list)¶

Divide the data based on the given parameter ratio_list.

Parameters:	data (ndarray) – Data to be split. ratio_list (list) – Split ratio, the data will be split according to the ratio.

:return:The elements in the returned list are the divided data, and the: dimensions of the list are the same as ratio_list.

Type:	list

static split_feed_dict(feed_dict, sequence_length, ratio_list)¶

Divide the value data in feed_dict based on the given parameter ratio_list.

Parameters:	feed_dict (dict) – It is a dictionary composed of key-value pairs. sequence_length (int) – If the length of value in feed_dict is equal to sequence_length, then this method divides the value according to the ratio without changing its key. ratio_list (list) – Split ratio, the data will be split according to the ratio.
Returns:	The elements in the returned list are divided dictionaries, and the dimensions of the list are the same as ratio_list.
Type:	list

class UCTB.preprocess.preprocessor.WhiteNormalizer(X, method='all')¶

Bases: UCTB.preprocess.preprocessor.Normalizer

This class’s normalization won’t do anything.

inverse_transform(X)¶

Restore normalized data.

Parameters:	X (ndarray) – normalized data.
Returns:	denormalized data.
Type:	numpy.ndarray.

transform(X)¶

Process input data to obtain normalized data.

Parameters:	X (ndarray) – input data.
Returns:	normalized data.
Type:	numpy.ndarray.

class UCTB.preprocess.preprocessor.ZscoreNormalizer(X, method='all')¶

Bases: UCTB.preprocess.preprocessor.Normalizer

This class can help normalize and denormalize data using mean and standard deviation in data by calling transform and inverse_transform method.

Parameters:	X (ndarray) – Data which normalizer extracts characteristics from. method (str) – Parameter to choose in which way the input data will be processed.

inverse_transform(X)¶

Restore normalized data.

Parameters:	X (ndarray) – normalized data.
Returns:	denormalized data.
Type:	numpy.ndarray.

transform(X)¶

Process input data to obtain normalized data.

Parameters:	X (ndarray) – input data.
Returns:	normalized data.
Type:	numpy.ndarray.

UCTB.preprocess.preprocessor.chooseNormalizer(in_arg, X_train)¶

Choose a proper normalizer consistent with user’s input.

Parameters:	in_arg (str\|bool\|object) – Function is based on it to choose different normalizer. X_train (numpy.ndarray) – Function is based on it to initialize the normalizer.
Returns:	The normalizer consistent with definition.
Type:	object.

6.2.3. UCTB.preprocess.time_utils module¶

UCTB.preprocess.time_utils.is_valid_date(date_str)¶

Parameters:	date_str (string) – e.g. 2019-01-01
Returns:	True if date_str is valid date, otherwise return False.

UCTB.preprocess.time_utils.is_work_day_america(date, city)¶

Parameters:	date (string or datetime) – e.g. 2019-01-01
Returns:	True if date is not holiday in America, otherwise return False.

UCTB.preprocess.time_utils.is_work_day_china(date, city)¶

Parameters:	date (string or datetime) – e.g. 2019-01-01
Returns:	True if date is not holiday in China, otherwise return False.