6.2. UCTB.preprocess package¶
6.2.1. UCTB.preprocess.GraphGenerator module¶
-
class
UCTB.preprocess.GraphGenerator.
GraphGenerator
(data_loader, graph='Correlation', threshold_distance=1000, threshold_correlation=0, threshold_interaction=500, **kwargs)¶ Bases:
object
This class is used to build graphs. Adajacent matrix and lapalace matrix will be stored in self.AM and self.LM.
Parameters: - data_loader (NodeTrafficLoader) – data_loader object.
- graph (str) – Types of graphs used in neural methods. Graphs should be a subset of {
'Correlation'
,'Distance'
,'Interaction'
,'Line'
,'Neighbor'
,'Transfer'
} and concatenated by'-'
, and dataset should have data of selected graphs. Default:'Correlation'
- threshold_distance (float) – Used in building of distance graph. If distance of two nodes in meters is larger
than
threshold_distance
, the corresponding position of the distance graph will be 1 and otherwise 0.the corresponding Default: 1000 - threshold_correlation (float) – Used in building of correlation graph. If the Pearson correlation coefficient is
larger than
threshold_correlation
, the corresponding position of the correlation graph will be 1 and otherwise 0. Default: 0 - threshold_interaction (float) – Used in building of interatction graph. If in the latest 12 months, the number of
times of interaction between two nodes is larger than
threshold_interaction
, the corresponding position of the interaction graph will be 1 and otherwise 0. Default: 500
-
AM
¶ array – Adajacent matrices of graphs.
-
LM
¶ array – Laplacian matrices of graphs.
-
static
adjacent_to_laplacian
(adjacent_matrix)¶ Turn adjacent_matrix into Laplace matrix.
-
static
correlation_adjacent
(traffic_data, threshold)¶ Calculate correlation graph based on pearson coefficient.
Parameters: - traffic_data (ndarray) – numpy array with shape [sequence_length, num_node].
- threshold (float) – float between [-1, 1], nodes with Pearson Correlation coefficient larger than this threshold will be linked together.
-
distance_adjacent
(lat_lng_list, threshold)¶ Calculate distance graph based on geographic distance.
Parameters:
-
static
haversine
(lat1, lon1, lat2, lon2)¶ Calculate the great circle distance between two points on the earth (specified in decimal degrees)
-
static
interaction_adjacent
(interaction_matrix, threshold)¶ Binarize interaction_matrix based on threshold.
Parameters: - interaction_matrix (ndarray) –
with shape [num_node, num_node], where each element represents the number of interactions during a certain time,
e.g. 6 monthes, between the corresponding nodes. - threshold (float or int) – nodes with number of interactions between them greater than this threshold will be linked together.
- interaction_matrix (ndarray) –
-
UCTB.preprocess.GraphGenerator.
scaled_Laplacian_ASTGCN
(W)¶ compute ilde{L}
Parameters: W(np.ndarray) (shape is (num_node, num_node)) – Returns: scaled_Laplacian_ASTGCN Return type: np.ndarray, shape (num_node, num_node)
-
UCTB.preprocess.GraphGenerator.
scaled_laplacian_STGCN
(W)¶ Normalized graph Laplacian function.
Parameters: W (np.ndarray) – [num_node, num_node], weighted adjacency matrix of G. Returns: Scaled laplacian matrix. Type: np.matrix, [num_node, num_node].
6.2.2. UCTB.preprocess.preprocessor module¶
-
class
UCTB.preprocess.preprocessor.
MaxMinNormalizer
(X, method='all')¶ Bases:
UCTB.preprocess.preprocessor.Normalizer
This class can help normalize and denormalize data using maximum and minimum of data by calling transform and inverse_transform method.
Parameters: - X (ndarray) – Data which normalizer extracts characteristics from.
- method (str) – Parameter to choose in which way the input data will be processed.
-
inverse_transform
(X)¶ Restore normalized data.
Parameters: X (ndarray) – normalized data. Returns: denormalized data. Type: numpy.ndarray.
-
transform
(X)¶ Process input data to obtain normalized data.
Parameters: X (ndarray) – input data. Returns: normalized data. Type: numpy.ndarray.
-
class
UCTB.preprocess.preprocessor.
Normalizer
(X)¶ Bases:
abc.ABC
Normalizer is the base abstract class for many normalizers such as MaxMinNormalizer and ZscoreNormalizer.You can also build your own normalizer by inheriting this class.
Parameters: X (ndarray) – Data which normalizer extracts characteristics from.
-
class
UCTB.preprocess.preprocessor.
ST_MoveSample
(closeness_len, period_len, trend_len, target_length=1, daily_slots=24)¶ Bases:
object
This class can converts raw data into temporal features including closenss, period and trend features.
Parameters: - closeness_len (int) – The length of closeness data history. The former consecutive
closeness_len
time slots of data will be used as closeness history. - period_len (int) – The length of period data history. The data of exact same time slots in former consecutive
period_len
days will be used as period history. - trend_len (int) – The length of trend data history. The data of exact same time slots in former consecutive
trend_len
weeks (every seven days) will be used as trend history. - target_length (int) – The numbers of steps that need prediction by one piece of history data. Have to be 1 now. Default: 1 default:1.
- daily_slots (int) – The number of records of one day. Calculated by 24 * 60 /time_fitness. default:24.
-
move_sample
(data)¶ Input data to generate closeness, period, trend features and target vector y.
Parameters: data (ndarray) – Orginal temporal data. :return:closeness, period, trend and y matrices. :type: numpy.ndarray.
- closeness_len (int) – The length of closeness data history. The former consecutive
-
class
UCTB.preprocess.preprocessor.
SplitData
¶ Bases:
object
This class can help split data by calling split_data and split_feed_dict method.
-
static
split_data
(data, ratio_list)¶ Divide the data based on the given parameter ratio_list.
Parameters: - data (ndarray) – Data to be split.
- ratio_list (list) – Split ratio, the data will be split according to the ratio.
- :return:The elements in the returned list are the divided data, and the
- dimensions of the list are the same as ratio_list.
Type: list
-
static
split_feed_dict
(feed_dict, sequence_length, ratio_list)¶ Divide the value data in feed_dict based on the given parameter ratio_list.
Parameters: - feed_dict (dict) – It is a dictionary composed of key-value pairs.
- sequence_length (int) – If the length of value in feed_dict is equal to sequence_length, then this method divides the value according to the ratio without changing its key.
- ratio_list (list) – Split ratio, the data will be split according to the ratio.
Returns: The elements in the returned list are divided dictionaries, and the dimensions of the list are the same as ratio_list.
Type: list
-
static
-
class
UCTB.preprocess.preprocessor.
WhiteNormalizer
(X, method='all')¶ Bases:
UCTB.preprocess.preprocessor.Normalizer
This class’s normalization won’t do anything.
-
inverse_transform
(X)¶ Restore normalized data.
Parameters: X (ndarray) – normalized data. Returns: denormalized data. Type: numpy.ndarray.
-
transform
(X)¶ Process input data to obtain normalized data.
Parameters: X (ndarray) – input data. Returns: normalized data. Type: numpy.ndarray.
-
-
class
UCTB.preprocess.preprocessor.
ZscoreNormalizer
(X, method='all')¶ Bases:
UCTB.preprocess.preprocessor.Normalizer
This class can help normalize and denormalize data using mean and standard deviation in data by calling transform and inverse_transform method.
Parameters: - X (ndarray) – Data which normalizer extracts characteristics from.
- method (str) – Parameter to choose in which way the input data will be processed.
-
inverse_transform
(X)¶ Restore normalized data.
Parameters: X (ndarray) – normalized data. Returns: denormalized data. Type: numpy.ndarray.
-
transform
(X)¶ Process input data to obtain normalized data.
Parameters: X (ndarray) – input data. Returns: normalized data. Type: numpy.ndarray.
-
UCTB.preprocess.preprocessor.
chooseNormalizer
(in_arg, X_train)¶ Choose a proper normalizer consistent with user’s input.
Parameters: - in_arg (str|bool|object) – Function is based on it to choose different normalizer.
- X_train (numpy.ndarray) – Function is based on it to initialize the normalizer.
Returns: The normalizer consistent with definition.
Type: object.
6.2.3. UCTB.preprocess.time_utils module¶
-
UCTB.preprocess.time_utils.
is_valid_date
(date_str)¶ Parameters: date_str (string) – e.g. 2019-01-01 Returns: True if date_str is valid date, otherwise return False.