6.2. UCTB.preprocess package¶
6.2.1. UCTB.preprocess.GraphGenerator module¶
- class UCTB.preprocess.GraphGenerator.GraphGenerator(data_loader, graph='Correlation', threshold_distance=1000, threshold_correlation=0, threshold_interaction=500, **kwargs)¶
Bases:
object
This class is used to build graphs. Adajacent matrix and lapalace matrix will be stored in self.AM and self.LM.
- Parameters:
data_loader (NodeTrafficLoader) – data_loader object.
graph (str) – Types of graphs used in neural methods. Graphs should be a subset of {
'Correlation'
,'Distance'
,'Interaction'
,'Line'
,'Neighbor'
,'Transfer'
} and concatenated by'-'
, and dataset should have data of selected graphs. Default:'Correlation'
threshold_distance (float) – Used in building of distance graph. If distance of two nodes in meters is larger than
threshold_distance
, the corresponding position of the distance graph will be 1 and otherwise 0.the corresponding Default: 1000threshold_correlation (float) – Used in building of correlation graph. If the Pearson correlation coefficient is larger than
threshold_correlation
, the corresponding position of the correlation graph will be 1 and otherwise 0. Default: 0threshold_interaction (float) – Used in building of interatction graph. If in the latest 12 months, the number of times of interaction between two nodes is larger than
threshold_interaction
, the corresponding position of the interaction graph will be 1 and otherwise 0. Default: 500
- AM¶
Adajacent matrices of graphs.
- Type:
array
- LM¶
Laplacian matrices of graphs.
- Type:
array
- static adjacent_to_laplacian(adjacent_matrix)¶
Turn adjacent_matrix into Laplace matrix.
- static correlation_adjacent(traffic_data, threshold)¶
Calculate correlation graph based on pearson coefficient.
- Parameters:
traffic_data (ndarray) – numpy array with shape [sequence_length, num_node].
threshold (float) – float between [-1, 1], nodes with Pearson Correlation coefficient larger than this threshold will be linked together.
- distance_adjacent(lat_lng_list, threshold)¶
Calculate distance graph based on geographic distance.
- static haversine(lat1, lon1, lat2, lon2)¶
Calculate the great circle distance between two points on the earth (specified in decimal degrees)
- static interaction_adjacent(interaction_matrix, threshold)¶
Binarize interaction_matrix based on threshold.
- Parameters:
interaction_matrix (ndarray) –
with shape [num_node, num_node], where each element represents the number of interactions during a certain time,
e.g. 6 monthes, between the corresponding nodes.
threshold (float or int) – nodes with number of interactions between them greater than this threshold will be linked together.
- UCTB.preprocess.GraphGenerator.scaled_Laplacian_ASTGCN(W)¶
compute ilde{L}
- Parameters:
W(np.ndarray) (shape is (num_node, num_node).) –
- Returns:
scaled_Laplacian_ASTGCN
- Return type:
np.ndarray, shape (num_node, num_node)
- UCTB.preprocess.GraphGenerator.scaled_laplacian_STGCN(W)¶
Normalized graph Laplacian function.
- Parameters:
W (np.ndarray) – [num_node, num_node], weighted adjacency matrix of G.
- Returns:
Scaled laplacian matrix.
- Type:
np.matrix, [num_node, num_node].
6.2.2. UCTB.preprocess.preprocessor module¶
- class UCTB.preprocess.preprocessor.MaxMinNormalizer(X, method='all')¶
Bases:
Normalizer
This class can help normalize and denormalize data using maximum and minimum of data by calling transform and inverse_transform method.
- Parameters:
X (ndarray) – Data which normalizer extracts characteristics from.
method (str) – Parameter to choose in which way the input data will be processed.
- inverse_transform(X)¶
Restore normalized data.
- Parameters:
X (ndarray) – normalized data.
- Returns:
denormalized data.
- Type:
numpy.ndarray.
- transform(X)¶
Process input data to obtain normalized data.
- Parameters:
X (ndarray) – input data.
- Returns:
normalized data.
- Type:
numpy.ndarray.
- class UCTB.preprocess.preprocessor.Normalizer(X)¶
Bases:
ABC
Normalizer is the base abstract class for many normalizers such as MaxMinNormalizer and ZscoreNormalizer.You can also build your own normalizer by inheriting this class.
- Parameters:
X (ndarray) – Data which normalizer extracts characteristics from.
- class UCTB.preprocess.preprocessor.ST_MoveSample(closeness_len, period_len, trend_len, target_length=1, daily_slots=24)¶
Bases:
object
This class can converts raw data into temporal features including closenss, period and trend features.
- Parameters:
closeness_len (int) – The length of closeness data history. The former consecutive
closeness_len
time slots of data will be used as closeness history.period_len (int) – The length of period data history. The data of exact same time slots in former consecutive
period_len
days will be used as period history.trend_len (int) – The length of trend data history. The data of exact same time slots in former consecutive
trend_len
weeks (every seven days) will be used as trend history.target_length (int) – The numbers of steps that need prediction by one piece of history data. Have to be 1 now. Default: 1 default:1.
daily_slots (int) – The number of records of one day. Calculated by 24 * 60 /time_fitness. default:24.
- move_sample(data)¶
Input data to generate closeness, period, trend features and target vector y.
- Parameters:
data (ndarray) – Orginal temporal data.
:return:closeness, period, trend and y matrices. :type: numpy.ndarray.
- class UCTB.preprocess.preprocessor.SplitData¶
Bases:
object
This class can help split data by calling split_data and split_feed_dict method.
- static split_data(data, ratio_list)¶
Divide the data based on the given parameter ratio_list.
- Parameters:
data (ndarray) – Data to be split.
ratio_list (list) – Split ratio, the data will be split according to the ratio.
- :return:The elements in the returned list are the divided data, and the
dimensions of the list are the same as ratio_list.
- Type:
- static split_feed_dict(feed_dict, sequence_length, ratio_list)¶
Divide the value data in feed_dict based on the given parameter ratio_list.
- Parameters:
feed_dict (dict) – It is a dictionary composed of key-value pairs.
sequence_length (int) – If the length of value in feed_dict is equal to sequence_length, then this method divides the value according to the ratio without changing its key.
ratio_list (list) – Split ratio, the data will be split according to the ratio.
- Returns:
The elements in the returned list are divided dictionaries, and the dimensions of the list are the same as ratio_list.
- Type:
- class UCTB.preprocess.preprocessor.WhiteNormalizer(X, method='all')¶
Bases:
Normalizer
This class’s normalization won’t do anything.
- inverse_transform(X)¶
Restore normalized data.
- Parameters:
X (ndarray) – normalized data.
- Returns:
denormalized data.
- Type:
numpy.ndarray.
- transform(X)¶
Process input data to obtain normalized data.
- Parameters:
X (ndarray) – input data.
- Returns:
normalized data.
- Type:
numpy.ndarray.
- class UCTB.preprocess.preprocessor.ZscoreNormalizer(X, method='all')¶
Bases:
Normalizer
This class can help normalize and denormalize data using mean and standard deviation in data by calling transform and inverse_transform method.
- Parameters:
X (ndarray) – Data which normalizer extracts characteristics from.
method (str) – Parameter to choose in which way the input data will be processed.
- inverse_transform(X)¶
Restore normalized data.
- Parameters:
X (ndarray) – normalized data.
- Returns:
denormalized data.
- Type:
numpy.ndarray.
- transform(X)¶
Process input data to obtain normalized data.
- Parameters:
X (ndarray) – input data.
- Returns:
normalized data.
- Type:
numpy.ndarray.
- UCTB.preprocess.preprocessor.chooseNormalizer(in_arg, X_train)¶
Choose a proper normalizer consistent with user’s input.
6.2.3. UCTB.preprocess.time_utils module¶
- UCTB.preprocess.time_utils.is_valid_date(date_str)¶
- Parameters:
date_str (string) – e.g. 2019-01-01
- Returns:
True if date_str is valid date, otherwise return False.
- UCTB.preprocess.time_utils.is_work_day_america(date, city)¶
- Parameters:
date (string or datetime) – e.g. 2019-01-01
- Returns:
True if date is not holiday in America, otherwise return False.
- UCTB.preprocess.time_utils.is_work_day_china(date, city)¶
- Parameters:
date (string or datetime) – e.g. 2019-01-01
- Returns:
True if date is not holiday in China, otherwise return False.