This is a package to generate random transactional data. You can use this package to study Pandas operations or clustering methods like RFM.
With this package you can create a table with transactional data containing:
- consumer_id: ID identifying the customer that does the transaction;
- transaction_created_at: Date of transaction;
- transaction_payment_value: Monetary value of transaction.
All the fields are customizable.
The consumer_id field is generated by a range function, returning a sequence of integers from 1 to n_consumers:
consumer_ids = range(1, n_consumers + 1)
The transaction_created_at field is generated by a Pandas function called date_range. You can view more about this functions in this link:
created_at_list = list(pd.date_range(start=first_transaction_date, end=last_transaction_date, periods=n_rows)
The transaction_payment_value is sample from a normal distribution with mean equals the mean_spend parameter and the stardand deviation equals the std_spend parameter:
list(np.random.normal(transaction_mean_value, transaction_std_value, n_rows))
You can start the use of RandomDataGen with this example code:
from random_data_gen.data_generator import TransactionalDataGenerator
TRGenerator = TransactionalDataGenerator(
n_rows=1000,
n_consumers=100,
transaction_mean_value=100,
transaction_std_value=10,
first_transaction_date="2020-01-01",
last_transaction_date="2021-01-01",
)
df = TRGenerator()
In this snippet we defined a dataframe with 1000 rows, 100 unique users, a mean spend in transactions of 100u.m., a standard deviation in transactional spend of 10u.m., the first transaction date (2020-01-01) and the last transaction date (2021-01-01).
The dataframe returned is in the form:
| consumer_id | transaction_created_at | transaction_payment_value |
|:-----------:|:-----------------------------:|:-------------------------:|
| 234 | 2020-01-01 00:00:00.000000000 | 120.10 |
| 43 | 2020-01-01 08:47:34.054054054 | 87.10 |
| 321 | 2021-10-23 10:27:12.092356134 | 12.98 |
| 3123 | 2020-12-30 21:37:17.837837840 | 12.84 |
The shape of this dataframe is defined by the parameter n_rows.
To contribute you need to install Poetry.
After installing, you need to clone this repo and run the following command:
poetry install -n
Before sending the code to the repo, you need to run:
make format
To apply the project style to the new code.
And after that, run:
make check
This command will check your code with flake8 and pytest.