Skip to content

felipesassi/random-data-gen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RandomDataGen - Random Data Generator Package

Code style: black Checked with mypy Downloads

This is a package to generate random transactional data. You can use this package to study Pandas operations or clustering methods like RFM.

With this package you can create a table with transactional data containing:

  • consumer_id: ID identifying the customer that does the transaction;
  • transaction_created_at: Date of transaction;
  • transaction_payment_value: Monetary value of transaction.

All the fields are customizable.

How the data is generated

The consumer_id field is generated by a range function, returning a sequence of integers from 1 to n_consumers:

consumer_ids = range(1, n_consumers + 1)

The transaction_created_at field is generated by a Pandas function called date_range. You can view more about this functions in this link:

created_at_list = list(pd.date_range(start=first_transaction_date, end=last_transaction_date, periods=n_rows)

The transaction_payment_value is sample from a normal distribution with mean equals the mean_spend parameter and the stardand deviation equals the std_spend parameter:

list(np.random.normal(transaction_mean_value, transaction_std_value, n_rows))

How to use

You can start the use of RandomDataGen with this example code:

from random_data_gen.data_generator import TransactionalDataGenerator

TRGenerator = TransactionalDataGenerator(
    n_rows=1000,
    n_consumers=100,
    transaction_mean_value=100,
    transaction_std_value=10,
    first_transaction_date="2020-01-01",
    last_transaction_date="2021-01-01",
)

df = TRGenerator()

In this snippet we defined a dataframe with 1000 rows, 100 unique users, a mean spend in transactions of 100u.m., a standard deviation in transactional spend of 10u.m., the first transaction date (2020-01-01) and the last transaction date (2021-01-01).

The dataframe returned is in the form:

| consumer_id |     transaction_created_at    | transaction_payment_value |
|:-----------:|:-----------------------------:|:-------------------------:|
|     234     | 2020-01-01 00:00:00.000000000 |           120.10          |
|      43     | 2020-01-01 08:47:34.054054054 |           87.10           |
|     321     | 2021-10-23 10:27:12.092356134 |           12.98           |
|     3123    | 2020-12-30 21:37:17.837837840 |           12.84           |

The shape of this dataframe is defined by the parameter n_rows.

Contribute

To contribute you need to install Poetry.

After installing, you need to clone this repo and run the following command:

poetry install -n

Before sending the code to the repo, you need to run:

make format

To apply the project style to the new code.

And after that, run:

make check

This command will check your code with flake8 and pytest.

About

Random data generator package.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published