RandomDataGen - Random Data Generator Package

This is a package to generate random transactional data. You can use this package to study Pandas operations or clustering methods like RFM.

With this package you can create a table with transactional data containing:

consumer_id: ID identifying the customer that does the transaction;
transaction_created_at: Date of transaction;
transaction_payment_value: Monetary value of transaction.

All the fields are customizable.

How the data is generated

The consumer_id field is generated by a range function, returning a sequence of integers from 1 to n_consumers:

consumer_ids = range(1, n_consumers + 1)

The transaction_created_at field is generated by a Pandas function called date_range. You can view more about this functions in this link:

created_at_list = list(pd.date_range(start=first_transaction_date, end=last_transaction_date, periods=n_rows)

The transaction_payment_value is sample from a normal distribution with mean equals the mean_spend parameter and the stardand deviation equals the std_spend parameter:

list(np.random.normal(transaction_mean_value, transaction_std_value, n_rows))

How to use

You can start the use of RandomDataGen with this example code:

from random_data_gen.data_generator import TransactionalDataGenerator

TRGenerator = TransactionalDataGenerator(
    n_rows=1000,
    n_consumers=100,
    transaction_mean_value=100,
    transaction_std_value=10,
    first_transaction_date="2020-01-01",
    last_transaction_date="2021-01-01",
)

df = TRGenerator()

In this snippet we defined a dataframe with 1000 rows, 100 unique users, a mean spend in transactions of 100u.m., a standard deviation in transactional spend of 10u.m., the first transaction date (2020-01-01) and the last transaction date (2021-01-01).

The dataframe returned is in the form:

| consumer_id |     transaction_created_at    | transaction_payment_value |
|:-----------:|:-----------------------------:|:-------------------------:|
|     234     | 2020-01-01 00:00:00.000000000 |           120.10          |
|      43     | 2020-01-01 08:47:34.054054054 |           87.10           |
|     321     | 2021-10-23 10:27:12.092356134 |           12.98           |
|     3123    | 2020-12-30 21:37:17.837837840 |           12.84           |

The shape of this dataframe is defined by the parameter n_rows.

Contribute

To contribute you need to install Poetry.

After installing, you need to clone this repo and run the following command:

poetry install -n

Before sending the code to the repo, you need to run:

make format

To apply the project style to the new code.

And after that, run:

make check

This command will check your code with flake8 and pytest.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
random_data_gen		random_data_gen
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RandomDataGen - Random Data Generator Package

How the data is generated

How to use

Contribute

About

Releases

Packages

Languages

felipesassi/random-data-gen

Folders and files

Latest commit

History

Repository files navigation

RandomDataGen - Random Data Generator Package

How the data is generated

How to use

Contribute

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages