Replies: 1 comment
-
Check your calculations and make sure that each thread is only selecting a unique range of data. When I tried your example, I saw threads were selecting sets of the same rows. The first thread selected everything. The 2nd thread selected all rows except the first batch. The 3rd thread selected all rows except the first two batches. I did have to set If memory is an issue, you may want to keep prefetchrows at its default size. Instead just tune arraysize. Since you seem to be holding the whole dataset in memory, you could use You may be interested in https://medium.com/oracledevs/selecting-from-an-oracle-database-table-in-parallel-using-python-31ecaa2c28c8. There are a lot of variables that affect whether a parallel extraction is faster, including all the memory management you are doing in Python. What performance benefit are you seeing? If you still have problems, can you share why you think there is a leak and how you are measuring it? |
Beta Was this translation helpful? Give feedback.
-
So I have a large table ~1 million rows and I am attempting threading capabilities. The following script runs perfectly well on a table with 40k rows, however, with the following it creates a memory leak issue. What is the cause of the memory issue and how to best overcome this? I find that its related to the max_row size, and row_iter. How can I best select these so each thread takes some chunk of the entire data size, and combined they total the dataframe size.
Beta Was this translation helpful? Give feedback.
All reactions