Many projects work with databases, and of course the performance of the system has to be tested. Usually, smaller amounts of data are entered by means of manual tests, but how do you get bulk data?
The usual answer is, of course: “Buy one of the tools” – and usually rightly so. These tools have an (often) fairly easy-to-use UI, are quick and above all they can generate semantically meaningful data.
Generating data with SQL
However, with a little SQL you can also generate such data yourself. It gets interesting when linking multiple tables, which plays an important role in the following example.
Scenario: Our project manages companies and their plants and for them invoice data is to be generated. There are different views and queries for this data in the project. We want to have some areas with little data (to test the functioning of the system), but also bulk data.
First, we create the companies (here only three of them) with a single instruction. This is the first important point to achieve good performance in the generation of the data. Of course, you could write three INSERT statements one after the other or use a loop 1-3 with PL/SQL. But after a certain amount of data, that you want to generate, it will become massively slower.
Sub-select „company_seq“: Creates a sequence of 1-3
Insert Statement: The company will get a meaningful name (high number = more data).
insert into company (name) select 'company '||company_seq.company_number as name from ( select rownum as company_number from dual connect by level<=3 ) company_seq ;
Next, we would like to assign the plants to the companies. Remember: We want some areas with little data, others with a lot of data. Therefore, three plants are to be assigned to the first company, six plants to the second company, etc.
Sub-select “Company“: The ranking gives you a running number 1-N, so you can easily set the desired number of plants per company. This query produces exactly one line per company.
Sub-select “plant_seq“: We need a sequence to generate a row by joining the “company” subquery and desired number of attachments. To do this, we use a generously estimated sequence. The join is limited by the previously determined number of plants per company.
Insert Statement: The plant name should indicate which company the plant belongs to.
insert into plant (firma_id, name) select company.company_id, 'plant '||company.company_number||'.'||plant_seq.plant_number from ( select company_id, dense_rank() over (order by company_id) as company_number, dense_rank() over (order by company_id) * 3 as amount_plants from company ) company join ( select rownum as plant_number from dual connect by level<=1000 ) plant_seq on (plant_seq.plant_number <= company.amount_plants) ;
Finally, a billing per calendar month is to be generated for each company and plant.
The core element for random data in Oracle is the dbms_random package. It offers several functions for generating numbers, with the default range 0-1 or with self-defined ranges. However, the decimal places of the generated numbers have to be checked by rounding up or down. There is also a generator for characters available, with some modes like Large, Small, Alphanumeric, …
Column “Invoice number“: Three capital letters followed by six digits
Column “Amount“: A random value in the range € 100.900.
Column “Surcharges”: For approx. 10% of invoices a surcharge of € 50 is to be charged, this is achieved by comparing a random number in the range 0-1 to < 0.10.
insert into billing (company_id,plant_id,billing_month,invoice_number,amount,surcharges) select company.company_id, plant.plant_id, to_date('01.'||month||'.2021', 'dd.mm.yyyy') as billing_month, dbms_random.string('U',3) || round(dbms_random.value(10000,99999), 0) as invoice_number, round(100 + dbms_random.value(100,900),2) as amount, case when dbms_random.value<0.10 then 50 else null end as surcharges from company join plant on plant.company_id = company.company_id join ( select rownum as month from dual connect by level<=12 ) month_seq on (1=1) order by 1,2,3 ;
Conclusion
These data do not win a beauty prize, so it may be better to resort to the mentioned products. But in our project, this approach has helped to provide data of sufficient quality and, above all, quantity from day one. This enabled us to ensure the performance of the system right from the start. In addition, the end result is that the system can process more data with better performance than similar systems.
Source Code
-------------------------------------------------------------------------------------------------------------- -- Create data model -------------------------------------------------------------------------------------------------------------- drop table billing; drop table plant; drop table company; create table company ( company_id number(9) generated as identity, name varchar2(100) not null, constraint pk_company primary key (company_id) ); create table plant ( plant_id number(9) generated as identity, company_id number(9) not null, name varchar2(100) not null, constraint pk_plant primary key (anlage_id), constraint fk_plant_01 foreign key (company_id) references company (company_id) ); create table billing ( billing_id number(9) generated as identity, company_id number(9) not null, plant_id number(9) not null, billing_month date not null, invoice_number varchar(30) not null, amount number(18,2) not null, surcharges number(18,2), constraint pk_billing primary key (billing_id), constraint fk_billing_01 foreign key (company_id) references company (company_id), constraint fk_billing_02 foreign key (plant_id) references plant (plant_id) ); -------------------------------------------------------------------------------------------------------------- -- Generate data -------------------------------------------------------------------------------------------------------------- -- Delete all data truncate table billing; truncate table plant; truncate table company; whenever sqlerror exit rollback; -- Generate Companies (here 3) insert into company (name) select 'company '||company_seq.company_number as name from (select rownum as company_number from dual connect by level<=3) company_seq ; commit; -- Insert attachments per company. The first company will receive three plants, the next six plants, ... insert into plant (company_id, name) select company.company_id, 'plant '||company.company_number||'.'||plant_seq.plant_number from ( -- Determine the serial number for each company and the desired number of plants select company_id, dense_rank() over (order by company_id) as company_number, dense_rank() over (order by company_id) * 3 as amount_plants from company ) company join ( -- A sequence of attachments by which to join the required rows per company with N attachments select rownum as plant_number from dual connect by level<=1000 ) plant_seq on (plant_seq.plant_number <= company.amount_plants) -- order by 1,2 ; commit; -- Generate an invoice per calendar month for each company and plant. -- The invoice number is a random string with three letters and six digits. -- The amount is a random value in the range 100-900 euros. -- The surcharge of 50 euros is applied to approx. 10% of the invoices. insert into billing (company_id,plant_id,billing_month,invoice_number,amount,surcharges) select company.company_id, plant.plant_id, to_date('01.'||month||'.2021', 'dd.mm.yyyy') as billing_month, dbms_random.string('U',3) || round(dbms_random.value(10000,99999), 0) as invoice_number, round(dbms_random.value(100,1000),2) as amount, case when dbms_random.value<0.10 then 50 else null end as surcharges from company join plant on plant.company_id = company.company_id join (select rownum as month from dual connect by level<=12) month_seq on (1=1) -- order by 1,2,3 ; commit;