Illustrate is an Interesting operator which is capable of generating test data automatically. It looks for present data in your input and then scans PIG scripts for conditions and verifies for what all conditions data is missing in your input file. Then PIG creates data for that condition on the fly and feed it to your script. This functionality was introduced in Pig 0.4 and enhanced in 0.9. This is useful when you are running with small amount of data like in unit test or local mode and not sure if your sample data covers all the path. This is also useful when you are not sure that your input data contains all the scenarios that you have coded in PIG script.
As per Oreilly Programming following is the description about illustrate operator:
Illustrate takes sample of your data and runs it thorough your script, but as it encounters operator that remove data (such as filter, join, etc.) it makes sure some records pass through the operator and some do not. When necessary, it will manufacture records that look like yours (i.e., that have the same schema) but are not in the sample it took.
To validates this I have coded a sample Pig script to filter one type of record. My script looks like:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
inputfile = load 'second_hdfs' using PigStorage('|') as (sex:chararray,country:chararray,cont:chararray); | |
describe inputfile; | |
relC = filter inputfile by cont == 'asia'; | |
--store relC into 'second_out3' using PigStorage('|'); | |
illustrate relC; |
CASE 1: Input file do not contain any record that satisfies filter for relC. So Pig manufactures a record that passes filter.
My Input file looks like as. No record in the file satisfies filter in relC
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
MALE|INDIA|AMERICA | |
MALE|INDIA|AFRICA | |
MALE|INDIA|ANTARTICA |
None of the record in input file passes filter.. But due to addition of illustrate operator Pig has manufactured the data on the fly that fails this condition. My Pig Job Output looks like.Record in blue is cooked by Pig on the fly. You need not to worry to modify your input data
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2013-06-05 07:29:07,461 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 | |
------------------------------------------------------------------------- | |
| inputfile | sex: bytearray | country: bytearray | cont: bytearray | | |
------------------------------------------------------------------------- | |
| | MALE | INDIA | asia | | |
| | MALE | INDIA | ANTARTICA | | |
------------------------------------------------------------------------- | |
------------------------------------------------------------------------- | |
| inputfile | sex: chararray | country: chararray | cont: chararray | | |
------------------------------------------------------------------------- | |
| | MALE | INDIA | asia | | |
| | MALE | INDIA | ANTARTICA | | |
------------------------------------------------------------------------- | |
-------------------------------------------------------------------- | |
| relC | sex: chararray | country: chararray | cont: chararray | | |
-------------------------------------------------------------------- | |
| | MALE | INDIA | asia | |
CASE 2: Input file contains all the records that passes through validation in script. However it does not contain any negative test case:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
MALE|india|asia | |
MALE|INDIA|asia | |
FEMALE|india|asia |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2013-06-03 11:58:08,649 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 | |
------------------------------------------------------------------------- | |
| inputfile | sex: bytearray | country: bytearray | cont: bytearray | | |
------------------------------------------------------------------------- | |
| | FEMALE | india | 0 | | |
| | FEMALE | india | asia | | |
------------------------------------------------------------------------- | |
------------------------------------------------------------------------- | |
| inputfile | sex: chararray | country: chararray | cont: chararray | | |
------------------------------------------------------------------------- | |
| | FEMALE | india | 0 | | |
| | FEMALE | india | asia | | |
------------------------------------------------------------------------- | |
-------------------------------------------------------------------- | |
| relC | sex: chararray | country: chararray | cont: chararray | | |
-------------------------------------------------------------------- | |
| | FEMALE | india | asia | | |
-------------------------------------------------------------------- |
CASE 3: Having multiple conditions in multiple statements inside the script.
Added two filters in the above script
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
inputfile = load 'second_hdfs' using PigStorage('|') as (sex:chararray,country:chararray,cont:chararray); | |
describe inputfile; | |
relC = filter inputfile by cont == 'africa'; | |
store relC into 'second_out3' using PigStorage('|'); | |
--explain relC; | |
relD = filter relC by country == 'PAKISTAN'; | |
store relD into 'second_out3a' using PigStorage('|'); | |
illustrate relD; | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
MALE|INDIA|ASIA | |
MALE|INDIA|ASIA | |
FEMALE|INDIA|ASIA | |
MALE|INDIA|ASIA | |
MALE|SRILANKA|ASIA | |
MALE|BRAZIL|AMERICA |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| inputfile | sex: bytearray | country: bytearray | cont: bytearray | | |
------------------------------------------------------------------------- | |
| | MALE | PAKISTAN | africa | | |
| | MALE | 0 | africa | | |
| | MALE | INDIA | ASIA | | |
------------------------------------------------------------------------- | |
------------------------------------------------------------------------- | |
| inputfile | sex: chararray | country: chararray | cont: chararray | | |
------------------------------------------------------------------------- | |
| | MALE | PAKISTAN | africa | | |
| | MALE | 0 | africa | | |
| | MALE | INDIA | ASIA | | |
------------------------------------------------------------------------- | |
-------------------------------------------------------------------- | |
| relC | sex: chararray | country: chararray | cont: chararray | | |
-------------------------------------------------------------------- | |
| | MALE | PAKISTAN | africa | | |
| | MALE | 0 | africa | | |
-------------------------------------------------------------------- | |
-------------------------------------------------------------------- | |
| relD | sex: chararray | country: chararray | cont: chararray | | |
-------------------------------------------------------------------- | |
| | MALE | PAKISTAN | africa | |
kırşehir
ReplyDeletekırıkkale
manisa
tokat
urfa
NJ1
edirne
ReplyDeletetrabzon
adana
yozgat
S5Y2L