开发者

generating an id/counter for foreach in pig latin

开发者 https://www.devze.com 2023-04-10 01:03 出处:网络
I want some sort of unique identifier/line_number/counter to be generated/appended in my foreach construct while iterate开发者_StackOverflows through the records. Is there a way to accomplish this wit

I want some sort of unique identifier/line_number/counter to be generated/appended in my foreach construct while iterate开发者_StackOverflows through the records. Is there a way to accomplish this without writing a UDF?

B = foreach A generate a_unique_id, field1,...etc

How do I get that 'a_unique_id' implemented?

Thanks!


If you are using pig 0.11 or later then the RANK operator is exactly what you are looking for. E.G.

DUMP A;
(foo,19)
(foo,19)
(foo,7)
(bar,90)
(etc.,0)

B = RANK A ;

DUMP B ;
(1,foo,19)
(2,foo,19)
(3,foo,7)
(4,bar,90)
(5,etc.,0)


There is no built-in UUID function in the main Pig distribution or piggybank. Unfortunately, I think your only option is going to be writing a UDF.

There is a standard way of building UUIDs and there is Java code out there you can utilize to build off of for your UDF.

Is there a particular reason why you don't want to write a UDF?

0

精彩评论

暂无评论...
验证码 换一张
取 消