[GAP Forum] Lookup tables in gap

Frank Lübeck frank.luebeck at math.rwth-aachen.de
Mon Jun 30 21:20:57 BST 2008


Dear Arnaldo Mandel, Dear Forum,


> First, obvious solution: produced an ordered list L of all labels
> (that was fast!).  Although L is longer than N, it is true that the
> label of N[i] is L[i], so, given a label s, the corresponding record
> is N[Position(L,s)].  Also, since L is ordered, lookup should be fast.

Short answer: Use N[PositionSorted(L,s)]. More generally, always use
'PositionSorted' instead of 'Position', when you know that your list
is sorted (but you are responsible to make sure that this is true).


And below is a longer answer as well. I'll try to discuss some aspects of 
the mentioned problem in a  commented GAP session. Maybe this is of more
general interest, since even experienced GAP users are sometimes running 
into certain traps when dealing with long lists. 

> Last chance: use a record as an associative array.  That is, create a
> record R such that, for every label s, R.(s) = Position(L,s). Looking
No, this doesn't help. In the moment record components are stored unsorted
and so they are searched linearly. (We will probably change this with the
next release of GAP.)


With best regards,

    Frank




gap> # As an example we produce a sorted list of strings without duplicates,
gap> # containing almost 10^6 entries:
gap> L := Set(List([1..1000000], i->
>                    ShallowCopy(String(Random(100000000,900000000)))));;
gap> Length(L);
999379
gap> 
gap> # Now L is mutable and has mutable entries.
gap> # Using 'Position' on L uses linear search:
gap> for i in [1..100] do p := Position(L, L[i]); od; time;
0
gap> for i in [1..100] do p := Position(L, L[900000+i]); od; time;
9037
gap> 
gap> # Here GAP cannot remember that L is actually sorted, because the list L
gap> # doesn't know about changes of its entries (and changing an entry could
gap> # make the list unsorted). So, GAP cannot do better above.
gap> 
gap> # Now, if we make the entries of L immutable, the above problem cannot
gap> # occur, it will be no longer possible to change the entries of L (of
gap> # course, this step may not be an option for you, if you do want to 
gap> # change  the entries of L, maybe later).
gap> for s in L do MakeImmutable(s); od;
gap> 
gap> # ok, let's try again:
gap> for i in [1..100] do p := Position(L, L[i]); od; time;
0
gap> for i in [1..100] do p := Position(L, L[900000+i]); od; time;
9016
gap> 
gap> # Hm, we get the same as before? The problem now is that GAP didn't run
gap> # through the whole list to check if it is sorted. Let's look at the
gap> # information GAP has stored about L:
gap> TNUM_OBJ(L);
[ 20, "list (plain,dense)" ]
gap> 
gap> # Ok, we make GAP learning more about L:
gap> IsSortedList(L);
true
gap> TNUM_OBJ(L);
[ 40, "list (plain,table,ssort)" ]
gap> 
gap> # Since the entries of L are now immutable, GAP can remember the 
gap> # sortedness.  Another try for our loops:
gap> for i in [1..100] do p := Position(L, L[i]); od; time;
0
gap> for i in [1..100] do p := Position(L, L[900000+i]); od; time;
0
gap> 
gap> # So, we are lucky that GAP has a builtin feature to remember that L is
gap> # sorted in this case! It can now use a much faster binary search. We 
gap> # can run much longer loops:
gap> for i in [1..Length(L)] do p := Position(L, L[i]); od; time;
700
gap> 
gap> # But, there can still be a problem in a very similar setting. Let us 
gap> # double the last entry of L:
gap> Add(L,L[Length(L)]);
gap> 
gap> # Now L is still sorted, but entries are not unique. Nevertheless, to
gap> # find an entry in L we could still use a binary search. But see what
gap> # happens:
gap> IsSortedList(L);
true
gap> TNUM_OBJ(L);
[ 38, "list (plain,table,nsort)" ]
gap> for i in [1..100] do p := Position(L, L[i]); od; time;
0
gap> for i in [1..100] do p := Position(L, L[900000+i]); od; time;
9049
gap> 
gap> # Unfortunately, GAP has only a hook for remembering that a list is 
gap> # sorted without duplicates, but not just that it is sorted. So, now 
gap> # we are unlucky, even having immutable entries in L doesn't help.
gap> 
gap> # But, remember my very first hint, if you know that a list is sorted
gap> # use 'PositionSorted' instead of 'Position', then you don't need to
gap> # care about all these internals:
gap> for i in [1..Length(L)] do p := PositionSorted(L, L[i]); od; time;
1052
gap> 
gap> # Finally, let me mention an effect (a trap), which becomes worse if the 
gap> # entries of a long list are again lists or records, and their entries
gap> # ....
gap> # If the entries are mutable then GAP runs recursively through this
gap> # structure whenever it is used as an argument of an operation. (This
gap> # is to find the "type" of the object and to find the right method for
gap> # the operation.) This is particularly annoying if the operation is 
gap> # doing something quite cheap and most information in the 'type' of
gap> # the list is not needed. 
gap> N := List(L, s-> rec(label:=rec(string:=[s])));;
gap> # 'Size' for a list returns its 'Length':
gap> for i in [1..100] do l := Size(N); od; time;
3940
gap> # If you use 'Length' directly, GAP uses a kernel hook to avoid the 
gap> # problem in this particular case:
gap> for i in [1..1000000] do l := Length(N); od; time;
64
gap> # Making entries immutable helps. But again, sometimes this may not 
gap> # be a solution because you do want to change your objects later.
gap> for r in N do MakeImmutable(r); od;
gap> for i in [1..1000000] do l := Size(N); od; time;
280



More information about the Forum mailing list