[GAP] Effect of using malloc for temporary allocations in GAP
dmitrii.pasechnik at cs.ox.ac.uk
Tue Jan 3 16:49:27 GMT 2017
On Tue, Jan 03, 2017 at 03:45:44PM +0100, Max Horn wrote:
> > On 03 Jan 2017, at 14:20, Dmitrii Pasechnik <dmitrii.pasechnik at cs.ox.ac.uk> wrote:
> > On Tue, Jan 03, 2017 at 10:36:08AM +0000, Christopher Jefferson wrote:
> >> A couple of points (replying in my phone, sorry for bad formatting).
> >> The use of things like sbrk is I believe just because this code is old, and originated before you could rely on a good mmap and friends.
> >> Other processes shouldn't be a problem -- all modern OSes give every process a separate memory space. It could obviously be a problem for things like sage, which want to load into the same process as gap.
> > Aren't also the heap and a virtual memory region obtained by
> > mmap separate even within a process?
> This question is "not even wrong" -- it simply doesn't make sense. You seem
> to presuppose that "the heap" is a single memory region, which it is not.
> The memory manager (e.g. libc's malloc) may and will allocate additional
> pages to satisfy allocation requests.
> To illustrate this by a hypothetical example: Suppose "the heap" so far
> covers addresses 0x0 till 0x0FFFFF, and now we mmap some data, and it gets
> mapped to addresses 0x10000 till 0x1FFFFF. Now somebody tries to malloc()
> some data which does not fit in the existing hap anymore. Now libc extends
> the heap, and quite possibly the new (additional) heap region will end up
> right after our mmap'ed data. So we cannot grow that region contiguously
This explanation appears to contradict the modern picture of contiguous
virtual memory regions, which may well consist of non-contiguous
physical pages (that's what e.g. vmalloc gives one in the Linux kernel).
Although probably I don't understand something here.
> >> With regards GASMAN. It is very fast, and I believe any replacement would be much slower. One of the biggest problems with HPC gap is that boehm is much slower.
> >> Of course gasman has issues -- it can't support multithreading (hence can't be in hpcgap), is moving, requires continuous address space. But any replacement will, for gap's current usage, be I believe slower. Also much of gaps internals relies on features of gasman other allocators may not provide (like cheap access to the exact size of any allocation for example).
> > Why would GASMAN be problematic with multithreading? Surely, it will
> > need a mechanism to pause all the other threads while collecting
> > garbage, but apart from this, is there anything fundamental that
> > makes it hard to port?
> Pausing all threads to do GC is problematic on its own; for starters, all
> memory allocations (which currently are blindly fast with GASMAN, and GAP
> code kind of relies on that) would need to take a lock, which is expensive.
Why would allocations need a lock?
Allocations may be queued and processed one by one.
> But let's ignore that for the moment (as perhaps there is a clever scheme to
> avoid or reduce that overhead).
> Then we still would have to teach GASMAN to inspect the stack and CPU
> register contents (!) of all other running threads, as those could contain
> references to objects. This is a very fundamental and major change. And
> non-portable, too. It makes up a considerable part of the complexity of
> Boehm GC, I think.
Oh, right, registers, this might be a pain indeed. One begins to understand why
reference counting might be easier to deal with...
> I guess e.g. Reimer would fit the bill, but he has a job already, and
> AFAIK we have no money to pay him or anybody else to work on a new GC
Although DFG forked out a huge sum for computer algebra,
and some of it is reaching GAP-related German universities.
More information about the Gap