Security question?

Thu Nov 4 15:39:51 PST 2004

On Thursday 04 November 2004 16:46, Roland Mainz wrote:
> Jim Gettys wrote:
> > > It would be interesting to see if the results in this paper can be
> > > improved upon by using linux futexes rather than the Unix socket for
> > > synchronization. The implementation referred to in this paper is still
> > > available on a branch of the DRI xc tree, if anyone feels like some
> > > archaeology.
> >
> > Not to mention the fact that Unix domain sockets on Linux are really,
> > *really* fast;
>
> ... which is a myth. It's fast but shared memory can usually beat it
> without problems.

As mentioned in the SMT paper I referenced earlier, shared memory needs a 
synchronization primitive.  Unix domain sockets give that to you for free, in 
the form of poll().  Shared memory transports are fine if your implementation 
overcomes the sync latency.  Alternatively, the common case is the server 
sleeping in poll(), and the kernel can wake the kernel up directly once the 
send() completes from the client.  Tough to beat that.

Like I said, it'd be interesting to see if shm+futexes are faster.

> You save at least one data copy (which is important 
> when you shuffle around large amounts of image/texture/etc. data) and 
> don't have to split the data (there's the BIGREQUESTS extension but Xsun
> doesn't support it right now).

I am not concerned with implementation deficiencies in other people's X 
servers ;).  And yes, you'll have to split the data eventually.  BIGREQUESTS 
gives you a 16M X11 packet by default, but that gets split over 1.5K TCP 
segments or 64K Unix segments.  Surprisingly enough that can actually be 
faster than a naive shared memory implementation due to cache friendliness.

Also due to cache issues on inferior platforms (x86), a smaller shm segment is 
sometimes better.  Which means splitting the data.  256k is only 256x256 @ 
32bpp.

> And in the case of a shared memory X 
> transport you can even skip all the weired endian tests and copying -
> you usually write into the shared memory area and let the Xserver do the
> rest.

In principle there's no reason we can't skip the endian shuffling on Unix 
sockets too.

> > Some serious performance work is in order, and not just the
> > "x11perf" flavor micro optimization work either.
>
> Using "x11perf" for benchmarking is more or less useless these days as
> it's mixture of protocol requests doesn't reflect how applications use
> it.

Agreed.

> For example if you measure Sun's shared memory transport with 
> "x11perf" you only get a one-digit percent improvement - but when you
> test it with Mozilla's DHTML perf. tests then you get a perf.
> improvement in the three-digit percent range (!!!; this is the reason
> why Mozilla/Firefox use the Xsun shared memory transport by default),
> assuming you use the default buffer size - which is far too small for
> todays applications, incresing it makes applications even faster.

Sun boxes are not x86 boxes.  (Well.  amd64 excluded.)  Sparc chips have a 
sensible MMU; the ratio of GPU:CPU performance can be much higher than it is 
on a typical Linux desktop; they have high bandwidth relative to their clock 
speed.  They are, in short, not subject to the same design considerations.

So, yes, SMT works for Sun.  We don't know that it works for x86.  Previous 
research indicates that it doesn't, but that may have changed by now.

- ajax
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.x.org/archives/xorg/attachments/20041104/abdbea90/attachment.pgp>