开发者

Memory access after ioremap very slow

开发者 https://www.devze.com 2023-01-30 13:23 出处:网络
I\'m working on a Linux kernel driver that makes a chunk of physical memory available to user space. I have a working version of the driver, but it\'s currently very slow. So, I\'ve gone back a few st

I'm working on a Linux kernel driver that makes a chunk of physical memory available to user space. I have a working version of the driver, but it's currently very slow. So, I've gone back a few steps and tried making a small, simple driver to recreate the problem.

I reserve the memory at boot time using the kernel parameter memmap=2G$1G. Then, in the driver's __init function, I ioremap some of this memory, and initialize it to a known value. I put in some code to measure the timing as well:

#define RESERVED_REGION_SIZE    (1 * 1024 * 1024 * 1024)   // 1GB
#define RESERVED_REGION_OFFSET  (1 * 1024 * 1024 * 1024)   // 1GB

static int __init memdrv_init(void)
{
    struct timeval t1, t2;
    printk(KERN_INFO "[memdriver] init\n");

    // Remap reserved physical memory (that we grabbed at boot time)
    do_gettimeofday( &t1 );
    reservedBlock = ioremap( RESERVED_REGION_OFFSET, RESERVED_REGION_SIZE );
    do_gettimeofday( &t2 );
    printk( KERN_ERR "[memdriver] ioremap() took %d usec\n", usec_diff( &t2, &t1 ) );

    // Set the memory to a known value
    do_gettimeofday( &t1 );
    memset( reservedBlock, 0xAB, RESERVED_REGION_SIZE );
    do_gettimeofday( &t2 );
    printk( KERN_ERR "[memdriver] memset() took %d usec\n", usec_diff( &t2, &t1 ) );

    // Register the character device
    ...

    return 0;
}

I load the driver, and check dmesg. It reports:

[memdriver] init
[memdriver] ioremap() took 76268 usec
[memdriver] memset() took 12622779 usec

That's 12.6 seconds for the memset. That means the memset is running at 81 MB/sec. Why on earth is it so slow?

This is kernel 2.6.34 on Fedora 13, and it's an x86_64 system.

EDIT:

The goal behind this scheme is to take a chunk of physical memory and make it available to both a PCI device (via the memory's bus/physical address) and a user space application (via a call to mmap, supported by the driver). The PCI device will then continually fill this memory with data, and the user-space app will read it out. If ioremap is a bad way to do this (as Ben suggested below), I'm open to other suggestions that'll allow me to get any large chunk of memory that can be directly accessed by b开发者_如何学Gooth hardware and software. I can probably make do with a smaller buffer also.


See my eventual solution below.


ioremap allocates uncacheable pages, as you'd desire for access to a memory-mapped-io device. That would explain your poor performance.

You probably want kmalloc or vmalloc. The usual reference materials will explain the capabilities of each.


I don't think ioremap() is what you want there. You should only access the result (what you call reservedBlock) with readb, readl, writeb, memcpy_toio etc. It is not even guaranteed that the return is virtually mapped (although it apparently is on your platform). I'd guess that the region is being mapped uncached (suitable for IO registers) leading to the terrible performance.


It's been a while, but I'm updating since I did eventually find a workaround for this ioremap problem.

Since we had custom hardware writing directly to the memory, it was probably more correct to mark it uncacheable, but it was unbearably slow and wasn't working for our application. Our solution was to only read from that memory (a ring buffer) once there was enough new data to fill a whole cache line on our architecture (I think that was 256 bytes). This guaranteed we never got stale data, and it was plenty fast.


I have tried out doing a huge memory chunk reservations with the memmap

The ioremapping of this chunk gave me a mapped memory address space which in beyond few tera bytes.

when you ask to reserve 128GB memory starting at 64 GB. you see the following in /proc/vmallocinfo

0xffffc9001f3a8000-0xffffc9201f3a9000 137438957568 0xffffffffa00831c9 phys=1000000000 ioremap

Thus the address space starts at 0xffffc9001f3a8000 (which is waay too large).

Secondly, Your observation is correct. even the memset_io results in a extremely large delays (in tens of minutes) to touch all this memory.

So, the time taken has to do mainly with address space conversion and non cacheable page loading.

0

精彩评论

暂无评论...
验证码 换一张
取 消