Patch Name:  PHKL_10284

Patch Description: s700 10.01 Buffer cache cumulative patch

Creation Date: 97/03/04

Post Date:  97/03/07

Hardware Platforms - OS Releases:
        s700: 10.01

Products: N/A

Filesets:
        OS-Core.CORE-KRN

Automatic Reboot?: Yes

Status: General Superseded

Critical:
        Yes
        PHKL_10284: PANIC
        PHKL_8329: CORRUPTION
        PHKL_7957: HANG
        PHKL_7793: HANG
        PHKL_7408: HANG

Path Name:  /hp-ux_patches/s700/10.X/PHKL_10284

Symptoms:
        PHKL_10284:
        Panic trap 15 in bwrite() under heavy disk I/O stress.

        PHKL_8329:
        Data loss with UFS files using fragments.

        PHKL_7957:
        VxFS hangs waiting for I/O to finish.

        PHKL_7793:
        Heavily used Vxfs with snapshot hangs.

        PHKL_7408:
        System hang can occur during heavy buffer cache activity
        in combination with readahead (prevalent in VxFS).

Defect Description:
        PHKL_10284:
        A buffer arrives in bwrite() with B_ASYNC/B_DELWRI
        and B_INVAL on and bp-vp == 0 (buffer disowned). On
        attempting to complete the write, VOP_STRATEGY
        resolution results in a trap 15 due to null vp.

        PHKL_8329:
        There was a code path where dirty buffers could possibly
        be dropped (not flushed) when extending UFS files using
        fragments.

        PHKL_7957:
        There is a race condition in the buffer cache.  A client of
        the buffer cache can arrange to have its own completion
        routine run when the I/O finishes.  But because of a problem
        in biodone(), the buffer might be released while the I/O
        completion routine is still running!  There are many reasons
        why this is bad, including the possibility that the flags
        for the buffer will be corrupted.  In one particular case,
        the busy flag is incorrectly left set when it should have
        been cleared.

        PHKL_7793:
        When Vxfs calls getnewbuf() to acquire a buffer, it passes
        VX_NONBLOCK in bxflags to avoid potential deadlock. However,
        getnewbuf(), when finding a B_DELWRI buffer, proceeds to
        call bwrite() to flush the buffer without checking the
        bxflags. This causes deadlock in the following scenario:

        1. getnewbuf() tries to flush a buffer belongs to Vxfs.
           The Vxfs strategy layer finds the buffer involves in an
           uncommitted transaction and decides to flush the log
           buffer first.

        2. This file system has a snapshot and the region of the
           log we are about to overwrite happens to change for the
           first time. So the snapshot strategy layer needs to copy
           the old data from primary disk to the snapshot before
           flushing the log. In order to do so, it asks for another
           buffer.

        3. getnewfs() is called again and yet another Vxfs dirty
           buffer needs to be flushed.

        4. Vxfs strategy layer decides the current log buffer must
           be flushed before any dirty buffer. Since the current
           log buffer is locked, it sleeps and waits, for a lock
           that is owned by itself.

        The fix is that if a B_DELWRI buffer with VX_NONBLOCK is
        chosen, getnewbuf() will return NULL instead of flushing
        and returning the buffer.

        PHKL_7408:
        This defect will occur under the following conditions:
                - We are doing readahead on the disk.  JFS is
                  aggressive this way.  Essentially, the BX_NONBLOCK
                  and/or BX_NOBUFWAIT flags will be set for the
                  buffer read.
                - The buffer cache virtual space (check bufmap) is
                  highly fragmented.  Another possibility (though
                  it hasn't been seen) is that the current buffer
                  overlaps another locked buffer.  Essentially,
                  anything that necessitates sleeping in brealloc1()
                  or allocbuf1().

        brealloc1() and allocbuf1() refuse to sleep if one of the
        nonblock flags are set.  However, a bug in ogetblk() ignores
        this error return condition, and simply loops if the call
        to brealloc1() fails.

SR:
        1653166496 1653173922 4701327338 4701333419 4701348359
        5003314906

Patch Files:
        /usr/conf/lib/libhp-ux.a(vfs_bio.o)

what(1) Output:
        /usr/conf/lib/libhp-ux.a(vfs_bio.o):
                vfs_bio.c $Date: 97/03/04 07:35:11 $ $Revision: 1.20
                        .72.76 $ PATCH_10.01 (PHKL_10284)

cksum(1) Output:
        4284804816 27612 /usr/conf/lib/libhp-ux.a(vfs_bio.o)

Patch Conflicts: None

Patch Dependencies:  None

Hardware Dependencies:  None

Other Dependencies:  None

Supersedes:
        PHKL_7408 PHKL_7793 PHKL_7957 PHKL_8329

Equivalent Patches:
        PHKL_10285:
        s800: 10.01

        PHKL_10286:
        s700: 10.10

        PHKL_10287:
        s800: 10.10

        PHKL_10288:
        s700: 10.20

        PHKL_10289:
        s800: 10.20

Patch Package Size:  80 Kbytes

Installation Instructions:
        Please review all instructions and the Hewlett-Packard
        SupportLine User Guide or your Hewlett-Packard support terms
        and conditions for precautions, scope of license,
        restrictions, and, limitation of liability and warranties,
        before installing this patch.
        ------------------------------------------------------------
        1. Back up your system before installing a patch.

        2. Login as root.

        3. Copy the patch to the /tmp directory.

        4. Move to the /tmp directory and unshar the patch:

                cd /tmp
                sh PHKL_10284

        5a. For a standalone system, run swinstall to install the
            patch:

                swinstall -x autoreboot=true -x match_target=true \
                        -s /tmp/PHKL_10284.depot

        5b. For a homogeneous NFS Diskless cluster run swcluster on the
            server to install the patch on the server and the clients:

                swcluster -i -b

            This will invoke swcluster in the interactive mode and
            force all clients to be shut down.

            WARNING: All cluster clients must be shut down prior to the
                     patch installation.  Installing the patch while the
                     clients are booted is unsupported and can lead to
                     serious problems.

            The swcluster command will invoke an swinstall session in which
            you must specify:

                alternate root path  -  default is /export/shared_root/OS_700
                source depot path    -  /tmp/PHKL_10284.depot

            To complete the installation, select the patch by choosing
            "Actions -> Match What Target Has" and then "Actions -> Install"
            from the Menubar.

        5c. For a heterogeneous NFS Diskless cluster:

                - run swinstall on the server as in step 5a to install
                  the patch on the cluster server.

                - run swcluster on the server as in step 5b to install
                  the patch on the cluster clients.

        By default swinstall will archive the original software in
        /var/adm/sw/patch/PHKL_10284.  If you do not wish to retain a
        copy of the original software, you can create an empty file
        named /var/adm/sw/patch/PATCH_NOSAVE.

        Warning: If this file exists when a patch is installed, the
                 patch cannot be deinstalled.  Please be careful
                 when using this feature.

        It is recommended that you move the PHKL_10284.text file to
        /var/adm/sw/patch for future reference.

        To put this patch on a magnetic tape and install from the
        tape drive, use the command:

                dd if=/tmp/PHKL_10284.depot of=/dev/rmt/0m bs=2k

Special Installation Instructions:  None