Home » Posts tagged 'quality control'
Tag Archives: quality control
Help!
Problem? Check out the OLS Knowledge Base or open a ticket by emailing support@cuny-ols.libanswers.com.
OneSearch, E-Resources, and Dedup
2016-01-26 / 1 Comment on OneSearch, E-Resources, and Dedup
Over the winter break, OLS applied a major improvement to OneSearch results: we “de-deduped” many e-resource records!
The Problem
As OneSearch was initially configured, it applied its dedup process to all records equally, whether they were print or electronic.
What is dedup? The dedup process allows OneSearch to present a single result when the identical resource is held by multiple schools. Multiple records are merged into one display record. This is useful in the case of print resources, allowing the system to offer all CUNY locations to the user in one result.
However, this proved a major problem for e-resource records as the merged data includes the MARC 856 fields (which often provide links to electronic resources). OneSearch uses these fields to create a “View Online” button which links to full-text whenever the underlying metadata indicates this is the purpose of the field (see Why Does OneSearch Say We Have the Electronic Book?).
Deduped e-resource records with multiple 856 fields included in the merged records made it difficult for OneSearch to determine the correct availability for many deduped records. This resulted in the dreaded Full text may be available, see “Details” for links availability message with its accompanying grey dot and lack of a “View Online” button:
The Solution
E-resource records whose links require proxying are being prevented from matching other records for the purpose of deduping. This allows each local view to accurately identify the availability status of its e-resource:
The solution was only applied to records which require proxying on the assumption that e-resources that do not require proxying are freely available to everyone and should be shared with everyone.
The Caveat
Please note that this solution works best in the local school views. A version of the problem still exists for many records in the CUNY view. Instead of a single, deduped record, the CUNY view now displays a group of records (as they are now FRBRized instead of deduped):
Clicking on the title will take the user to a list of individual e-resource records. Records which are only available via a local library will still show the Full text may be available message in the CUNY view.
If you run across any problems, please be sure to report them to OLS by opening a work order with the CUNY Service Desk.
OneSearch vs Aleph as an E-resource Source
Last week’s post about link checking describes one of the ways that OLS is working on improving access to e-resources via OneSearch.
Another area of work regards e-resource records batchloaded to Aleph which are also available via the Primo Central Index (PCI). Because of the large number of duplicate records coming from Aleph, there are often problems determining availability correctly for these batchloaded records.
To avoid these problem, when OLS finds that PCI records are duplicated in Aleph, we mark the Aleph records in order to prevent transfer of these records to OneSearch.
Note: OLS is not removing records from Aleph, we are simply not using those records in OneSearch.
Additional benefits of this approach:
- Efficiency. Duplication doesn’t only happen with PCI. Sometimes the same records are loaded to Aleph for multiple libraries. Any library using OneSearch as their discovery tool can choose to stop running these duplicate batchloads.
- Faster turnaround with updates coming to PCI directly from the publisher.
- Reduced potential for error and out-dated content.
How do I know which Aleph records are being blocked?
STA = PCI
means we don’t need this record in OneSearch. These records can be found in Aleph via a CCL search (where “xx” is your library code):
wst=pci and wow=xx
What if there is more metadata in the Aleph record than in the PCI record?
Many PCI records come with searchable full-text — for example, in addition to the metadata, the entire text of an article could be searchable. Although it is possible that PCI records will contain less metadata than Aleph records, in such cases, we believe that the searchable full-text is a more than ample substitute. On the OLS Support Site, the following information is available:
- PCI collection descriptions including collection name, vendor, content, coverage, if full-text is searchable
- Each library’s PCI collection activations
How do we know if specific e-resources are available in PCI in order to decide what content is duplicated?
PCI makes activated collections searchable (aka “discoverable”). Full-text access for CUNY (aka “Full text Available” in OneSearch) happens because the resource is also activated in SFX. Many PCI collections can be searched for free, even when delivery of full-text is only available to subscribers. See the Primo, PCI, and SFX blog post for more information on this relationship.
At CUNY, we have activated all free-to-search PCI collections in our testing (“Sandbox”) version of OneSearch. When we want to check if specific e-resources are available via PCI, we can search for them in the Sandbox and, after the initial search result is displayed, we click on the “Include results without full text online” checkbox to include content to which we do not have full-text access.
Aleph e-resource records that are not found in PCI will continue to be sent to OneSearch.
Sandbox Caveat: The Sandbox does not include all CUNY content. It is restricted to 100K (local) records so it only contains a small percentage of our Aleph content.
Why does OneSearch sometimes say “No full-text” when there is a working link to full-text in the “Details” tab?
Distinguishing what is really available as full-text is challenging. To minimize user frustration, it is considered preferable to under-promise and over-deliver rather than to over-promise and under-deliver. In other words, the system errs on the side of false negatives (“No full-text” displayed even though full text is available), instead of false positives (“Full text available” displayed even though full text is not available).
For this reason, it can be helpful to remember that while “Full text available” is generally reliable, “No full-text” is less reliable. It is usually worthwhile to click on the “Details” tab of a “No full-text” result, just in case it contains some link that will get you to full-text.
Example: HathiTrust provides no indication of whether its content is open access or not, forcing its users to choose between false negatives and false positives. In response, we at CUNY have chosen to indicate that all HathiTrust content is “No full-text” even though many records will contain a link to full-text in the “Details” tab.
Are e-books showing up in Google Scholar? Many faculty will go to Google Scholar as their discovery tool and bypass the library.
E-resources activated in SFX show as available at CUNY in Google Scholar. This means that a side effect of this project is more availability information being shared with Google Scholar, inadvertently leading to a better user experience in Google Scholar. Please be aware, however, that Google Scholar shares little about how and where it gets the information it provides.
What are e-resources?
“E-resources” refers to any type of content that is available in electronic or digital form. This includes articles, books, videos, and more.
Looking for Link Errors in Aleph Catalog Records
The Office of Library Services is pleased to report that we have completed a thorough review of 856 (URL) fields in our catalog. On Thursday, November 5, we emailed the Cataloging Committee members explaining how to access their school’s error list in the Aleph task manager and providing details about the results.
Project Goals
- a school-specific report listing all the problem fields
- a method for temporarily removing these broken URLs from OneSearch
- an easy method for catalogers to update Aleph and OneSearch once the 856 fields are corrected
- an easy method for catalogers to obtain updated lists of any remaining broken 856 fields
To that end, suspect 856s fields have been marked in the Aleph cataloging record with an 856 sub-field ($zBroken Link). For example:
85641$ahttp://state.tn.us/correction$zBroken Link
Until the link is fixed and the subfield is removed, the URL will not appear in OneSearch.
When the cataloger corrects the link in the 856 field, they will also remove the Broken Link subfield and, later that day or the next day (depending on the time of the update), the link will reappear in OneSearch.
While we have provided a list of broken links for each campus, any Aleph power user can easily get an updated list via an Aleph CCL (Common Command Language) search. For example, this search will output all broken links owned by CUNY Central (OWN=AL):
WUR="broken link" AND WOW=AL
Recommendations
To help you get started, we are recommending that catalogers prioritize the error list in the following way?s
- Troubleshoot e-resource-only records first (before mixed print+electronic). If these don’t work, the records have no reason to exist.
- Identify 856s in old records (government docs and books) that point to non-full-text content (like table of contents). Consider removing those without assessment.
- Identify 856s for online versions of print material. Update if possible, otherwise remove the links (PURLs often don’t work).
Additional Notes
In September, we wrote about new error checks for the 856 field that had been added in the Aleph GUI. These error checks allow the cataloger to open a record, press CTRL-U (Check Record), and get an error report showing the type of error. Clicking on that error will take them directly to the problematic 856 field.
That September blog entry also lists all the MARC fields which can contain a URL. At the beginning of this project, we reviewed the content of all those other URL fields (505, 506, etc) and found that their use was extremely rare at CUNY. For that reason, we only ran this URL checking process on 856 fields.
Since we have added the GUI-based URL checks, we do not anticipate running this process more than once a year.
New in Aleph GUI: Checking URLs for Errors
Most links in our catalog are in the MARC 856 field. Many of these are added to the catalog by batch record loads, but some are added by catalogers.
URLs manually entered in Aleph by catalogers will now be checked for errors as part of the standard Aleph GUI error checking procedures.
This has been made possible by the addition of the following new tests to the Aleph Cataloging module:
- check a URL for validity
- check that the URL was not mistakenly entered in subfield a ($a)
Checking a URL for Validity
In addition to the 856 field, the URL subfield ($u) can also be used in the following fields:
- 505 (Formatted Contents Note)
- 506 (Restrictions on Access Note)
- 514 (Data Quality Note)
- 520 (Summary, Etc.)
- 530 (Additional Physical Form Available Note)
- 540 (Terms Governing Use and Reproduction Note)
- 545 (Biographical or Historical Note)
- 552 (Entity and Attribute Information Note)
- 555 (Cumulative Index/Finding Aids Note)
- 563 (Binding Information)
- 583 (Action Note)
While $u is rarely used in these fields, this validation process will check any URLs that are entered into $u of these fields, too.
The error will be expressed as a URL error code (aka HTTP status code) — see the example below. Lists of the common URL errors and their meanings, such as 404 “Not Found”, can be found on the web.
Checking that a URL is Not in $a
The URL validity check described above only looks at URLs that are located in $u. It will never cause an error message when the URL is entered in the wrong subfield.
One of the more common 856 URL errors is that a URL is entered in $a by mistake. In fact, 856 fields rarely have need for $a.
For this reason, we have created a second check that looks for the existence of $u when $a exists in the 856 field.
This check is only run on field 856 because the 856 is overwhelmingly the most common location of URLs in our catalog records.
The New Checks in Action
These two new checks are both run along with other standard Aleph error-checking:
- by saving the record (example of the validity check)
- from anywhere in the record, go to the Edit Actions menu > Check Record (example of the subfield check)
- with the cursor in the 856 field, go to the Edit Actions menu > Check Field
We hope that these new features will be useful to you and aid in our on-going efforts to make e-resources more easily accessible for end-users.
Archive
M | T | W | T | F | S | S |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 |
Recent Posts
- Keyboard shortcuts in Alma 2020-06-03
- Serials Solutions MARC records being removed from Aleph this month 2019-07-09
- Activating CUNY Central collections in OCLC WorldShare Collection Manager 2019-07-03
- OneSearch | Highlights of May 2019 update 2019-06-06
- New workflow for OneSearch bug reports 2019-04-05